G06F9/3887

Neural network accelerator with parameters resident on chip

One embodiment of an accelerator includes a computing unit; a first memory bank for storing input activations and a second memory bank for storing parameters used in performing computations, the second memory bank configured to store a sufficient amount of the neural network parameters on the computing unit to allow for latency below a specified level with throughput above a specified level. The computing unit includes at least one cell comprising at least one multiply accumulate (“MAC”) operator that receives parameters from the second memory bank and performs computations. The computing unit further includes a first traversal unit that provides a control signal to the first memory bank to cause an input activation to be provided to a data bus accessible by the MAC operator. The computing unit performs computations associated with at least one element of a data array, the one or more computations performed by the MAC operator.

PROCESSOR WITH TABLE LOOKUP UNIT

A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop.

PROCESSOR HAVING MULTIPLE CORES, SHARED CORE EXTENSION LOGIC, AND SHARED CORE EXTENSION UTILIZATION INSTRUCTIONS
20230052630 · 2023-02-16 ·

An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.

Processor device for executing SIMD instructions
11500632 · 2022-11-15 · ·

In a processor device according to the present invention, a memory access unit reads data to be processed from an external memory and writes the data to a first register group that a plurality of processors does not access among a plurality of register groups. A control unit sequentially makes each of the plurality of processors implement a same instruction, in parallel with changing an address of a register group that stores the data to be processed. A scheduler, based on specified scenario information, specifies an instruction to be implemented and a register group to be accessed for the plurality of processors, and specifies a register group to be written to among the plurality of register groups and data to be processed that is to be written for the memory access unit.

Privacy-enhanced decision tree-based inference on homomorphically-encrypted data

A technique for computationally-efficient privacy-preserving homomorphic inferencing against a decision tree. Inferencing is carried out by a server against encrypted data points provided by a client. Fully homomorphic computation is enabled with respect to the decision tree by intelligently configuring the tree and the real number-valued features that are applied to the tree. To that end, and to the extent the decision tree is unbalanced, the server first balances the tree. A cryptographic packing scheme is then applied to the balanced decision tree and, in particular, to one or more entries in at least one of: an encrypted feature set, and a threshold data set, that are to be used during the decision tree evaluation process. Upon receipt of an encrypted data point, homomorphic inferencing on the configured decision tree is performed using a highly-accurate approximation comparator, which implements a “soft” membership recursive computation on real numbers, all in an oblivious manner.

Prefetch strategy control for parallel execution of threads based on one or more characteristics of a stream of program instructions indicative that a data access instruction within a program is scheduled to be executed a plurality of times

A single instruction multiple thread (SIMT) processor includes execution circuitry, prefetch circuitry and prefetch strategy selection circuitry. The prefetch strategy selection circuitry serves to detect one or more characteristics of a stream of program instructions that are being executed to identify whether or not a given data access instruction within a program will be executed a plurality of times. The prefetch strategy to use is selected from a plurality of selectable prefetch strategies in dependence upon the detection of such detected characteristics.

CONVERGENCE AMONG CONCURRENTLY EXECUTING THREADS

Convergence of threads executing common code sections is facilitated using instructions inserted at strategic locations in computer code sections. The inserted instructions enable the threads in a warp or other group to cooperate with a thread scheduler to promote thread convergence.

Contextual configuration adjuster for graphics

An embodiment of a graphics apparatus may include a context engine to determine contextual information, a recommendation engine communicatively coupled to the context engine to determine a recommendation based on the contextual information, and a configuration engine communicatively coupled to the recommendation engine to adjust a configuration of a graphics operation based on the recommendation. Other embodiments are disclosed and claimed.

Message based general register file assembly

In an example, an apparatus comprises a plurality of execution units, and logic, at least partially including hardware logic, to assemble a general register file (GRF) message and hold the GRF message in storage in a data port until all data for the GRF message is received. Other embodiments are also disclosed and claimed.

TECHNOLOGIES FOR SWITCHING NETWORK TRAFFIC IN A DATA CENTER

Technologies for switching network traffic include a network switch. The network switch includes one or more processors and communication circuitry coupled to the one or more processors. The communication circuity is capable of switching network traffic of multiple link layer protocols. Additionally, the network switch includes one or more memory devices storing instructions that, when executed, cause the network switch to receive, with the communication circuitry through an optical connection, network traffic to be forwarded, and determine a link layer protocol of the received network traffic. The instructions additionally cause the network switch to forward the network traffic as a function of the determined link layer protocol. Other embodiments are also described and claimed.