G06F15/781

TRACKING STREAMING ENGINE VECTOR PREDICATES TO CONTROL PROCESSOR EXECUTION
20230084716 · 2023-03-16 ·

In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.

Method and Apparatus for Vector Based Matrix Multiplication
20220283810 · 2022-09-08 ·

A method is provided that includes performing, by a processor in response to a vector matrix multiply instruction, multiplying an m×n matrix (A matrix) and a n×p matrix (B matrix) to generate elements of an m×p matrix (R matrix), and storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.

Method and Apparatus for Dual Issue Multiply Instructions
20220229782 · 2022-07-21 ·

A method is provided that includes performing, by a processor in response to a dual issue multiply instruction, multiplication of operands of the dual issue multiply instruction using multiplication units comprised in a data path of the processor and configured to operate together to determine a product of the operands, and storing, by the processor, the product in a storage location indicated by the dual issue multiply instruction.

Method and Apparatus for Dual Multiplication Units in a Data Path
20220206802 · 2022-06-30 ·

A processor is provided that includes a first multiplication unit in a first data path of the processor, the first multiplication unit configured to perform single issue multiply instructions, and a second multiplication unit in the first data path, the second multiplication unit configured to perform single issue multiply instructions, wherein the first multiplication unit and the second multiplication unit are configured to execute respective single issue multiply instructions in parallel.

Neural network processing

A data processing system operable to process a neural network, and comprising a plurality of processors. The data processing system is operable to determine whether to perform neural network processing using a single processor or using plural processors. When it is determined that plural processors should be used, a distribution of the neural network processing among two or more of the processors is determined and the two or more processors are each assigned a portion of the neural network processing to perform. A neural network processing output is provided as a result of the processors performing their assigned portions of the neural network processing.

INTEGRATED SCALING AND STRETCHING PLATFORM FOR OPTIMIZING MONOLITHIC INTEGRATION AND/OR HETEROGENEOUS INTEGRATION IN A SINGLE SEMICONDUCTOR DIE

The present invention provides a single monolithic the comprising a first schematic circuit manufactured based on a first technology node. A die area of the single monolithic die is smaller than a die area of another monolithic die with a second schematic circuit made based on the first technology node, wherein the first schematic circuit is the same as the second schematic circuit, and the first schematic circuit is a SRAM circuit, a logic circuit, a combination of SRAM and logic circuit, or a major function block circuit.

METHOD AND APPARATUS FOR PERMUTING STREAMED DATA ELEMENTS

A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

METHOD AND APPARATUS TO SORT A VECTOR FOR A BITONIC SORTING ALGORITHM
20220156073 · 2022-05-19 ·

A method is provided that includes performing, by a processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values in a first portion of the lanes are sorted in a first order indicated by the vector sort instruction and the values in a second portion of the lanes are sorted in a second order indicated by the vector sort instruction; and storing the sorted vector in a storage location.

Method and apparatus for vector based matrix multiplication

A method is provided that includes performing, by a processor in response to a vector matrix multiply instruction, multiplying an m×n matrix (A matrix) and a n×p matrix (B matrix) to generate elements of an m×p matrix (R matrix), and storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.

Symbiotic Network On Layers

The technology relates to a system on chip (SoC). The SoC may include a plurality of network layers which may assist electrical communications either horizontally or vertically among components from different device layers. In one embodiment, a system on chip (SoC) includes a plurality of network layers, each network layer including one or more routers, and more than one device layers, each of the plurality of network layers respectively bonded to one of the device layers. In another embodiment, a method for forming a system on chip (SoC) includes forming a plurality of network layers in an interconnect, wherein each network layer is bonded to an active surface of a respective device layer in a plurality of device layer.