Patent classifications
G06F9/30196
Programmable coarse grained and sparse matrix compute hardware with advanced scheduling
- Eriko Nurvitadhi ,
- Balaji Vembu ,
- Nicolas C. Galoppo Von Borries ,
- Rajkishore Barik ,
- Tsung-Han Lin ,
- Kamal Sinha ,
- Nadathur Rajagopalan Satish ,
- Jeremy Bottleson ,
- Farshad Akhbari ,
- Altug Koker ,
- Narayan Srinivasa ,
- Dukhwan Kim ,
- Sara S. Baghsorkhi ,
- Justin E. Gottschlich ,
- Feng Chen ,
- Elmoustapha Ould-Ahmed-Vall ,
- Kevin Nealis ,
- Xiaoming Chen ,
- Anbang Yao
One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
GEOMETRY MODEL FOR POINT CLOUD CODING
A method, computer program, and computer system for point cloud coding is provided. Data corresponding to a point cloud is received, and one or more geometric features are detected from among the data corresponding to the point cloud. A representation is determined for one or more of the detected geometric features, and the received data is encoded or decoded based on the determined representations whereby the point cloud is reconstructed based on the decoded data.
Systems, methods, and apparatuses for matrix add, subtract, and multiply
Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
APPARATUSES, METHODS, AND SYSTEMS FOR 8-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS
Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.
Matrix processing instruction with optional up/down sampling of matrix
A processor system comprises a shared memory and a processing element. The processing element includes a matrix processor unit and is in communication with the shared memory. The processing element is configured to receive a processor instruction specifying a data matrix and a matrix manipulation operation. A manipulation matrix based on the processor instruction is identified. The data matrix and the manipulation matrix are loaded into the matrix processor unit and a matrix operation is performed to determine a result matrix. The result matrix is outputted to a destination location.
SYSTEMS, METHODS, AND APPARATUSES FOR MATRIX ADD, SUBTRACT, AND MULTIPLY
Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
PROGRAMMABLE COARSE GRAINED AND SPARSE MATRIX COMPUTE HARDWARE WITH ADVANCED SCHEDULING
- Eriko Nurvitadhi ,
- Balaji Vembu ,
- Nicolas C. Galoppo Von Borries ,
- Rajkishore Barik ,
- Tsung-Han Lin ,
- Kamal Sinha ,
- Nadathur Rajagopalan Satish ,
- Jeremy Bottleson ,
- Farshad Akhbari ,
- Altug Koker ,
- Narayan Srinivasa ,
- Dukhwan Kim ,
- Sara S. Baghsorkhi ,
- Justin E. Gottschlich ,
- Feng Chen ,
- Elmoustapha Ould-Ahmed-Vall ,
- Kevin Nealis ,
- Xiaoming Chen ,
- Anbang Yao
One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation.
Circuit for Fast Interrupt Handling
A circuit for fast interrupt handling is disclosed. An apparatus includes a processor circuit having an execution pipeline and a table configured to store a plurality of pointers that correspond to interrupt routines stored in a memory circuit. The apparatus further includes an interrupt redirect circuit configured to receive a plurality of interrupt requests. The interrupt redirect circuit may select a first interrupt request among a plurality of interrupt requests of a first type. The interrupt redirect circuit retrieves a pointer from the table using information associated with the request. Using the pointer, the execution pipeline retrieves first program instruction from the memory circuit to execute a particular interrupt routine.
Apparatuses, methods, and systems for hardware-assisted lockstep of processor cores
Systems, methods, and apparatuses relating to circuitry to implement lockstep of processor cores are described. In one embodiment, a hardware processor comprises a first processor core comprising a first control flow signature register and a first execution circuit, a second processor core comprising a second control flow signature register and a second execution circuit, and at least one signature circuit to perform a first state history compression operation on a first instruction that executes on the first execution circuit of the first processor core to produce a first result, store the first result in the first control flow signature register, perform a second state history compression operation on a second instruction that executes on the second execution circuit of the second processor core to produce a second result, and store the second result in the second control flow signature register.
Memory apparatus and data processing system including the same
A memory apparatus may include at least one memory, and a memory controller configured to receive an address signal and a command through shared pins and store data, provided from an external source, within the memory controller when a write command is inputted without the address signal.