Patent classifications
G06F9/30149
Systems, methods, and apparatuses for matrix add, subtract, and multiply
Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
METHOD AND SYSTEM FOR EXECUTING NEW INSTRUCTIONS
A method for executing new instructions is provided. The method is used in a processor and includes: receiving an instruction; when the received instruction is an unknown instruction, the processor executes the following steps through a conversion program: determining whether the received instruction is a new instruction; and converting the received instruction into at least one old instruction when the received instruction is a new instruction; and simulating the execution of the received instruction by executing the at least one old instruction.
METHOD AND SYSTEM FOR CONVERTING INSTRUCTIONS
A method for converting instructions is provided. The method is used in a processor and includes: receiving an instruction, wherein the instruction is an unknown instruction; determining whether the received instruction is a new instruction; and converting the received instruction into at least one old instruction when the received instruction is a new instruction.
STREAM REFERENCE REGISTER WITH DOUBLE VECTOR AND DUAL SINGLE VECTOR OPERATING MODES
A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers.
Chip and chip-based data processing method
Embodiments of the present specification provide chips and chip-based data processing methods. In an embodiment, a method comprises: obtaining data associated with one or more neural networks transmitted from a server; for each layer of a neural network of the one or more neural networks, configuring, based on the data, a plurality of operator units based on a type of computation each operator unit performs; and invoking the plurality of operator units to perform computations, based on neurons of a layer of the neural network immediately above, of the data for each neuron to produce a value of the neuron.
SYSTEMS, METHODS, AND APPARATUSES FOR MATRIX ADD, SUBTRACT, AND MULTIPLY
Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
Encoding and decoding variable length instructions
Methods of encoding and decoding are described which use a variable number of instruction words to encode instructions from an instruction set, such that different instructions within the instruction set may be encoded using different numbers of instruction words. To encode an instruction, the bits within the instruction are reordered and formed into instruction words based upon their variance as determined using empirical or simulation data. The bits in the instruction words are compared to corresponding predicted values and some or all of the instruction words that match the predicted values are omitted from the encoded instruction.
Apparatus and methods for vector operations
Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
Vector friendly instruction format and execution thereof
- Robert C. Valentine ,
- Jesus Corbal San Adrian ,
- Roger Espasa Sans ,
- Robert D. Cavin ,
- Bret L. Toll ,
- Santiago Galan Duran ,
- Jeffrey G. Wiedemeier ,
- Sridhar Samudrala ,
- Milind Baburao Girkar ,
- Edward Thomas Grochowski ,
- Jonathan Cannon Hall ,
- Dennis R. Bradford ,
- Elmoustapha Ould-Ahmed-Vall ,
- James C Abel ,
- Mark Charney ,
- Seth Abraham ,
- Suleyman Sair ,
- Andrew Thomas Forsyth ,
- Lisa Wu ,
- Charles Yount
A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
Computing machine using a matrix space and matrix pointer registers for matrix and array processing
This disclosure relates to methods and mechanisms for matrix computing which include machine embodiments with one or more matrix storage spaces for holding matrices and arrays for computing, where a matrix or an array is accessible by its columns, by its rows, or both, individually, or concurrently. A set of methods and mechanisms to build a large capacity instruction set with multi-length instructions to load, store, and compute with these matrices and arrays are also disclosed. Methods and access control mechanisms with keys to secure, share, lock and unlock regions in the storage space for matrices and arrays under the control of an operating system or a virtual machine hypervisor by permitted threads and processes are also disclosed. Methods and mechanisms to handle long immediate operands for use by shorter instructions using a payload instruction are also disclosed. The structure of the instructions with key instruction fields and a method for determining instruction length are also disclosed.