Patent classifications
G06F9/30149
Tracking streaming engine vector predicates to control processor execution
In a method of operating a computer system, an instruction loop is executed by a processor in which each iteration of the instruction loop accesses a current data vector and an associated current vector predicate. The instruction loop is repeated when the current vector predicate indicates the current data vector contains at least one valid data element and the instruction loop is exited when the current vector predicate indicates the current data vector contains no valid data elements.
Apparatus and methods for vector operations
Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.
Method and apparatus for permuting streamed data elements
A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.
Method and apparatus for implied bit handling in floating point multiplication
A method is provided that includes performing, by a processor in response to a floating point multiply instruction, multiplication of floating point numbers, wherein determination of values of implied bits of leading bit encoded mantissas of the floating point numbers is performed in parallel with multiplication of the encoded mantissas, and storing, by the processor, a result of the floating point multiply instruction in a storage location indicated by the floating point multiply instruction.
Enhanced Low Cost Microcontroller
An 8-bit microprocessor has a program memory having a 16-bit instruction word size and a data memory having an 8-bit data size. An instruction word has a payload size for an address of up to 12 bits. The microprocessor furthermore has a central processing unit coupled with the program memory and the data memory, a bank select register configured to select one of up to 64 memory banks, and an indirect addressing register operable to address up to 16 KB of data memory. The CPU is configured to execute a first move instruction having two instruction words and being configured to only access the lower 4 KB of the data memory and a second move instruction having three instruction words and configured to access the entire data memory.
SYSTEMS AND METHODS FOR PERFORMING INSTRUCTIONS TO TRANSFORM MATRICES INTO ROW-INTERLEAVED FORMAT
Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.
COMPUTATION PROCESSING APPARATUS AND METHOD OF PROCESSING COMPUTATION
A computation processing apparatus includes: a memory; and a processor coupled to the memory and configured to: decode instructions; execute the instructions which is decoded and operate as a plurality of sub-computation processing apparatuses in accordance with a bit width of data to be computed; and observe an operation state of the computation processing apparatus, wherein, when observing that a subset of the plurality of sub-computation processing apparatuses does not execute an instruction or instructions, the processor parallelizes the instructions and outputs the parallelized instructions.
Single instruction multiple data add processors, methods, systems, and instructions
New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed.
Processor operable to ensure code integrity
A processor can be used to ensure that program code can only be used for a designed purpose and not exploited by malware. Embodiments of an illustrative processor can comprise logic operable to execute a program instruction and to distinguish whether the program instruction is a legitimate branch instruction or a non-legitimate branch instruction.
COMPUTING MACHINE ARCHITECTURE FOR MATRIX AND ARRAY PROCESSING
This invention discloses a novel paradigm, method and apparatus for Matrix Computing which include a novel machine architecture with an embedded storage space for holding matrices and arrays for computing which can be accessed by its columns or by its rows or both concurrently. A large capacity multi length instruction set with instructions and methods to load, store and compute with these matrices and arrays are also disclosed; a method and apparatus to secure, share, lock and unlock this embedded space for matrices under the control of an Operating System or a Virtual Machine Monitor by a plurality of threads and processes are also disclosed. A novel method and apparatus to handle immediate operands used by Immediate Instructions are also disclosed. The structure of the instructions with some key fields and a method for determining instruction length easily are also disclosed.