Patent classifications
G06F9/30149
PRE-STAGED INSTRUCTION REGISTERS FOR VARIABLE LENGTH INSTRUCTION SET MACHINE
Methods and systems relating to improved processing architectures with pre-staged instructions are disclosed herein. A disclosed processor includes an instruction memory, at least one functional processing unit, a bus, a set of instruction registers configured to be loaded, using the bus, with a set of pre-staged instructions from the instruction memory, and a logic circuit configured to provide the set of pre-staged instructions from the set of instruction registers to the at least one functional processing unit in response to receiving an instruction from the instruction memory.
Variable-length instruction buffer management
A vector processor is disclosed including a variety of variable-length instructions. Computer-implemented methods are disclosed for efficiently carrying out a variety of operations in a time-conscious, memory-efficient, and power-efficient manner. Methods for more efficiently managing a buffer by controlling the threshold based on the length of delay line instructions are disclosed. Methods for disposing multi-type and multi-size operations in hardware are disclosed. Methods for condensing look-up tables are disclosed. Methods for in-line alteration of variables are disclosed.
METHOD AND APPARATUS FOR VECTOR SORTING USING VECTOR PERMUTATION LOGIC
A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.
Instruction packing scheme for VLIW CPU architecture
A processor is provided and includes a core that is configured to perform a decode operation on a multi-instruction packet comprising multiple instructions. The decode operation includes receiving the multi-instruction packet that includes first and second instructions. The first instruction includes a primary portion at a fixed first location and a secondary portion. The second instruction includes a primary portion at a fixed second location between the primary portion of the first instruction and the secondary portion of the first instruction. An operational code portion of the primary portion of each of the first and second instructions is accessed and decoded. An instruction packet including the primary and secondary portions of the first instruction is created, and a second instruction packet including the primary portion of the second instruction is created. The first and second instructions packets are dispatched to respective first and second functional units.
ACCESSING PHYSICAL MEMORY FROM A CPU OR PROCESSING ELEMENT IN A HIGH PERFOMANCE MANNER
A method and apparatus is described herein for accessing a physical memory location referenced by a physical address with a processor. The processor fetches/receives instructions with references to virtual memory addresses and/or references to physical addresses. Translation logic translates the virtual memory addresses to physical addresses and provides the physical addresses to a common interface. Physical addressing logic decodes references to physical addresses and provides the physical addresses to a common interface based on a memory type stored by the physical addressing logic.
STREAM REFERENCE REGISTER WITH DOUBLE VECTOR AND DUAL SINGLE VECTOR OPERATING MODES
A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers.
DATA PROCESSING APPARATUS HAVING STREAMING ENGINE WITH READ AND READ/ADVANCE OPERAND CODING
A streaming engine employed in a digital signal processor specified a fixed data stream. Once started the data stream is read only and cannot be written. Once fetched the data stream is stored in a first-in-first-out buffer for presentation to functional units in the fixed order. Data use by the functional unit is controlled using the input operand fields of the corresponding instruction. A read only operand coding supplies the data an input of the functional unit. A read/advance operand coding supplies the data and also advances the stream to the next sequential data elements. The read only operand coding permits reuse of data without requiring a register of the register file for temporary storage.
SYSTEMS, METHODS, AND APPARATUSES FOR TILE LOAD
Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.
APPARATUS AND METHOD FOR VECTOR PACKED SIGNED/UNSIGNED SHIFT, ROUND, AND SATURATE
Apparatus and method for signed and unsigned shift, round and saturate using different data element values. For example, one embodiment of an apparatus comprises a decoder to decode an instruction having fields for a first packed data source operand to provide a first source data element and a second source data element, a second packed data source operand or immediate to provide a first shift value and a second shift value corresponding to the first source data element and second source data element, respectively, and a packed data destination operand to indicate a first result value and a second result value corresponding to the first source data element and second source data element, and execution circuitry to execute the decoded instruction to: shift the first source data element by an amount based on the first shift value to generate a first shifted data element; shift the second source data element by an amount based on the second shift value to generate a second shifted data element; update a saturation indicator responsive to detecting a saturation condition resulting from the shift of the first and/or second source data elements; round and/or saturate the first and second shifted data elements in accordance with a specified rounding mode and the saturation indicator, respectively, to generate the first and second result data elements; and store the first result value and the second result value in a first data element location and a second data element location in a destination register.
Systems, methods, and apparatuses for tile store
Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.