Patent classifications
G06F9/3013
PROCESSING MIXED-SCALAR-VECTOR INSTRUCTIONS
Processing circuitry supports overlapped execution of vector instructions when at least one beat of a first vector instruction is performed in parallel with at least one beat of a second vector instruction. The processing circuitry also supports mixed-scalar-vector instructions for which one of a destination register and one or more source registers is a vector register and another is a scalar register. In a sequence including first and subsequent mixed-scalar-vector instructions, instances of relaxed execution which can potentially lead to uncertain and incorrect results are permitted by the processing circuitry when the instructions are separated by fewer than a predetermined number of intervening instructions. In practice the situations which lead to the uncertain results are very rare and so it is not justified providing relatively expensive dependency checking circuitry for eliminating such cases.
Enhanced Macroscalar predicate operations
Systems, apparatuses and methods for utilizing enhanced macro scalar predicate operations which take enhanced predicate operands that designate the element width and which elements are to be processed. The element width and the number of elements per vector are determined at run-time rather than being defined in the architectural definition of the instruction. This enables additional parallelism when processing smaller-sized data. The instruction performs the requested operation on the elements specified by the enhanced control predicate, assuming an element-width also specified by the enhanced control predicate, and returns the result as an enhanced predicate of the same element width.
Apparatus and method for storing bounded pointers
An apparatus and method are provided for storing bounded pointers. One example apparatus comprises a storage comprising storage elements to store bounded pointers, each bounded pointer comprising a pointer value and associated attributes including at least range information, and processing circuitry to store a bounded pointer in a chosen storage element. The storing process comprises storing in the chosen storage element a pointer value of the bounded pointer, and storing in the storage element the range information of the bounded pointer, such that the range information indicates both a read range of the bounded pointer and a write range of the bounded pointer that differs to the read range. The read range comprises at least one memory address for which reading is allowed when using the bounded pointer, and the write range comprises at least one memory address to which writing is allowed when using the bounded pointer.
Systems and methods for performing matrix compress and decompress instructions
Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
Method and Computing System for Handling Instruction Execution Using Affine Register File on Graphic Processing Unit
The present invention provides an affine engine design to the microarchitecture of the graphic processing unit, in which an operand type detection is performed, and then physical scalar, affine, or vector registers and corresponding ALUs with maximum performance improving and energy saving are allocated to perform instruction execution. In runtime, affine and uniform instructions are executed by the affine engine, while general vector instructions are executed by a vector engine, thereby the affine/uniform instruction execution can be dispatched to the affine engine, so the vector engine can enter a power-saving state to save the energy consumption of the GPU.
Non-volatile memory device using efficient page collection mapping in association with cache and method of operating the same
Disclosed are a non-volatile memory device and a method of operating the non-volatile memory device. A non-volatile memory device in which m logical pages are stored in a single physical page includes: a plurality of registers configured to be included in a flash translation layer (FTL) and to store at least part of the data of a write command received from a file system; and a controller configured to control operations of the plurality of registers based on the write command; wherein each of the plurality of registers is further configured to have a storage space associated with the size of the m logical pages; and wherein the controller is further configured to program the data of the write command into the non-volatile memory device and to store the data of the write command in the plurality of registers.
Methods and systems for hardware-based memory resource allocation
Methods and systems for memory resource allocation are disclosed. In an embodiment, a method for memory resource allocation involves reading a pool-specific configuration record from an array of memory mapped pool-specific configuration records according to a memory resource allocation request that is held in an address register of a memory mapped register interface, performing a memory resource allocation operation to service the memory resource allocation request, wherein performing the memory resource allocation operation involves interacting with a resource list according to a pointer in the pool-specific configuration record, advancing the pointer after the interaction, and updating the pointer in the pool-specific configuration record with the advanced pointer.
Conditional execution specification of instructions using conditional extension slots in the same execute packet in a VLIW processor
In one embodiment, a system includes a memory and a processor core. The processor core includes functional units and an instruction decode unit configured to determine whether an execute packet of instructions received by the processing core includes a first instruction that is designated for execution by a first functional unit of the functional units and a second instruction that is a condition code extension instruction that includes a plurality of sets of condition code bits, wherein each set of condition code bits corresponds to a different one of the functional units, and wherein the sets of condition code bits include a first set of condition code bits that corresponds to the first functional unit. When the execute packet includes the first and second instructions, the first functional unit is configured to execute the first instruction conditionally based upon the first set of condition code bits in the second instruction.
Method and apparatus for processing data splicing instruction
The present disclosure discloses an instruction processing apparatus, comprising a first vector register adapted to store a first vector to be operated on, a second vector register adapted to store a second vector to be operated on, a decoder adapted to receive and decode a data splicing instruction, and an execution unit. The data splicing instruction indicates the first vector register as a first operand, the second vector register as a second operand, a splicing indicator, and a destination. The execution unit is coupled to the first vector register, the second vector register, and the decoder, and is adapted to execute the decoded data splicing instruction, so as to acquire a first part of the first vector from the first vector register and acquire a second part of the second vector from the second vector register according to the splicing indicator, splice the acquired first part of the first vector and the acquired second part of the second vector to form a third vector, and store the third vector into the destination.
Fused overloaded register file read to enable 2-cycle move from condition register instruction in a microprocessor
A computer system, processor, and method for processing information is disclosed that includes at least one computer processor, a register file associated with the at least one processor, preferably a condition register that stores status information, the register file having multiple locations for storing data, multiple ports to write data to and read data from the register file. The system or processor includes an execution area, and the processor is configured to read from all the read ports in a first cycle, and to read from all the read ports in a second cycle. In an embodiment, the execution area includes a staging latch to store data from a first cycle read operation, and in an aspect the computer system is configured to combine the data stored in the staging latch during a first read cycle with the data read from the second cycle.