G06F9/30112

Data processing apparatus having streaming engine with read and read/advance operand coding
11693660 · 2023-07-04 · ·

A streaming engine employed in a digital signal processor specified a fixed data stream. Once started the data stream is read only and cannot be written. Once fetched, the data stream is stored in a first-in-first-out buffer for presentation to functional units in the fixed order. Data use by the functional unit is controlled using the input operand fields of the corresponding instruction. A read only operand coding supplies the data an input of the functional unit. A read/advance operand coding supplies the data and also advances the stream to the next sequential data elements. The read only operand coding permits reuse of data without requiring a register of the register file for temporary storage.

Mechanism for interrupting and resuming execution on an unprotected pipeline processor

Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, beginning execution of the first instruction, receiving one or more second instructions for execution on the instruction execution pipeline, the one or more second instructions associated with a higher priority task than the first instruction, storing a register state associated with the execution of the first instruction in one or more registers of a capture queue associated with the instruction execution pipeline, copying the register state from the capture queue to a memory, determining that the one or more second instructions have been executed, copying the register state from the memory to the one or more registers of the capture queue, and restoring the register state to the instruction execution pipeline from the capture queue.

STREAMING ENGINE WITH STREAM METADATA SAVING FOR CONTEXT SWITCHING
20230004391 · 2023-01-05 ·

A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces addresses of data elements. A steam head register stores data elements next to be supplied to functional units for use as operands. Stream metadata is stored in response to a stream store instruction. Stored stream metadata is restored to the stream engine in response to a stream restore instruction. An interrupt changes an open stream to a frozen state discarding stored stream data. A return from interrupt changes a frozen stream to an active state.

Monolithic vector processor configured to operate on variable length vectors using a vector length register

A computer processor comprising a vector unit is disclosed. The vector unit may comprise a vector register file comprising at least one register to hold a varying number of elements. The vector unit may further comprise a vector length register file comprising at least one register to specify the number of operations of a vector instruction to be performed on the varying number of elements in the at least one register of the vector register file. The computer processor may be implemented as a monolithic integrated circuit.

Bit width reconfiguration using a shadow-latch configured register file

A processor includes a front-end with an instruction set that operates at a first bit width and a floating point unit coupled to receive the instruction set in the processor that operates at the first bit width. The floating point unit operates at a second bit width and, based upon a bit width assessment of the instruction set provided to the floating point unit, the floating point unit employs a shadow-latch configured floating point register file to perform bit width reconfiguration. The shadow-latch configured floating point register file includes a plurality of regular latches and a plurality of shadow latches for storing data that is to be either read from or written to the shadow latches. The bit width reconfiguration enables the floating point unit that operates at the second bit width to operate on the instruction set received at the first bit width.

LARGE INTEGER MULTIPLICATION ENHANCEMENTS FOR GRAPHICS ENVIRONMENT

An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.

EVICTING AND RESTORING INFORMATION USING A SINGLE PORT OF A LOGICAL REGISTER MAPPER AND HISTORY BUFFER IN A MICROPROCESSOR COMPRISING MULTIPLE MAIN REGISTER FILE ENTRIES MAPPED TO ONE ACCUMULATOR REGISTER FILE ENTRY

A computer system, processor, programming instructions and/or method of processing data that includes a main register file having a plurality of entries for storing data; an accumulator register file having a plurality of entries for storing data wherein multiple main register file entries are mapped to one accumulator register file entry in the at least one accumulator register file; a logical register mapper to track and map logical registers to main register file entries, and a history buffer. Processing wide data width instructions includes evicting and restoring information from a single primary entry in the logical register mapper through a single read or write port in the logical register mapper without evicting or restoring the remaining other multiple main register file entries mapped in the accumulator register.

Multiply-accumulation in a data processing apparatus
11513796 · 2022-11-29 · ·

A data processing apparatus, a method of operating a data processing apparatus, a non-transitory computer readable storage medium, and an instruction are provided. The instruction specifies a first source register, a second source register, and a set of N accumulation registers. In response to the instruction control signals are generated, causing processing circuitry to extract N data elements from content of the first source register, perform a multiplication of each of the N data elements by content of the second source register, and apply a result of each multiplication to content of a respective target register of the set of N accumulation registers. As a result plural (N) multiplications are performed in a manner that effectively provides a multiplier N times the register width, but without requiring the register file to be made N times larger.

METHOD AND SYSTEM FOR OPTIMIZING ADDRESS CALCULATIONS
20220374236 · 2022-11-24 ·

The disclosed systems, structures, and methods are directed to optimizing address calculations in a computer. This is achieved in a compiler that identifies an address calculation in code that is being compiled and transforms the code by splitting the address calculation into a first portion in which an offset is determined and a second portion, in which the offset is combined with a base pointer to generate an address. The address and the base pointer have a first bit-length, and the offset has a second bit-length shorter than the first bit-length. The offset is determined using an operation performed at the second bit-length. In some implementations the first bit-length is 64 bits and the second bit-length is 32 bits.

Streaming engine with deferred exception reporting

This invention is a streaming engine employed in a digital signal processor. A fixed data stream sequence is specified by a control register. The streaming engine fetches stream data ahead of use by a central processing unit and stores it in a stream buffer. Upon occurrence of a fault reading data from memory, the streaming engine identifies the data element triggering the fault preferably storing this address in a fault address register. The streaming engine defers signaling the fault to the central processing unit until this data element is used as an operand. If the data element is never used by the central processing unit, the streaming engine never signals the fault. The streaming engine preferably stores data identifying the fault in a fault source register. The fault address register and the fault source register are preferably extended control registers accessible only via a debugger.