G06F9/30141

STREAMING ENGINE WITH ERROR DETECTION, CORRECTION AND RESTART
20210390018 · 2021-12-16 ·

Disclosed embodiments relate to a streaming engine employed in, for example, a digital signal processor. A fixed data stream sequence including plural nested loops is specified by a control register. The streaming engine includes an address generator producing addresses of data elements and a steam head register storing data elements next to be supplied as operands. The streaming engine fetches stream data ahead of use by the central processing unit core in a stream buffer. Parity bits are formed upon storage of data in the stream buffer which are stored with the corresponding data. Upon transfer to the stream head register a second parity is calculated and compared with the stored parity. The streaming engine signals a parity fault if the parities do not match. The streaming engine preferably restarts fetching the data stream at the data element generating a parity fault.

POWER EFFICIENT MULTI-BIT STORAGE SYSTEM
20210389952 · 2021-12-16 ·

Disclosed herein are embodiments related to a power efficient multi-bit storage system. In one configuration, the multi-bit storage system includes a first storage circuit, a second storage circuit, a prediction circuit, and a clock gating circuit. In one aspect, the first storage circuit updates a first output bit according to a first input bit, in response to a trigger signal, and the second storage circuit updates a second output bit according to a second input bit, in response to the trigger signal. In one aspect, the prediction circuit generates a trigger enable signal indicating whether at least one of the first output bit or the second output bit is predicted to change a state. In one aspect, the clock gating circuit generates the trigger signal based on the trigger enable signal.

Fused overloaded register file read to enable 2-cycle move from condition register instruction in a microprocessor

A computer system, processor, and method for processing information is disclosed that includes at least one computer processor, a register file associated with the at least one processor, preferably a condition register that stores status information, the register file having multiple locations for storing data, multiple ports to write data to and read data from the register file. The system or processor includes an execution area, and the processor is configured to read from all the read ports in a first cycle, and to read from all the read ports in a second cycle. In an embodiment, the execution area includes a staging latch to store data from a first cycle read operation, and in an aspect the computer system is configured to combine the data stored in the staging latch during a first read cycle with the data read from the second cycle.

Management of Thrashing in a GPU

Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory

Central Processing Unit

A central processing unit which achieves increased processing speed is provided. In a CPU constituted of a RISC architecture, a program counter which indicates an address in an instruction memory and a general-purpose register which is designated as an operand in an instruction to be decoded by an instruction decoder are constituted of asynchronous storage elements.

GENERAL PURPOSE REGISTER HIERARCHY SYSTEM AND METHOD
20220197649 · 2022-06-23 ·

A processing unit includes a first memory device and a second memory device. The first memory device includes a first plurality of general purpose registers (GPRs) and the second memory device includes a second plurality of GPRs. The second memory device includes fewer GPRs than the first memory device. Program data is stored at the first memory device and the second memory device based on expected frequency of accesses associated with the program data.

Methods And Apparatus For Selectively Extracting And Loading Register States
20220187899 · 2022-06-16 ·

Integrated circuits may include registers that store register states. Only a subset of the registers may store critical register states. The subset of registers may be specially demarcated, such as using synthesis directions in the hardware description, and may be coupled to dedicated extraction/loading circuitry. The extraction/loading circuitry may be implemented using soft or hard logic or can leverage existing programming or debugging circuitry on a programmable integrated circuit. The extraction/loading mechanism may also be implemented using multiplexers and associated control circuitry, scan chain circuitry, a memory-mapped interface, a tool-instantiated or user-instantiated finite state machine, or external memory interface logic. Accessing critical register states in this way can help improve efficiency with live migration events, debugging, retiming, and other integrated circuit operations.

MICROPROCESSOR WITH TIME COUNTER FOR STATICALLY DISPATCHING INSTRUCTIONS WITH PHANTOM REGISTERS
20230273796 · 2023-08-31 · ·

A processor includes a time counter and provides a method for statically dispatching fused instructions with first operation and second operation with preset execution times for forwarding of result data from the first operation to the second operation without writing to a register, and where the preset execution times are based on a time count from the time counter provided to an execution pipeline.

Issuing instructions on a vector processor

The present disclosure relates to a mechanism for issuing instructions in a processor (e.g., a vector processor) implemented as an overlay on programmable hardware (e.g., a field programmable gate array (FPGA) device). Implementations described herein include features for optimizing resource availability on programmable hardware units and enabling superscalar execution when coupled with a temporal single-instruction multiple data (SIMD). Systems described herein involve an issue component of a processor controller (e.g., a vector processor controller) that enables fast and efficient instruction issue while verifying that structural and data hazards between instructions have been resolved.

Methods and apparatus for selectively extracting and loading register states
11726545 · 2023-08-15 · ·

Integrated circuits may include registers that store register states. Only a subset of the registers may store critical register states. The subset of registers may be specially demarcated, such as using synthesis directions in the hardware description, and may be coupled to dedicated extraction/loading circuitry. The extraction/loading circuitry may be implemented using soft or hard logic or can leverage existing programming or debugging circuitry on a programmable integrated circuit. The extraction/loading mechanism may also be implemented using multiplexers and associated control circuitry, scan chain circuitry, a memory-mapped interface, a tool-instantiated or user-instantiated finite state machine, or external memory interface logic. Accessing critical register states in this way can help improve efficiency with live migration events, debugging, retiming, and other integrated circuit operations.