G06F9/32

Control of branch prediction for zero-overhead loop

In response to decoding a zero-overhead loop control instruction of an instruction set architecture, processing circuitry sets at least one loop control parameter for controlling execution of one or more iterations of a program loop body of a zero-overhead loop. Based on the at least one loop control parameter, loop control circuitry controls execution of the one or more iterations of the program loop body of the zero-overhead loop, the program loop body excluding the zero-overhead loop control instruction. Branch prediction disabling circuitry detects whether the processing circuitry is executing the program loop body of the zero-overhead loop associated with the zero-overhead loop control instruction, and dependent on detecting that the processing circuitry is executing the program loop body of the zero-overhead loop, disables branch prediction circuitry. This reduces power consumption during a zero-overhead loop when the branch prediction circuitry is unlikely to provide a benefit.

LOOP UNROLLING PROCESSING APPARATUS, METHOD, AND PROGRAM
20230161590 · 2023-05-25 · ·

The generation unit 4 generates arithmetic expressions. Here, N denotes the number of looping times of the loop processing. L denotes a designated lower limit of unroll stage number. M denotes a designated upper limit of the unroll stage number. Q denotes a quotient obtained by dividing N by L. R denotes a remainder obtained by dividing N by L. The arithmetic expressions include an arithmetic expression that represents executing loop processing whose number of looping times is a quotient obtained by dividing R by (M−L), with the unroll stage number M when R−Q*(M−L)>0 is not satisfied, and then executing, when a remainder obtained by dividing R by (M−L) is other than 0, processing of one loop with sum of the remainder and L as the unroll stage number, and then executing loop processing with the unroll stage number L.

METHOD AND SYSTEM FOR HARDWARE-ASSISTED PRE-EXECUTION
20230061576 · 2023-03-02 ·

One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.

Livelock recovery circuit for detecting illegal repetition of an instruction and transitioning to a known state

Livelock recovery circuits configured to detect livelock in a processor, and cause the processor to transition to a known safe state when livelock is detected. The livelock recovery circuits include detection logic configured to detect that the processor is in livelock when the processor has illegally repeated an instruction; and transition logic configured to cause the processor to transition to a safe state when livelock has been detected by the detection logic.

Techniques for efficiently transferring data to a processor

A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Extended memory neuromorphic component

Systems, apparatuses, and methods related to an extended memory neuromorphic component for performing operations in memory are described. An example apparatus can include a plurality of computing devices. Each of the computing devices can include a processing unit and a memory array. The example apparatus can further include a communication subsystem coupled to the at least one of the plurality of computing devices and to a neuromorphic component. At least one of the plurality of computing devices can receive a request from a host to perform an operation, receive an indication of data to be access in a memory device to perform the operation, and send an indication to the neuromorphic component to monitor the data to be accessed. The neuromorphic component can intercept data and determine that a portion of the data should be flagged.

Mergeable counter system and method

A system includes a first counter configured to increment or decrement in response to a triggering event. The first counter is sized to overflow. The system also includes a second counter configured to increment or decrement in response to a triggering event. The first counter and the second counter are merged to form a third counter in response to detecting an overflow triggering event for the first counter. A merge bit indicative of whether the first counter and the second counter are merged changes value in response to merging the first counter and the second counter.

Dropped write detection and correction

Implementations for dropped write detection and correction are described. An example method includes receiving a write command comprising data and associated metadata; increasing a value of a monotonic counter; generating updated metadata by adding the counter value to the metadata; atomically writing (a) the data and a first instance of the updated metadata to a first storage device, and (b) a second instance of the updated metadata to a second storage device; receiving a read request for the data; reading the first instance of the updated metadata from the first storage device; reading the second instance of the updated metadata from a second storage device; comparing the instances of metadata and the counter values within each instance of metadata; determining whether the first counter value matches the second counter value; and determining whether a dropped write has occurred based on whether the first counter values matches the second counter value.

CONTROL OF BRANCH PREDICTION FOR ZERO-OVERHEAD LOOP

In response to decoding a zero-overhead loop control instruction of an instruction set architecture, processing circuitry sets at least one loop control parameter for controlling execution of one or more iterations of a program loop body of a zero-overhead loop. Based on the at least one loop control parameter, loop control circuitry controls execution of the one or more iterations of the program loop body of the zero-overhead loop, the program loop body excluding the zero-overhead loop control instruction. Branch prediction disabling circuitry detects whether the processing circuitry is executing the program loop body of the zero-overhead loop associated with the zero-overhead loop control instruction, and dependent on detecting that the processing circuitry is executing the program loop body of the zero-overhead loop, disables branch prediction circuitry. This reduces power consumption during a zero-overhead loop when the branch prediction circuitry is unlikely to provide a benefit.

Data encryption based on immutable pointers
11620391 · 2023-04-04 · ·

Technologies disclosed herein provide cryptographic computing. An example processor includes a core to execute an instruction, where the core includes a register to store a pointer to a memory location and a tag associated with the pointer. The tag indicates whether the pointer is at least partially immutable. The core also includes circuitry to access the pointer and the tag associated with the pointer, determine whether the tag indicates that the pointer is at least partially immutable. The circuitry is further, based on a determination that the tag indicates the pointer is at least partially immutable, to obtain a memory address of the memory location based on the pointer, use the memory address to access encrypted data at the memory location, and decrypt the encrypted data based on a key and a tweak, where the tweak including one or more bits based, at least in part, on the pointer.