G06F9/321

CONTROL REGISTER SET TO FACILITATE PROCESSOR EVENT BASED SAMPLING

Techniques and mechanisms for configuring processor event-based sampling (PEBS) with a set of control registers. In an embodiment, a first control register of a processor is programmed to store a physical address of a location in a buffer which receives PEBS records. The first control register is further programmed or otherwise configured to store an indication of a size of the buffer. A second control register of the processor stores a physical address of a location in the buffer were a next PEBS record is to be stored. In another embodiment, the processor further comprises multiple control registers which variously configure PEBS generation on a per-counter basis.

METHOD AND SYSTEM FOR HARDWARE-ASSISTED PRE-EXECUTION
20230315471 · 2023-10-05 ·

One aspect provides a system for hardware-assisted pre-execution. During operation, the system determines a pre-execution code region comprising one or more instructions. The system increments a global counter upon initiating the one or more instructions. The system issues a first instruction, which involves setting, in a first entry for the first instruction in a data structure, a first prefetch region identifier with a current value of the global counter. Responsive to a head pointer of the data structure reaching the first entry, the system: determines, based on a non-zero value for the first prefetch region identifier, that the first entry is not available to be allocated; and advances the head pointer to a next entry in the data structure, which renders a load associated with the first entry as a non-blocking load. The system resets the global counter upon completing the one or more instructions.

Cache-based communication for trusted execution environments
11775360 · 2023-10-03 · ·

A method executes inter-enclave communication via cache memory of a processor. The method includes: instantiating a first enclave such that it is configured to execute a first communication thread, which is configured to read/write data to the cache memory; instantiating a second enclave such that it is configured to execute a second communication thread, which is configured to read/write data to cache memory; executing, by the first enclave, the first communication thread to send message data to the second enclave, executing the first communication thread comprising writing the message data to the cache memory; and executing, by the second enclave, the second communication thread to receive the message data. Executing the second communication thread can include: monitoring the cache memory to determine whether the data message is being sent; and based upon determining the data message is being sent, reading from the cache memory to receive the data message.

Delivering immediate values by using program counter (PC)-relative load instructions to fetch literal data in processor-based devices

Delivering immediate values by using program counter (PC)-relative load instructions to fetch literal data in processor-based devices is disclosed. In this regard, a processing element (PE) of a processor-based device provides an execution pipeline circuit that comprises an instruction processing portion and a data access portion. Using a literal data access logic circuit, the PE detects a PC-relative load instruction within a fetch window that includes multiple fetched instructions. The PE determines that the PC-relative load instruction can be serviced using literal data that is available to the instruction processing portion of the execution pipeline circuit (e.g., located within the fetch window containing the PC-relative load instruction, or stored in a literal pool buffer), The PE then retrieves the literal data within the instruction processing portion of the execution pipeline circuit, and executes the PC-relative load instruction using the literal data.

Methods and apparatus for storing a copy of a current fetched instruction when a miss threshold is exceeded until a refill threshold is reached

Aspects of the present disclosure relate an apparatus comprising fetch circuitry and instruction storage circuitry. The fetch circuitry is to fetch instructions for execution by execution circuitry. The instruction storage circuitry is to store temporary copies of fetched instructions. The fetch circuitry is configured to preferentially fetch instructions from the instruction storage circuitry. The instruction storage circuitry is configured to, responsive to a storage condition being met, begin storing copies of consecutive fetched instructions, the storage condition indicating a utility of a current fetched instruction; and to, responsive to determining that a number of said stored consecutive instructions has reached a storage threshold, cease storing copies of subsequent fetched instructions.

DEBUGGING OF ACCELERATOR CIRCUIT FOR MATHEMATICAL OPERATIONS USING PACKET LIMIT BREAKPOINT
20230281106 · 2023-09-07 ·

Embodiments of the present disclosure relate to debugging of an accelerator circuit using a packet limit breakpoint. A vector circuit reads a subset of instruction packets from an instruction memory and receives a portion of input data from a data memory corresponding to the subset of instruction packets. The vector circuit executes a set of vector operations in accordance with multiple instruction packets from the subset using data from the received portion of input data identified in the multiple instruction packets to generate output data. A program counter control circuit coupled to the instruction memory triggers a breakpoint in a program stored in the instruction memory causing the accelerator circuit to stop executing remaining instruction packets in the program following the multiple instruction packets responsive to a number of instruction packets executed in the program from a time instant of an event reaching a predetermined number.

Method and system for managing memory leaks in a linear memory model

A method for managing memory leaks in a memory device includes grouping, by a garbage collection system, a plurality of similar memory allocations of the memory device into one or more Unique Fixed Identifiers (UFIs); identifying, by the garbage collection system, one of the one or more UFIs having a highest accumulated memory size and adding each of the plurality of memory allocations in the identified one of the one or more UFIs into a Potential Leak Candidate List (PLCL); identifying, by the garbage collection system, the memory leaks in the memory device by identifying unreferenced memory addresses associated with the plurality of memory allocations in the PLCL; and releasing, by the garbage collection system, the identified unreferenced memory addresses associated with the plurality of memory allocations corresponding to the memory leaks into the memory device.

Checker Cores for Fault Tolerant Processing
20230153203 · 2023-05-18 ·

Systems and methods are disclosed for checker cores for fault tolerant processing. For example, an integrated circuit (e.g., a processor) for executing instructions includes a processor core configured to execute instructions of an instruction set; an outer memory system configured to store instructions and data; and a checker core configured to receive committed instruction packets from the processor core and check the committed instruction packets for errors, wherein the checker core is configured to utilize a memory pathway of the processor core to access the outer memory system by receiving instructions and data read from the outer memory system as portions of committed instruction packets from the processor core. For example, data flow from the processor core to the checker core may be limited to committed instruction packets received via dedicated a wire bundle.

Branch stall elimination in pipelined microprocessors
11650821 · 2023-05-16 · ·

A system can include a microprocessor having a prefetch queue including a plurality of slots configured to store program counter values (PCVs) and instructions, a pipeline configured to receive instructions from the prefetch queue, and a select circuit coupled to the prefetch queue. The select circuit may selectively freeze a first slot of the plurality of slots and selectively output a frozen PCV and a frozen instruction from the first slot while frozen. The microprocessor can include write logic coupled to the prefetch queue and a comparator circuit coupled to the prefetch queue and the select circuit. The write logic may load data into unfrozen slots of the prefetch queue. The comparator circuit may compare a target PCV with the frozen PCV to determine a match. The select circuit indicates, to the pipeline, whether the frozen instruction is valid based on the comparing.

SOFTWARE-DIRECTED REGISTER FILE SHARING
20230144553 · 2023-05-11 ·

A computing system including one or more processor and one or more memory that stores application code that configures the processor to execute an application. The system includes logic to identify high and low register utilization regions of the application code and insert register acquire instructions and register release instructions in the application code by the compiler, such that when executed by the processor, the application code borrows and returns registers to an inter-block register pool when execution enters a high and low register utilization region, respectively.