G06F9/32

STATISTIC BASED CACHE PRE-FETCHER

The disclosure relates to technology for pre-fetching data. An apparatus comprises a processor core, pre-fetch logic, and a memory hierarchy. The pre-fetch logic is configured to generate cache pre-fetch requests for a program instruction identified by a program counter. The pre-fetch logic is configured to track one or more statistics with respect to the cache pre-fetch requests. The pre-fetch logic is configured to link the one or more statistics with the program counter. The pre-fetch logic is configured to determine a degree of the cache pre-fetch requests for the program instruction based on the one or more statistics. The memory hierarchy comprises main memory and a hierarchy of caches. The memory hierarchy further comprises a memory controller configured to pre-fetch memory blocks identified in the cache pre-fetch requests from a current level in the memory hierarchy into a higher level of the memory hierarchy.

NON-BLOCKING EXTERNAL DEVICE CALL
20230050383 · 2023-02-16 ·

Devices and techniques for non-blocking external device calls are described herein. Specifically, when a processor receives an instruction with a no-return indication from a thread for a device, the processor can increase a counter corresponding to the thread based on the no-return indication. The processor can then continue execution of the thread without waiting for a return value from the device. When a return value is received for the instruction, the processor can decrement the counter. While the counter is not zero, the processor prevents the thread from completing (exiting).

EFFICIENT PROCESSING OF NESTED LOOPS FOR COMPUTING DEVICE WITH MULTIPLE CONFIGURABLE PROCESSING ELEMENTS USING MULTIPLE SPOKE COUNTS
20230051544 · 2023-02-16 ·

Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.

UN-MARK INSTRUCTIONS ON AN INSTRUCTION MATCH TO REDUCE RESOURCES REQUIRED TO MATCH A GROUP OF INSTRUCTIONS

A method of performing instruction marking in a computer processor architecture includes fetching instructions from a memory unit by a fetching unit in the computer processor architecture. Instruction groups for marking are determined. Fetched instructions are matched to instruction groups for marking. The fetched instructions are marked. Some of the marked instructions are selectively unmarked. The marked and unmarked instructions are forwarded to a queue of instructions for processing in the computer processor architecture.

UN-MARK INSTRUCTIONS ON AN INSTRUCTION MATCH TO REDUCE RESOURCES REQUIRED TO MATCH A GROUP OF INSTRUCTIONS

A method of performing instruction marking in a computer processor architecture includes fetching instructions from a memory unit by a fetching unit in the computer processor architecture. Instruction groups for marking are determined. Fetched instructions are matched to instruction groups for marking. The fetched instructions are marked. Some of the marked instructions are selectively unmarked. The marked and unmarked instructions are forwarded to a queue of instructions for processing in the computer processor architecture.

Implicit integrity for cryptographic computing

In one embodiment, a processor includes a memory hierarchy and a core coupled to the memory hierarchy. The memory hierarchy stores encrypted data, and the core includes circuitry to access the encrypted data stored in the memory hierarchy, decrypt the encrypted data to yield decrypted data, perform an entropy test on the decrypted data, and update a processor state based on a result of the entropy test. The entropy test may include determining a number of data entities in the decrypted data whose values are equal to one another, determining a number of adjacent data entities in the decrypted data whose values are equal to one another, determining a number of data entities in the decrypted data whose values are equal to at least one special value from a set of special values, or determining a sum of n highest data entity value frequencies.

METHOD AND APPARATUS FOR VECTOR SORTING USING VECTOR PERMUTATION LOGIC
20230037321 · 2023-02-09 ·

A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.

Security enhancement in hierarchical protection domains

Methods and systems for allowing software components that operate at a specific exception level (e.g., EL-3 to EL-1, etc.) to repeatedly or continuously observe or evaluate the integrity of software components operating at a lower exception level (e.g., EL-2 to EL-0) to ensure that the software components have not been corrupted or compromised (e.g., subjected to malware, cyberattacks, etc.) include a computing device that identifies, by a component operating at a higher exception level (“HEL component”), at least one of a current vector base address (VBA), an exception raising instruction (ERI) address, or a control and system register value associated with a component operating at a lower exception level (“LEL component”). The computing device may perform a responsive action in response to determining that the current VBA, the ERT address, or control and system register value do not match the corresponding reference data.

Checker cores for fault tolerant processing
11556413 · 2023-01-17 · ·

Systems and methods are disclosed for checker cores for fault tolerant processing. For example, an integrated circuit (e.g., a processor) for executing instructions includes a processor core configured to execute instructions of an instruction set; an outer memory system configured to store instructions and data; and a checker core configured to receive committed instruction packets from the processor core and check the committed instruction packets for errors, wherein the checker core is configured to utilize a memory pathway of the processor core to access the outer memory system by receiving instructions and data read from the outer memory system as portions of committed instruction packets from the processor core. For example, data flow from the processor core to the checker core may be limited to committed instruction packets received via dedicated a wire bundle.

NON-TRANSITORY COMPUTER-READABLE MEDIUM, ANALYSIS DEVICE, AND ANALYSIS METHOD

The present disclosure relates to a non-transitory computer-readable recording medium storing an analysis program that causes a computer to execute a process. The process includes sampling an instruction address of one of instructions included in a program during execution of the program, identifying a first function that includes the sampled instruction address in an address range, rewriting mark information associated with the identified first function, identifying first information corresponding to the instruction address of the first function among a plurality of first information based on the rewritten mark information, identifying second information corresponding to the instruction address of the first function among a plurality of second information based on the rewritten mark information, storing the first information and the second information in a memory, and analyzing performance of the program based on the first information and the second information stored in the memory.