G06F9/30116

Logical register recovery within a processor

A computer system, processor, and method for processing information is disclosed that includes partitioning a logical register in the processor into a plurality of ranges of logical register entries based upon the logical register entry, assigning at least one recovery port of a history buffer to each range of logical register entries, initiating a flush recovery process for the processor, and directing history buffer entries to the assigned recovery port based upon the logical register entry associated with the history buffer entry.

INSTRUCTION HANDLING FOR ACCUMULATION OF REGISTER RESULTS IN A MICROPROCESSOR

A computer system, processor, and method for processing information is disclosed that includes at least one computer processor; a main register file associated with the at least one processor, the main register file having a plurality of entries for storing data, one or more write ports to write data to the main register file entries, and one or more read ports to read data from the main register file entries; one or more execution units including a dense math execution unit; and at least one accumulator register file having a plurality of entries for storing data. The results of the dense math execution unit in an aspect are written to the accumulator register file, preferably to the same accumulator register file entry multiple times, and the data from the accumulator register file is written to the main register file.

Circular shadow stack in audit mode
11861364 · 2024-01-02 ·

Performing shadow stack functionality for a thread in an audit mode includes initiating execution of a thread at the processor. Execution of the thread includes initiating execution of executable code of an application binary as part of the thread and enabling shadow stack functionality for the thread in an audit mode. Based at least on the execution of the thread in the audit mode, at least a portion of the shadow stack is enabled to be a circular stack. In response to determining that usage of the shadow stack has reached the defined threshold, one or more currently used entries of the shadow stack are overwritten, preventing the shadow stack from overflowing.

HYPERVISOR-BASED REDIRECTION OF SYSTEM CALLS AND INTERRUPT-BASED TASK OFFLOADING
20210026950 · 2021-01-28 ·

A security agent configured to initiate a security agent component as a hypervisor for a computing device is described herein. The security agent component may change a value of a processor configuration register, such as a Model Specific Register (MSR), in order to cause system calls to be redirected to the security agent, and may set an intercept for instructions for performing read operations on the processor configuration register so that a process, thread, or component different from the processor of the computing device may receive the original value of the processor configuration register instead of an updated value of the processor configuration register. The security agent component may also be configured to generate interrupts to offload task execution from the hypervisor to a security agent executing as a kernel-level component.

OPTIMIZING SOFTWARE-DIRECTED INSTRUCTION REPLICATION FOR GPU ERROR DETECTION

A thread execution method in a processor includes executing original instructions of a first thread in a first execution lane of the processor, and interleaving execution of duplicated instructions of the first thread with execution of original instructions of a second thread in a second execution lane of the processor.

PROCESSOR INSTRUCTION SUPPORT FOR MITIGATING CONTROLLED-CHANNEL AND CACHE-BASED SIDE-CHANNEL ATTACKS

Detailed herein are systems, apparatuses, and methods for a computer architecture with instruction set support to mitigate against page fault and/or cache-based side-channel attacks. In an embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, the instruction comprising a first field that indicates an instruction pointer to a user-level event handler; and an execution unit to execute the decoded instruction to, after a swap of an instruction pointer that indicates where an event occurred from a current instruction pointer register into a user-level event handler pointer register, push the instruction pointer that indicates where the event occurred onto call stack storage, and change a current instruction pointer in the current instruction pointer register to the instruction pointer to the user-level event handler.

Data plane error detection for ternary content-addressable memory (TCAM) of a forwarding element

A method of detecting error in a data plane of a packet forwarding element that includes a plurality of physical ternary content-addressable memories (TCAMs) is provided. The method configures a first set of physical TCAMs into a first logical TCAM. The method configures a second set of physical TCAMs into a second logical TCAM. The second logical TCAM includes the same number of physical TCAMs as the first logical TCAM. The method programs the first and second logical TCAMs to store a same set of data. The method requests a search for a particular content from the first and second logical TCAMs. The method generates an error signal when the first and second logical TCAMs do not produce a same search results.

SYSTEM AND ARCHITECTURE OF NEURAL NETWORK ACCELERATOR

A system includes a memory, a processor, and an accelerator circuit. The accelerator circuit includes an internal memory, an input circuit block, a filter circuit block, a post-processing circuit block, and an output circuit block to concurrently perform tasks of a neural network application assigned to the accelerator circuit by the processor.

Multi-processor system with configurable cache sub-domains and cross-die memory coherency

Disclosed embodiments relate to a system with configurable cache sub-domains and cross-die memory coherency. In one example, a system includes R racks, each rack housing N nodes, each node incorporating D dies, each die containing C cores and a die shadow tag, each core including P pipelines and a core shadow tag, each pipelines associated with a data cache and data cache tags and being either non-coherent or coherent and one of X coherency domains, wherein each pipeline, when needing to read a cache line, issues a read request to its associated data cache, then, if need be, issues a read request to its associated core-level cache, then, if need be, issues a read request to its associated die-level cache, then, if need be, issues a no-cache remote read request to a target die being mapped to hold the cache line.

MMU assisted address sanitizer

Providing memory management unit (MMU)-assisted address sanitizing in processor-based devices is disclosed. In one aspect, a processor-based device provides an MMU that includes a last-level page table that is configured to store page table entry (PTE) tokens for validating memory accesses, as well as fragment order indicators representing a count of page fragments for each memory page in the system memory. Upon receiving a memory access request comprising a pointer token and a virtual address of a memory fragment within a memory page of the system memory, the MMU uses the virtual address and the fragment order indicator of the PTE corresponding to the virtual address to retrieve a PTE token for the virtual address from the last-level page table, and determines whether the PTE token corresponds to the pointer token. If so, the MMU performs the memory access request using the pointer, and otherwise may raise an exception.