G06F2212/681

Converting a Stream of Data Using a Lookaside Buffer
20220283809 · 2022-09-08 ·

A stream of data is accessed from a memory system by an autonomous memory access engine, converted on the fly by the memory access engine, and then presented to a processor for data processing. A portion of a lookup table (LUT) containing converted data elements is preloaded into a lookaside buffer associated with the memory access engine. As the stream of data elements is fetched from the memory system each data element in the stream of data elements is replaced with a respective converted data element obtained from the LUT in the lookaside buffer according to a content of each data element to thereby form a stream of converted data elements. The stream of converted data elements is then propagated from the memory access engine to a data processor.

Cache Preload Operations Using Streaming Engine

A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.

Cache structure using a logical directory

Disclosed herein is a method for operating access to a cache memory via an effective address comprising a tag field and a cache line index field. The method comprises: splitting the tag field into a first group of bits and a second group of bits. The line index bits and the first group of bits are searched in the set directory. A set identifier is generated indicating the set containing the respective cache line of the effective address. The set identifier, the line index bits and the second group of bits are searched in the validation directory. In response to determining the presence of the cache line in the set based on the second searching, a hit signal is generated.

Control speculation in dataflow graphs
11385873 · 2022-07-12 · ·

Systems, apparatuses and methods may provide for technology that determines that a control loop is to be executed for an unspecified number of iterations and automatically forces the control loop to be executed for a fixed number of iterations in addition to the unspecified number of iterations, where execution of the control loop for the fixed number of iterations is conducted in parallel. In one example, the technology also removes one or more dataflow tokens associated with the execution of the control loop for the fixed number of iterations.

Apparatus and method for efficient process-based compartmentalization

An apparatus and method for efficient process-based compartmentalization. For example, one embodiment of a processor comprises: execution circuitry to execute instructions and process data; memory management circuitry coupled to the execution circuitry, the memory management circuitry to manage access to a system memory by a plurality of related processes using one or more process-specific translation structures and one or more shared translation structures to be shared by the related processes; and one or more control registers to store a process-specific base address pointer associated with a first process of the plurality of related processes and to store a shared base address pointer to identify the shared translation structures; wherein the memory management circuitry is to use the process-specific base address pointer in combination with a first linear address provided by the first process to walk the process-specific translation structures to identify any permissions and/or physical address associated with the first linear address, wherein if permissions are identified, the memory management circuitry is to use the permissions in place of any permissions specified in the shared translation structures.

Hardware cache coherency using physical address proxies

A cache memory subsystem includes a virtually-indexed virtually-tagged first-level data cache (L1D) and a physically-indexed physically-tagged second-level set-associative cache (L2). Each L2 entry is uniquely identified by a set index and a way number. The cache memory subsystem has an inclusive allocation policy. When a snoop request that specifies a physical memory line address hits in an entry in the L2, the cache memory subsystem forwards the snoop request to the L1D but substitutes a physical address proxy (PAP) for the physical memory line address. The PAP specifies the way number and the set index of the hit entry in the L2. To process the forwarded snoop request, the L1D uses N bits of the PAP to select S sets and uses the remaining PAP bits (diminutive PAP) for comparison with a diminutive PAP stored in each valid entry of the S selected sets.

Methods and systems for translating virtual addresses in a virtual memory based system

An information handling system and method for translating virtual addresses to real addresses including a processor for processing data; memory devices for storing the data; a Page Walk Cache (PWC) for storing page directory entries; and a memory controller configured to control accesses to the memory devices. The processor in an embodiment is configured to send to the memory controller a page directory base and a plurality of memory offsets; and receive from the memory controller and store in the PWC at least one of the page directory entries. The memory controller is configured to: combine a first level page directory entry with a second level memory offset; read from memory a second page directory entry using the first level page directory entry and the second level memory offset; and send to the processor at least one of the page directory entries and a page table entry (PTE).

Backward compatibility by restriction of hardware resources

A new device executing an application on a new central processing unit (CPU), determines whether the application is for a legacy device having a legacy CPU. When the new device determines that the application is for the legacy device, it executes the application on the new CPU with selected available resources of the new device restricted to approximate or match a processing behavior of the legacy CPU.

Cache preload operations using streaming engine

A stream of data is accessed from a memory system using a stream of addresses generated in a first mode of operating a streaming engine in response to executing a first stream instruction. A block cache preload operation is performed on a cache in the memory using a block of addresses generated in a second mode of operating the streaming engine in response to executing a second stream instruction.

Dynamic remapping of virtual address ranges using remap vector

For one or more stages of execution of a software application at a first processor, a remap vector of a second processor is reconfigured to represent a dynamic mapping of virtual address groups to physical address groups for that stage. Each bit position of the remap vector is configured to store a value indicating whether a corresponding virtual address group is actively mapped to a corresponding physical address group. Address translation operations issued during a stage of execution of the software application are selectively processed based on the configuration of the remap vector for that stage, with the particular value at the bit position of the remap vector associated with the corresponding virtual address group controlling whether processing of the address translation operation is continued to obtain a virtual-to-physical address translation sought by the address translation operation or processing of the address translation operation is ceased and a fault is issued.