G06F12/0806

MULTI-CORE PROCESSOR AND STORAGE DEVICE
20230195688 · 2023-06-22 ·

A multi-core processor includes a plurality of cores, a shared memory, a plurality of address allocators, and a bus. The shared memory has a message queue including a plurality of memory regions for transmitting messages between the plurality of cores. The plurality of address allocators are configured to, each time addresses in a predetermined range corresponding to a reference memory region among the plurality of memory regions are received from a corresponding core among the plurality of cores, control the plurality of memory regions to be accessed in sequence by applying an offset determined according to an access count of the reference memory region to the addresses in the predetermined range. The bus is configured to connect the plurality of cores, the shared memory, and the plurality of address allocators to one another.

ASYNCHRONOUS CACHE FLUSHING
20170357583 · 2017-12-14 ·

Proactive flush logic in a computing system is configured to perform a proactive flush operation to flush data from a first memory in a first computing device to a second memory in response to execution of a non-blocking flush instruction. Reactive flush logic in the computing system is configured to, in response to a memory request issued prior to completion of the proactive flush operation, interrupt the proactive flush operation and perform a reactive flush operation to flush requested data from the first memory to the second memory.

ASYNCHRONOUS CACHE FLUSHING
20170357583 · 2017-12-14 ·

Proactive flush logic in a computing system is configured to perform a proactive flush operation to flush data from a first memory in a first computing device to a second memory in response to execution of a non-blocking flush instruction. Reactive flush logic in the computing system is configured to, in response to a memory request issued prior to completion of the proactive flush operation, interrupt the proactive flush operation and perform a reactive flush operation to flush requested data from the first memory to the second memory.

Memory Coloring for Executing Operations in Concurrent Paths of a Graph Representing a Model
20230195627 · 2023-06-22 ·

An electronic device that handles memory accesses includes a memory and a processor that supports a plurality of streams. The processor acquires a graph that includes paths of operations in a set of operations for processing instances of data through a model, each path of operations including a separate sequence of operations from the set of operations that is to be executed using a respective stream from among the plurality of streams. The processor then identifies concurrent paths in the graph, the concurrent paths being paths of operations between split points at which two or more paths of operations diverge and merge points at which the two or more paths of operations merge. The processor next executes operations in each of the concurrent paths using a respective stream, the executing including using memory coloring for handling memory accesses in the memory for the operations in each concurrent path.

Memory Coloring for Executing Operations in Concurrent Paths of a Graph Representing a Model
20230195627 · 2023-06-22 ·

An electronic device that handles memory accesses includes a memory and a processor that supports a plurality of streams. The processor acquires a graph that includes paths of operations in a set of operations for processing instances of data through a model, each path of operations including a separate sequence of operations from the set of operations that is to be executed using a respective stream from among the plurality of streams. The processor then identifies concurrent paths in the graph, the concurrent paths being paths of operations between split points at which two or more paths of operations diverge and merge points at which the two or more paths of operations merge. The processor next executes operations in each of the concurrent paths using a respective stream, the executing including using memory coloring for handling memory accesses in the memory for the operations in each concurrent path.

VARIABLE DISPATCH WALK FOR SUCCESSIVE CACHE ACCESSES

A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.

VARIABLE DISPATCH WALK FOR SUCCESSIVE CACHE ACCESSES

A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.

Generation and use of memory access instruction order encodings

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Generation and use of memory access instruction order encodings

Apparatus and methods are disclosed for controlling execution of memory access instructions in a block-based processor architecture using a hardware structure that indicates a relative ordering of memory access instruction in an instruction block. In one example of the disclosed technology, a method of executing an instruction block having a plurality of memory load and/or memory store instructions includes selecting a next memory load or memory store instruction to execute based on dependencies encoded within the block, and on a store vector that stores data indicating which memory load and memory store instructions in the instruction block have executed. The store vector can be masked using a store mask. The store mask can be generated when decoding the instruction block, or copied from an instruction block header. Based on the encoded dependencies and the masked store vector, the next instruction can issue when its dependencies are available.

Restricted speculative execution mode to prevent observable side effects

Embodiments of methods and apparatuses for restricted speculative execution are disclosed. In an embodiment, a processor includes configuration storage, an execution circuit, and a controller. The configuration storage is to store an indicator to enable a restricted speculative execution mode of operation of the processor, wherein the processor is to restrict speculative execution when operating in restricted speculative execution mode. The execution circuit is to perform speculative execution. The controller to restrict speculative execution by the execution circuit when the restricted speculative execution mode is enabled.