G06F9/3816

DATAFLOW-BASED GENERAL-PURPOSE PROCESSOR ARCHITECTURES
20230333852 · 2023-10-19 ·

A dataflow-based general-purpose processor architecture and its method are disclosed. A circuit for the dataflow-based general-purpose processor architecture includes multiple processing elements (PEs) corresponding to multiple assigned central processing unit (CPU) instructions in program order, a register file, and multiple feedforward register lanes configured to map each of the multiple assigned CPU instructions on the multiple PEs to the register file or another PE of the multiple PEs to construct a hardware datapath corresponding to a dataflow graph of the multiple assigned CPU instructions. Other aspects, embodiments, and features are also claimed and described.

SYSTEM FOR MANAGING A GROUP OF ROTATING REGISTERS DEFINED ARBITRARILY IN A PROCESSOR REGISTER FILE
20230315472 · 2023-10-05 ·

A processor core including an N-bit system memory interface; a register file comprising a plurality of general purpose registers of capacity less than N bits; a set of N-bit vector registers ; in its instruction set, a register manipulation instruction executable with the following parameters: a) a value defining in the set of vector registers a buffer area formed by a plurality of consecutive vector registers, and b) a reference to a first general purpose register , the first general purpose register containing an index identifying a vector register within the buffer area; and an execution unit configured to, upon execution of a register manipulation instruction, read or write, in one cycle, N bits in a vector register identified from the value defining the buffer area and the index contained in the first general purpose register).

DECENTRALIZED HOT CACHE LINE TRACKING FAIRNESS MECHANISM

Embodiments are for using a decentralized hot cache line tracking fairness mechanism. In response to receiving an incoming request to access a cache line, a determination is made to grant access to the cache line based on a requested state and a serviced state used for maintaining the cache line, a structure comprising the requested and serviced states. In response to the determination to grant access to the cache line, the requested state and the serviced state are transferred along with data of the cache line.

CASTOUT HANDLING IN A DISTRIBUTED CACHE TOPOLOGY

Castout handling in a distributed cache topology, including: detecting, by a first cache of a plurality of caches, a cache miss; providing, by the first cache to each other cache of the plurality of caches, a message comprising: data indicating a cache address corresponding to the cache miss; and data indicating a cache line to be evicted.

Coprocessor operation bundling

In an embodiment, a processor includes a buffer in an interface unit. The buffer may be used to accumulate coprocessor instructions to be transmitted to a coprocessor. In an embodiment, the processor issues the coprocessor instructions to the buffer when ready to be issued to the coprocessor. The interface unit may accumulate the coprocessor instructions in the buffer, generating a bundle of instructions. The bundle may be closed based on various predetermined conditions and then the bundle may be transmitted to the coprocessor. If a sequence of coprocessor instructions appears consecutively in a program, the rate at which the instructions are provided to the coprocessor (on average) at least matches the rate at which the coprocessor consumes the instructions, in an embodiment.

METHOD AND APPARATUS FOR USING A STORAGE SYSTEM AS MAIN MEMORY
20230153243 · 2023-05-18 ·

A data access system including a processor, multiple cache modules for the main memory, and a storage drive. The cache modules include a FLC controller and a main memory cache. The multiple cache modules function as main memory. The processor sends read/write requests (with physical address) to the cache module. The cache module includes two or more stages with each stage including a FLC controller and DRAM (with associated controller). If the first stage FLC module does not include the physical address, the request is forwarded to a second stage FLC module. If the second stage FLC module does not include the physical address, the request is forwarded to the storage drive, a partition reserved for main memory. The first stage FLC module has high speed, lower power operation while the second stage FLC is a low-cost implementation. Multiple FLC modules may connect to the processor in parallel.

WRITE CONTROL FOR READ-MODIFY-WRITE OPERATIONS IN CACHE MEMORY

In described examples, a processor system includes a processor core that generates memory write requests, and a cache memory with a memory controller having a memory pipeline. The cache memory has cache lines of length L. The cache memory has a minimum write length that is less than a cache line length of the cache memory. The memory pipeline determines whether the data payload includes a first chunk and ECC syndrome that correspond to a partial write and are writable by a first cache write operation, and a second chunk and ECC syndrome that correspond to a full write operation that can be performed separately from the first cache write operation. The memory pipeline performs an RMW operation to store the first chunk and ECC syndrome in the cache memory, and performs the full write operation to store the second chunk and ECC syndrome in the cache memory.

HYBRID ALLOCATION OF DATA LINES IN A STREAMING CACHE MEMORY
20230359560 · 2023-11-09 ·

Various embodiments include a system for managing cache memory in a computing system. The system includes a sectored cache memory that provides a mechanism for sharing sectors in a cache line among multiple cache line allocations. Traditionally, different cache line allocations are assigned to different cache lines in the cache memory. Further, cache line allocations may not use all of the sectors of the cache line, leading to low utilization of the cache memory. With the present techniques, multiple cache lines share the same cache line, leading to improved cache memory utilization relative to prior techniques. Further, sectors of cache allocations can be assigned to reduce data bank conflicts when accessing cache memory. Reducing such data bank conflicts can result in improved memory access performance, even when cache lines are shared with multiple allocations.

Prediction class determination

There is provided an apparatus, method and medium. The apparatus comprises processing circuitry to perform data processing in response to decoded instructions and prediction circuitry to generate a prediction of a number of iterations of a fetching process. The fetching process is used to control fetching of data or instructions to be used in processing operations that are predicted to be performed by the processing circuitry. The processing circuitry is configured to tolerate performing one or more unnecessary iterations of the fetching process following an over-prediction of the number of iterations and, for at least one prediction, to determine a class of a plurality of prediction classes, each of which corresponds to a range of numbers of iterations. The prediction circuitry is also arranged to signal a predetermined number of iterations associated with the class to the processing circuitry to trigger at least the predetermined number of iterations of the fetching process.

TREATING MULTIPLE CACHE LINES AS A MERGED CACHE LINE TO STORE MULTIPLE BLOCKS OF DATA

Apparatus, method and code for fabrication of the apparatus, the apparatus comprising a cache providing a plurality of cache lines, each cache line storing a block of data; cache access control circuitry, responsive to an access request, to determine whether a hit condition is present in the cache; and cache configuration control circuitry to set, in response to a merging trigger event, merge indication state identifying multiple cache lines to be treated as a merged cache line to store multiple blocks of data, wherein when the merge indication state indicates that the given cache line is part of the merged cache line, the cache access control circuitry is responsive to detecting the hit condition to allow access to any of the data blocks stored in the multiple cache lines forming the merged cache line.