G06F12/0215

Write merging on stores with different tags

Techniques for caching data are provided that include receiving, by a caching system, a write memory command for a memory address, the write memory command associated with a first color tag, determining, by a first sub-cache of the caching system, that the memory address is not cached in the first sub-cache, determining, by second sub-cache of the caching system, that the memory address is not cached in the second sub-cache, storing first data associated with the first write memory command in a cache line of the second sub-cache, storing the first color tag in the second sub-cache, receiving a second write memory command for the cache line, the write memory command associated with a second color tag, merging the second color tag with the first color tag, storing the merged color tag, and evicting the cache line based on the merged color tag.

Methods and apparatus to facilitate atomic compare and swap in cache for a coherent level 1 data cache system

Methods, apparatus, systems and articles of manufacture to facilitate atomic compare and swap in cache for a coherent level 1 data cache system are disclosed. An example system includes a cache storage; a cache controller coupled to the cache storage wherein the cache controller is operable to: receive a memory operation that specifies a key, a memory address, and a first set of data; retrieve a second set of data corresponding to the memory address; compare the second set of data to the key; based on the second set of data corresponding to the key, cause the first set of data to be stored at the memory address; and based on the second set of data not corresponding to the key, complete the memory operation without causing the first set of data to be stored at the memory address.

SYSTEMS AND METHODS FOR REVISING PERMANENT ROM-BASED PROGRAMMING
20230281111 · 2023-09-07 ·

An application program stored in a ROM includes a function lookup data structure in which functions called by the application program have identifiers and memory addresses at which the function is located and can be executed. Upon startup, the function lookup data structure is copied to a RAM as a revised lookup data structure and is compared to a revision lookup data structure also written to that RAM or elsewhere. If the revision lookup data structure contains replacement functions having the same function identifiers but new memory addresses, these new memory addresses are written over the existing addresses in the revised lookup data structure for those replacement functions. The application program refers to the revised lookup data structure to find and execute the functions; thus the original application program on the ROM can continue to be used with revised functions.

METHODS AND APPARATUS TO FACILITATE READ-MODIFY-WRITE SUPPORT IN A VICTIM CACHE

Methods, apparatus, systems and articles of manufacture are disclosed to facilitate read-modify-write support in a victim cache. An example apparatus includes a first storage coupled to a controller, a second storage coupled to the controller and parallel coupled to the first storage, and a storage queue coupled to the first storage, the second storage, and to the controller, the storage queue to obtain a memory operation from the controller indicating an address and a first set of data, obtain a second set of data associated with the address from at least one of the first storage and the second storage, merge the first set of data and the second set of data to produce a third set of data, and provide the third set of data for writing to at least one of the first storage and the second storage.

Methods and apparatus to facilitate fully pipelined read-modify-write support in level 1 data cache using store queue and data forwarding

Methods, apparatus, systems and articles of manufacture are disclosed to facilitate fully pipelined read-modify-write support in level 1 data cache using store queue and data forwarding. An example apparatus includes a first storage, a second storage, a store queue coupled to the first storage and the second storage, the store queue operable to receive a first memory operation specifying a first set of data, process the first memory operation for storing the first set of data in at least one of the first storage and the second storage, receive a second memory operation, and prior to storing the first set of data in the at least one of the first storage and the second storage, feedback the first set of data for use in the second memory operation.

ATOMIC OPERATIONS AND HISTOGRAM OPERATIONS IN A CACHE PIPELINE

Methods, apparatus, systems and articles of manufacture to facilitate an atomic operation and/or a histogram operation in cache pipeline are disclosed An example system includes a cache storage coupled to an arithmetic component; and a cache controller coupled to the cache storage, wherein the cache controller is operable to: receive a memory operation that specifies a set of data; retrieve the set of data from the cache storage; utilize the arithmetic component to determine a set of counts of respective values in the set of data; generate a vector representing the set of counts; and provide the vector.

Methods and apparatus to facilitate read-modify-write support in a victim cache

Methods, apparatus, systems and articles of manufacture are disclosed to facilitate read-modify-write support in a victim cache. An example apparatus includes a first storage coupled to a controller, a second storage coupled to the controller and parallel coupled to the first storage, and a storage queue coupled to the first storage, the second storage, and to the controller, the storage queue to obtain a memory operation from the controller indicating an address and a first set of data, obtain a second set of data associated with the address from at least one of the first storage and the second storage, merge the first set of data and the second set of data to produce a third set of data, and provide the third set of data for writing to at least one of the first storage and the second storage.

Methods and apparatus for inflight data forwarding and invalidation of pending writes in store queue

Methods, apparatus, systems and articles of manufacture are disclosed to forward and invalidate inflight data in a store queue. An example apparatus includes a cache storage, a cache controller coupled to the cache storage and operable to receive a first memory operation, determine that the first memory operation corresponds to a read miss in the cache storage, determine a victim address in the cache storage to evict in response to the read miss, issue a read-invalidate command that specifies the victim address, compare the victim address to a set of addresses associated with a set of memory operations being processed by the cache controller, and in response to the victim address matching a first address of the set of addresses corresponding to a second memory operation of the set of memory operations, provide data associated with the second memory operation.

Memory device and operation method thereof

A memory device and an operation method thereof are provided. The memory device includes: a plurality of page buffers, storing an input data; a plurality of memory planes coupled to the page buffers, based on received addresses of the memory planes, a plurality of weights stored in the memory planes, the memory planes performing bit multiplication on the weights and the input data in the page buffers in parallel to generate a plurality of bit multiplication results in parallel, the bit multiplication results stored back to the page buffers; and at least one accumulation circuit coupled to the page buffers, for performing bit accumulation on the bit multiplication results of the memory planes in parallel or in sequential to generate a multiply-accumulate (MAC) operation result.

MEMORY-NETWORK PROCESSOR WITH PROGRAMMABLE OPTIMIZATIONS

Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. First and second address generator units may generate, based on different fields of the multi-part instruction, addresses from which to retrieve first and second data for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction. The execution units may perform operations using a single pipeline or multiple pipelines based on third and fourth fields of the multi-part instruction.