G06F9/3816

Tag processing for external caches
11726920 · 2023-08-15 · ·

A device includes a cache memory and a memory controller coupled to the cache memory. The memory controller is configured to receive a first read request from a cache controller over an interconnect, the first read request comprising first tag data identifying a first cache line in the cache memory, and determine that the first read request comprises a tag read request. The memory controller is further configured to read second tag data corresponding to the tag read request from the cache memory, compare the second tag data read from the cache memory to the first tag data received from the cache controller with the first read request, and if the second tag data matches the first tag data, initiate an action with respect to the first cache line in the cache memory.

PIPELINED READ-MODIFY-WRITE OPERATIONS IN CACHE MEMORY

Providing memory bandwidth compression using compressed memory controllers (CMCs) in a central processing unit (CPU)-based system is disclosed. In this regard, in some aspects, a CMC is configured to receive a memory read request to a physical address in a system memory, and read a compression indicator (CI) for the physical address from a master directory and/or from error correcting code (ECC) bits of the physical address. Based on the CI, the CMC determines a number of memory blocks to be read for the memory read request, and reads the determined number of memory blocks. In some aspects, a CMC is configured to receive a memory write request to a physical address in the system memory, and generate a CI for write data based on a compression pattern of the write data. The CMC updates the master directory and/or the ECC bits of the physical address with the generated CI.

CIRCUIT, SYSTEM, AND METHOD FOR MATRIX DECIMATION

A method is described herein. The method generally includes fetching a set of data from a memory coupled to a memory controller. The method generally includes determining a first subset of data from the set of data. The method generally includes determining a second subset of data from the set of data. The method generally includes determining a first element from the set of data. The method generally includes providing a vector including the first subset, the first element, and the second subset, wherein each element of the first subset is disposed in one portion of the vector and each element of the second subset is disposed in another portion of the vector. The method generally includes storing the vector into a register of the memory controller.

WIDENING MEMORY ACCESS TO AN ALIGNED ADDRESS FOR UNALIGNED MEMORY OPERATIONS

Unaligned atomic memory operations on a processor using a load-store instruction set architecture (ISA) that requires aligned accesses are performed by widening the memory access to an aligned address by the next larger power of two (e.g., 4-byte access is widened to 8 bytes, and 8-byte access is widened to 16 bytes). Data processing operations supported by the load-store ISA including shift, rotate, and bitfield manipulation are utilized to modify only the bytes in the original unaligned address so that the atomic memory operations are aligned to the widened access address. The aligned atomic memory operations using the widened accesses avoid the faulting exceptions associated with unaligned access for most 4-byte and 8-byte accesses. Exception handling is performed in cases in which memory access spans a 16-byte boundary.

Writing zero data

Apparatuses, methods of operating apparatuses, interconnects for connecting apparatuses to one another, and methods of operating the interconnects are disclosed. A master apparatus can issue an individual all-zero-data write transaction specifying a data storage location to the interconnect, which conveys the individual all-zero-data write transaction to a target device which writes all-zero-data at the data storage location. No write data is conveyed with the individual all-zero-data write transaction, so that the individual all-zero-data write transaction may be used to clear the data storage location without adding to congestion of a write data channel in the interconnect.

Branch prediction throughput by skipping over cachelines without branches

According to one general aspect, an apparatus may include a branch prediction circuit configured to predict if a branch instruction will be taken or not. The apparatus may include a branch target buffer circuit configured to store a memory segment empty flag that indicates whether or not the memory segment after a target address includes at least one other branch instruction, wherein the memory segment empty flag was created during a commit stage of a prior occurrence of the branch instruction. The branch prediction circuit may be configured to skip over the memory segment if the memory segment empty flag indicates a lack of other branch instruction(s).

HARDWARE SECURE ELEMENT, RELATED PROCESSING SYSTEM, INTEGRATED CIRCUIT, AND DEVICE

A hardware secure element includes a processing unit and a receiver circuit configured to receive data comprising a command field and a parameter field adapted to contain a plurality of parameters. The hardware secure element also includes at least one hardware parameter check module configured to receive at an input a parameter to be processed selected from the plurality of parameters, and to process the parameter to be processed to verify whether the parameter has given characteristics. The hardware parameter check module has associated one or more look-up tables configured to receive at an input the command field and a parameter index identifying the parameter to be processed by the hardware parameter check module, and to determine for the command field and the parameter index a configuration data element.

PAGE MODIFICATION ENCODING AND CACHING
20210349828 · 2021-11-11 ·

Modifying a page stored in a non-volatile storage includes receiving one or more requests to modify data stored in the page with new data. One or more lines are identified in the page that include data to be modified by the one or more requests. The identified one or more lines correspond to one or more respective byte ranges each of a predetermined size in the page. Encoded data is created based on the new data and respective locations of the one or more identified lines in the page. The encoded data is cached, and at least a portion of the cached encoded data is used to rewrite the page in the non-volatile storage to include at least a portion of the new data.

DIRECT SWAP CACHING WITH ZERO LINE OPTIMIZATIONS

Systems and methods related to direct swap caching with zero line optimizations are described. A method for managing a system having a near memory and a far memory comprises receiving a request from a requestor to read a block of data that is either stored in the near memory or the far memory. The method includes analyzing a metadata portion associated with the block of data, the metadata portion comprising: both (1) information concerning whether the near memory contains the block of data or whether the far memory contains the block of data and (2) information concerning whether a data portion associated with the block of data is all zeros. The method further includes instead of retrieving the data portion from the far memory, synthesizing the data portion corresponding to the block of data to generate a synthesized data portion and transmitting the synthesized data portion to the requestor.

Widening memory access to an aligned address for unaligned memory operations

Unaligned atomic memory operations on a processor using a load-store instruction set architecture (ISA) that requires aligned accesses are performed by widening the memory access to an aligned address by the next larger power of two (e.g., 4-byte access is widened to 8 bytes, and 8-byte access is widened to 16 bytes). Data processing operations supported by the load-store ISA including shift, rotate, and bitfield manipulation are utilized to modify only the bytes in the original unaligned address so that the atomic memory operations are aligned to the widened access address. The aligned atomic memory operations using the widened accesses avoid the faulting exceptions associated with unaligned access for most 4-byte and 8-byte accesses. Exception handling is performed in cases in which memory access spans a 16-byte boundary.