Patent classifications
G06F12/0886
SLOT/SUB-SLOT PREFETCH ARCHITECTURE FOR MULTIPLE MEMORY REQUESTORS
A prefetch unit generates a prefetch address in response to an address associated with a memory read request received from the first or second cache. The prefetch unit includes a prefetch buffer that is arranged to store the prefetch address in an address buffer of a selected slot of the prefetch buffer, where each slot of the prefetch unit includes a buffer for storing a prefetch address, and two sub-slots. Each sub-slot includes a data buffer for storing data that is prefetched using the prefetch address stored in the slot, and one of the two sub-slots of the slot is selected in response to a portion of the generated prefetch address. Subsequent hits on the prefetcher result in returning prefetched data to the requestor in response to a subsequent memory read request received after the initial received memory read request.
DATA READING METHOD BASED ON VARIABLE CACHE LINE
A data reading and writing method based on a variable length cache line. A lookup table stores cache line information of each request. When a read task arrives at the cache, the cache line information is obtained according to the request index, and the request is hit. The data in the cache is read and sent to the requester in multiple cycles, otherwise the request is not in the cache, some read requests are created and sent. The offset, tag and cache line size are recorded in the record of the lookup table, and the request is sent to the DRAM. Once all the data is returned and written to the cache, the corresponding record of the lookup table is set to be valid.
DATA READING METHOD BASED ON VARIABLE CACHE LINE
A data reading and writing method based on a variable length cache line. A lookup table stores cache line information of each request. When a read task arrives at the cache, the cache line information is obtained according to the request index, and the request is hit. The data in the cache is read and sent to the requester in multiple cycles, otherwise the request is not in the cache, some read requests are created and sent. The offset, tag and cache line size are recorded in the record of the lookup table, and the request is sent to the DRAM. Once all the data is returned and written to the cache, the corresponding record of the lookup table is set to be valid.
Reducing memory latency in graphics operations
Methods and apparatus relating to reducing memory latency in graphics operations are described. In an embodiment, uniform data is transferred from a buffer to a General Register File (GRF) of a processor based at least in part on information stored in a gather table. The uniform data comprises data that is uniform across a plurality of primitives in a graphics operation. Other embodiments are also disclosed and claimed.
Reducing memory latency in graphics operations
Methods and apparatus relating to reducing memory latency in graphics operations are described. In an embodiment, uniform data is transferred from a buffer to a General Register File (GRF) of a processor based at least in part on information stored in a gather table. The uniform data comprises data that is uniform across a plurality of primitives in a graphics operation. Other embodiments are also disclosed and claimed.
CACHE MEMORY, MEMORY SYSTEM INCLUDING THE SAME, AND EVICTION METHOD OF CACHE MEMORY
In a cache memory used for communication between a host and a memory, the cache memory may include a plurality of cache sets, each comprising: a valid bit; N dirty bits; a tag; and N data sets respectively corresponding to the N dirty bits and each including data of a data chunk size substantially identical to a data chunk size of the host, wherein a data chunk size of the memory is N times as large as the data chunk size of the host, where N is an integer greater than or equal to 2.
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO LOAD MULTIPLE DATA ELEMENTS TO DESTINATION STORAGE LOCATIONS OTHER THAN PACKED DATA REGISTERS
A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO LOAD MULTIPLE DATA ELEMENTS TO DESTINATION STORAGE LOCATIONS OTHER THAN PACKED DATA REGISTERS
A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
OVERLAPPING RANGES OF PAGES IN MEMORY SYSTEMS
Systems, methods, and apparatus including computer-readable mediums for managing memories by overlapping ranges of pages in nonvolatile memory systems are provided. An example memory system includes a memory controller coupled to a memory and configured to: determine a range of logical addresses associated with a command, search particular mapping tables including the range of logical addresses in mapping pages in the memory, determine whether a starting address of the range of logical addresses is in an overlapped range of first and second sequential mapping pages, the overlapped range including logical addresses of one or more mapping tables duplicated in the first and second mapping pages, determine which of the first and second mapping pages from which the particular mapping tables to be loaded based on a result of determining whether the starting address is in the overlapped range, and load the particular mapping tables from the determined mapping page.
DIRECT SWAP CACHING WITH ZERO LINE OPTIMIZATIONS
Systems and methods related to direct swap caching with zero line optimizations are described. A method for managing a system having a near memory and a far memory comprises receiving a request from a requestor to read a block of data that is either stored in the near memory or the far memory. The method includes analyzing a metadata portion associated with the block of data, the metadata portion comprising: both (1) information concerning whether the near memory contains the block of data or whether the far memory contains the block of data and (2) information concerning whether a data portion associated with the block of data is all zeros. The method further includes instead of retrieving the data portion from the far memory, synthesizing the data portion corresponding to the block of data to generate a synthesized data portion and transmitting the synthesized data portion to the requestor.