Patent classifications
G06F12/0886
System and methods for caching a small size I/O to improve caching device endurance
An apparatus comprising a memory and a controller. The memory may be configured to (i) implement a cache and (ii) store meta-data. The cache comprises one or more cache windows. Each of the one or more cache windows comprises a plurality of cache-lines configured to store information. Each of the cache-lines comprises a plurality of sub-cache lines. Each of the plurality of cache-lines and each of the plurality of sub-cache lines is associated with meta-data indicating one or more of a dirty state and an invalid state. The controller is connected to the memory and configured to (i) recognize sub-cache line boundaries and (ii) process the I/O requests in multiples of a size of said sub-cache lines to minimize cache-fills.
System and methods for caching a small size I/O to improve caching device endurance
An apparatus comprising a memory and a controller. The memory may be configured to (i) implement a cache and (ii) store meta-data. The cache comprises one or more cache windows. Each of the one or more cache windows comprises a plurality of cache-lines configured to store information. Each of the cache-lines comprises a plurality of sub-cache lines. Each of the plurality of cache-lines and each of the plurality of sub-cache lines is associated with meta-data indicating one or more of a dirty state and an invalid state. The controller is connected to the memory and configured to (i) recognize sub-cache line boundaries and (ii) process the I/O requests in multiples of a size of said sub-cache lines to minimize cache-fills.
COMPUTE ACCELERATED STACKED MEMORY
An integrated circuit that includes a set of one or more logic layers that are, when the integrated circuit is stacked in an assembly with the set of stacked memory devices, electrically coupled to a set of stacked memory devices. The set of one or more logic layers include a coupled chain of processing elements. The processing elements in the coupled chain may independently compute partial results as functions of data received, store partial results, and pass partial results directly to a next processing element in the coupled chain of processing elements. The processing elements in the chains may include interfaces that allow direct access to memory banks on one or more DRAMs in the stack. These interfaces may access DRAM memory banks via TSVs that are not used for global I/O. These interfaces allow the processing elements to have more direct access to the data in the DRAM.
Cryptographic system memory management
In one example, a system for managing encrypted memory comprises a processor to store a first MAC based on data stored in system memory in response to a write operation to the system memory. The processor can also detect a read operation corresponding to the data stored in the system memory, calculate a second MAC based on the data retrieved from the system memory, determine that the second MAC does not match the first MAC, and recalculate the second MAC with a correction operation, wherein the correction operation comprises an XOR operation based on the data retrieved from the system memory and a replacement value for a device of the system memory. Furthermore, the processor can decrypt the data stored in the system memory in response to detecting the recalculated second MAC matches the first MAC and transmit the decrypted data to cache thereby correcting memory errors.
Cryptographic system memory management
In one example, a system for managing encrypted memory comprises a processor to store a first MAC based on data stored in system memory in response to a write operation to the system memory. The processor can also detect a read operation corresponding to the data stored in the system memory, calculate a second MAC based on the data retrieved from the system memory, determine that the second MAC does not match the first MAC, and recalculate the second MAC with a correction operation, wherein the correction operation comprises an XOR operation based on the data retrieved from the system memory and a replacement value for a device of the system memory. Furthermore, the processor can decrypt the data stored in the system memory in response to detecting the recalculated second MAC matches the first MAC and transmit the decrypted data to cache thereby correcting memory errors.
PROCESSOR INSTRUCTIONS FOR DATA COMPRESSION AND DECOMPRESSION
A processor that includes compression instructions to compress multiple adjacent data blocks of uncompressed read-only data stored in memory into one compressed read-only data block and store the compressed read-only data block in multiple adjacent blocks in the memory is provided. During execution of an application to operate on the read-only data, one of the multiple adjacent blocks storing the compressed read-only block is read from memory, stored in a prefetch buffer and decompressed in the memory controller. In response to a subsequent request during execution of the application for an adjacent data block in the compressed read-only data block, the uncompressed adjacent block is read directly from the prefetch buffer.
APPLICATION PROGRAMMING INTERFACE FOR FINE GRAINED LOW LATENCY DECOMPRESSION WITHIN PROCESSOR CORE
Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.
APPLICATION PROGRAMMING INTERFACE FOR FINE GRAINED LOW LATENCY DECOMPRESSION WITHIN PROCESSOR CORE
Methods and apparatus relating to techniques for increasing per core memory bandwidth by using forget store operations are described. In an embodiment, a cache stores a buffer. Execution circuitry executes an instruction. The instruction causes one or more cachelines in the cache to be marked based on a start address for the buffer and a size of the buffer. A marked cacheline in the cache is to be prevented from being written back to memory. Other embodiments are also disclosed and claimed.
SPECULATIVE DECOMPRESSION WITHIN PROCESSOR CORE CACHES
Methods and apparatus relating to speculative decompression within processor core caches are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into a plurality of cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the plurality of cachelines of the cache of the processor core in response to the second micro operation. The decompression instruction causes the DE circuitry to perform an out-of-order decompression of the plurality of cachelines. Other embodiments are also disclosed and claimed.
COMPRESSED CACHE MEMORY WITH PARALLEL DECOMPRESS ON FAULT
An embodiment of an integrated circuit may comprise, coupled to a core, hardware decompression accelerators, a compressed cache, a processor communicatively coupled to the hardware decompression accelerators and the compressed cache, and memory communicatively coupled to the processor, wherein the memory stores microcode instructions that when executed by the processor causes the processor to load a page table entry in response to an indication of a page fault, determine if the page table entry indicates that the page is to be decompressed on fault, and, if so determined, modify a first decompression work descriptor at a first address and a second decompression work descriptor at a second address based on information from the page table entry, and generate a first enqueue transaction to the hardware decompression accelerators with the first address of the first decompression work descriptor and a second enqueue transaction to the hardware decompression accelerators with the second address of the second decompression work descriptor. Other embodiments are disclosed and claimed.