Patent classifications
G06F12/0879
Tensor data distribution using grid direct-memory access (DMA) controller
In one embodiment, a method for tensor data distribution using a direct-memory access agent includes generating, by a first controller, source addresses indicating locations in a source memory where portions of a source tensor are stored. A second controller may generate destination addresses indicating locations in a destination memory where portions of a destination tensor are to be stored. The direct-memory access agent receives a source address generated by the first controller and a destination address generated by the second controller and determines a burst size. The direct-memory access agent may issue a read request comprising the source address and the burst size to read tensor data from the source memory and may store the tensor data into an alignment buffer. The direct-memory access agent then issues a write request comprising the destination address and the burst size to write data from the alignment buffer into the destination memory.
Tensor data distribution using grid direct-memory access (DMA) controller
In one embodiment, a method for tensor data distribution using a direct-memory access agent includes generating, by a first controller, source addresses indicating locations in a source memory where portions of a source tensor are stored. A second controller may generate destination addresses indicating locations in a destination memory where portions of a destination tensor are to be stored. The direct-memory access agent receives a source address generated by the first controller and a destination address generated by the second controller and determines a burst size. The direct-memory access agent may issue a read request comprising the source address and the burst size to read tensor data from the source memory and may store the tensor data into an alignment buffer. The direct-memory access agent then issues a write request comprising the destination address and the burst size to write data from the alignment buffer into the destination memory.
BULK MEMORY INITIALIZATION
The disclosure relates to technology for bulk initialization of memory in a computer system. The computer system comprises a processor core comprising a load store unit and a last level cache in communication with the processor core. The last level cache is configured to receive bulk store operations from the load store unit. Each bulk store operation includes a physical address in the memory to be initialized. The last level cache is configured to send multiple write transactions to the memory for each bulk store operation to perform a bulk initialization of the memory for each bulk store operation. The last level cache is configured to track status of the bulk store operations.
Processor with reduced interrupt latency
A processor with reduced interrupt latency is disclosed. An apparatus includes a processor core and a cache subsystem having a cache controller and a cache. The processor core is configured to submit, to the cache controller, requests for access to the cache, wherein a given request for access to the cache specifies whether the given request is abandonable or non-abandonable in an event of an interrupt request. In response to a particular interrupt request, the processor core may provide an indication to cause the cache controller to abandon requests for access to the cache identified as abandonable. After receiving an acknowledgement from the cache controller that the abandonable requests have been abandoned, the processor core may begin execution of an interrupt handler in order to service the interrupt request.
Processor with reduced interrupt latency
A processor with reduced interrupt latency is disclosed. An apparatus includes a processor core and a cache subsystem having a cache controller and a cache. The processor core is configured to submit, to the cache controller, requests for access to the cache, wherein a given request for access to the cache specifies whether the given request is abandonable or non-abandonable in an event of an interrupt request. In response to a particular interrupt request, the processor core may provide an indication to cause the cache controller to abandon requests for access to the cache identified as abandonable. After receiving an acknowledgement from the cache controller that the abandonable requests have been abandoned, the processor core may begin execution of an interrupt handler in order to service the interrupt request.
Adjusting characteristic of system based on profile
Various embodiments described herein provide for operation of a memory sub-system based on a profile (also referred to herein as an operational profile) that causes the memory sub-system to have a specific set of operational characteristics. Additionally, some embodiments can provide dynamic switching between profiles based on a set of conditions being satisfied, such as current time of day or detection of a particular data input/out (I/O) pattern with respect to the memory sub-system.
Adjusting characteristic of system based on profile
Various embodiments described herein provide for operation of a memory sub-system based on a profile (also referred to herein as an operational profile) that causes the memory sub-system to have a specific set of operational characteristics. Additionally, some embodiments can provide dynamic switching between profiles based on a set of conditions being satisfied, such as current time of day or detection of a particular data input/out (I/O) pattern with respect to the memory sub-system.
Machine learning to improve caching efficiency in a storage system
A system and method improve caching efficiency in a data storage system by performing machine learning processes on metadata relating to extents of data blocks, rather than individual blocks themselves. Thus, once the storage devices are divided into extents, various metadata regarding access to the blocks within each extent are aggregated, and per-extent features are extracted. These features are used to train a data regression model that is subsequently used to infer a most likely “hotness” value for each extent at a future time. These predicted values, which may be further classified as e.g. “hot”, “warm”, and “cold” using thresholds, are used to implement the cache replacement policy. Embodiments scale to large and multi-layered caches, and may avoid common caching problems like thrashing, by adjusting the extent size. Policy goal functions may be optimized by dynamically adjusting the classification thresholds.
DATA BURST QUEUE MANAGEMENT
Operations include establishing a queue storing a list of data burst commands to be communicated via a multiplexed interface coupled to the set of memory dies, communicating, during a first time period, a first data burst command in the queue to a first memory die of the set of memory dies via the multiplexed interface, and communicating, during a second time period, a second data burst command in the queue to a second memory die of the set of memory dies via the multiplexed interface, where a first latency associated with the first data burst command occurs during the second time period.
DATA BURST QUEUE MANAGEMENT
Operations include establishing a queue storing a list of data burst commands to be communicated via a multiplexed interface coupled to the set of memory dies, communicating, during a first time period, a first data burst command in the queue to a first memory die of the set of memory dies via the multiplexed interface, and communicating, during a second time period, a second data burst command in the queue to a second memory die of the set of memory dies via the multiplexed interface, where a first latency associated with the first data burst command occurs during the second time period.