Patent classifications
G06F12/0879
COMPUTING ARCHITECTURE
Computing architecture comprises an off-chip memory, an on-chip cache unit, a prefetching unit, a global scheduler, a transmitting unit, a pre-recombination network, a post-recombination network, a main computing array, a write-back cache unit, a data dependence controller and an auxiliary computing array. The architecture reads data tiles into an on-chip cache in a prefetching mode, and performs computing according to the data tiles; in the computing process of the tiles, a tile exchange network is adopted to recombine a data structure, and a data dependence module is arranged to process a data dependence relationship possibly existing between different tiles. According to the computing architecture, the data utilization rate can be increased, the data processing flexibility is improved, and therefore Cache Miss is reduced, and the memory bandwidth pressure is reduced.
BANDWIDTH BOOSTED STACKED MEMORY
A high bandwidth memory system. In some embodiments, the system includes: a memory stack having a plurality of memory dies and eight 128-bit channels; and a logic die, the memory dies being stacked on, and connected to, the logic die; wherein the logic die may be configured to operate a first channel of the 128-bit channels in: a first mode, in which a first 64 bits operate in pseudo-channel mode, and a second 64 bits operate as two 32-bit fine-grain channels, or a second mode, in which the first 64 bits operate as two 32-bit fine-grain channels, and the second 64 bits operate as two 32-bit fine-grain channels.
BANDWIDTH BOOSTED STACKED MEMORY
A high bandwidth memory system. In some embodiments, the system includes: a memory stack having a plurality of memory dies and eight 128-bit channels; and a logic die, the memory dies being stacked on, and connected to, the logic die; wherein the logic die may be configured to operate a first channel of the 128-bit channels in: a first mode, in which a first 64 bits operate in pseudo-channel mode, and a second 64 bits operate as two 32-bit fine-grain channels, or a second mode, in which the first 64 bits operate as two 32-bit fine-grain channels, and the second 64 bits operate as two 32-bit fine-grain channels.
MACHINE LEARNING TO IMPROVE CACHING EFFICIENCY IN A STORAGE SYSTEM
A system and method improve caching efficiency in a data storage system by performing machine learning processes on metadata relating to extents of data blocks, rather than individual blocks themselves. Thus, once the storage devices are divided into extents, various metadata regarding access to the blocks within each extent are aggregated, and per-extent features are extracted. These features are used to train a data regression model that is subsequently used to infer a most likely “hotness” value for each extent at a future time. These predicted values, which may be further classified as e.g. “hot”, “warm”, and “cold” using thresholds, are used to implement the cache replacement policy. Embodiments scale to large and multi-layered caches, and may avoid common caching problems like thrashing, by adjusting the extent size. Policy goal functions may be optimized by dynamically adjusting the classification thresholds.
METHOD AND DEVICE FOR RAPIDLY SEARCHING CACHE
A method and a device for rapidly searching a cache are provided. The method for rapidly searching a cache includes: translating a source identifier (SID) to a domain identifier (DID) according to an extended flag from the software by searching a context cache, wherein the extended flag indicates that a current context entry stored in the context cache is a normal context entry or an extended context entry.
METHOD AND DEVICE FOR RAPIDLY SEARCHING CACHE
A method and a device for rapidly searching a cache are provided. The method for rapidly searching a cache includes: translating a source identifier (SID) to a domain identifier (DID) according to an extended flag from the software by searching a context cache, wherein the extended flag indicates that a current context entry stored in the context cache is a normal context entry or an extended context entry.
Electronic device that accesses memory and data writing method
An electronic device capable of accessing a memory and a data writing method are provided. The electronic device includes a processing unit, a bus, and a memory controller. The processing unit includes a bus interface control circuit, and the processing unit generates a first write command through the bus interface control circuit according to a memory access command. The memory access command contains a first memory address and a target value, and the first write command contains the first memory address and the target value. The bus is coupled to the bus interface control circuit and configured to generate a second write command according to the first write command. The second write command contains a second memory address and the target value. The memory controller is coupled to the bus and configured to write the target value into the memory according to the second memory address.
Processing data in memory using an FPGA
Processing data in memory using a field programmable gate array by reading a first portion of a data set to a burst block having a first data format, transforming a sub-portion of the first portion, to an element block having a second data format, processing the sub-portion yielding a first results set, transforming the first results set to the first data format of the burst block, and writing the first results set to the burst block.
Processing data in memory using an FPGA
Processing data in memory using a field programmable gate array by reading a first portion of a data set to a burst block having a first data format, transforming a sub-portion of the first portion, to an element block having a second data format, processing the sub-portion yielding a first results set, transforming the first results set to the first data format of the burst block, and writing the first results set to the burst block.
DRAM command streak efficiency management
A memory controller includes a command queue and an arbiter for selecting entries from the command queue for transmission to a DRAM. The arbiter transacts streaks of consecutive read commands and streaks of consecutive write commands. The arbiter transacts a streak for at least a minimum burst length based on a number of commands of a designated type available to be selected by the arbiter. Following the minimum burst length, the arbiter decides to start a new streak of commands of a different type based on a first set of one or more conditions indicating intra-burst efficiency.