Patent classifications
G06F12/0215
Systems and methods for improving cache efficiency and utilization
- Altug Koker ,
- Joydeep Ray ,
- Ben Ashbaugh ,
- Jonathan Pearce ,
- Abhishek Appu ,
- Vasanth Ranganathan ,
- Lakshminarayanan Striramassarma ,
- Elmoustapha Ould-Ahmed-Vall ,
- Aravindh Anantaraman ,
- Valentin Andrei ,
- Nicolas Galoppo von Borries ,
- Varghese George ,
- Yoav Harel ,
- Arthur Hunter, JR. ,
- Brent Insko ,
- Scott Janus ,
- Pattabhiraman K ,
- Mike Macpherson ,
- Subramaniam Maiyuran ,
- Marian Alin Petre ,
- Murali Ramadoss ,
- Shailesh Shah ,
- Kamal Sinha ,
- Prasoonkumar Surti ,
- Vikranth Vemulapalli
Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
Methods and apparatus to facilitate read-modify-write support in a coherent victim cache with parallel data paths
Methods, apparatus, systems and articles of manufacture are disclosed facilitate read-modify-write support in a coherent victim cache with parallel data paths. An example apparatus includes a random-access memory configured to be coupled to a central processing unit via a first interface and a second interface, the random-access memory configured to obtain a read request indicating a first address to read via a snoop interface, an address encoder coupled to the random-access memory, the address encoder to, when the random-access memory indicates a hit of the read request, generate a second address corresponding to a victim cache based on the first address, and a multiplexer coupled to the victim cache to transmit a response including data obtained from the second address of the victim cache.
Intelligent write-amplification reduction for data storage devices configured on autonomous vehicles
Systems, methods and apparatus of intelligent write-amplification reduction for data storage devices configured on autonomous vehicles. For example, a data storage device of a vehicle includes: one or more storage media components; a controller configured to store data into and retrieve data from the one or more storage media components according to commands received in the data storage device; an address map configured to map between: logical addresses specified in the commands received in the data storage device, and physical addresses of memory cells in the one or more storage media components; and an artificial neural network configured to receive, as input and as a function of time, operating parameters indicative a data access pattern, and generate, based on the input, a prediction to determine an optimized data placement scheme. The controller is configured to adjust the address map according to the optimized data placement scheme.
Cached result use through quantum gate rewrite
Techniques facilitating cached result use through quantum gate rewrite are provided. In one example, a computer-implemented method comprises converting, by a device operatively coupled to a processor, an input quantum circuit to a normalized form, resulting in a normalized quantum circuit; detecting, by the device, a match between the normalized quantum circuit and a cached quantum circuit among a set of cached quantum circuits; and providing, by the device, a cached run result of the cached quantum circuit based on the detecting.
STLB prefetching for a multi-dimension engine
A multi-dimension engine, connected to a system TLB, generates sequences of addresses to request page address translation prefetch requests in advance of predictable accesses to elements within data arrays. Prefetch requests are filtered to avoid redundant requests of translations to the same page. Prefetch requests run ahead of data accesses but are tethered to within a reasonable range. The number of pending prefetches are limited. A system TLB stores a number of translations, the number being relative to the dimensions of the range of elements accessed from within the data array.
Method and apparatus for flash memory storage mapping table maintenance via DRAM transfer
The method for maintaining a storage mapping table is introduced. After a total number of logical blocks, which exceeds a specified number, have been programmed into a storage unit, an access interface is directed to program a corresponding group of a storage mapping table of a DRAM (Dynamic Random Access Memory) into a first block of the storage unit according to a group number of an unsaved group queue. A group mapping table of the DRAM is updated to indicate that the latest data of the group of the storage mapping table is stored in which location in the storage unit. The group number is removed from the unsaved group queue.
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT
Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.
Device and method for processing convolution operation using kernel
Provided are a method and apparatus for processing a convolution operation in a neural network. The apparatus may include a memory, and a processor configured to read, from the memory, one of divided blocks of input data stored in a memory; generate an output block by performing the convolution operation on the one of the divided blocks with a kernel; generate a feature map by using the output block, and write the feature map to the memory.
Continuous page read for memory
Subject matter disclosed herein relates to techniques to read memory in a continuous fashion.
Methods and apparatus to facilitate an atomic operation and/or a histogram operation in cache pipeline
Methods, apparatus, systems and articles of manufacture to facilitate an atomic operation and/or a histogram operation in cache pipeline are disclosed. An example system includes a cache storage coupled to an arithmetic component; and a cache controller coupled to the cache storage, wherein the cache controller is operable to: receive a memory operation that specifies a set of data; retrieve the set of data from the cache storage; utilize the arithmetic component to determine a set of counts of respective values in the set of data; generate a vector representing the set of counts; and provide the vector.