Patent classifications
G06F12/0215
PERFORMING LOAD AND STORE OPERATIONS OF 2D ARRAYS IN A SINGLE CYCLE IN A SYSTEM ON A CHIP
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
APPARATUS AND METHOD FOR PERFORMING ADDRESS TRANSLATION
An apparatus, system, and method for address translation are provided. Physical address information corresponding to virtual addresses is prefetched and stored, where at least some sequences of the virtual addresses are in a predefined order. The physical address information is prefetched based on identification information provided by a data processing activity, comprising at least a segment identifier and a portion of a virtual address to be translated. The storage has segments of entries, wherein each segment stores physical address information which corresponds to virtual addresses in a predefined order. This predefined order means that it is not necessary to store virtual addresses in the storage. Storage capacity and response speed are therefore gained.
INFORMATION PROCESSING APPARATUS
An information processing device having a processor and memory, and including one or more accelerators and one or more storage devices, wherein: the information processing device has one network for connecting the processor, the accelerators, and the storage devices; the storage devices have an initialization interface for accepting an initialization instruction from the processor, and an I/O issuance interface for issuing an I/O command; and the processor notifies the accelerators of the address of the initialization interface or the address of the I/O issuance interface.
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.
METHODS AND APPARATUS TO FACILITATE READ-MODIFY-WRITE SUPPORT IN A COHERENT VICTIM CACHE WITH PARALLEL DATA PATHS
Methods, apparatus, systems and articles of manufacture are disclosed facilitate read-modify-write support in a coherent victim cache with parallel data paths. An example apparatus includes a random-access memory configured to be coupled to a central processing unit via a first interface and a second interface, the random-access memory configured to obtain a read request indicating a first address to read via a snoop interface, an address encoder coupled to the random-access memory, the address encoder to, when the random-access memory indicates a hit of the read request, generate a second address corresponding to a victim cache based on the first address, and a multiplexer coupled to the victim cache to transmit a response including data obtained from the second address of the victim cache.
AGGRESSIVE WRITE FLUSH SCHEME FOR A VICTIM CACHE
A caching system including a first sub-cache and a second sub-cache in parallel with the first sub-cache, wherein the second sub-cache includes: line type bits configured to store an indication that a corresponding cache line of the second sub-cache is configured to store write-miss data, and an eviction controller configured to evict a cache line of the second sub-cache storing write-miss data based on an indication that the cache line has been fully written.
NONVOLATILE MEMORY DEVICE INCLUDING COMBINED SENSING NODE AND CACHE READ METHOD THEREOF
A cache read method of a nonvolatile memory device including a plurality of page buffer units and cache latches, each page buffer units having a sensing latch and a sensing node line is provided. The method comprises performing a first on-chip valley search (OVS) read on a selected memory cell using a first sensing node line and a first sensing latch of a first page buffer unit of the plurality of page buffer units; storing first data sensed from the selected memory cell in the first sensing latch, the first data based on a result of the first OVS read; dumping the first data to sensing node lines of at least one page buffer unit, excluding the first page buffer unit, from among the plurality of page buffer units; and performing a second OVS read on the selected memory cell using the first sensing latch.
Apparatuses and methods for cache operations
The present disclosure includes apparatuses and methods for cache operations. An example apparatus includes a memory device including a plurality of subarrays of memory cells, where the plurality of subarrays includes a first subset of the respective plurality of subarrays and a second subset of the respective plurality of subarrays. The memory device includes sensing circuitry coupled to the first subset, the sensing circuitry including a sense amplifier and a compute component. The first subset is configured as a cache to perform operations on data moved from the second subset. The apparatus also includes a cache controller configured to direct a first movement of a data value from a subarray in the second subset to a subarray in the first subset.
Active random access memory
Systems and methods for processing commands at a random access memory. A series of commands are received to read data from the random access memory or to write data to the random access memory. The random access memory can process commands at a first rate when the series of commands matches a pattern, and at a second, slower, rate when the series of commands does not match the pattern. A determination is made as to whether the series of commands matches the pattern based on at least a current command and a prior command in the series of commands. A ready signal is asserted when said determining determines that the series of commands matches the pattern, where the random access memory is configured to receive and process commands faster than the second rate when the pattern is matched and the ready signal is asserted over a period of multiple commands.
Victim cache that supports draining write-miss entries
A caching system including a first sub-cache and a second sub-cache in parallel with the first sub-cache, wherein the second sub-cache includes a set of cache lines, line type bits configured to store an indication that a corresponding cache line of the set of cache lines is configured to store write-miss data, and an eviction controller configured to flush stored write-miss data based on the line type bits.