Patent classifications
G06F12/0207
BLOCK OPERATIONS FOR AN IMAGE PROCESSOR HAVING A TWO-DIMENSIONAL EXECUTION LANE ARRAY AND A TWO-DIMENSIONAL SHIFT REGISTER
A method is described that includes, on an image processor having a two dimensional execution lane array and a two dimensional shift register array, repeatedly shifting first content of multiple rows or columns of the two dimensional shift register array and repeatedly executing at least one instruction between shifts that operates on the shifted first content and/or second content that is resident in respective locations of the two dimensional shift register array that the shifted first content has been shifted into.
Apparatuses and methods for organizing data in a memory device
Systems, apparatuses, and methods related to organizing data to correspond to a matrix at a memory device are described. Data can be organized by circuitry coupled to an array of memory cells prior to the processing resources executing instructions on the data. The organization of data may thus occur on a memory device, rather than at an external processor. A controller coupled to the array of memory cells may direct the circuitry to organize the data in a matrix configuration to prepare the data for processing by the processing resources. The circuitry may be or include a column decode circuitry that organizes the data based on a command from the host associated with the processing resource. For example, data read in a prefetch operation may be selected to correspond to rows or columns of a matrix configuration.
APPARATUS AND METHOD FOR DYNAMICALLY MANAGING MEMORY
The present invention relates to a dynamic memory management method which includes generating an N-dimensional memory address space in which coordinates are in a range of N natural numbers, the sum of which is the number of bits; and mapping a predetermined linear memory address region to an address region in the N-dimensional memory address space.
SYSTEMS, METHODS, AND APPARATUSES FOR TILE LOAD
Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.
DETERMINISTIC MEMORY FOR TENSOR STREAMING PROCESSORS
Embodiments are directed to a deterministic streaming system with one or more deterministic streaming processors each having an array of processing elements and a first deterministic memory coupled to the processing elements. The deterministic streaming system further includes a second deterministic memory with multiple data banks having a global memory address space, and a controller. The controller initiates retrieval of first data from the data banks of the second deterministic memory as a first plurality of streams, each stream of the first plurality of streams streaming toward a respective group of processing elements of the array of processing elements. The controller further initiates writing of second data to the data banks of the second deterministic memory as a second plurality of streams, each stream of the second plurality of streams streaming from the respective group of processing elements toward a respective data bank of the second deterministic memory.
Banked memory architecture for multiple parallel datapath channels in an accelerator
The present disclosure relates to devices and methods for using a banked memory structure with accelerators. The devices and methods may segment and isolate dataflows in datapath and memory of the accelerator. The devices and methods may provide each data channel with its own register memory bank. The devices and methods may use a memory address decoder to place the local variables in the proper memory bank.
Dual address encoding for logical-to-physical mapping
Methods, systems, and devices for dual address encoding for logical-to-physical mapping are described. A memory device may identify a first physical address corresponding to a first logical block address generated by a host device and a second physical address corresponding to a second (consecutive) logical block address generated by a host device. The memory device may store the first physical address and second physical address in a single entry of a logical-to-physical mapping table that corresponds to the first logical block address. The memory device may transmit the logical-to-physical table to the host device for storage at the host device. The host device may subsequently transmit a single read command to the memory device that includes the first physical address and the second physical address based on the logical-to-physical table.
Systems and methods for improving computational speed of planning by tracking dependencies in hypercubes
A system for updating a hypercube includes an interface and a processor. The interface is configured to receive an indication to update a cell of the hypercube. The processor is configured to determine a primary dimension value associated with the cell; determine a group of dependencies based at least in part on the primary dimension value, wherein a dependency of the group of dependencies comprises one or more primary dimension values and a pattern; for the dependency of the group of dependencies, determine a set of source locations based at least in part on the one or more primary dimension values and the pattern; and mark the set of source locations as invalid.
Performing multiple point table lookups in a single cycle in a system on chip
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
SYSTEMS AND METHODS FOR ENERGY-EFFICIENT DATA PROCESSING
An energy-efficient sequencer comprising inline multipliers and adders causes a read source that contains matching values to output an enable signal to enable a data item prior to using a multiplier to multiply the data item with a weight to obtain a product for use in a matrix-multiplication in hardware. A second enable signal causes the output to be written to the data item.