Patent classifications
G06F15/8092
Apparatus and method of vector unit sharing
A reconfigurable vector processor is described that allows the size of its vector units to be changed in order to process vectors of different sizes. The reconfigurable vector processor comprises a plurality of processor units. Each of the processor units comprises a control unit for decoding instructions and generating control signals, a scalar unit for processing instructions on scalar data, and a vector unit for processing instructions on vector data under control of control signals. The reconfigurable vector processor architecture also comprises a vector control selector for selectively providing control signals generated by one processor unit of the plurality of processor units to the vector unit of a different processor unit of the plurality of processor units.
Reconfigurable Parallel Processing
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
System and method of loading and replication of sub-vector values
A processor includes a vector register configured to load data responsive to a special purpose load instruction. The processor also includes circuitry configured to replicate a selected sub-vector value from the vector register.
APPARATUS, SYSTEMS, AND METHODS FOR LOW POWER COMPUTATIONAL IMAGING
The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.
Matrix tiling to accelerate computing in redundant matrices
a Systems and methods are provided for matrix tiling to accelerate computing in redundant matrices. The method may include identifying unique submatrices in the matrix; loading values of elements of each unique submatrix into a respective one of the array processors; applying the vector to inputs of each of the array processors; and adding outputs of the array processors according to locations of the unique submatrices in the matrix.
METHOD FOR PERMUTING DIMENSIONS OF A MULTI-DIMENSIONAL TENSOR
A method performed by a processor for permuting dimensions of a multi-dimensional tensor is described. The multi-dimensional tensor contains an array of tensor values in three or more dimensions that are stored in a first storage unit. The array of tensor values is transferred from the first storage unit to a second storage unit by reading tensor values from the first storage that are arrayed along a first dimension of the multi-dimensional tensor and writing the corresponding tensor values to the second storage in locations corresponding to a second dimension of the multi-dimensional tensor. The dimensions of the multi-dimensional tensor may be further permuted by a programmable engine within the processor.
Apparatus, systems, and methods for low power computational imaging
The present application discloses a computing device that can provide a low-power, highly capable computing platform for computational imaging. The computing device can include one or more processing units, for example one or more vector processors and one or more hardware accelerators, an intelligent memory fabric, a peripheral device, and a power management module. The computing device can communicate with external devices, such as one or more image sensors, an accelerometer, a gyroscope, or any other suitable sensor devices.
Programmable Spatial Array for Matrix Decomposition
Programmable spatial array processing circuitry may be programmable to perform multiple different types of matrix decompositions. The programmable spatial array processing circuitry may include an array of processing elements. When programmed with a first instructions, the array performs a first type of matrix decomposition. When programmed with second instructions, the array performs a second type of matrix decomposition. Individual processing elements of the programmable spatial array processing circuitry may avoid having individual instruction memories. Instead, there may be an instruction memory that provides a portion of the first instructions or a portion of the second instructions sequentially to one processing element of a row of processing elements to sequentially propagate to other processing elements of the row of processing elements.
Reconfigurable parallel processing
Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
VECTOR PROCESSING UNIT
A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.