Patent classifications
G06F15/8076
OFFLOADING PROCESSING TASKS TO DECOUPLED ACCELERATORS FOR INCREASING PERFORMANCE IN A SYSTEM ON A CHIP
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
Quick clearing of registers
A method of clearing of registers and logic designs with AND and OR logics to propagate the zero values provided to write enable signal buses upon the execution of clear instruction of more than one registers, allowing more than one architecturally visible registers to be cleared with one signal instruction regardless of the values of data buses.
Monolithic vector processor configured to operate on variable length vectors using a vector length register
A computer processor comprising a vector unit is disclosed. The vector unit may comprise a vector register file comprising at least one register to hold a varying number of elements. The vector unit may further comprise a vector length register file comprising at least one register to specify the number of operations of a vector instruction to be performed on the varying number of elements in the at least one register of the vector register file. The computer processor may be implemented as a monolithic integrated circuit.
True/false vector index registers and methods of populating thereof
Disclosed herein are vector index registers for storing or loading indexes of true and/or false results of comparison operations in vector processors. Each of the vector index registers store multiple addresses for accessing multiple positions in operand vectors.
Method and Apparatus for Desynchronizing Execution in a Vector Processor
In one implementation a vector processor unit having preload registers for at least some of vector length, vector constant, vector address, and vector stride. Each preload register has an input and an output. All the preload register inputs are coupled to receive a new vector parameters. Each of the preload registers' outputs are coupled to a first input of a respective multiplexor, and the second input of all the respective multiplexors are coupled to the new vector parameters.
Data Processing Method and Device, and Storage Medium
A data processing method and device, and a storage medium are provided. A processor of the data processing device comprises an index register group. Said method comprises: obtaining a first index value of each of at least one index register according to instruction codes, and determining the at least one index register according to the first index value, the instruction codes being generated by a compiler, and the at least one index register being at least one register in the index register group; and acquiring a first content stored in each of the at least one index register, and determining a first vector register according to the first content; and executing the instruction codes by accessing the first vector register.
TRUE/FALSE VECTOR INDEX REGISTERS AND METHODS OF POPULATING THEREOF
Disclosed herein are vector index registers for storing or loading indexes of true and/or false results of comparison operations in vector processors. Each of the vector index registers store multiple addresses for accessing multiple positions in operand vectors.
Apparatus and method of vector unit sharing
A reconfigurable vector processor is described that allows the size of its vector units to be changed in order to process vectors of different sizes. The reconfigurable vector processor comprises a plurality of processor units. Each of the processor units comprises a control unit for decoding instructions and generating control signals, a scalar unit for processing instructions on scalar data, and a vector unit for processing instructions on vector data under control of control signals. The reconfigurable vector processor architecture also comprises a vector control selector for selectively providing control signals generated by one processor unit of the plurality of processor units to the vector unit of a different processor unit of the plurality of processor units.
NEURAL NETWORK PROCESSOR, CHIP AND ELECTRONIC DEVICE
The embodiments of the present disclosure provide a neural network processor, a chip and an electronic device. The neural network processor includes a scalar processing unit, a general register and a data migration engine. The scalar processing unit includes a plurality of scalar registers. The data migration engine is coupled to the general register and at least one of the scalar registers. The data migration engine is configured to cause data interaction between the scalar processing unit and the general register.
System and method of loading and replication of sub-vector values
A processor includes a vector register configured to load data responsive to a special purpose load instruction. The processor also includes circuitry configured to replicate a selected sub-vector value from the vector register.