G06F9/345

REDUCED MEMORY WRITE REQUIREMENTS IN A SYSTEM ON A CHIP USING AUTOMATIC STORE PREDICATION

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

METHOD AND APPARATUS FOR VECTOR SORTING USING VECTOR PERMUTATION LOGIC
20230037321 · 2023-02-09 ·

A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, generating a control input vector for vector permutation logic comprised in the processor based on values in lanes of the vector and a sort order for the vector indicated by the vector sort instruction and storing the control input vector in a storage location.

MODEL GENERATION DEVICE, IN-VEHICLE DEVICE, AND MODEL GENERATION METHOD
20230037499 · 2023-02-09 · ·

Provided are: a selection information acquiring unit to acquire selection information for identifying a target model to be generated from among a plurality of generable neural network models; a model identification unit to identify the target model on the basis of the selection information acquired by the selection information acquiring unit; a weight acquiring unit to acquire a weight of the target model identified by the model identification unit; and a model generation unit to generate the target model identified by the model identification unit on the basis of the weight acquired by the weight acquiring unit and a weight map in which structure information on a structure of each of the plurality of neural network models and information for mapping a weight in the structure are defined.

MODEL GENERATION DEVICE, IN-VEHICLE DEVICE, AND MODEL GENERATION METHOD
20230037499 · 2023-02-09 · ·

Provided are: a selection information acquiring unit to acquire selection information for identifying a target model to be generated from among a plurality of generable neural network models; a model identification unit to identify the target model on the basis of the selection information acquired by the selection information acquiring unit; a weight acquiring unit to acquire a weight of the target model identified by the model identification unit; and a model generation unit to generate the target model identified by the model identification unit on the basis of the weight acquired by the weight acquiring unit and a weight map in which structure information on a structure of each of the plurality of neural network models and information for mapping a weight in the structure are defined.

PROCESSOR AND CONTROL METHOD OF PROCESSOR

A processor includes: an address generating unit that, when an instruction decoded by a decoding unit is an instruction to execute arithmetic processing on a plurality of operand sets each including a plurality of operands that are objects of the arithmetic processing, in parallel a plurality of times, generates an address set corresponding to each of the operand sets of the arithmetic processing for each time, based on a certain address displacement with respect to the plurality of operands included in each of the operand sets; a plurality of instruction queues that hold the generated address sets corresponding to the respective operand sets, in correspondence to respective processing units; and a plurality of processing units that perform the arithmetic processing in parallel on the operand sets obtained based on the respective address sets outputted by the plurality of instruction queues.

PROCESSOR AND CONTROL METHOD OF PROCESSOR

A processor includes: an address generating unit that, when an instruction decoded by a decoding unit is an instruction to execute arithmetic processing on a plurality of operand sets each including a plurality of operands that are objects of the arithmetic processing, in parallel a plurality of times, generates an address set corresponding to each of the operand sets of the arithmetic processing for each time, based on a certain address displacement with respect to the plurality of operands included in each of the operand sets; a plurality of instruction queues that hold the generated address sets corresponding to the respective operand sets, in correspondence to respective processing units; and a plurality of processing units that perform the arithmetic processing in parallel on the operand sets obtained based on the respective address sets outputted by the plurality of instruction queues.

Streaming engine with multi dimensional circular addressing selectable at each dimension
11709779 · 2023-07-25 · ·

A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

Streaming engine with multi dimensional circular addressing selectable at each dimension
11709779 · 2023-07-25 · ·

A streaming engine employed in a digital data processor may specify a fixed read-only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.

STREAM REFERENCE REGISTER WITH DOUBLE VECTOR AND DUAL SINGLE VECTOR OPERATING MODES
20180011709 · 2018-01-11 ·

A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers.

DATA PROCESSING APPARATUS HAVING STREAMING ENGINE WITH READ AND READ/ADVANCE OPERAND CODING
20180011707 · 2018-01-11 ·

A streaming engine employed in a digital signal processor specified a fixed data stream. Once started the data stream is read only and cannot be written. Once fetched the data stream is stored in a first-in-first-out buffer for presentation to functional units in the fixed order. Data use by the functional unit is controlled using the input operand fields of the corresponding instruction. A read only operand coding supplies the data an input of the functional unit. A read/advance operand coding supplies the data and also advances the stream to the next sequential data elements. The read only operand coding permits reuse of data without requiring a register of the register file for temporary storage.