IPIQ

G06F15/8053

Flexible precision neural inference processing unit

11537859 · 2022-12-27 ·

International Business Machines Corporation

Neural inference chips are provided. A neural core of the neural inference chip comprises a vector-matrix multiplier; a vector processor; and an activation unit operatively coupled to the vector processor. The vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision.

Vector processing unit

11520581 · 2022-12-06 ·

Google Llc

A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.

True/false vector index registers and methods of populating thereof

11507374 · 2022-11-22 ·

Micron Technology, Inc.

Steven Jeffrey Wallach

Disclosed herein are vector index registers for storing or loading indexes of true and/or false results of comparison operations in vector processors. Each of the vector index registers store multiple addresses for accessing multiple positions in operand vectors.

PROCESSOR WITH TABLE LOOKUP UNIT

20230049454 · 2023-02-16 ·

A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop.

Using a vector processor to configure a direct memory access system for feature tracking operations in a system on a chip

11573795 · 2023-02-07 ·

Nvidia Corporation

In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

TRUE/FALSE VECTOR INDEX REGISTERS AND METHODS OF POPULATING THEREOF

20230077404 · 2023-03-16 ·

Steven Jeffrey Wallach

BUILT-IN SELF-TEST FOR A PROGRAMMABLE VISION ACCELERATOR OF A SYSTEM ON A CHIP

20230125397 · 2023-04-27 ·

Hardware accelerated anomaly detection using a min/max collector in a system on a chip

11636063 · 2023-04-25 ·

Nvidia Corporation

DATA PROCESSING METHOD AND DEVICE, AND RELATED PRODUCT

20230068827 · 2023-03-02 ·

The present disclosure relates to a data processing method and device, and related products. The product may include a control unit. The control unit may include an instruction caching unit, an instruction processing unit, and a storage queue unit. The instruction caching unit is configured to store a calculation instruction associated with an artificial neural network computation. The instruction processing unit may be configured to parse the calculation instruction to obtain a plurality of computation instructions. The storage queue unit may be configured to store an instruction queue, where the instruction queue may include a plurality of computation instructions or calculation instructions to be executed in a sequence of the queue. By adopting the above method, the present disclosure may improve a computation efficiency of the related products when performing a neural network model computation.

Dedicated vector sub-processor system

11630667 · 2023-04-18 ·

Advanced Micro Devices, Inc.

A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.

Patent classifications

G06F15/8053