G06F15/8076

Vector processor with vector first and multiple lane configuration
10877925 · 2020-12-29 · ·

A vector processor with a vector first and multi-lane configuration. A vector operation for a vector processor can include a single vector or multiple vectors as input. Multiple lanes for the input can be used to accelerate the operation in parallel. And, a vector first configuration can enhance the multiple lanes by reducing the number of elements accessed in the lanes to perform the operation in parallel.

Apparatus and methods for generating dot product

Aspects for generating a dot product for two vectors in neural network are described herein. The aspects may include a controller unit configured to receive a vector load instruction that includes a first address of a first vector and a length of the first vector. The aspects may further include a direct memory access unit configured to retrieve the first vector from a storage device based on the first address of the first vector. Further still, the aspects may include a caching unit configured to store the first vector.

TRUE/FALSE VECTOR INDEX REGISTERS
20200371801 · 2020-11-26 ·

Disclosed herein are vector index registers for storing or loading indexes of true and/or false results of comparison operations in vector processors. Each of the vector index registers store multiple addresses for accessing multiple positions in operand vectors.

VECTOR INDEX REGISTERS
20200371802 · 2020-11-26 ·

Disclosed herein are vector index registers in vector processors that each store multiple addresses for accessing multiple positions in vectors. It is known to use scalar index registers in vector processors to access multiple positions of vectors by changing the scalar index registers in vector operations. By using a vector indexing register for indexing positions of one or more operand vectors, the scalar index register can be replaced and at least the continual changing of the scalar index register can be avoided.

QUICK CLEARING OF REGISTERS
20200371706 · 2020-11-26 ·

A method of clearing of registers and logic designs with AND and OR logics to propagate the zero values provided to write enable signal buses upon the execution of clear instruction of more than one registers, allowing more than one architecturally visible registers to be cleared with one signal instruction regardless of the values of data buses.

Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices

Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.

FACILITATING DATA PROCESSING USING SIMD REDUCTION OPERATIONS ACROSS SIMD LANES

Various embodiments are provided for facilitating data processing by one or more processors in a computing system. An instruction to be executed may be obtained. The instruction is a single instruction multiple data (SIMD) reduction operation of an operand vector with a plurality of vector elements. The SIMD reduction operation may be executed to produce a result vector with a plurality of alternative vector elements. One or more reduction functions may be performed on each of a pair of vector elements from the plurality of vector elements of the operand vector and a result of the one or more reduction functions may be placed in a corresponding vector element of the result vector.

VECTOR PROCESSOR WITH VECTOR FIRST AND MULTIPLE LANE CONFIGURATION
20200301875 · 2020-09-24 ·

A vector processor with a vector first and multi-lane configuration. A vector operation for a vector processor can include a single vector or multiple vectors as input. Multiple lanes for the input can be used to accelerate the operation in parallel. And, a vector first configuration can enhance the multiple lanes by reducing the number of elements accessed in the lanes to perform the operation in parallel.

Processor Core Design Optimized for Machine Learning Applications

A computing system includes a plurality of functional units, each functional unit having one or more inputs and an output. There is a shared memory block coupled to the inputs and outputs of the plurality of functional units. There is a private memory block assigned to each of the plurality of functional units. An inter functional unit data bypass (IFUDB) block is coupled to the plurality of functional units. The IFUDB is configured to route signals between the one or more functional units without use of the shared memory block.

VECTOR REGISTERS IMPLEMENTED IN MEMORY
20200117454 · 2020-04-16 ·

Systems and methods related to implementing vector registers in memory. A memory system for implementing vector registers in memory can include an array of memory cells, where a plurality of rows in the array serve as a plurality of vector registers as defined by an instruction set architecture. The memory system for implementing vector registers in memory can also include a processing resource configured to, responsive to receiving a command to perform a particular vector operation on a particular vector register, access a particular row of the array serving as the particular register to perform the vector operation.