G06F15/8076

Quick clearing of registers

A method of clearing of registers and logic designs with AND and OR logics to propagate the zero values provided to write enable signal buses upon the execution of clear instruction of more than one registers, allowing more than one architecturally visible registers to be cleared with one signal instruction regardless of the values of data buses.

VECTOR REGISTERS IMPLEMENTED IN MEMORY
20220066777 · 2022-03-03 ·

Systems and methods related to implementing vector registers in memory. A memory system for implementing vector registers in memory can include an array of memory cells, where a plurality of rows in the array serve as a plurality of vector registers as defined by an instruction set architecture. The memory system for implementing vector registers in memory can also include a processing resource configured to, responsive to receiving a command to perform a particular vector operation on a particular vector register, access a particular row of the array serving as the particular register to perform the vector operation.

Apparatus and methods for combining vectors

Aspects for vector combination in neural network are described herein. The aspects may include a direct memory access unit configured to receive aa first vector, a second vector, and a controller vector. The first vector, the second vector, and the controller vector may each include one or more elements indexed in accordance with a same one-dimensional data structure. The aspects may further include a computation module configured to select one of the one or more control values, determine that the selected control value satisfies a predetermined condition, and select one of the one or more first elements that corresponds to the selected control value in the one-dimensional data structure as an output element based on a determination that the selected control value satisfies the predetermined condition.

Vector Processor with Vector First and Multiple Lane Configuration
20210117375 · 2021-04-22 ·

A vector processor with a vector first and multi-lane configuration. A vector operation for a vector processor can include a single vector or multiple vectors as input. Multiple lanes for the input can be used to accelerate the operation in parallel. And, a vector first configuration can enhance the multiple lanes by reducing the number of elements accessed in the lanes to perform the operation in parallel.

Methods and systems for fast set-membership tests using one or more processors that support single instruction multiple data instructions

Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (SIMD) architecture are presented herein. Specifically, methods and apparatuses are discussed for determining, in parallel, whether multiple values in a first set of values are members of a second set of values. Many of the methods and systems discussed herein are applied to determining whether one or more rows in a dictionary-encoded column of a database table satisfy one or more conditions based on the dictionary-encoded column. However, the methods and systems discussed herein may apply to many applications executed on a SIMD processor using set-membership tests.

Methods and systems for fast set-membership tests using one or more processors that support single instruction multiple data instructions

Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (SIMD) architecture are presented herein. Specifically, methods and apparatuses are discussed for determining, in parallel, whether multiple values in a first set of values are members of a second set of values. Many of the methods and systems discussed herein are applied to determining whether one or more rows in a dictionary-encoded column of a database table satisfy one or more conditions based on the dictionary-encoded column. However, the methods and systems discussed herein may apply to many applications executed on a SIMD processor using set-membership tests.

Methods and systems for fast set-membership tests using one or more processors that support single instruction multiple data instructions

Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (SIMD) architecture are presented herein. Specifically, methods and apparatuses are discussed for determining, in parallel, whether multiple values in a first set of values are members of a second set of values. Many of the methods and systems discussed herein are applied to determining whether one or more rows in a dictionary-encoded column of a database table satisfy one or more conditions based on the dictionary-encoded column. However, the methods and systems discussed herein may apply to many applications executed on a SIMD processor using set-membership tests.

Methods and systems for fast set-membership tests using one or more processors that support single instruction multiple data instructions

Methods and apparatuses for determining set-membership using Single Instruction Multiple Data (SIMD) architecture are presented herein. Specifically, methods and apparatuses are discussed for determining, in parallel, whether multiple values in a first set of values are members of a second set of values. Many of the methods and systems discussed herein are applied to determining whether one or more rows in a dictionary-encoded column of a database table satisfy one or more conditions based on the dictionary-encoded column. However, the methods and systems discussed herein may apply to many applications executed on a SIMD processor using set-membership tests.

Processor core design optimized for machine learning applications

A computing system includes a plurality of functional units, each functional unit having one or more inputs and an output. There is a shared memory block coupled to the inputs and outputs of the plurality of functional units. There is a private memory block assigned to each of the plurality of functional units. An inter functional unit data bypass (IFUDB) block is coupled to the plurality of functional units. The IFUDB is configured to route signals between the one or more functional units without use of the shared memory block.

PROCESSOR AND CONTROL METHOD THEREOF

A processor is provided. The processor includes a plurality of processing elements configured to be arranged in a matrix form, and a controller configured to control the plurality of processing elements during a plurality of cycles to process a target data, control first processing elements so that each of the first processing elements operates data provided from adjacent first processing elements and the input first element and inputs each of second elements included in a second row among the plurality of elements to second processing elements arranged in the second row among the plurality of processing elements, control the second processing elements so that each of the second processing elements operates data provided from adjacent second processing elements and the input second element, and operates data provided from the adjacent first processing elements in the same column among the first processing elements and pre-stored operation data.