G06F17/153

METHOD FOR IMPROVING CONVOLUTIONAL NEURAL NETWORK TO PERFORM COMPUTATIONS
20220398429 · 2022-12-15 ·

A method for improving a convolutional neural network (CNN) to perform computations is provided. The method includes the following steps: determining a number of a plurality of multipliers to be N and a number of a plurality of adders to be N according to a number of convolution kernels used by a plurality of convolution layers; and in response to an i-th convolutional layer of the convolutional neural network performing a convolution operation and N convolution kernels of the i-th convolutional layer being all in a size of K×1×1, using the N multipliers and the N adders to perform a multiplication operation once and an addition operation once for each of the N convolution kernels of the i-th convolutional layer in one cycle, such that N outputs of the N convolution kernels of the i-th convolutional layer are obtained after K cycles.

Systems and methods for vectorized FFT for multidimensional convolution operations
11526731 · 2022-12-13 · ·

A new approach is proposed to support efficient convolution for deep learning by vectorizing multi-dimensional input data for multi-dimensional fast Fourier transform (FFT) and direct memory access (DMA) for data transfer. Specifically, a deep learning processor (DLP) includes a plurality of tensor engines each configured to perform convolution operations by applying one or more kernels on multi-dimensional input data for pattern recognition and classification based on a neural network, wherein each tensor engine includes, among other components, one or more vector processing engines each configured to vectorize the multi-dimensional input data at each layer of the neural network to generate a plurality of vectors and to perform multi-dimensional FFT on the generated vectors and/or the kernels to create output for the convolution operations. Each tensor engine further includes a data engine configured to prefetch the multi-dimensional data and/or the kernels to both on-chip and external memories via DMA.

Methods and Circuits of Spatial Alignment

According to one implementation of the present disclosure, a method includes performing a spatial alignment of at least one of first or second data tiers of a circuit; and performing a computation based on the spatial alignment of the at least one of the first and second data tiers. According to another implementation of the present disclosure, a circuit includes: a compute circuitry; and at least first and second data tiers of two or more data tiers positioned at least partially overlapping one another. In an example, each of the at least first and second data tiers is coupled to the compute circuitry. In certain implementations, the positioning of the first and second data tiers at least partially overlapping one another corresponds to a spatial alignment.

Mapping convolution to a partition channel convolution engine

A processor system comprises two groups of registers and a hardware channel convolution processor unit. The first group of registers is configured to store data elements of channels of a portion of a convolution data matrix. Each register stores at least one data element from each channel. The second group of registers is configured to store data elements of convolution weight matrices including a separate matrix for each channel. Each register stores at least one data element from each matrix. The hardware channel convolution processor unit is configured to multiply each data element in a first and second portion of the first group of registers with a corresponding data element in the second group of registers to determine corresponding multiplication results and sum together the multiplication results for each specific channel to determine two corresponding channel convolution result data elements in a corresponding channel convolution result matrix.

Neural network weight distribution from a grid of memory elements

Neural inference chips for computing neural activations are provided. In various embodiments, a neural inference chip comprises at least one neural core, a memory array, an instruction buffer, and an instruction memory. The instruction buffer has a position corresponding to each of a plurality of elements of the memory array. The instruction memory provides at least one instruction to the instruction buffer. The instruction buffer advances the at least one instruction between positions in the instruction buffer. The instruction buffer provides the at least one instruction to at least one of the plurality of elements of the memory array from its associated position in the instruction buffer when the memory of the at least one of the plurality of elements contains data associated with the at least one instruction. Each element of the memory array provides a data block from its memory to its horizontal buffer in response to the arrival of an associated instruction from the instruction buffer. The horizontal buffer of each element of the memory array provides a data block to the horizontal buffer of another of the elements of the memory array or to the at least one neural core.

Circuit for neural network convolutional calculation of variable feature and kernel sizes
11514136 · 2022-11-29 · ·

A circuit for performing parallel convolutional computation for features and kernels of variable sizes may receive inputs of an m×n matrix of feature data, an m×n matrix of convolution data, and a (2m−1)×(2n−1) matrix of kernel data. A feature manager of the circuit may hold m rows of n data buffers storing the input feature data and rotating values between rows during one restricted convolution calculation. A kernel manager of the circuit may hold a (2m−1)×(2n−1) matrix of data buffers storing the input kernel data in the buffers and cyclically rotating values in upwards, downwards, leftwards and rightwards directions for different restricted convolution calculations. A row convolution engine of the circuit may hold m row convolution processors, each storing and updating input convolution data by multiplication-and-accumulation (MAC) operations on its input feature and kernel data rows. The circuit produces accumulated convolutional data.

Rotation and translation invariant representation of an object
11514647 · 2022-11-29 · ·

A non-transitory computer readable medium that stores instructions that once executed by a computer cause the computer to execute the stages of: calculating a first function that represents an object that is three dimensional; calculating a second function that is a convolution or an approximated convolution of (a) the first function applied on points of the object, and (b) an other function that is the first function composed with a function that sends points of the object to opposite points; wherein the second function is translation invariant; and calculating the translation and rotation invariant features of the query object, based on the second function.

DATA PROCESSING METHOD AND CIRCUIT BASED ON CONVOLUTION COMPUTATION
20220374494 · 2022-11-24 · ·

A data processing method and circuit based on convolution computation are provided. In the data processing method, a shared memory structure is provided, convolution computation of data in batches or duplicated data is provided, an allocation mechanism for storing data into multiple memories is provided, and a signed padding mechanism is provided. Therefore, a flexible and efficient convolution computation mechanism and structure are provided.

DATA PROCESSING METHOD AND CIRCUIT BASED ON CONVOLUTION COMPUTATION
20220374495 · 2022-11-24 · ·

A data processing method and circuit based on convolution computation are provided. In the data processing method, a shared memory structure is provided, convolution computation of data in batches or duplicated data is provided, an allocation mechanism for storing data into multiple memories is provided, and a signed padding mechanism is provided. Therefore, a flexible and efficient convolution computation mechanism and structure are provided.

DATA PROCESSING METHOD AND CIRCUIT BASED ON CONVOLUTION COMPUTATION
20220374493 · 2022-11-24 · ·

A data processing method and circuit based on convolution computation are provided. In the data processing method, a shared memory structure is provided, convolution computation of data in batches or duplicated data is provided, an allocation mechanism for storing data into multiple memories is provided, and a signed padding mechanism is provided. Therefore, a flexible and efficient convolution computation mechanism and structure are provided.