IPIQ

G06F2207/4824

ELEMENTS FOR IN-MEMORY COMPUTE

20230004354 · 2023-01-05 ·

A memory array arranged in multiple columns and rows. Computation circuits that each calculate a computation value from cell values in a corresponding column. A column multiplexer cycles through multiple data lines that each corresponds to a computation circuit. Cluster cycle management circuitry determines a number of multiplexer cycles based on a number of columns storing data of a compute cluster. A sensing circuit obtains the computation values from the computation circuits via the column multiplexer as the column multiplexer cycles through the data lines. The sensing circuit combines the obtained computation values over the determined number of multiplexer cycles. A first clock may initiate the multiplexer to cycle through its data lines for the determined number of multiplexer cycles, and a second clock may initiate each individual cycle. The multiplexer or additional circuitry may be utilized to modify the order in which data is written to the columns.

Semiconductor device having neural network

11568224 · 2023-01-31 ·

Semiconductor Energy Laboratory Co., Ltd.

A semiconductor device capable of efficiently recognizing images utilizing a neural network is provided. The semiconductor device includes a shift register group, a D/A converter, and a product-sum operation circuit. The product-sum operation circuit includes an analog memory and stores a parameter of a filter. The shift register group captures image data and outputs part of the image data to the D/A converter while shifting the image data. The D/A converter converts the part of the input image data into analog data and outputs the analog data to the product-sum operation circuit.

Device for computing an inner product

11567731 · 2023-01-31 ·

National Chung Cheng University

Tay-Jyi Lin

A device for computing an inner product includes an index unit, a storage operation unit, a redundant to 2's complement (RTC) converter, a mapping table, and a multiplier-accumulate (MAC) module. The index unit, storing index values, is coupled to word lines. The storage operation unit includes the word lines and bit lines and stores data values. The mapping table stores coefficients corresponding to the index values. The index unit enables the word line according to a count value and the index value, such that the storage operation unit accumulates the data values corresponding to the bit lines and the enabled word line, thereby generating accumulation results. The RTC converter converts the accumulation results into a total data value in 2's complement format. The MAC module operates based on the total data value and the coefficient to generate an inner product value.

Multiplication-free approximation for neural networks and sparse coding

11714977 · 2023-08-01 ·

Intel Corporation

Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations.

Data output method, data acquisition method, device, and electronic apparatus

11562241 · 2023-01-24 ·

Beijing Baidu Netcom Science and Technology Co., Ltd

A data output method, a data acquisition method, a device, and an electronic apparatus are provided, and a specific technical solution is: reading a first data sub-block, and splicing the first data sub-block into a continuous data stream, wherein the first data sub-block is a data sub-block in transferred data in a neural network; compressing the continuous data stream to acquire a second data sub-block; determining, according to a length of the first data sub-block and a length of the second data sub-block, whether there is a gain in compression of the continuous data stream; outputting the second data sub-block if there is the gain in the compression of the continuous data stream.

Neural network processor using dyadic weight matrix and operation method thereof

11562046 · 2023-01-24 ·

Samsung Electronics Co., Ltd.

An neural network (NN) processor includes an input feature map buffer configured to store an input feature matrix, a weight buffer configured to store a weight matrix trained in a form of a, a transform circuit configured to perform a Walsh-Hadamard transform on an input feature vector obtained from the input feature matrix and a weight vector included in the weight matrix to output a transformed input feature vector and a transformed weight vector, and an arithmetic circuit configured to perform an element-wise multiplication (EWM) on the transformed input feature vector and the transformed weight vector.

Neural network accelerator

11562218 · 2023-01-24 ·

Samsung Electronics Co., Ltd.

Disclosed is a neural network accelerator including a first bit operator generating a first multiplication result by performing multiplication on first feature bits of input feature data and first weight bits of weight data, a second bit operator generating a second multiplication result by performing multiplication on second feature bits of the input feature data and second weight bits of the weight data, an adder generating an addition result by performing addition based on the first multiplication result and the second multiplication result, a shifter shifting a number of digits of the addition result depending on a shift value to generate a shifted addition result, and an accumulator generating output feature data based on the shifted addition result.

Mixed-precision computation unit

11561767 · 2023-01-24 ·

Arm Limited

The present disclosure advantageously provides a mixed precision computation (MPC) unit for executing one or more mixed-precision layers of an artificial neural network (ANN). The MPC unit includes a multiplier circuit configured to input a pair of operands and output a product, a first adder circuit coupled to the multiplier circuit, a second adder circuit, coupled to the first adder circuit, configured to input a pair of operands, an accumulator circuit, coupled to the multiplier circuit and the first adder circuit, configured to output an accumulated value, and a controller, coupled to the multiplier circuit, the first adder circuit, the second adder circuit and the accumulator circuit, configured to input a mode control signal. The controller has a plurality of operating modes including a high precision mode, a low precision add mode and a low precision multiply mode.

Transposing neural network matrices in hardware

11704547 · 2023-07-18 ·

Google Llc

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying a transpose operation to be performed on a first neural network matrix; and generating instructions that when executed by the hardware circuit cause the hardware circuit to transpose the first neural network matrix by performing first operations, wherein the first operations include repeatedly performing the following second operations: for a current subdivision of the first neural network matrix that divides the first neural network matrix into one or more current submatrices, updating the first neural network matrix by swapping an upper right quadrant and a lower left quadrant of each current submatrix, and subdividing each current submatrix into respective new submatrices to update the current subdivision.

METHOD AND DEVICE FOR BINARY CODING OF SIGNALS IN ORDER TO IMPLEMENT DIGITAL MAC OPERATIONS WITH DYNAMIC PRECISION

20230014185 · 2023-01-19 ·

A computer-implemented method for coding a digital signal intended to be processed by a digital computing system includes the steps of: receiving a sample of the digital signal quantized on a number N.sub.d of bits, decomposing the sample into a plurality of binary words of parameterizable bit size N.sub.p, coding the sample through a plurality of pairs of values, each pair comprising one of the binary words and an address corresponding to the position of the binary word in the sample, transmitting the pairs of values to an integration unit in order to carry out a MAC operation between the sample and a weighting coefficient.

Patent classifications

G06F2207/4824