G06F7/544

MEMORY DEVICE AND OPERATING METHOD THEREOF

A memory device, includes a memory array for storing a plurality of vector data each of which has an MSB vector and a LSB vector. The memory array includes a plurality of memory units each of which has a first bit and a second bit. The first bit is used to store the MSB vector of each vector data, the second bit is used to store the LSB vector of each vector data. Each vector data is executed with a multiplying-operation, the MSB vector and the LSB vector of each vector data is executed with a first group-counting operation and a second group-counting operation respectively. The threshold voltage distribution of each memory unit is divided into N states, where N is a positive integer and N is less than 2 to the power of 2, the effective bit number stored by each memory unit is less than 2.

Accelerated mathematical engine

Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.

Kernel Decomposition and Activation Broadcasting in Deep Neural Networks (DNNs)

An DNN accelerator may perform 1×N kernel decomposition to decompose a convolutional kernel into kernel vectors, each of which includes multiple weights. Through the kernel decomposition, a weight operand may be generated from a filter. The DNN accelerator converts an input tensor into input operands. An input operand includes activations and has the same size as the weight operand. The DNN accelerator may read a first activation in the input operand from memory to an internal memory of a first PE and read a second activation in the input operand from the memory to an internal memory of a second PE. The first PE may receive the second activation from the second PE through activation broadcasting between the two PEs and perform MAC operations on the input operand and weight operand. The second PE may perform MAC operations on another input operand in the input tensor and the weight operand.

COMPUTER PROCESSOR FOR HIGHER PRECISION COMPUTATIONS USING A MIXED-PRECISION DECOMPOSITION OF OPERATIONS
20230214215 · 2023-07-06 ·

Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.

METHOD AND APPARATUS WITH BIT-SERIAL DATA PROCESSING OF A NEURAL NETWORK

A processor-implemented data processing method includes encoding a plurality of weights of a filter of a neural network using an inverted two's complement fixed-point format; generating weight data based on values of the encoded weights corresponding to same filter positions of a plurality of filters; and performing an operation on the weight data and input activation data using a bit-serial scheme to control when to perform an activation function with respect to the weight data and input activation data.

Processing system and method for binary weight convolutional neural network

The present invention provides a processing system for a binary weight convolutional neural network. The system comprises: at least one storage unit for storing data and instructions; at least one control unit for acquiring the instructions stored in the storage unit and sending out a control signal; and, at least one calculation unit for acquiring, from the storage unit, node values of a layer in a convolutional neural network and corresponding binary weight value data and obtaining node values of a next layer by performing addition and subtraction operations. With the system of the present invention, the data bit width during the calculation process of a convolutional neural network is reduced, the convolutional operation speed is improved, and the storage capacity and operational energy consumption are reduced.

CALCULATING DEVICE, CALCULATION PROGRAM, AND CALCULATION METHOD
20230214184 · 2023-07-06 · ·

According to one embodiment, a calculating device includes a processor configured to perform repeating an update processing. The update processing includes update of first and second variable sets. The first variable set includes a first variable x.sub.i. The second variable set includes a second variable y.sub.i. The update of the second variable set includes updating the second variable y.sub.i by adding a second function F.sub.i to the second variable y.sub.i before the update. The second function F.sub.i includes the first variable x.sub.i as a variable. The second function F.sub.i includes a parameter a.sub.i. An ordinal number p is one integer not less than 1 and not more than N. An ordinal number q is one integer not less than 1 and not more than N. The ordinal number q is different from the ordinal number p. A parameter a.sub.p is different from a parameter a.sub.q.

CALCULATING DEVICE, CALCULATION PROGRAM, AND CALCULATION METHOD
20230214184 · 2023-07-06 · ·

According to one embodiment, a calculating device includes a processor configured to perform repeating an update processing. The update processing includes update of first and second variable sets. The first variable set includes a first variable x.sub.i. The second variable set includes a second variable y.sub.i. The update of the second variable set includes updating the second variable y.sub.i by adding a second function F.sub.i to the second variable y.sub.i before the update. The second function F.sub.i includes the first variable x.sub.i as a variable. The second function F.sub.i includes a parameter a.sub.i. An ordinal number p is one integer not less than 1 and not more than N. An ordinal number q is one integer not less than 1 and not more than N. The ordinal number q is different from the ordinal number p. A parameter a.sub.p is different from a parameter a.sub.q.

CALCULATING DEVICE, CALCULATION PROGRAM, AND CALCULATION METHOD
20230214446 · 2023-07-06 · ·

According to one embodiment, a calculating device includes a processor configured to perform a matrix transformation processing, and an update processing. The matrix transformation processing includes deriving a second matrix by transforming first row vectors included in a first matrix. The update processing includes an update of first and second variable sets. The update of the second variable set includes obtaining the second variable set after the update by adding a first update function of the updated first variable set to the second variable set. The first update function includes at least one of first or second multiply-add operation. The first multiply-add operation includes a multiply-add operation of the updated first variable set and a component of the second matrix. The second multiply-add operation includes a multiply-add operation of a component of the second matrix and a variable dependent on the updated first variable set.

MULTIPURPOSE MULTIPLY-ACCUMULATOR ARRAY
20230214185 · 2023-07-06 ·

Embodiments of the present disclosure include a multipurpose multiply-accumulator (MAC) array circuit comprising one or more input memories for receiving operands and a plurality of multiply-accumulator circuits each selectively coupled to the one or more input memories to receive at least a pair of operands and generate a result. Each of the plurality of multiply-accumulator circuits receives operands from the one or more input memories independently. Additionally, selection of operands from the one or more input memories is controlled based on at least an operation and/or data types, where different operation and/or data types configure the plurality of multiply-accumulator circuits to receive different pairs of operands from the one or more input memories to execute particular operation types.