G06F7/5443

Bit string accumulation in multiple registers
11579843 · 2023-02-14 · ·

Methods, Systems, and apparatuses related to performing bit string accumulation within a compute or memory device are described. A logic circuit with processing capability and a register within or near memory, for example, can perform multiple iterations of a recursive operation using several bit strings. Results of the various iterations may be written to the register, and subsequent iterations of the recursive operation using the bit strings may be performed. Results of the iterations of recursive operations may be accumulated within the register. Accumulated results may be written as data to another register or to memory that is external to or separate from the logic circuit.

Processing apparatus and electronic device including the same

Provided are processing and an electronic device including the same. The processing apparatus includes a bit cell line comprising bit cells connected in series, a mirror circuit unit configured to generate a mirror current by replicating a current flowing through the bit cell line at a ratio, a charge charging unit configured to charge a voltage corresponding to the mirror current as the mirror current replicated by the mirror circuit unit is applied, and a voltage measuring unit configured to output a value corresponding to a multiply-accumulate (MAC) operation of weights and inputs applied to the bit cell line, based on the voltage charged by the charge charging unit.

Neural network processor for handling differing datatypes
11580353 · 2023-02-14 · ·

Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.

Inference apparatus, convolution operation execution method, and program
11580369 · 2023-02-14 · ·

An inference apparatus comprises a plurality of PEs (Processing Elements) and a control part. The control part operates a convolution operation in a convolutional neural network using each of a plurality of pieces of input data and a weight group including a plurality of weights corresponding to each of the plurality of pieces of input data by controlling the plurality of PEs. Further, each of the plurality of PEs executes a computation including multiplication of a single piece of the input data by a single weight and also executes multiplication included in the convolution operation using an element with a non-zero value included in each of the plurality of pieces of input data.

SECURE INVERSE SQUARE ROOT COMPUTATION SYSTEM, SECURE NORMALIZATION SYSTEM, METHODS THEREFOR, SECURE COMPUTATION APPARATUS, AND PROGRAM

The bit decomposition unit (11) generates a bit representation lap {a.sub.0}, . . . , {a.sub.λ−1} of a. A first bit sequence generator (12) calculates {a′.sub.i}={a.sub.i}∨{a.sub.i+1} to generate {a′.sub.0}, . . . , {a′.sub.λ′−1}. A flag sequence generator (13) generates {x.sub.0}, . . . , {x.sub.λ′−1} indicating a most significant bit of {a′.sub.0}, . . . , {a′.sub.λ′−1}. A normalization multiplier generator (14) generates [c′] by bit-connecting {x.sub.λ′−1}, . . . , {x.sub.0}. A second bit sequence generator (15) sets {a″.sub.i}={a.sub.2i} to generate {a″.sub.0}, . . . . A flag calculator (16) sums {x.sub.j}{a′.sub.j} to calculate a share value {r}. A normalization unit (18) calculates [b]: =[c′][c′][2a] when r=1 and [b]: =[c′][c′][a] when r=0. A inverse square root calculator (19) calculates [w]: =[1/√b]*√2 when r=1, and [w]: =[1/√b] when r=0. An inverse normalization unit (20) multiplies [1/√a]: =[w][c′].

MEMORY DEVICE FOR PERFORMING CONVOLUTION OPERATION
20230043170 · 2023-02-09 ·

A memory device performs a convolution operation. The memory device includes first to N-th processing elements (PEs), a first analog-to-digital converter (ADC), a first shift adder, and a first accumulator. The first to N-th PEs, where N is a natural number equal to or greater than 2, are respectively associated with at least one weight data included in a weight feature map and are configured to perform a partial convolution operation with at least one input data included in an input feature map. The first ADC is configured to receive a first partial convolution operation result from the first to N-th PEs. The first shift adder shifts an output of the first ADC. The first accumulator accumulates an output from the first shift adder.

SEMICONDUCTOR MEMORY APPARATUS AND OPERATING METHOD THEREOF
20230040775 · 2023-02-09 ·

A semiconductor memory apparatus may include: a data adjusting circuit configured to conditionally adjust a weight data value for a MAC (Multiplication and ACcumulation) operation based on comparing the weight data value to a reference data value, and generate flag information indicating whether the weight data value has been adjusted; a memory cell array circuit configured to store the adjusted weight data value outputted from the data adjusting circuit; and a data calculation circuit configured to recover, on the flag information, a result based on the weight data value from a result based on the adjusted weight data value to perform the MAC operation on an input data value and the weight data value.

Reconfigurable input precision in-memory computing

Technology for reconfigurable input precision in-memory computing is disclosed herein. Reconfigurable input precision allows the bit resolution of input data to be changed to meet the requirements of in-memory computing operations. Voltage sources (that may include DACs) provide voltages that represent input data to memory cell nodes. The resolution of the voltage sources may be reconfigured to change the precision of the input data. In one parallel mode, the number of DACs in a DAC node is used to configure the resolution. In one serial mode, the number of cycles over which a DAC provides voltages is used to configure the resolution. The memory system may include relatively low resolution voltage sources, which avoids the need to have complex high resolution voltage sources (e.g., high resolution DACs). Lower resolution voltage sources can take up less area and/or use less power than higher resolution voltage sources.

NEURAL NETWORK FACILITATING FIXED-POINT EMULATION OF FLOATING-POINT COMPUTATION
20230008856 · 2023-01-12 ·

An DNN accelerator can perform fixed-point emulation of floating-point computation. In a multiplication operation on two floating-point matrices, the DNN accelerator determines an extreme exponent for a row in the first floating-point matrix and determines another extreme exponent for a column in the second floating-point matrix. The row and column can be converted to fixed-point vectors based on the extreme exponents. The two fixed-point vectors are fed into a PE array in the DNN accelerator. The PE array performs a multiplication operation on the two fixed-point vectors and generates a fixed-point inner product. The fixed-point inner product can be converted back to a floating-point inner product based on the extreme exponents. The floating-point inner product is an element in the matrix resulted from the multiplication operation on the two floating-point matrices. The matrix can be accumulated with another matrix resulted from a fixed-point emulation of a floating-point matrix multiplication.

Physics simulation on machine-learning accelerated hardware platforms
11550971 · 2023-01-10 · ·

At least one machine-accessible storage medium that provides instructions that, when executed by a machine, will cause the machine to perform operations. The operations comprise configuring a simulated environment to be representative of a physical device based, at least in part, on an initial description of the physical device that described structural parameters of the physical device. The operations further comprise performing a physics simulation with an artificial intelligence (“AI”) accelerator. The AI accelerator includes a matrix multiply unit for computing convolution operations via a plurality of multiply-accumulate units. The operations further comprise computing a field response in response of the physical device in response to an excitation source within the simulated environment when performing the physics simulation. The field response is computed, at least in part, with the convolution operations to perform spatial differencing.