G06F7/50

Temperature Compensation Circuit and Method for Neural Network Computing-in-memory Array
20230048640 · 2023-02-16 ·

The disclosure discloses a temperature compensation circuit and method for a neural network computing-in-memory array. Reference arrays sparsely inserted in the computing-in-memory array are adopted to provide a reference voltage for ADCs, so that an input voltage and a reference voltage of the ADCs have a same temperature coefficient. Finally, after analog-to-digital conversion by the ADC, the digital output of the ADC is not affected by the external temperature, thereby ensuring the operational precision of the neural network. According to the temperature compensation circuit of the disclosure, the reference arrays have the same structure as the computing-in-memory array. The insertion density of the reference arrays is related to the temperature field where the computing-in-memory arrays are located. One reference array may provide the reference voltage of the ADC for a plurality of computing-in-memory arrays, thereby minimizing the increase of area and power consumption caused by inserting the reference arrays.

Temperature Compensation Circuit and Method for Neural Network Computing-in-memory Array
20230048640 · 2023-02-16 ·

The disclosure discloses a temperature compensation circuit and method for a neural network computing-in-memory array. Reference arrays sparsely inserted in the computing-in-memory array are adopted to provide a reference voltage for ADCs, so that an input voltage and a reference voltage of the ADCs have a same temperature coefficient. Finally, after analog-to-digital conversion by the ADC, the digital output of the ADC is not affected by the external temperature, thereby ensuring the operational precision of the neural network. According to the temperature compensation circuit of the disclosure, the reference arrays have the same structure as the computing-in-memory array. The insertion density of the reference arrays is related to the temperature field where the computing-in-memory arrays are located. One reference array may provide the reference voltage of the ADC for a plurality of computing-in-memory arrays, thereby minimizing the increase of area and power consumption caused by inserting the reference arrays.

PARTIAL SUM MANAGEMENT AND RECONFIGURABLE SYSTOLIC FLOW ARCHITECTURES FOR IN-MEMORY COMPUTATION
20230047364 · 2023-02-16 ·

Methods and apparatus for performing machine learning tasks, and in particular, to a neural-network-processing architecture and circuits for improved handling of partial accumulation results in weight-stationary operations, such as operations occurring in compute-in-memory (CIM) processing elements (PEs). One example PE circuit for machine learning generally includes an accumulator circuit, a flip-flop array having an input coupled to an output of the accumulator circuit, a write register, and a first multiplexer having a first input coupled to an output of the write register, having a second input coupled to an output of the flip-flop array, and having an output coupled to a first input of the first accumulator circuit.

PARTIAL SUM MANAGEMENT AND RECONFIGURABLE SYSTOLIC FLOW ARCHITECTURES FOR IN-MEMORY COMPUTATION
20230047364 · 2023-02-16 ·

Methods and apparatus for performing machine learning tasks, and in particular, to a neural-network-processing architecture and circuits for improved handling of partial accumulation results in weight-stationary operations, such as operations occurring in compute-in-memory (CIM) processing elements (PEs). One example PE circuit for machine learning generally includes an accumulator circuit, a flip-flop array having an input coupled to an output of the accumulator circuit, a write register, and a first multiplexer having a first input coupled to an output of the write register, having a second input coupled to an output of the flip-flop array, and having an output coupled to a first input of the first accumulator circuit.

Automated optimization of large-scale quantum circuits with continuous parameters

The disclosure describes the implementation of automated techniques for optimizing quantum circuits of the size and type expected in quantum computations that outperform classical computers. The disclosure shows how to handle continuous gate parameters and report a collection of fast algorithms capable of optimizing large-scale-scale quantum circuits. For the suite of benchmarks considered, the techniques described obtain substantial reductions in gate counts. In particular, the techniques in this disclosure provide better optimization in significantly less time than previous approaches, while making minimal structural changes so as to preserve the basic layout of the underlying quantum algorithms. The results provided by these techniques help bridge the gap between computations that can be run on existing quantum computing hardware and more advanced computations that are more challenging to implement in quantum computing hardware but are the ones that are expected to outperform what can be achieved with classical computers.

Neural network processor for handling differing datatypes
11580353 · 2023-02-14 · ·

Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.

Neural network processor for handling differing datatypes
11580353 · 2023-02-14 · ·

Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.

OPERATION APPARATUS

An embodiment of the present disclosure provides an operation apparatus which includes a storage unit, a control unit and a compute unit. The technical solution provided in this disclosure can reduce resource consumption of convolution operation, improve the speed of convolution operation and reduce operation time.

OPERATION APPARATUS

An embodiment of the present disclosure provides an operation apparatus which includes a storage unit, a control unit and a compute unit. The technical solution provided in this disclosure can reduce resource consumption of convolution operation, improve the speed of convolution operation and reduce operation time.

SIGN-EFFICIENT ADDITION AND SUBTRACTION FOR STREAMINGCOMPUTATIONS IN CRYPTOGRAPHIC ENGINES
20230042366 · 2023-02-09 ·

Aspects of the present disclosure involve techniques and cryptographic processors configured to perform the techniques that include sign-efficient addition and subtraction operations that use Montgomery reduction and are capable of facilitating fast streaming operations. The techniques involve receiving a first number and a second number, where the first number and second number are within a target interval, and performing a modular operation to obtain a third number, the third number being within the same target interval and representing a sum or a difference of a rescaled first number and a rescaled second number, and wherein the modular operation includes a Montgomery reduction.