Patent classifications
G06F2207/382
METHOD AND APPARATUS WITH FLOATING POINT PROCESSING
A processor-implemented includes receiving a first floating point operand and a second floating point operand, each having an n-bit format comprising a sign field, an exponent field, and a significand field, normalizing a binary value obtained by performing arithmetic operations for fields corresponding to each other in the first and second floating point operands for an n-bit multiplication operation, determining whether the normalized binary value is a number that is representable in the n-bit format or an extended normal number that is not representable in the n-bit format, according to a result of the determining, encoding the normalized binary value using an extension bit format in which an extension pin identifying whether the normalized binary value is the extended normal number is added to the n-bit format, and outputting the encoded binary value using the extended bit format, as a result of the n-bit multiplication operation.
Conversion hardware mechanism
An apparatus to facilitate a computer number format conversion is disclosed. The apparatus comprises a control unit to receive to receive data format information indicating a first precision data format that input data is to be received and converter hardware to receive the input data and convert the first precision data format to a second precision data format based on the data format information.
Reconfigurable processor circuit architecture
A representative reconfigurable processing circuit and a reconfigurable arithmetic circuit are disclosed, each of which may include input reordering queues; a multiplier shifter and combiner network coupled to the input reordering queues; an accumulator circuit; and a control logic circuit, along with a processor and various interconnection networks. A representative reconfigurable arithmetic circuit has a plurality of operating modes, such as floating point and integer arithmetic modes, logical manipulation modes, Boolean logic, shift, rotate, conditional operations, and format conversion, and is configurable for a wide variety of multiplication modes. Dedicated routing connecting multiplier adder trees allows multiple reconfigurable arithmetic circuits to be reconfigurably combined, in pair or quad configurations, for larger adders, complex multiplies and general sum of products use, for example.
APPARATUS AND METHOD WITH MULTI-FORMAT DATA SUPPORT
An apparatus with multi-format data support includes: a receiver configured to receive a plurality of data corresponding to a plurality of data formats; one or more processors configured to: multiply the plurality of data using one or more multipliers; perform a first alignment on a result of the multiplication based on an exponent value of the plurality of data; add a result of the first alignment; and perform a second alignment on a result of the addition based on the exponent value and an operation result of a previous cycle.
Deep neural network with low-precision dynamic fixed-point in reconfigurable hardware design
A system for operating a floating-to-fixed arithmetic framework includes a floating-to-fix arithmetic framework on an arithmetic operating hardware such as a central processing unit (CPU) for computing a floating pre-trained convolution neural network (CNN) model to a dynamic fixed-point CNN model. The dynamic fixed-point CNN model is capable of implementing a high performance convolution neural network (CNN) on a resource limited embedded system such as mobile phone or video cameras.
Computing device and method
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
Multiplication and accumulation(MAC) operator and processing-in-memory (PIM) device including the MAC operator
A MAC operator includes a plurality of multipliers configured to perform a multiplication operation on a floating-point format first data and a floating-point format second data to output a floating-point format multiplication result data, a plurality of floating-point-to-fixed-point converters configured to receive the floating-point format multiplication result data from each of the plurality of multipliers and convert into a fixed-point format multiplication result data to be output, and an adder tree configured to perform an addition operation on the fixed-point format multiplication result data that is output from the plurality of floating-point-to-fixed-point converters. If a first mantissa of the first data and a second mantissa of the second data are composed of ‘M’-bit (‘M’ being a natural number), each of the plurality of multipliers is configured to perform the multiplication operation so that the fixed-point format multiplication result data includes a mantissa of 2*(M+1) bits.
Computing device and method
The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.
PROCESSING ELEMENT AND NEURAL PROCESSING DEVICE INCLUDING SAME
The present disclosure discloses a processing element and a neural processing device including the processing element. The processing element includes a weight register configured to store a weight, an input activation register configured to store an input activation, a flexible multiplier configured to receive a first sub-weight of a first precision included in the weight, receive a first sub-input activation of the first precision included in the input activation, and generate result data by performing multiplication calculation of the first sub-weight and the first sub-input activation as the first precision or a second precision different from the first precision according to the first sub-weight and the first sub-input activation and a saturating adder configured to generate a partial sum by using the result data.
DEEP LEARNING ACCELERATION WITH MIXED PRECISION
A device for deep learning acceleration with mixed precision may include a first data port configured to receive a map data segment and a second data port configured to receive a kernel data segment. The device may include a precision mode port configured to receive an indication of an input precision mode that indicates a word length for the map data segment and for the kernel data segment. The device may include a multiplier component configured to generate a multiplier component output based on the input precision mode and based on multiplying the map data segment and the kernel data segment. The device may include an adder component configured to generate an adder component output based on the input precision mode and based on the multiplier component output. The device may include an output port configured to output the adder component output.