G06F7/49947

ARITHMETIC OPERATION DEVICE AND ARITHMETIC OPERATION METHOD

An arithmetic operation device causes a convolution arithmetic unit to perform a convolution arithmetic operation between a filter and target data corresponding to a size of the filter in each of a plurality of convolution layers constituting a neural network. The arithmetic operation device includes: a bit reduction unit that reduces a bit string corresponding to a first bit number from a least significant bit of the target data and reduces a bit string corresponding to a second bit number from a least significant bit of a weight that is an element of the filter for each convolution layer; and a bit addition unit that adds a bit string corresponding to a third bit number obtained by adding the first bit number and the second bit number to a least significant bit of a convolution arithmetic operation result output from the convolution arithmetic unit by inputting the target data and the weight after being reduced by the bit reduction unit to the convolution arithmetic unit.

ELIMINATION OF ROUNDING ERROR ACCUMULATION
20210405966 · 2021-12-30 · ·

The present invention extends to methods, systems, and computing system program products for elimination of rounding error accumulation in iterative calculations for Big Data or streamed data. Embodiments of the invention include iteratively calculating a function for a primary computation window of a pre-defined size while incrementally calculating the function for one or more backup computation windows started at different time points and whenever one of the backup computation windows reaches a size of the pre-defined size, swapping the primary computation window and the backup computation window. The result(s) of the function is/are generated by either the iterative calculation performed for the primary computation window or the incremental calculation performed for a backup computation window which reaches the pre-defined size. Elimination of rounding error accumulation enables a computing system to steadily and smoothly run iterative calculations for unlimited number of iterations without rounding error accumulation.

Parallelized rounding for decimal floating point to binary coded decimal conversion

A computer-implemented method includes: receiving, using a processor, a decimal floating point number; and using a floating point unit within the processor to convert the decimal floating point number into a binary coded decimal number, wherein the floating point unit starts a conversion loop subsequent to a rounding loop starting, wherein the rounding loop and the conversion loop run in parallel once started.

DEEP NEURAL NETWORK OPERATION METHOD AND APPARATUS

A deep neural network operation method and apparatus are provided. The method comprises: obtaining an input feature map of a network layer; displacing respectively, according to a preset displacement parameter, each of channels of the input feature map of the network layer along axes, to obtain a displaced feature map, wherein the preset displacement parameter comprises displacement amounts of the channel in the axes; and performing convolution operation on the displaced feature map with a 1×1 convolution kernel to obtain an output feature map of the network layer. The operation efficiency of the DNN can be improved through the above method.

Circuitry for Floating-point Power Function

Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2.sup.(second input*log.sup.2.sup.(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.

INSTRUCTIONS TO CONVERT FROM FP16 TO BF8

Techniques for converting FP16 data elements to BF8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.

Circuitry for floating-point power function

Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2.sup.(second input*log.sup.2.sup.(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.

COMPUTING APPARATUS AND METHOD, BOARD CARD, AND COMPUTER READABLE STORAGE MEDIUM
20220188071 · 2022-06-16 ·

The present disclosure relates to a computing device for processing a multi-bit width value, an integrated circuit board card, a method, and a computer readable storage medium. The computing device may be included in a combined processing apparatus, and the combined processing apparatus may further include a general interconnection interface, and an other processing device. The computing device interacts with the other processing device to jointly complete a computing operation specified by a user. The combined processing apparatus may further include a storage device connected to an apparatus and the other processing device and configured to store data of the apparatus and the other processing device. The solution of the present disclosure can split the multi-bit width value so that the processing capability of the processor is not influenced by the bit width.

Multiplier Circuit Array, MAC and MAC Pipeline including Same, and Methods of Configuring Same
20220171604 · 2022-06-02 ·

An integrated circuit comprising a MAC pipeline including a plurality of MACs connected in series to perform concatenated multiply and accumulate operations, wherein each MAC includes a multiplier circuit array, including a plurality of multiplier circuits, to multiply first data and weight data and generate product data. The plurality of multiplier circuits, in one embodiment, includes a first multiplier circuit to multiply first portions of the first data and the weight data to generate a first field, and a second multiplier circuit to multiply a second portions of the first data and weight data to generate a second field, wherein the product data includes data which is representative of the first field and the second field. An accumulator circuit adds the product data, output from the associated multiplier circuit array, and second data. The multiply cores of the first and second multiplier circuits are separate and different.

METHOD FOR UPDATING AN ARTIFICIAL NEURAL NETWORK
20220164664 · 2022-05-26 ·

According to one aspect, the disclosure proposes a method for updating an artificial neural network including initial weights stored in a memory at least in an integer format, which method includes: a processing unit determining the error gradients at the output of the layers of the neural network, the processing unit retrieving the initial weights from memory, the processing unit updating the initial weights comprising, for each initial weight, a first calculation of a corrected weight, in the integer format of this initial weight, the processing unit replacing the value of the initial weights stored in the memory by the value of the corrected weights.