G06F7/57

ARITHMETIC PROCESSING DEVICE AND ARITHMETIC METHOD
20220357925 · 2022-11-10 · ·

An arithmetic processing device includes: a first multiplier circuit configured to calculate a product DY of an approximate value D obtained by approximating a reciprocal 1/Y of a divisor Y; a dividend operation circuit configured to compare a dividend X and the divisor Y, and generate an operation value twice the dividend X or the operation value equal to the dividend X based on a comparison result; a second multiplier circuit configured to calculate a product of the approximate value D and the operation value as an initial value R(0) of a partial remainder R(n); a third multiplier circuit configured to calculate a product DY*q(n) of the product DY and a partial quotient q(n) that is a predetermined number of upper bits of the partial remainder R(n); and a first addition circuit configured to calculate a new partial remainder R(n) by subtracting the product DY*q(n) from the partial remainder R(n).

Processing element and operating method thereof in neural network

A processing element and an operating method thereof in a neural network are disclosed. The processing element may include a first multiplexer selecting one of a first value stored in a first memory and a second value stored in a second memory, a second multiplexer selecting one of a first data input signal and an output value of the first multiplexer, a third multiplexer selecting one of the output value of the first multiplexer and a second data input signal, a multiplier multiplying an output value of the second multiplexer by an output value of the third multiplexer, a fourth multiplexer for selecting one of the output value of the second multiplexer and an output value of the multiplier, and a third memory storing an output value of the fourth multiplexer.

Processing element and operating method thereof in neural network

A processing element and an operating method thereof in a neural network are disclosed. The processing element may include a first multiplexer selecting one of a first value stored in a first memory and a second value stored in a second memory, a second multiplexer selecting one of a first data input signal and an output value of the first multiplexer, a third multiplexer selecting one of the output value of the first multiplexer and a second data input signal, a multiplier multiplying an output value of the second multiplexer by an output value of the third multiplexer, a fourth multiplexer for selecting one of the output value of the second multiplexer and an output value of the multiplier, and a third memory storing an output value of the fourth multiplexer.

Neuromorphic arithmetic device and operating method thereof

The neuromorphic arithmetic device comprises an input monitoring circuit that outputs a monitoring result by monitoring that first bits of at least one first digit of a plurality of feature data and a plurality of weight data are all zeros, a partial sum data generator that skips an arithmetic operation that generates a first partial sum data corresponding to the first bits of a plurality of partial sum data in response to the monitoring result while performing the arithmetic operation of generating the plurality of partial sum data, based on the plurality of feature data and the plurality of weight data, and a shift adder that generates the first partial sum data with a zero value and result data, based on second partial sum data except for the first partial sum data among the plurality of partial sum data and the first partial sum data generated with the zero value.

FPGA specialist processing block for machine learning

The present disclosure describes a digital signal processing (DSP) block that includes a plurality of columns of weight registers and a plurality of inputs configured to receive a first plurality of values and a second plurality of values. The first plurality of values is stored in the plurality of columns of weight registers after being received. Additionally, the DSP block includes a plurality of multipliers configured to simultaneously multiply each value of the first plurality of values by each value of the second plurality of values.

Data Processing Method and Interaction System
20230101493 · 2023-03-30 ·

A data processing method applied to a programmable chip, includes a logic classification unit (LCU) that obtains, based on first data received through a data bus, at least first target classified data (TCD) and second TCD, and sends the first and second TCD to a corresponding first arithmetic logic unit (ALU) and a corresponding second ALU based on a preset mapping relationship. The LCU classifies target execution information obtained through preprocessing an entry by a ternary content addressable memory (TCAM) and service data, so that an instruction memory determines first and second information, and sends the first and second information to the corresponding first and second ALUs. The first and second ALUs respectively send, through the data bus, data obtained through performing calculation based on the first TCD and the first information and data obtained through performing calculation based on the second TCD and the second information.

Data Processing Method and Interaction System
20230101493 · 2023-03-30 ·

A data processing method applied to a programmable chip, includes a logic classification unit (LCU) that obtains, based on first data received through a data bus, at least first target classified data (TCD) and second TCD, and sends the first and second TCD to a corresponding first arithmetic logic unit (ALU) and a corresponding second ALU based on a preset mapping relationship. The LCU classifies target execution information obtained through preprocessing an entry by a ternary content addressable memory (TCAM) and service data, so that an instruction memory determines first and second information, and sends the first and second information to the corresponding first and second ALUs. The first and second ALUs respectively send, through the data bus, data obtained through performing calculation based on the first TCD and the first information and data obtained through performing calculation based on the second TCD and the second information.

Reuse in-flight register data in a processor

Devices and techniques for short-thread rescheduling in a processor are described herein. When an instruction for a thread completes, a result is produced. The condition that the same thread is scheduled in a next execution slot and that the next instruction of the thread will use the result can be detected. In response to this condition, the result can be provided directly to an execution unit for the next instruction.

Reuse in-flight register data in a processor

Devices and techniques for short-thread rescheduling in a processor are described herein. When an instruction for a thread completes, a result is produced. The condition that the same thread is scheduled in a next execution slot and that the next instruction of the thread will use the result can be detected. In response to this condition, the result can be provided directly to an execution unit for the next instruction.

ARITHMETIC LOGIC UNIT, FLOATING-POINT NUMBER MULTIPLICATION CALCULATION METHOD, AND DEVICE

An arithmetic logic unit comprises multiple (N) adjustment circuits and a multiplier-accumulator. Each of the N adjustment circuits obtains an input floating-point number of a pre-selected input type, and converts the input number to one or more output floating-point numbers of an operation type and precision. The multiplier-accumulator is connected to the N adjustment circuits, and is configured to perform operations on input floating-point numbers of the operation type. The multiplier-accumulator receives a group of floating-point numbers of the operation type from the N adjustment circuits as inputs, performs an operation on the group of floating-point numbers, and generates an operation result floating-point number of the operation type. The multiplier-accumulator then converts the operation result floating-point number to an output floating-point number of a desired type different from the operation type.