G06F7/575

Neuromorphic arithmetic device and operating method thereof

The neuromorphic arithmetic device comprises an input monitoring circuit that outputs a monitoring result by monitoring that first bits of at least one first digit of a plurality of feature data and a plurality of weight data are all zeros, a partial sum data generator that skips an arithmetic operation that generates a first partial sum data corresponding to the first bits of a plurality of partial sum data in response to the monitoring result while performing the arithmetic operation of generating the plurality of partial sum data, based on the plurality of feature data and the plurality of weight data, and a shift adder that generates the first partial sum data with a zero value and result data, based on second partial sum data except for the first partial sum data among the plurality of partial sum data and the first partial sum data generated with the zero value.

Neuromorphic arithmetic device and operating method thereof

The neuromorphic arithmetic device comprises an input monitoring circuit that outputs a monitoring result by monitoring that first bits of at least one first digit of a plurality of feature data and a plurality of weight data are all zeros, a partial sum data generator that skips an arithmetic operation that generates a first partial sum data corresponding to the first bits of a plurality of partial sum data in response to the monitoring result while performing the arithmetic operation of generating the plurality of partial sum data, based on the plurality of feature data and the plurality of weight data, and a shift adder that generates the first partial sum data with a zero value and result data, based on second partial sum data except for the first partial sum data among the plurality of partial sum data and the first partial sum data generated with the zero value.

SCHEDULING TASKS USING SWAP FLAGS
20230097760 · 2023-03-30 ·

A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.

SCHEDULING TASKS USING SWAP FLAGS
20230097760 · 2023-03-30 ·

A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.

SYNCHRONIZING SCHEDULING TASKS WITH ATOMIC ALU
20230033355 · 2023-02-02 ·

A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.

SYNCHRONIZING SCHEDULING TASKS WITH ATOMIC ALU
20230033355 · 2023-02-02 ·

A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.

Bypassing zero-value multiplications in a hardware multiplier

A device (e.g., integrated circuit chip) includes a first operand register, a second operand register, a multiplication unit, and a hardware logic component. The first operand register is configured to store a first operand value. The second operand register is configured to store a second operand value. The multiplication unit is configured to at least multiply the first operand value with the second operand value. The hardware logic component is configured to detect whether a zero value is provided and in response to a detection that the zero value is being provided: cause an update of at least the first operand register to be disabled, and cause a result of a multiplication of the first operand value with the second operand value to be a zero-value result.

Bypassing zero-value multiplications in a hardware multiplier

A device (e.g., integrated circuit chip) includes a first operand register, a second operand register, a multiplication unit, and a hardware logic component. The first operand register is configured to store a first operand value. The second operand register is configured to store a second operand value. The multiplication unit is configured to at least multiply the first operand value with the second operand value. The hardware logic component is configured to detect whether a zero value is provided and in response to a detection that the zero value is being provided: cause an update of at least the first operand register to be disabled, and cause a result of a multiplication of the first operand value with the second operand value to be a zero-value result.

Binary neural network accelerator engine methods and systems
11488002 · 2022-11-01 · ·

Disclosed are methods, apparatus and systems for a binary neural network accelerator engine. One example circuit is designed to perform a multiply-and-accumulate (MAC) operation using logic circuits that include a first set of exclusive nor (XNOR) gates to generate a product vector based on a bit-wise XNOR operation two vectors. The result is folded and operated on by another set of logic circuits that provide an output for a series of adder circuits. The MAC circuit can be implemented as part of binary neural network at a small footprint to effect power and cost savings.

Binary neural network accelerator engine methods and systems
11488002 · 2022-11-01 · ·

Disclosed are methods, apparatus and systems for a binary neural network accelerator engine. One example circuit is designed to perform a multiply-and-accumulate (MAC) operation using logic circuits that include a first set of exclusive nor (XNOR) gates to generate a product vector based on a bit-wise XNOR operation two vectors. The result is folded and operated on by another set of logic circuits that provide an output for a series of adder circuits. The MAC circuit can be implemented as part of binary neural network at a small footprint to effect power and cost savings.