IPIQ

G06F7/533

Multiply-accumulate “0” data gating

11656846 · 2023-05-23 ·

Intel Corporation

In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

Apparatus and method using neural network

11625224 · 2023-04-11 ·

Canon Kabushiki Kaisha

An apparatus includes a first holding unit and a second holding unit configured to hold first-type data and second-type data, respectively, a first operation unit configured to execute a first product-sum operation based on the first-type data, a branch unit configured to output an operation result of the first product-sum operation in parallel, a sampling unit configured to sample the operation result and to output a sampling result, and a second operation unit configured to execute a second product-sum operation based on the second-type data and the sampling result.

Apparatus and method using neural network

11625224 · 2023-04-11 ·

Canon Kabushiki Kaisha

Neural processing accelerator

11645224 · 2023-05-09 ·

Samsung Electronics Co., Ltd.

A system for calculating. A scratch memory is connected to a plurality of configurable processing elements by a communication fabric including a plurality of configurable nodes. The scratch memory sends out a plurality of streams of data words. Each data word is either a configuration word used to set the configuration of a node or of a processing element, or a data word carrying an operand or a result of a calculation. Each processing element performs operations according to its current configuration and returns the results to the communication fabric, which conveys them back to the scratch memory.

Multiplier pipelining optimization with a bit folding correction

09778910 · 2017-10-03 ·

Intel Corporation

One embodiment provides a system. The system includes a register to store an operand; a multiplier; and optimizer logic to initiate a square/multiply stage to operate on the operand, initiate a reduction stage prior to completion of the square/multiply stage, and determine whether a carry propagation has occurred.

Multiplier pipelining optimization with a bit folding correction

09778910 · 2017-10-03 ·

Intel Corporation

METHOD AND APPARATUS FOR PERFORMING DEEP LEARNING OPERATIONS

20220035629 · 2022-02-03 ·

Samsung Electronics Co., Ltd.

Disclosed is an apparatus and method for performing deep learning operations. The apparatus includes a systolic array comprising multiplier accumulator (MAC) units, and a control circuit configured to control an operation of a multiplexer connected to at least one of the MAC units and operations of the MAC units according to a plurality of operation modes.

Method and apparatus for efficient binary and ternary support in fused multiply-add (FMA) circuits

11366636 · 2022-06-21 ·

Intel Corporation

An apparatus and method for efficiently performing a multiply add or multiply accumulate operation. For example, one embodiment of a processor comprises: a decoder to decode an instruction specifying an operation, the instruction comprising a first operand identifying a multiplier and a second operand identifying a multiplicand; and fused multiply-add (FMA) execution circuitry comprising first multiplication circuitry to perform a multiplication using the multiplicand and multiplier to generate a result for multipliers and multiplicands falling within a first precision range, and second multiplication circuitry to be used instead of the first multiplication circuitry for multipliers and multiplicands falling within a second precision range.

Neural processing accelerator

11360930 · 2022-06-14 ·

Samsung Electronics Co., Ltd.

Processing apparatus and processing method

11720353 · 2023-08-08 ·

SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD

The present disclosure provides a processing device and method. The device includes: an input/output module, a controller module, a computing module, and a storage module. The input/output module is configured to store and transmit input and output data; the controller module is configured to decode a computation instruction into a control signal to control other modules to perform operation; the computing module is configured to perform four arithmetic operation, logical operation, shift operation, and complement operation on data; and the storage module is configured to temporarily store instructions and data. The present disclosure can execute a composite scalar instruction accurately and efficiently.

Patent classifications

G06F7/533