Patent classifications
G06F7/5095
LSTM circuit with selective input computation
An apparatus is described. The apparatus includes a long short term memory (LSTM) circuit having a multiply accumulate circuit (MAC). The MAC circuit has circuitry to rely on a stored product term rather than explicitly perform a multiplication operation to determine the product term if an accumulation of differences between consecutive, preceding input values has not reached a threshold.
MULTIPLY ACCUMULATE (MAC) UNIT WITH SPLIT ACCUMULATOR
In a multiply accumulate (MAC) unit, an accumulator may be implemented in two or more stages. For example, a first accumulator may accumulate products from the multiplier of the MAC unit, and a second accumulator may periodically accumulate the running total of the first accumulator. Each time the first accumulator's running total is accumulated by the second accumulator, the first accumulator may be initialized to begin a new accumulation period. In one embodiment, the number of values accumulated by the first accumulator within an accumulation period may be a user-adjustable parameter. In one embodiment, the bit width of the input of the second accumulator may be greater than the bit width of the output of the first accumulator. In another embodiment, an adder may be shared between the first and second accumulators, and a multiplexor may switch the accumulation operations between the first and second accumulators.
Apparatus and method for performing accumulation operations
An apparatus has processing circuitry to perform an accumulation operation in which a first addend is added to a second addend. The apparatus has storage circuitry to store the second addend in a plurality of lanes, each lane having a significance different to that of each other lane. Each lane within at least a subset of the lanes comprises at least one overlap bit having the same bit significance as a bit in an adjacent more significant lane in the plurality of lanes. The accumulation operation includes selecting an accumulating lane out of the plurality of lanes and performing an addition operation between bits of the accumulating lane and the first addend. The at least one overlap bit of the accumulating lane enables the addition operation to be performed without a possibility of overflowing the accumulating lane.
Accumulation Systems and Methods
Example accumulation systems and methods are described. In one implementation, data is received for processing. A multiplication operation is performed on the received data to generate multiplied data. An addition operation is performed on the multiplied data to generate a result. At least a portion of the least significant bits of the result are stored in a first region of an accumulation buffer of a convolution core. And, at least a portion of the remaining bits of the result are stored in a shared memory that is separate from the convolution core.