Patent classifications
G06F7/50
USING SPARSITY METADATA TO REDUCE SYSTOLIC ARRAY POWER CONSUMPTION
A processing apparatus can include a general-purpose parallel processing engine comprising a matrix accelerator including a multi-stage systolic array, where each stage includes multiple processing elements associated with multiple processing channels. The multiple processing elements are configured to receive output sparsity metadata that is independent of input sparsity of input matrix elements and perform processing operations on the input matrix elements based on the output sparsity metadata.
USING SPARSITY METADATA TO REDUCE SYSTOLIC ARRAY POWER CONSUMPTION
A processing apparatus can include a general-purpose parallel processing engine comprising a matrix accelerator including a multi-stage systolic array, where each stage includes multiple processing elements associated with multiple processing channels. The multiple processing elements are configured to receive output sparsity metadata that is independent of input sparsity of input matrix elements and perform processing operations on the input matrix elements based on the output sparsity metadata.
PARTIAL SUM COMPRESSION
A method for performing a neural network operation. In some embodiments, method includes: calculating a first plurality of products, each of the first plurality of products being the product of a weight and an activation; calculating a first partial sum, the first partial sum being the sum of the products; and compressing the first partial sum to form a first compressed partial sum.
PARTIAL SUM COMPRESSION
A method for performing a neural network operation. In some embodiments, method includes: calculating a first plurality of products, each of the first plurality of products being the product of a weight and an activation; calculating a first partial sum, the first partial sum being the sum of the products; and compressing the first partial sum to form a first compressed partial sum.
COMPUTE IN MEMORY-BASED MACHINE LEARNING ACCELERATOR ARCHITECTURE
Certain aspects of the present disclosure provide techniques for processing machine learning model data with a machine learning task accelerator, including: configuring one or more signal processing units (SPUs) of the machine learning task accelerator to process a machine learning model; providing model input data to the one or more configured SPUs; processing the model input data with the machine learning model using the one or more configured SPUs; and receiving output data from the one or more configured SPUs.
COMPUTE IN MEMORY-BASED MACHINE LEARNING ACCELERATOR ARCHITECTURE
Certain aspects of the present disclosure provide techniques for processing machine learning model data with a machine learning task accelerator, including: configuring one or more signal processing units (SPUs) of the machine learning task accelerator to process a machine learning model; providing model input data to the one or more configured SPUs; processing the model input data with the machine learning model using the one or more configured SPUs; and receiving output data from the one or more configured SPUs.
CONFIGURABLE COMPUTING UNIT WITHIN MEMORY
A configurable computing unit within memory including a first input transistor, a first weight transistor, a first resistor, a second input transistor, a second weight transistor, and a second resistor is provided. The first input transistor, the first weight transistor, and the first resistor are coupled in series between a first readout bit line and a common signal line. The first input transistor is coupled to a first input bit line, and the first weight transistor receives a first weight bit. The second input transistor, the second weight transistor, and the second resistor are coupled in series between the first readout bit line and the common signal line. The second input transistor is coupled to a second input bit line, and the second weight transistor receives the second weight bit.
CONFIGURABLE COMPUTING UNIT WITHIN MEMORY
A configurable computing unit within memory including a first input transistor, a first weight transistor, a first resistor, a second input transistor, a second weight transistor, and a second resistor is provided. The first input transistor, the first weight transistor, and the first resistor are coupled in series between a first readout bit line and a common signal line. The first input transistor is coupled to a first input bit line, and the first weight transistor receives a first weight bit. The second input transistor, the second weight transistor, and the second resistor are coupled in series between the first readout bit line and the common signal line. The second input transistor is coupled to a second input bit line, and the second weight transistor receives the second weight bit.
MEMORY DEVICE INCLUDING TERNARY MEMORY CELL
Provided is a memory device for a logic-in-memory. The memory cell includes: a ternary memory cell for storing ternary data: and a weight cell for controlling a current flowing in an operation line on the basis of a weight signal transmitted from the ternary memory cell and an activation signal transmitted via an activation line, wherein the weight cell includes a first transistor for receiving an input of weight data from a first node corresponding to a stored value of the ternary memory cell, a second transistor for receiving an input of inversed weight data from a second node corresponding to an inversed stored value of the ternary memory cell, and a third transistor for receiving an input of an activation signal transmitted via the activation line.
MEMORY DEVICE INCLUDING TERNARY MEMORY CELL
Provided is a memory device for a logic-in-memory. The memory cell includes: a ternary memory cell for storing ternary data: and a weight cell for controlling a current flowing in an operation line on the basis of a weight signal transmitted from the ternary memory cell and an activation signal transmitted via an activation line, wherein the weight cell includes a first transistor for receiving an input of weight data from a first node corresponding to a stored value of the ternary memory cell, a second transistor for receiving an input of inversed weight data from a second node corresponding to an inversed stored value of the ternary memory cell, and a third transistor for receiving an input of an activation signal transmitted via the activation line.