Patent classifications
H03M7/46
NEAR-OPTIMAL TRANSITION ENCODING CODES
A method of encoding input data includes dividing the input data into a plurality of data packets, an input packet of the plurality of data packets including a plurality of digits in a first base system, base-converting the input packet from the first base system to generate a base-converted packet including a plurality of converted digits in a second base system, the second base system having a base value lower than that of the first base system, and incrementing the converted digits to generate a coded packet for transmission through a communication channel.
NEAR-OPTIMAL TRANSITION ENCODING CODES
A method of encoding input data includes dividing the input data into a plurality of data packets, an input packet of the plurality of data packets including a plurality of digits in a first base system, base-converting the input packet from the first base system to generate a base-converted packet including a plurality of converted digits in a second base system, the second base system having a base value lower than that of the first base system, and incrementing the converted digits to generate a coded packet for transmission through a communication channel.
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format
Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.
Neural network activation compression with non-uniform mantissas
Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
Neural network activation compression with non-uniform mantissas
Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.
Data compression techniques
Techniques and solutions are described for compressing data and facilitating access to compressed data. Compression can be applied to proper data subsets of a data set, such as to columns of a table. Using various methods, the proper data subsets can be evaluated to be included in a group of proper data subsets to be compressed using a first compression technique, where unselected proper data subsets are not compressed using the first compression technique. Data in the data set can be reordered based on a reordering sequence for the proper data subsets. Reordering data in the data set can improve compression when at least a portion of the proper data subsets are compressed. A data structure is provided that facilitates accessing specified data stored in a compressed format.
Data compression techniques
Techniques and solutions are described for compressing data and facilitating access to compressed data. Compression can be applied to proper data subsets of a data set, such as to columns of a table. Using various methods, the proper data subsets can be evaluated to be included in a group of proper data subsets to be compressed using a first compression technique, where unselected proper data subsets are not compressed using the first compression technique. Data in the data set can be reordered based on a reordering sequence for the proper data subsets. Reordering data in the data set can improve compression when at least a portion of the proper data subsets are compressed. A data structure is provided that facilitates accessing specified data stored in a compressed format.
Weight data compression method, weight data decompression method, weight data compression device, and weight data decompression device
A weight data compression method includes: generating a 4-bit data string of 4-bit data items each expressed as any one of nine 4-bit values, by dividing ternary weight data into data items each having 4 bits; and generating first compressed data including a first flag value string and a first non-zero value string by (i) generating the first flag value string by assigning one of 0 and 1 as a first flag value of a 1-bit flag to a 4-bit data item 0000 and assigning an other of 0 and 1 as a second flag value of the 1-bit flag to a 4-bit data item other than 0000 among the 4-bit data items in the 4-bit data string and (ii) generating the first non-zero value string by converting the 4-bit data item other than 0000 into a 3-bit data item having any one of eight 3-bit values.
Feature reordering based on sparsity for improved memory compression transfers during machine learning jobs
A processing device for executing a machine learning neural network operation includes memory and a processor. The processor is configured to receive input data at a layer of the machine learning neural network operation, receive a plurality of sorted filters to be applied to the input data, apply the plurality of sorted filters to the input data to produce a plurality of different feature maps, compress the plurality of different feature maps according to a sparsity of the feature maps and store the plurality of different feature maps in the memory.