Patent classifications
H03M7/46
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT
Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.
DATA COMPRESSION APPARATUS, DATA DECOMPRESSION APPARATUS, DATA COMPRESSION METHOD, DATA DECOMPRESSION METHOD, AND COMPUTER READABLE MEDIUM
A data compression apparatus of the invention includes a data acquisition unit to acquire n integers from encoding data, an integer division unit to divide each integer of the n integers into a second integer represented by low-order bits whose number of divided bits is b and a first integer represented by high-order bits obtained by excluding the low-order bits from each integer of the n integers and to output n first integers and n second integers, a first encoding unit to encode and output the n first integers as a first code represented by binary data having a number of bits that is a natural-number times the number of unit bits of L, and a second encoding unit to encode and output the n second integers as a second code.
Method and apparatus for compressing weights of neural network
A method of compressing weights of a neural network includes compressing a weight set including the weights of a the neural network, determining modified weight sets by changing at least one of the weights, calculating compression efficiency values for the determined modified weight sets based on a result of compressing the weight set and results of compressing the determined modified weight sets, determining a target weight of the weights satisfying a compression efficiency condition among the weights based on the calculated compression efficiency values, and determining a final compression result by compressing the weights based on a result of replacing the determined target weight.
Parallel Processing of Data Having Data Dependencies for Accelerating the Launch and Performance of Operating Systems and Other Computing Applications
Representative embodiments are disclosed for a rapid and highly parallel decompression of compressed executable and other files, such as executable files for operating systems and applications, having compressed blocks including run length encoded (“RLE”) data having data-dependent references. An exemplary embodiment includes a plurality of processors or processor cores to identify a start or end of each compressed block; to partially decompress, in parallel, a selected compressed block into independent data, dependent (RLE) data, and linked dependent (RLE) data; to sequence the independent data, dependent (RLE) data, and linked dependent (RLE) data from a plurality of partial decompressions of a plurality of compressed blocks, to obtain data specified by the dependent (RLE) data and linked dependent (RLE) data, and to insert the obtained data into a corresponding location in an uncompressed file. The representative embodiments are also applicable to other types of data processing for applications having data dependencies.
Parallel Processing of Data Having Data Dependencies for Accelerating the Launch and Performance of Operating Systems and Other Computing Applications
Representative embodiments are disclosed for a rapid and highly parallel decompression of compressed executable and other files, such as executable files for operating systems and applications, having compressed blocks including run length encoded (“RLE”) data having data-dependent references. An exemplary embodiment includes a plurality of processors or processor cores to identify a start or end of each compressed block; to partially decompress, in parallel, a selected compressed block into independent data, dependent (RLE) data, and linked dependent (RLE) data; to sequence the independent data, dependent (RLE) data, and linked dependent (RLE) data from a plurality of partial decompressions of a plurality of compressed blocks, to obtain data specified by the dependent (RLE) data and linked dependent (RLE) data, and to insert the obtained data into a corresponding location in an uncompressed file. The representative embodiments are also applicable to other types of data processing for applications having data dependencies.
USING PREDICATES IN CONDITIONAL TRANSCODER FOR COLUMN STORE
A storage device is disclosed. The storage device may comprise storage for input encoded data. A controller may process read requests and write requests from a host computer on the data in the storage. An in-storage compute controller may receive a predicate from the host computer to be applied to the input encoded data. A transcoder may include an index mapper to map an input dictionary to an output dictionary, with one entry in the input dictionary mapped to an entry in the output dictionary, and another entry in the input dictionary mapped to a “don't care” entry in the output dictionary.
USING PREDICATES IN CONDITIONAL TRANSCODER FOR COLUMN STORE
A storage device is disclosed. The storage device may comprise storage for input encoded data. A controller may process read requests and write requests from a host computer on the data in the storage. An in-storage compute controller may receive a predicate from the host computer to be applied to the input encoded data. A transcoder may include an index mapper to map an input dictionary to an output dictionary, with one entry in the input dictionary mapped to an entry in the output dictionary, and another entry in the input dictionary mapped to a “don't care” entry in the output dictionary.
Parallel processing of data having data dependencies for accelerating the launch and performance of operating systems and other computing applications
Representative embodiments are disclosed for a rapid and highly parallel decompression of compressed executable and other files, such as executable files for operating systems and applications, having compressed blocks including run length encoded (“RLE”) data having data-dependent references. An exemplary embodiment includes a plurality of processors or processor cores to identify a start or end of each compressed block; to partially decompress, in parallel, a selected compressed block into independent data, dependent (RLE) data, and linked dependent (RLE) data; to sequence the independent data, dependent (RLE) data, and linked dependent (RLE) data from a plurality of partial decompressions of a plurality of compressed blocks, to obtain data specified by the dependent (RLE) data and linked dependent (RLE) data, and to insert the obtained data into a corresponding location in an uncompressed file. The representative embodiments are also applicable to other types of data processing for applications having data dependencies.
Parallel processing of data having data dependencies for accelerating the launch and performance of operating systems and other computing applications
Representative embodiments are disclosed for a rapid and highly parallel decompression of compressed executable and other files, such as executable files for operating systems and applications, having compressed blocks including run length encoded (“RLE”) data having data-dependent references. An exemplary embodiment includes a plurality of processors or processor cores to identify a start or end of each compressed block; to partially decompress, in parallel, a selected compressed block into independent data, dependent (RLE) data, and linked dependent (RLE) data; to sequence the independent data, dependent (RLE) data, and linked dependent (RLE) data from a plurality of partial decompressions of a plurality of compressed blocks, to obtain data specified by the dependent (RLE) data and linked dependent (RLE) data, and to insert the obtained data into a corresponding location in an uncompressed file. The representative embodiments are also applicable to other types of data processing for applications having data dependencies.
Dynamic sequencing of data partitions for optimizing memory utilization and performance of neural networks
Optimized memory usage and management is crucial to the overall performance of a neural network (NN) or deep neural network (DNN) computing environment. Using various characteristics of the input data dimension, an apportionment sequence is calculated for the input data to be processed by the NN or DNN that optimizes the efficient use of the local and external memory components. The apportionment sequence can describe how to parcel the input data (and its associated processing parameters—e.g., processing weights) into one or more portions as well as how such portions of input data (and its associated processing parameters) are passed between the local memory, external memory, and processing unit components of the NN or DNN. Additionally, the apportionment sequence can include instructions to store generated output data in the local and/or external memory components so as to optimize the efficient use of the local and/or external memory components.