H03M7/702

Methods and apparatus for thread-based scheduling in multicore neural networks
11775810 · 2023-10-03 · ·

Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.

Compression technique for deep neural network weights

Various embodiments include methods and devices for compression and decompression of weight data sets. Some embodiments may include compressing weight data by receiving a weight data set of binary numbers representing weight values, generating a frame payload including a compressed first frame of a first subset of the weight values in the weight data set, and generating a block of compressed weight data having the frame payload. Some embodiments may include decompressing weight data by retrieving a block of compressed weight data, in which the block of compressed weight data includes a frame header associated with a frame payload, in which the frame header includes a normalization factor indicator, and in which the frame payload includes compressed weight values, and generating a first decompressed frame comprising decompressed weight values of the compressed weight values of the frame payload.

Multi-pixel caching scheme for lossless encoding
11653009 · 2023-05-16 ·

Systems and methods are provided for encoding a multi-pixel caching scheme for lossless encoders. The systems and methods can include obtaining a sequence of pixels, determining repeating sub-sequences of the sequence of pixels consisting of a single repeated pixel and non-repeating sub-sequences of the sequence of pixels, responsive to the determination, encoding the repeating sub-sequences using a run-length of the repeated pixel and encoding the non-repeating sub-sequences using a multi-pixel cache, wherein the encoding using a multi-pixel cache comprises, encoding non-repeating sub-sequences stored in the multi-pixel cache as the location of the non-repeating sub-sequences in the multi-pixel cache, and encoding non-repeating sub-sequences not stored in the multi-pixel cache using the value of the pixels in the non-repeating sub-sequences.

METHODS AND APPARATUS FOR THREAD-BASED SCHEDULING IN MULTICORE NEURAL NETWORKS
20230153595 · 2023-05-18 · ·

Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.

Decompression and compression of neural network data using different compression schemes
11537853 · 2022-12-27 · ·

Described herein is a neural network accelerator (NNA) with a decompression unit that can be configured to perform multiple types of decompression. The decompression may include a separate subunit for each decompression type. The subunits can be coupled to form a pipeline in which partially decompressed results generated by one subunit are input for further decompression by another subunit. Depending on which types of compression were applied to incoming data, any number of the subunits may be used to produce a decompressed output. In some embodiments, the decompression unit is configured to decompress data that has been compressed using a zero value compression scheme, a shared value compression scheme, or both. The NNA can also include a compression unit implemented in a manner similar to that of the decompression unit.

MATRIX MULTIPLICATION AND ACCUMULATION OPERATIONS ON COMPRESSED MATRICES

Apparatuses, systems, and techniques to perform an operation to indicate one or more non-zero values within one or more matrices of data; to perform an API to compress one or more matrices of data; to perform a matrix multiply accumulate (MMA) operation on two or more matrices of data, wherein at least one of the two or more matrices contain compressed data; and/or to perform an API to decompress one or more matrices of data. In at least one embodiment, one or more circuits are configured to receive and compile one or more instructions to perform computational operations for a sparse matrix multiplication.

APPLICATION PROGRAMMING INTERFACE TO COMPRESS DATA

Apparatuses, systems, and techniques to perform an operation to indicate one or more non-zero values within one or more matrices of data; to perform an API to compress one or more matrices of data; to perform a matrix multiply accumulate (MMA) operation on two or more matrices of data, wherein at least one of the two or more matrices contain compressed data; and/or to perform an API to decompress one or more matrices of data. In at least one embodiment, one or more circuits are configured to receive and compile one or more instructions to perform computational operations for a sparse matrix multiplication.

PERFORMING MATRIX VALUE INDICATION

Apparatuses, systems, and techniques to perform an operation to indicate one or more non-zero values within one or more matrices of data; to perform an API to compress one or more matrices of data; to perform a matrix multiply accumulate (MMA) operation on two or more matrices of data, wherein at least one of the two or more matrices contain compressed data; and/or to perform an API to decompress one or more matrices of data. In at least one embodiment, one or more circuits are configured to receive and compile one or more instructions to perform computational operations for a sparse matrix multiplication.

Encoding and Decoding Variable Length Instructions
20220261248 · 2022-08-18 ·

Methods of encoding and decoding are described which use a variable number of instruction words to encode instructions from an instruction set, such that different instructions within the instruction set may be encoded using different numbers of instruction words. To encode an instruction, the bits within the instruction are re-ordered and formed into instruction words based upon their variance as determined using empirical or simulation data. The bits in the instruction words are compared to corresponding predicted values and some or all of the instruction words that match the predicted values are omitted from the encoded instruction.

Compression Technique For Deep Neural Network Weights
20220321143 · 2022-10-06 ·

Various embodiments include methods and devices for compression and decompression of weight data sets. Some embodiments may include compressing weight data by receiving a weight data set of binary numbers representing weight values, generating a frame payload including a compressed first frame of a first subset of the weight values in the weight data set, and generating a block of compressed weight data having the frame payload. Some embodiments may include decompressing weight data by retrieving a block of compressed weight data, in which the block of compressed weight data includes a frame header associated with a frame payload, in which the frame header includes a normalization factor indicator, and in which the frame payload includes compressed weight values, and generating a first decompressed frame comprising decompressed weight values of the compressed weight values of the frame payload.