Patent classifications
H03M7/6058
Techniques to enable stateful decompression on hardware decompression acceleration engines
A hardware decompression acceleration engine including: an input buffer for receiving to-be-decompressed data from a software layer of a host computer; a decompression processing unit coupled to the input buffer for decompressing the to-be-decompressed data, the decompression processing unit further receiving first and second flags from the software layer of the host computer, wherein the first flag is indicative of a location of the to-be-decompressed data in a to-be-decompressed data block and the second flag is indicative of a presence of an intermediate state; and an output buffer for storing decompressed data from the decompression processing unit.
Flexible hardware for high throughput vector dequantization with dynamic vector length and codebook size
The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the fly as part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.
Neural network processor using compression and decompression of activation data to reduce memory bandwidth utilization
A deep neural network (“DNN”) module compresses and decompresses neuron-generated activation data to reduce the utilization of memory bus bandwidth. The compression unit receives an uncompressed chunk of data generated by a neuron in the DNN module. The compression unit generates a mask portion and a data portion of a compressed output chunk. The mask portion encodes the presence and location of the zero and non-zero bytes in the uncompressed chunk of data. The data portion stores truncated non-zero bytes from the uncompressed chunk of data. A decompression unit receives a compressed chunk of data from memory in the DNN processor or memory of an application host. The decompression unit decompresses the compressed chunk of data using the mask portion and the data portion.
STORAGE DEVICE AND OPERATING METHOD OF THE STORAGE DEVICE
A storage device may include: a memory device for extracting bits having a first logic value among bits included in data received from outside the memory device, generating a plurality of compressed data chunks including the bits comprising the first logic value and position information representing positions of the bits having the first logic value in the data, and outputting the plurality of compressed data chunks in response to a data output command; and a memory controller for receiving the plurality of compressed data chunks from the memory device, and recovering the data, based on the bits having the first logic value, which are included in the plurality of compressed data, and the position information.
Dynamically partitioning workload in a deep neural network module to reduce power consumption
A deep neural network (DNN) module is disclosed that can dynamically partition neuron workload to reduce power consumption. The DNN module includes neurons and a group partitioner and scheduler unit. The group partitioner and scheduler unit divides a workload for the neurons into partitions in order to maximize the number of neurons that can simultaneously process the workload. The group partitioner and scheduler unit then assigns a group of neurons to each of the partitions. The groups of neurons in the DNN module process the workload in their assigned partition to generate a partial output value. The neurons in each group can then sum their partial output values to generate a final output value for the workload. The neurons can be powered down once the groups of neurons have completed processing their assigned workload to reduce power consumption.
Enhancing processing performance of artificial intelligence/machine hardware by data sharing and distribution as well as reuse of data in neuron buffer/line buffer
An exemplary artificial intelligence/machine learning hardware computing environment having an exemplary DNN module cooperating with one or more memory components can perform data sharing and distribution as well reuse of a buffer data to reduce the number of memory component read/writes thereby enhancing overall hardware performance and reducing power consumption. Illustratively, data from a cooperating memory component is read according to a selected operation of the exemplary hardware and written to corresponding other memory component for use by one or more processing elements (e.g., neurons). The data is read in such a manner to optimize the engagement of the one or more processing elements for each processing cycle as well as to reuse data previously stored in the one or more cooperating memory components. Operatively, the written data is copied to a shadow memory buffer prior to being consumed by the processing elements.
Reducing power consumption in a neural network processor by skipping processing operations
A deep neural network (“DNN”) module can determine whether processing of certain values in an input buffer or a weight buffer by neurons can be skipped. For example, the DNN module might determine whether neurons can skip the processing of values in entire columns of a neuron buffer. Processing of these values might be skipped if an entire column of an input buffer or a weight buffer are zeros, for example. The DNN module can also determine whether processing of single values in rows of the input buffer or the weight buffer can be skipped (e.g. if the values are zero). Neurons that complete their processing early as a result of skipping operations can assist other neurons with their processing. A combination operation can be performed following the completion of processing that transfers the results of the processing operations performed by a neuron to their correct owner.
Data compression using reduced numbers of occurrences
Systems, apparatus and methods are provided for compressing data. A method may include receiving an input data block to be compressed, determining numbers of occurrences for distinct symbols in the input data block, generating reduced numbers of occurrences for the distinct symbols based on the numbers of occurrences for the distinct symbols and encoding the input data block using the reduced numbers of occurrences as probability distribution of the distinct symbols in the input data block.
Accelerating memory compression of a physically scattered buffer
Embodiments herein describe using compression engines in a processor subsystem to compress only the data fragments stored locally. That is, an application may be allocated a buffer where the physical memory of that buffer is spread across multiple processor subsystems. Rather than asking a single actor (e.g., a single host processor or compression engine) to compress all the fragments of the buffer, a compression library can instead instruct the individual compression engines in each of the processor subsystems to compress only the fragments stored in local memory in the same processor subsystem. Doing so leverages the memory affinity between the compression engines in the local memory which can reduce the overall time required to perform compression.
Minimizing memory reads and increasing performance by leveraging aligned blob data in a processing unit of a neural network environment
The performance of a neural network (NN) and/or deep neural network (DNN) can be limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. By inserting a selected padding in the input data to align the input data in memory, data read/writes can be optimized for processing by the NN/DNN thereby enhancing the overall performance of a NN/DNN. Operatively, an operations controller/iterator can generate one or more instructions that inserts the selected padding into the data. The data padding can be calculated using various characteristics of the input data as well as the NN/DNN as well as characteristics of the cooperating memory components. Padding on the output data can be utilized to support the data alignment at the memory components and the cooperating processing units of the NN/DNN.