H03M7/702

Neural network accelerator with reconfigurable memory

Described herein is a neural network accelerator (NNA) with reconfigurable memory resources for forming a set of local memory buffers comprising at least one activation buffer, at least one weight buffer, and at least one output buffer. The NNA supports a plurality of predefined memory configurations that are optimized for maximizing throughput and reducing overall power consumption in different types of neural networks. The memory configurations differ with respect to at least one of a total amount of activation, weight, or output buffer memory, or a total number of activation, weight, or output buffers. Depending on which type of neural network is being executed and the memory behavior of the specific neural network, a memory configuration can be selected accordingly.

Reducing decompression time without impacting compression ratio

Aspects include computing devices, systems, and methods for implementing executing decompression of a compressed page. A computing device may determine a decompression block of a compressed page that contains a code instruction requested in a memory access request. Decompression blocks, other than the decompression block containing the requested code instruction, may be selected for decompression based on being situated between an end of the compressed page and the decompression block containing the requested code instruction. Decompression blocks not identified for decompression may be substituted for a fault or exception code. The computing device may decompress decompression blocks identified for decompression, starting at the end of the compressed page and terminating the decompression of the compressed page upon filling all blocks with decompressed blocks, faults, or exception code. The remaining decompression blocks of the compressed page may be decompressed after or concurrently with the execution of the requested code instruction.

DISTRIBUTABLE HASH FILTER FOR NONPROBABILISTIC SET INCLUSION
20250077498 · 2025-03-06 ·

In certain embodiments, a method includes recursively performing a procedure that includes using an allowed set of object identifiers and a hash function to update a bit array, using a disallowed set of object identifiers and the hash function to further update the bit array where collisions occur, repeating the process with a new allowed set that includes object identifiers from the original allowed set that collided with the disallowed set and a new hash function, until reaching a round where no collisions occurred, generating a data structure that includes the bit arrays created during each recursive round, and compressing the data structure.

Transformation functions for compression and decompression of data in computing environments and systems

One or more transformation functions can be used in connection or together with one or more compression/decompression techniques. A transformation function can transform data (e.g., a data object) into a form more suitable for compression and/or decompression. As a result, data can be compressed and/or decompressed more effectively. In addition, multiple data objects can be associated with various transformation functions and/or compression/decompression techniques. As a result, different approaches can be taken with respect to compression and decompression of data objects in an effort to find an optimum approach for compression of data objects that may vary significantly from each other and change over time. It will be appreciated that the objects can be associated with transformation functions in a dynamic manner to accommodate changes to data. Also, an extendible and/or extensible system can allow for growth and adaption of new data in forms not currently present or expected.

Neural network activation compression with non-uniform mantissas

Apparatus and methods for training a neural network accelerator using quantized precision data formats are disclosed, and in particular for storing activation values from a neural network in a compressed format having lossy or non-uniform mantissas for use during forward and backward propagation training of the neural network. In certain examples of the disclosed technology, a computing system includes processors, memory, and a compressor in communication with the memory. The computing system is configured to perform forward propagation for a layer of a neural network to produced first activation values in a first block floating-point format. In some examples, activation values generated by forward propagation are converted by the compressor to a second block floating-point format having a non-uniform and/or lossy mantissa. The compressed activation values are stored in the memory, where they can be retrieved for use during back propagation.

Electronic apparatus and method for controlling thereof

A method for controlling an electronic apparatus is provided. The method for controlling an electronic apparatus includes the steps of selecting a generic-purpose artificial intelligence model, generating a compressed artificial intelligence model based on the selected generic-purpose artificial intelligence model, and generating a dedicated artificial intelligence model based on the generated compressed artificial intelligence model, and the step of generating a compressed artificial intelligence model includes the steps of acquiring a rank of a singular value decomposition (SVD) algorithm based on a compression rate, compressing and training the selected generic-purpose artificial intelligence model based on the acquired rank and converting the model into the compressed artificial intelligence model, determining the performance of the converted compressed artificial intelligence model based on a predetermined first threshold value, and based on the performance of the converted compressed artificial intelligence model being lower than the predetermined first threshold value, generating the dedicated artificial intelligence model.

SYSTEMS AND METHODS FOR COMPRESSION OF ARTIFICIAL INTELLIGENCE
20250192803 · 2025-06-12 ·

Provided are systems, methods, and apparatuses for compression of artificial intelligence models. In one or more examples, the systems, devices, and methods include categorizing data based on an analysis of a distribution of the data, generating compressed data based on the data and on a compression algorithm that is selected based on the categorization, and storing the compressed data in a storage device. In one or more examples, the systems, devices, and methods include identifying an address associated with compressed data based on a request for the compressed data, determining a decompression algorithm based on the address, and decompressing the compressed data using the determined decompression algorithm.

DATA COMPRESSION FOR A NEURAL NETWORK
20250190788 · 2025-06-12 ·

Systems and methods for generating a representative value of a data set by first compressing a portion of values in the data set to determine a first common value and further compressing a subset of the portion of values to determine a second common value. The representative value is generated by taking the difference between the first common value and the second common value, wherein the representative value corresponds to a mathematical relationship between the first and second common values and each value within the subset of the portion of values. The representative value requires less storage than the first and second common values.

Neural network model compression with quantizability regularization
12362764 · 2025-07-15 · ·

A method, computer program, and computer system is provided for compressing a neural network model. A multi-dimensional tensor corresponding to a set of weight coefficients associated with a neural network is reshaped. A subset of weight coefficients is identified from among the set of weight coefficients. A model of the neural network is compressed based on the identified subset of weight coefficients.

ENCODING AND DECODING VARIABLE LENGTH INSTRUCTIONS
20250224958 · 2025-07-10 ·

Methods of encoding and decoding are described which use a variable number of instruction words to encode instructions from an instruction set, such that different instructions within the instruction set may be encoded using different numbers of instruction words. To encode an instruction, the bits within the instruction are re-ordered and formed into instruction words based upon their variance as determined using empirical or simulation data. The bits in the instruction words are compared to corresponding predicted values and some or all of the instruction words that match the predicted values are omitted from the encoded instruction.