Patent classifications
H03M7/6023
Lossless exponent and lossy mantissa weight compression for training deep neural networks
Systems, methods, and apparatuses are provided for compressing values. A plurality of parameters may be obtained from a memory, each parameter comprising a floating-point number that is used in a relationship between artificial neurons or nodes in a model. A mantissa value and an exponent value may be extracted from each floating-point number to generate a set of mantissa values and a set of exponent values. The set of mantissa values may be compressed to generate a mantissa lookup table (LUT) and a plurality of mantissa LUT index values. The set of exponent values may be encoded to generate an exponent LUT and a plurality of exponent LUT index values. The mantissa LUT, mantissa LUT index values, exponent LUT, and exponent LUT index values may be provided to one or more processing entities to train the model.
Storing data and parity via a computing system
A method includes generating a plurality of parity blocks from a plurality of lines of data blocks. The plurality of lines of data blocks are stored in data sections of memory of a cluster of computing devices of the computing system by distributing storage of individual data blocks of the plurality of lines of data blocks among unique data sections of the cluster of computing devices. The plurality of parity blocks are stored in parity sections of memory of the cluster of computing devices by distributing storage of parity blocks of the plurality of parity blocks among unique parity sections of the cluster of computing devices.
COMPRESSION CIRCUIT, STORAGE SYSTEM, AND COMPRESSION METHOD
According to one embodiment, a compression circuit generates substrings from input data for (3+M) cycles, the input data being N bytes per cycle, a byte length of each substring being greater than or equal to (N×(1+M)+1); obtains a set of matches, each of the matches including at least one past input data which input past and corresponds to at least a part of each of the substrings; selects a subset of matches from the set of matches including the input data of one cycle; and outputs the subset of matches. M is zero or a natural number. N is a positive integer which is two or more.
DATA COMPRESSION DEVICE, MEMORY SYSTEM AND METHOD
According to one embodiment, a data compression device includes a dictionary match determination unit, an extended matching generator, a match selector and a match connector. The dictionary match determination unit searches for first past input data matching first new input data. The extended matching generator compares second past input data subsequent to the first past input data with second new input data subsequent to the first new input data. The match selector generates compressed data by replacing a part of the input data with match information output from the dictionary match determination unit or the extended matching generator. The match connector replaces a plurality of match information in the compressed data with single match information.
SYSTEM AND METHOD FOR MITIGATING EFFECTS OF HASH COLLISIONS IN HARDWARE DATA COMPRESSION
Systems and methods are provided for mitigating effects of hash collisions in hardware data compression, for example reducing or avoiding the side effects of hash collisions, or reducing or avoiding slow downs caused by hash collisions. In an aspect, a processor-implemented method includes: hashing an input data byte sequence to produce a hash value, the input data byte sequence being located at a sequence address within an input data stream; and storing, in a hash table at a hash address corresponding to the hash value, the sequence address and a portion of the input data byte sequence. In an aspect, to further avoid hash collisions, hash memory accesses are distributed among a plurality of parallel hash banks to increase the throughput. Another aspect virtually extends a hash depth by extending a data match search around broken hash links, going backward in the data sequence.
Homogenizing data sparsity using a butterfly multiplexer
A data-sparsity homogenizer includes a plurality of multiplexers and a controller. The plurality of multiplexers receives 2.sup.N bit streams of non-homogenous sparse data in which the non-homogenous sparse data includes non-zero value data clumped together. The plurality of multiplexers is arranged in 2.sup.N rows and N columns. Each input of a multiplexer in a first column receives a respective bit stream of the 2.sup.N bit streams of non-homogenized sparse data, and the multiplexers in a last column output 2.sup.N bit streams of sparse data that is more homogenous than the non-homogenous sparse data of the 2.sup.N bit streams. The controller controls the plurality of multiplexers so that the multiplexers in the last column output the 2.sup.N channels of bit streams of sparse data that is more homogeneous than the non-homogenous sparse data of the 2.sup.N bit streams.
Methods and apparatus for thread-based scheduling in multicore neural networks
Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.
METHODS AND APPARATUS FOR THREAD-BASED SCHEDULING IN MULTICORE NEURAL NETWORKS
Systems, apparatus, and methods for thread-based scheduling within a multicore processor. Neural networking uses a network of connected nodes (aka neurons) to loosely model the neuro-biological functionality found in the human brain. Various embodiments of the present disclosure use thread dependency graphs analysis to decouple scheduling across many distributed cores. Rather than using thread dependency graphs to generate a sequential ordering for a centralized scheduler, the individual thread dependencies define a count value for each thread at compile-time. Threads and their thread dependency count are distributed to each core at run-time. Thereafter, each core can dynamically determine which threads to execute based on fulfilled thread dependencies without requiring a centralized scheduler.
SENSOR DATA COMPRESSION IN A PLASMA TOOL
Systems and methods for compressing data are described. One of the methods includes receiving a plurality of measurement signals from one or more sensors coupled to a radio frequency (RF) transmission path of a plasma tool. The RF transmission path is from an output of an RF generator to an electrode of a plasma chamber. The method includes converting the plurality of measurement signals from an analog form to a digital form to sample data and processing the data to reduce an amount of the data. The amount of the data is compressed to output compressed data. The method includes sending the compressed data to a controller for controlling the plasma tool.
Techniques to configure physical compute resources for workloads via circuit switching
Embodiments are generally directed apparatuses, methods, techniques and so forth to select two or more processing units of the plurality of processing units to process a workload, and configure a circuit switch to link the two or more processing units to process the workload, the two or more processing units each linked to each other via paths of communication and the circuit switch.