Patent classifications
G06F12/0207
Memory device, semiconductor system, and data processing system
A memory device includes a memory cell array and a peripheral circuit. The memory cell array includes a plurality of memory regions each identified by a row address and a column address. The peripheral circuit accesses the memory cell array by performing, based on an address, a burst length and a burst address gap provided from a memory controller, a burst operation supporting a variable burst address gap. The burst address gap is a numerical difference between adjacent column addresses, on which the burst operation is to be performed.
Host performing an embedding operation and computing system including the same
A computing system capable of reducing data movement during an embedding operation and efficiently processing the embedding operation includes a host and a memory system. The host divides a plurality of feature tables, each including a respective plurality of embedding vectors for a respective plurality of elements, into a first feature table group and a second feature table group; generates a first embedding table configured of the first feature table group; and sends a request for a generation operation of a second embedding table configured of the second feature table group to the memory system. The memory system generates the second embedding table according to the generation operation request provided by the host. The host divides the plurality of feature tables into the first feature table group and the second feature table group based on the number of elements included in each of the plurality of feature tables.
System and method for memory management
Embodiments of the disclosure provide methods and systems for memory management. The method can include: receiving a request for allocating target node data to a memory space, wherein the memory space includes a buffer and an external memory and the target node data comprises property data and structural data and represents a target node of a graph having a plurality of nodes and edges; determining a node degree associated with the target node data; allocating the target node data to the memory space based on the determined node degree.
SELECTIVE DATA STRUCTURE ENCODING FOR DEEP NEURAL NETWORK TRAINING
Methods, systems, apparatuses, and computer-readable storage mediums described herein are directed to techniques for efficient data encoding for neural network training. In particular, the embodiments described herein train a DNN based on a selective encoding (e.g., compressing) of data structures that are generated during training. For example, multiple training sessions may be performed where, in each training session, a different set of data structures performed by various operators of the DNN are encoded. Memory allocation information generated based on each training session is analyzed to determine which combination of encoded data structures results in a reduction of memory required to train the DNN.
MEMORY MANAGEMENT FOR OVERLAP DATA BETWEEN TILES OF NEURAL NETWORKS
Techniques for providing an overlap data buffer to store portions of tiles between passes of chained layers of a neural network are described. One accelerator circuit includes one or more processing units to execute instructions corresponding to the chained layers in multiple passes. In a first pass, the processing unit(s) receives a first input tile of an input feature map from a primary buffer and performs a first operation on the first input tile to obtain a first output tile. The processing unit stores the first output tile in the primary buffer and identifies a portion of the first output tile as corresponding to overlap data between tiles of the input feature map. The processing unit stores the portion in a secondary buffer. In a second pass, the processing unit retrieves the portion to avoid fetching the portion that overlaps and computing the overlap data again.
Non-volatile memory based processors and dataflow techniques
A monolithic integrated circuit (IC) including one or more compute circuitry, one or more non-volatile memory circuits, one or more communication channels and one or more communication interface. The one or more communication channels can communicatively couple the one or more compute circuitry, the one or more non-volatile memory circuits and the one or more communication interface together. The one or more communication interfaces can communicatively couple one or more circuits of the monolithic integrated circuit to one or more circuits external to the monolithic integrated circuit.
High bandwidth memory system with crossbar switch for dynamically programmable distribution scheme
A system comprises a processor coupled to a plurality of memory units. Each of the plurality of memory units includes a request processing unit and a plurality of memory banks. Each request processing unit includes a plurality of decomposition units and a crossbar switch, the crossbar switch communicatively connecting each of the plurality of decomposition units to each of the plurality of memory banks. The processor includes a plurality of processing elements and a communication network communicatively connecting the plurality of processing elements to the plurality of memory units. At least a first processing element of the plurality of processing elements includes a control logic unit and a matrix compute engine. The control logic unit is configured to access the plurality of memory units using a dynamically programmable distribution scheme.
Parallel processing device
There is provided a parallel processing device which allows consecutive parallel data processing to be performed. The parallel processing device includes: a plurality of addition units configured to selectively receive input data among output data from the plurality of input units according to configuration values for each addition unit of the plurality of addition units, and perform addition operation for the input data in parallel; and the plurality of the delay units configured to delay input data for one cycle. Each delay unit of the plurality of the delay units delays output data from each addition unit of the plurality of addition units and outputs the delayed output data to each input unit of the plurality of input units.
Technologies for performing column architecture-aware scrambling
Technologies for scrambling functions in a column-addressable memory architecture includes a device having a memory and a circuitry. The memory includes a matrix storing individually addressable bit data, and the matrix is formed by rows and columns. The circuitry is to receive a request to perform a write operation of one or more bit values to one of the columns. The circuitry is further to determine a scrambler state at each location of the column, the location corresponding to a respective row and column index. The scrambler state is indicative of a function used to determine a value at the respective column location. Each of the bit values is scrambled as a function of the scrambler state for the respective column location and written thereto.
OPERATION DEVICE OF CONVOLUTIONAL NEURAL NETWORK, OPERATION METHOD OF CONVOLUTIONAL NEURAL NETWORK AND COMPUTER PROGRAM STORED IN A RECORDING MEDIUM TO EXECUTE THE METHOD THEREOF
A convolutional operation method of generating a feature data matrix corresponding to an output data matrix by performing a general matrix multiplication (GEMM) operation on an input data matrix with a set filter matrix includes updating, by at least one processor, a register mapping table so that first destination register addresses of redundant components indicating data redundant with each other among a plurality of components of the input data matrix correspond to a same second destination register address, and performing, by the at least one processor, a convolutional operation by reusing a register having the same second destination register address with respect to the redundant components, based on the register mapping table.