Patent classifications
G06F9/355
STOCHASTIC HYPERDIMENSIONAL ARITHMETIC COMPUTING
Stochastic hyperdimensional arithmetic computing is provided. Hyperdimensional computing (HDC) is a neurally-inspired computation model working based on the observation that the human brain operates on high-dimensional representations of data, called hypervectors. Although HDC is powerful in reasoning and association of the abstract information, it is weak on feature extraction from complex data. Consequently, most existing HDC solutions rely on expensive pre-processing algorithms for feature extraction. This disclosure proposes StocHD, a novel end-to-end hyperdimensional system that supports accurate, efficient, and robust learning over raw data. StocHD expands HDC functionality to the computing area by mathematically defining stochastic arithmetic over HDC hypervectors. StocHD enables an entire learning application (including feature extractor) to process using HDC data representation, enabling uniform, efficient, robust, and highly parallel computation. This disclosure further provides a novel fully digital and scalable processing in-memory (PIM) architecture that exploits the HDC memory-centric nature to support extensively parallel computation.
Memory deallocation across a trust boundary
A method of memory deallocation across a trust boundary between a first software component and a second software component is described. Some memory is shared between the first and second software components. An in-memory message passing facility is implemented using the shared memory. The first software component is used to deallocate memory from the shared memory which has been allocated by the second software component. The deallocation is done by: taking at least one allocation to be freed from the message passing facility; and freeing the at least one allocation using a local deallocation mechanism while validating that memory access to memory owned by data structures related to memory allocation within the shared memory are within the shared memory.
Memory deallocation across a trust boundary
A method of memory deallocation across a trust boundary between a first software component and a second software component is described. Some memory is shared between the first and second software components. An in-memory message passing facility is implemented using the shared memory. The first software component is used to deallocate memory from the shared memory which has been allocated by the second software component. The deallocation is done by: taking at least one allocation to be freed from the message passing facility; and freeing the at least one allocation using a local deallocation mechanism while validating that memory access to memory owned by data structures related to memory allocation within the shared memory are within the shared memory.
True/false vector index registers and methods of populating thereof
Disclosed herein are vector index registers for storing or loading indexes of true and/or false results of comparison operations in vector processors. Each of the vector index registers store multiple addresses for accessing multiple positions in operand vectors.
METADATA PREDICTOR
Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.
METADATA PREDICTOR
Embodiments for a metadata predictor. An index pipeline generates indices in an index buffer in which the indices are used for reading out a memory device. A prediction cache is populated with metadata of instructions read from the memory device. A prediction pipeline generates a prediction using the metadata of the instructions from the prediction cache, the populating of the prediction cache with the metadata of the instructions being performed asynchronously to the operating of the prediction pipeline.
LOOK-UP TABLE READ
A digital data processor includes an instruction memory storing instructions specifying data processing operations and a data operand field, an instruction decoder coupled to the instruction memory for recalling instructions from the instruction memory and determining the operation and the data operand, and an operational unit coupled to a data register file and an instruction decoder to perform an operation upon an operand corresponding to an instruction decoded by the instruction decoder and storing results of the operation. The operational unit is configured to perform a table recall in response to a look up table read instruction by recalling data elements from a specified location and adjacent location to the specified location, in a specified number of at least one table and storing the recalled data elements in successive slots in a destination register. Recalled data elements include at least one interpolated data element in the adjacent location.
COMPUTING ARCHITECTURE
Computing architecture comprises an off-chip memory, an on-chip cache unit, a prefetching unit, a global scheduler, a transmitting unit, a pre-recombination network, a post-recombination network, a main computing array, a write-back cache unit, a data dependence controller and an auxiliary computing array. The architecture reads data tiles into an on-chip cache in a prefetching mode, and performs computing according to the data tiles; in the computing process of the tiles, a tile exchange network is adopted to recombine a data structure, and a data dependence module is arranged to process a data dependence relationship possibly existing between different tiles. According to the computing architecture, the data utilization rate can be increased, the data processing flexibility is improved, and therefore Cache Miss is reduced, and the memory bandwidth pressure is reduced.
METHOD OF EXECUTING OPERATION, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
A method of executing an operation in a deep learning training, an electronic device, and a computer-readable storage medium, which relate to a field of artificial intelligence, especially to a field of deep learning. The method of executing an operation in a deep learning training includes: acquiring an instruction for the operation including a plurality of vector operations; determining, for each vector operation of the plurality of vector operations, two source operand vectors for a comparison; and executing the vector operation on the two source operand vectors using an instruction format for the vector operation, so as to obtain an operation result including a destination operand vector.
IMMEDIATE OFFSET OF LOAD STORE AND ATOMIC INSTRUCTIONS
One embodiment provides a graphics processor including a processing resource including a register file, memory, a cache memory, and load/store/cache circuitry to process load, store, and prefetch messages from the processing resource. The circuitry includes support for an immediate address offset that will be used to adjust the address supplied for a memory access to be requested by the circuitry. Including support for the immediate address offset removes the need to execute additional instructions to adjust the address to be accessed prior to execution of the memory access instruction.