Patent classifications
G06F9/544
Memory access instructions
Processing circuitry performs processing operations specified by program instructions, and a decoder decodes memory access instructions to generate control signals to control the processing circuitry to perform memory access operations. The memory access instructions have respective encodings specifying protected memory access instructions corresponding to protected memory access operations and less-protected memory access instructions corresponding to less-protected memory access operations. The less-protected memory access operations are associated with less restrictive memory access conditions than the protected memory access operations.
Efficient sharing and compression expansion of data across processing systems
A mechanism is described for facilitating sharing of data and compression expansion of models at autonomous machines. A method of embodiments, as described herein, includes detecting a first processor processing information relating to a neural network at a first computing device, where the first processor comprises a first graphics processor and the first computing device comprises a first autonomous machine. The method further includes facilitating the first processor to store one or more portions of the information in a library at a database, where the one or more portions are accessible to a second processor of a computing device.
Memory device for swapping data and operating method thereof
An operating method of a memory device, which includes a first memory region and a second memory region, includes reading first data from the first memory region and storing the read first data in a data buffer block, performing a first XOR operation on the first data provided from the data buffer block and second data read from the second memory region to generate first result data, writing the first data stored in the data buffer block in the second memory region, performing a second XOR operation on the first data and the first result data to generate the second data, storing the generated second data in the data buffer block, and writing the second data stored in the data buffer block in the first memory region.
Processing unit and method for computing a convolution using a hardware-implemented spiral algorithm
Disclosed is a processing unit for computing a convolution of an activations matrix (e.g., a N×N activations matrix) and a weights kernel (e.g., a M×M weights kernel). The processing unit specifically employs an array of processing elements and a hardware-implemented spiral algorithm to compute the convolution. Due to this spiral algorithm, the need for a discrete data setup logic block is avoided, activation values from the activations matrix can be pre-loaded into processing elements only one time so that the need to repeatedly access the activations matrix is avoided, and the computation can be completed in a relatively low number of clock cycles, which is independent of the number of activation values in the activation matrix and which is equal to the number of weight values in a weights kernel. Also disclosed is an associated processing method.
Data layout optimization on processing in memory architecture for executing neural network model
The present disclosure relates to a method for scheduling a computation graph on a processing in memory (PIM) enabled device comprising a memory block assembly. The method comprises allocating a first node of the computation graph on a first memory block of a first array of memory blocks in the memory block assembly and allocating a second node of the computation graph on a second memory block of a second array of memory blocks in the memory block assembly, wherein output data of the first node is used for executing the second node. The memory block assembly can be configured to support data transfer from the first memory block to the second memory block via an internal data coupling in the memory block assembly.
Efficient High Bandwidth Shared Memory Architectures for Parallel Machine Learning and AI Processing of Large Data Sets and Streams
The present disclosure relates to systems and methods to implement efficient high-bandwidth shared memory systems particularly suited for parallelizing and operating large scale machine learning and AI computing systems necessary to efficiently process high volume data sets and streams.
METHOD AND SYSTEM FOR PROCESSING A STREAM OF INCOMING MESSAGES SENT FROM A SPECIFIC INPUT MESSAGE SOURCE AND VALIDATING EACH INCOMING MESSAGE OF THAT STREAM BEFORE SENDING THEM TO A SPECIFIC TARGET SYSTEM
Methods and systems are provided for processing a stream of incoming messages sent from a specific input message source and validating each incoming message of that stream before sending them to a specific target system.
MEMORY DEALLOCATION ACROSS A TRUST BOUNDARY
A method of memory deallocation across a trust boundary between a first software component and a second software component is described. Some memory is shared between the first and second software components. An in-memory message passing facility is implemented using the shared memory. The first software component is used to deallocate memory from the shared memory which has been allocated by the second software component. The deallocation is done by: taking at least one allocation to be freed from the message passing facility; and freeing the at least one allocation using a local deallocation mechanism while validating that memory access to memory owned by data structures related to memory allocation within the shared memory are within the shared memory.
SYSTEMS AND METHODS OF SUBJECT STATE CHANGE NOTIFICATION
Systems and methods for notifying one or more observers of one or more state changes are provided. For instance, at least one subject can be configured to write data to a buffer in a shared memory space. One or more observers can have an associated notification group that includes one or more buffer identifiers corresponding to a buffer in the shared memory space. A scheduler can be configured to detect one or more state changes associated with one or more buffers in the shared memory space and to provide an update notification to at least one observer based at least in part on the one or more detected state changes.
INTER-PROCESS COMMUNICATION METHOD AND APPARATUS
Example inter-process communication methods and apparatus are descried. One example method includes writing, based on a calling convention, communication information into a register group of a first execution entity of a sending process. First data of target data is written into a stack memory of the first execution entity. The communication information includes a first value indicating a data amount of the first data, and the target data is data to be sent by the sending process to a first receiving process. Information in the register group of the first execution entity is replicated to a context of the first execution entity. The first data from the stack memory of the first execution entity is replicated to a stack memory of a second execution entity of the first receiving process based on the first value in the context of the first execution entity and the address of the stack memory.