G06F9/544

Automatic memory management method, corresponding micro-controller unit and computer program product

Methods, microprocessors, and systems are provided for implementing an artificial neural network. Data buffers in virtual memory are coupled to respective processing layers in the artificial neural network. An ordered visiting sequence of layers of the artificial neural network is obtained. A virtual memory allocation schedule is produced as a function of the ordered visiting sequence of layers of the artificial neural network, the schedule including a set of instructions for memory allocation and deallocation operations applicable to the data buffers. A physical memory configuration dataset is computed as a function of the virtual memory allocation schedule for the artificial neural network, the dataset including sizes and addresses of physical memory locations for the artificial neural network.

Optimization method for graph processing based on heterogeneous FPGA data streams

The present invention relates to an optimization method for graph processing based on heterogeneous FPGA data streams. The method can balance processing loads between the CPU processing module and the FPGA processing module during acceleration of graph data processing.

Using a sharded distributed cache as a pipeline integration buffer
11463551 · 2022-10-04 · ·

Systems and methods of operating a distributed cache in a fast producer, slow consumer environment are disclosed. A system implements a distributed cache including a plurality of shards. Each shard includes a set of item containers selected from a plurality of containers. A first event related to a first item container in the set of item containers is received and the first item container is updated to include the first event. The first item container is positioned in at least one consumption queue. A second event related to the first item container in the set of item containers is received and the first item container is updated without changing the position of the first item container in the at least one consumption queue.

Systems and methods for improved neural network execution
11449363 · 2022-09-20 · ·

A method and system for computing one or more outputs of a neural network having a plurality of layers is provided. The method and system can include determining a plurality of sub-computations from total computations of the neural network to execute in parallel wherein the computations to execute in parallel involve computations from multiple layers. The method and system also can also include avoiding repeating overlapped computations and/or multiple memory reads and writes during execution.

METHODS AND APPARATUS FOR DATA PIPELINES BETWEEN CLOUD COMPUTING PLATFORMS

Methods, apparatus, systems and articles of manufacture are disclosed to establish a data pipeline between cloud computing platforms. An example apparatus includes at least one memory, machine readable instructions in the apparatus, and processor circuitry to execute the machine readable instructions to at least extract a data producer name from data, the data to be provided from a data producer to a data consumer, identify a buffer identifier based on a mapping of the data producer name to the buffer identifier, cause transmission of the data to a buffer associated with the buffer identifier, and cause transmission of the data from the buffer to the data consumer based on an association between the buffer identifier and a data consumer name, the data consumer name corresponding to the data consumer.

ADDRESS MAPPING BETWEEN SHARED MEMORY MODULES AND CACHE SETS
20220283936 · 2022-09-08 ·

A memory module system with a global shared context. A memory module system can include a plurality of memory modules and at least one processor, which can implement the global shared context. The memory modules of the system can provide the global shared context at least in part by providing an address space shared between the modules and applications running on the modules. The address space sharing can be achieved by having logical addresses global to the modules, and each logical address can be associated with a certain physical address of a specific module.

Service schedule optimization for background execution limits

Scheduling optimizations for services are described. In one example, a priority category, such as a high, low, or other priority category, can be determined for a service of an application executing on a computing device. If the application is running as a background application on the computing device, an exception to the start of the service can be returned by the operating system of the device, due to background execution limits on the device. In that case, the start of the service can be managed by a service manager of the application based on the priority category for the service. If the priority category for the service is high, the background application can call a foreground service. The call for the foreground service can bring the application to the foreground, and the service manager can again call for the start of the service after the foreground service is running.

Multiprocessor system and method for controlling shared memory
11422937 · 2022-08-23 · ·

A multiprocessor system includes a shared memory, and first and second processors. The shared memory includes a queue configured to store messages. The first processor transmits the messages to the shared memory. The second processor receives the messages stored in the shared memory. The first memory stores a first head pointer indicating a vacant position head of the queue and a first tail pointer indicating a vacant position tail of the queue. The second memory stores a second head pointer indicating a position of a head of the messages stored in the queue and a second tail pointer indicating a tail position of the messages stored in the queue. The first processor increments the first head pointer and copies a value identical to a value of the first head pointer to the second tail pointer, when transmitting the messages.

Pipeline arbitration

A method includes receiving, by a first stage in a pipeline, a first transaction from a previous stage in pipeline; in response to first transaction comprising a high priority transaction, processing high priority transaction by sending high priority transaction to a buffer; receiving a second transaction from previous stage; in response to second transaction comprising a low priority transaction, processing low priority transaction by monitoring a full signal from buffer while sending low priority transaction to buffer; in response to full signal asserted and no high priority transaction being available from previous stage, pausing processing of low priority transaction; in response to full signal asserted and a high priority transaction being available from previous stage, stopping processing of low priority transaction and processing high priority transaction; and in response to full signal being de-asserted, processing low priority transaction by sending low priority transaction to buffer.

Neural network internal data fast access memory buffer
11436486 · 2022-09-06 · ·

Systems, apparatuses, and methods for optimizing neural network training with a first-in, last-out (FILO) buffer are disclosed. A processor executes a training run of a neural network implementation by performing multiple passes and adjusting weights of the neural network layers on each pass. Each training phase includes a forward pass and a backward pass. During the forward pass, each layer, in order from first layer to last layer, stores its weights in the FILO buffer. An error is calculated for the neural network at the end of the forward pass. Then, during the backward pass, each layer, in order from last layer to first layer, retrieves the corresponding weights from the FILO buffer. Gradients are calculated based on the error so as to update the weights of the layer for the next pass through the neural network.