G06F2212/254

ON-DEMAND SHARED DATA CACHING METHOD, COMPUTER PROGRAM, AND COMPUTER READABLE MEDIUM APPLICABLE FOR DISTRIBUTED DEEP LEARNING COMPUTING
20230236980 · 2023-07-27 ·

Disclosed are an on-demand shared data caching method, a computer program, and a computer readable medium applicable for distributed deep learning computing. The method includes a step of dynamically building a distributed shared memory cache space, in which a distributed shared memory deployment and data file access management module is added to a deep learning framework to build the distributed shared memory cache space by a memory set of a multiple of computing nodes of a cluster computer; and a distributed deep learning computing step, in which the computing node overrides a Dataset API of the deep learning framework to execute the distributed deep learning computing. When reading a data file, if the data file exists in the distributed shared memory cache space, then it will be accessed directly, or else it will be obtained from an original specified directory location and stored in the distributed shared memory cache space.

Parallel processing device
11526432 · 2022-12-13 · ·

There is provided a parallel processing device which allows consecutive parallel data processing to be performed. The parallel processing device includes: a plurality of addition units configured to selectively receive input data among output data from the plurality of input units according to configuration values for each addition unit of the plurality of addition units, and perform addition operation for the input data in parallel; and the plurality of the delay units configured to delay input data for one cycle. Each delay unit of the plurality of the delay units delays output data from each addition unit of the plurality of addition units and outputs the delayed output data to each input unit of the plurality of input units.

Retrieving data in a storage network
11513685 · 2022-11-29 · ·

A method for execution by a dispersed storage and task (DST) client module includes issuing a read threshold number of read slice requests are issued to storage units of the set of storage units. One or more encoded slices of a selected read threshold number of encoded slices are received. When a next encoded data slice of a decode threshold number of encoded data slices is received within a response timeframe, outputting of the next encoded data slice is initiated. When the next encoded data slice is not received within the response timeframe, receiving of another decode threshold number of encoded slices of the set of encoded slices is facilitated. The other decode threshold number of encoded slices are decoded to produce recovered encoded data slices, where the recovered encoded data slices includes at least a recovered next encoded data. Outputting of the recovered next encoded data slice is initiated.

Generating Recovered Data in a Storage Network
20230056072 · 2023-02-23 · ·

A storage network operates by: issuing a read threshold number of read slice requests to storage units of a set of storage units, where the read threshold number of read slice requests identifies a read threshold number of encoded slices of a set of encoded slices corresponding to a data segment; when one or more other encoded data slices of the read threshold number of encoded slices is not received within a time threshold, facilitating receiving a decode threshold number of encoded slices of the set of encoded slices; decoding the decode threshold number of encoded slices to produce recovered encoded data slices, wherein a number of the recovered encoded data slices corresponds to the read threshold number minus a number of the encoded slices received within the time threshold; and outputting the recovered encoded data slices and the encoded slices of the read threshold number of encoded slices received within the time threshold.

SYSTEMS, METHODS, AND APPARATUS FOR TRANSFERRING DATA BETWEEN INTERCONNECTED DEVICES

A method for transferring data may include writing, from a producing device, data to a storage device through an interconnect, determining a consumer device for the data, prefetching the data from the storage device, and transferring, based on the determining, the data to the consumer device through the interconnect. The method may further comprise receiving, at a prefetcher for the storage device, an indication of a relationship between the producing device and the consumer device, and determining the consumer device based on the indication. The method may further comprise placing the data in a stream at the storage device based on the relationship between the producing device and the consumer device. The indication may be provided by an application associated with the consumer device. Receiving the indication may include receiving the indication through a coherent memory protocol for the interconnect.

Techniques for configuring parallel processors for different application domains

In various embodiments, a parallel processor includes a parallel processor module implemented within a first die and a memory system module implemented within a second die. The memory system module is coupled to the parallel processor module via an on-package link. The parallel processor module includes multiple processor cores and multiple cache memories. The memory system module includes a memory controller for accessing a DRAM. Advantageously, the performance of the parallel processor module can be effectively tailored for memory bandwidth demands that typify one or more application domains via the memory system module.

PARALLEL PROCESSING DEVICE
20230071941 · 2023-03-09 · ·

A parallel processing device includes: a plurality of memories configured to output a plurality of pieces of memory output data respectively; a plurality of input units configured to output a plurality of pieces of input unit output data respectively; a plurality of addition units configured to receive the plurality of pieces of input unit output data, perform a parallel processing function and a data path configuration function according to a plurality of configuration values, and output a plurality of pieces of addition unit output data; and a plurality of delay units configured to delay the plurality of pieces of addition unit output data according to a clock signal, and output the plurality of pieces of delay data respectively. The plurality of pieces of input unit output data are selected from the plurality of pieces of memory output data and a plurality of pieces of delay data respectively.

GLOBAL CAPABILITIES TRANSFERRABLE ACROSS NODE BOUNDARIES
20170371663 · 2017-12-28 ·

Example implementations relate to global capabilities transferrable across node boundaries. For example, in an implementation, a switch that routes traffic between a node and global memory may receive an instruction from the node. The switch may recognize that data referenced by the instruction is a global capability, and the switch may process that global capability accordingly.

Multi-channel communications between controllers in a storage system
11681640 · 2023-06-20 · ·

Enabling multi-channel communications between controllers in a storage array, including: creating a plurality of logical communications channels between two or more storage array controllers; inserting, into a buffer utilized by a direct memory access (‘DMA’) engine of a first storage array controller, a data transfer descriptor describing data stored in memory of the first storage array controller and a location to write the data to memory of a second storage array controller; retrieving, in dependence upon the data transfer descriptor, the data stored in memory of the first storage array controller; and writing, via a predetermined logical communications channel, the data into the memory of the second storage array controller in dependence upon the data transfer descriptor.

In-flight packet processing

A method for supporting in-flight packet processing is provided. Packet processing devices (microengines) can send a request for packet processing to a packet engine before a packet comes in. The request offers a twofold benefit. First, the microengines add themselves to a work queue to request for processing. Once the packet becomes available, the header portion is automatically provided to the corresponding microengine for packet processing. Only one bus transaction is involved in order for the microengines to start packet processing. Second, the microengines can process packets before the entire packet is written into the memory. This is especially useful for large sized packets because the packets do not have to be written into the memory completely when processed by the microengines.