G06F12/0842

LOW LATENCY INTER-CHIP COMMUNICATION MECHANISM IN A MULTI-CHIP PROCESSING SYSTEM

Systems and methods of multi-chip processing with low latency and congestion. In a multi-chip processing system, each chip includes a plurality of clusters arranged in a mesh design. A respective interconnect controller is disposed at the end of each column. The column is linked to a corresponding remote column in the other chip. A share cache controller in the column is paired with a corresponding cache controller in the remote column, the pair of cache controllers are configured to control data caching for a same set of main memory locations. Communications between cross-chip cache controllers are performed within linked columns of clusters via the column-specific inter-chip interconnect controllers.

Disaggregated system domain

An approach is disclosed that configures a computer system node from components that are each connected to an intra-node network. The configuring is performed by selecting a set of components, including at least one processor, and assigning each of the components a different address range within the node. An operating system is run on the processor included in the node with the operating system accessing each of the assigned components.

Allocation of spare cache reserved during non-speculative execution and speculative execution
11561903 · 2023-01-24 · ·

A cache system, having cache sets, a connection to a line identifying an execution type, a connection to a line identifying a status of speculative execution, and a logic circuit that can: allocate a first subset of cache sets when the execution type is a first type indicating non-speculative execution, allocate a second subset when the execution type changes from the first type to a second type indicating speculative execution, and reserve a cache set when the execution type is the second type. When the execution type changes from the second to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted, the logic circuit can reconfigure the second subset when the execution type is the first type; and allocate the at least one cache set when the execution type changes from the first to the second type.

Allocation of spare cache reserved during non-speculative execution and speculative execution
11561903 · 2023-01-24 · ·

A cache system, having cache sets, a connection to a line identifying an execution type, a connection to a line identifying a status of speculative execution, and a logic circuit that can: allocate a first subset of cache sets when the execution type is a first type indicating non-speculative execution, allocate a second subset when the execution type changes from the first type to a second type indicating speculative execution, and reserve a cache set when the execution type is the second type. When the execution type changes from the second to the first type and the status of speculative execution indicates that a result of speculative execution is to be accepted, the logic circuit can reconfigure the second subset when the execution type is the first type; and allocate the at least one cache set when the execution type changes from the first to the second type.

Using a track format code in a cache control block for a track in a cache to process read and write requests to the track in the cache

Provided are a computer program product, system, and method for using a track format code in a cache control block for a track in a cache to process read and write requests to the track in the cache. A track format table associates track format codes with track format metadata. A determination is made as to whether the track format table has track format metadata matching track format metadata of a track staged into the cache. A determination is made as to whether a track format code from the track format table for the track format metadata in the track format table matches the track format metadata of the track staged. A cache control block for the track being added to the cache is generated including the determined track format code when the track format table has the matching track format metadata.

Using a track format code in a cache control block for a track in a cache to process read and write requests to the track in the cache

Provided are a computer program product, system, and method for using a track format code in a cache control block for a track in a cache to process read and write requests to the track in the cache. A track format table associates track format codes with track format metadata. A determination is made as to whether the track format table has track format metadata matching track format metadata of a track staged into the cache. A determination is made as to whether a track format code from the track format table for the track format metadata in the track format table matches the track format metadata of the track staged. A cache control block for the track being added to the cache is generated including the determined track format code when the track format table has the matching track format metadata.

Frozen time cache for multi-host read operations

Aspects of a storage device including a memory and a controller are provided. The controller may receive a prefetch request to retrieve data for a host having a promoted stream. The controller may access a frozen time table indicating hosts for which data has been prefetched and frozen times associated with the host and other hosts. The controller can determine whether the host has a higher priority over other hosts included in the frozen time table based on corresponding frozen times and data access parameters associated with the host. The controller may determine to prefetch the data for the host in response to the prefetch request when the host has a higher priority than the other hosts. The controller can receive a host read command associated with the promoted stream from the host and provide the prefetched data to the host in response to the host read command.

Data set and node cache-based scheduling method and device

Disclosed is a data set and node cache-based scheduling method, which includes: obtaining storage resource information of each host node; in response to receiving a training task, obtaining operation information of the training task, and according to the operation information and the storage resource information, screening host nodes that satisfy a space required by the training task; in response to no host node satisfying the space required by the training task, scoring each host node according to the storage resource information; according to scoring results, selecting, from among all of the host nodes, a host node to be executed that is used to execute the training task; and obtaining and deleting an obsolete data set cache in the host node to be executed, and executing the training task in the host node to be executed.

Data set and node cache-based scheduling method and device

Disclosed is a data set and node cache-based scheduling method, which includes: obtaining storage resource information of each host node; in response to receiving a training task, obtaining operation information of the training task, and according to the operation information and the storage resource information, screening host nodes that satisfy a space required by the training task; in response to no host node satisfying the space required by the training task, scoring each host node according to the storage resource information; according to scoring results, selecting, from among all of the host nodes, a host node to be executed that is used to execute the training task; and obtaining and deleting an obsolete data set cache in the host node to be executed, and executing the training task in the host node to be executed.

PARALLEL PROCESSOR OPTIMIZED FOR MACHINE LEARNING
20230008138 · 2023-01-12 ·

A parallel processor system for machine learning includes an arithmetic unit (ALU) array including several ALUs and a controller to provide instructions for the ALUs. The system includes a direct-access memory (DMA) block containing multiple DMA engines to access an external memory to retrieve data. An input-stream buffer decouples the DMA block from the ALU array and provides aligning and reordering of the retrieved data. The DMA engines operate in parallel and include rasterization logic capable of performing a three-dimensional (3-D) rasterization.