Patent classifications
G06F15/17318
Partitionable networked computer
A computer, including a plurality of processing nodes arranged in two-dimensional arrays in respective front and rear layers. Each processing node has a set of activatable links. When activated, transmission of data items between the nodes connected via the activated link is enabled. When not activated, transmission of data items between the nodes is prevented. The set of activatable links including a respective link which connects the processing node to each adjacent node in the array, and to a facing processing node in the other layer. An allocation engine is configured to receive an allocation instruction and connected to the processing nodes to selectively activate the links in a configuration.
COMPUTER SYSTEM AND COMPUTER
A computer system, comprising a plurality of computers, each of the plurality of computers including at least one processor chip each including a plurality of processor cores, the at least one processor chip constructing a plurality of regions each constructed by at least one processor core, each of the plurality of processor cores carries out calculation processing for executing a predetermined program and inter-core communication processing, which is communication between the plurality of processor cores, the computer system comprising: a regulation module which controls a voltage and a frequency that are supplied to each of the plurality of regions; and a determination module which determines a power mode of each of the plurality of regions, to output an instruction to the regulation module.
Communication in a Computer Having Multiple Processors
A computer comprising a plurality of processors, each of which are configured to perform operations on data during a compute phase for the computer and, following a pre-compiled synchronisation barrier, exchange data with at least one other of the processors during an exchange phase for the computer, wherein of the processors in the computer is indexed and the data exchange operations carried out by each processor in the exchange phase depend upon its index value.
APPARATUS AND METHOD TO PERFORM ALL-TO-ALL COMMUNICATION WITHOUT PATH CONFLICT IN A NETWORK INCLUDING PLURAL TOPOLOGICAL STRUCTURES
An apparatus stores connection information indicating connection relationship among topological structures in a network, in which first-type topological structures are coupled to second-type topological structures. The apparatus stores first transfer-patterns each indicating a combination of input and output ports for performing all-to-all communication without path conflict in each of the first-type topological structures, and second transfer-patterns each indicating a combination of input and output ports for performing all-to-all communication without path conflict in each of the second-type topological structures. The apparatus identifies paths from transmission sources to transmission destinations for a combination of the first and second transfer-patterns, and determines, based on the identified paths, a transfer-pattern with which to perform all-to-all communication without path conflict from the transmission sources to the transmission destinations, and determines output ports in each of the first- and second-type topological structures, corresponding to the identified paths.
Memory mapping in a processor having multiple programmable units
The disclosure includes, in general, among other aspects, an apparatus having multiple programmable units integrated within a processor. The apparatus has circuitry to map addresses in a single address space to resources within the multiple programmable units where the single address space includes addresses for different ones of the resources in different ones of the multiple programmable units and where there is a one-to-one correspondence between respective addresses in the single address space and resources within the multiple programmable units.
Interconnect Distributed Virtual Memory Message Preemptive Responding
Aspects include computing devices, apparatus, and methods for accelerating distributive virtual memory (DVM) message processing in a computing device. DVM message interceptors may be positioned in various locations within a DVM network of a computing device so that DVM messages may be intercepted before reaching certain DVM destinations. A DVM message interceptor may receive a broadcast DVM message from first DVM source. The DVM message interceptor may determine whether a preemptive DVM message response should be returned to the DVM source on behalf of the DVM destination. When certain criteria are met, the DVM message interceptor may generate a preemptive DVM message response to the broadcast DVM message, and send the preemptive DVM message response to the DVM source.
Autonomous memory architecture
An autonomous memory device in a distributed memory sub-system can receive a database downloaded from a host controller. The autonomous memory device can pass configuration routing information and initiate instructions to disperse portions of the database to neighboring die using an interface that handles inter-die communication. Information is then extracted from the pool of autonomous memory and passed through a host interface to the host controller.
Method for implementing a line speed interconnect structure
A method for line speed interconnect processing. The method includes receiving initial inputs from an input communications path, performing a pre-sorting of the initial inputs by using a first stage interconnect parallel processor to create intermediate inputs, and performing the final combining and splitting of the intermediate inputs by using a second stage interconnect parallel processor to create resulting outputs. The method further includes transmitting the resulting outputs out of the second stage at line speed.
NETWORK-ON-CHIP DATA PROCESSING METHOD AND DEVICE
The present application relates to a network-on-chip data processing method. The method is applied to a network-on-chip processing system, the network-on-chip processing system is used for executing machine learning calculation, and the network-on-chip processing system comprises a storage device and a calculation device. The method comprises: accessing the storage device in the network-on-chip processing system by means of a first calculation device in the network-on-chip processing system, and obtaining first operation data; performing an operation on the first operation data by means of the first calculation device to obtain a first operation result; and sending the first operation result to a second calculation device in the network-on-chip processing system. According to the method, operation overhead can be reduced and data read/write efficiency can be improved.
TOPOLOGIES AND ALGORITHMS FOR MULTI-PROCESSING UNIT INTERCONNECTED ACCELERATOR SYSTEMS
An accelerator system can include one or more clusters of eight processing units. The processing units can include seven communication ports. Each cluster of eight processing units can be organized into two subsets of four processing units. Each processing unit can be coupled to each of the other processing units in the same subset by a respective set of two bi-directional communication links. Each processing unit can also be coupled to a corresponding processing unit in the other subset by a respective single bi-directional communication link. Input data can be divided into one or more groups of four subsets of data. Each processing unit can be configured to sum corresponding subsets of the input data received on the two bi-directional communication links from the other processing units in the same subset with the input data of the respective processing unit to generate a respective set of intermediate data. Each processing unit can be configured to sum a corresponding set of intermediate data received on the one bi-directional communication link from the corresponding processing unit in the other subset with the intermediate data of the respective processing unit to generate respective sum data. Each processing unit can be configured to broadcast the sum data of the respective processing unit to the other processing units in the same subset on the respective sets of two bi-directional communication links.