Patent classifications
G06F15/17312
Multiple dies hardware processors and methods
- Nevine Nassif ,
- Yen-Cheng Liu ,
- Krishnakanth V. Sistla ,
- Gerald Pasdast ,
- Siva Soumya Eachempati ,
- Tejpal Singh ,
- Ankush Varma ,
- Mahesh K. Kumashikar ,
- Srikanth Nimmagadda ,
- Carleton L. Molnar ,
- Vedaraman Geetha ,
- Jeffrey D. Chamberlain ,
- William R. Halleck ,
- George Z. Chrysos ,
- John R. Ayers ,
- Dheeraj R. Subbareddy
Methods and apparatuses relating to hardware processors with multiple interconnected dies are described. In one embodiment, a hardware processor includes a plurality of physically separate dies, and an interconnect to electrically couple the plurality of physically separate dies together. In another embodiment, a method to create a hardware processor includes providing a plurality of physically separate dies, and electrically coupling the plurality of physically separate dies together with an interconnect.
Method for deploying a task in a supercomputer, method for implementing a task in a supercomputer, corresponding computer program and supercomputer
A method for deploying a task includes allocating nodes to the task; determining, in the network, a subnetwork, for interconnecting the allocated nodes, satisfying one or more predefined determination criteria including a first criterion according to which the subnetwork uses only links that are not allocated to any other task already deployed or that are allocated to fewer than N other tasks already deployed, N being a predefined number equal to one or more; allocating the subnet, and in particular the links belonging to that subnet, to the task; and implementing inter-node communication routes in the allocated subnet.
ACCELERATION SYSTEM FOR FACILITATING PROCESSING OF API CALLS
One embodiment includes acceleration systems that operate as intermediaries between the API processing system and the clients to reduce API call roundtrip latencies. The acceleration systems are a network of interconnected systems that are distributed across the globe. A given acceleration system establishes a network connection with a given client and receives a request for processing an API call over the connection. The programming function associated with the API call is configured in the API processing system. The acceleration system facilitates the processing of the API call over an established connection with the API processing system.
SYNCHRONIZATION IN MULTI-CHIP SYSTEMS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining, for each pair of adjacent chips in a plurality of chips connected in a series-ring arrangement of a semiconductor device, a corresponding loop latency for round trip data transmissions between the pair of chips. Identifying, from among the loop latencies, a maximum loop latency. Determining a ring latency for a data transmission originating from a chip of the plurality chips to be transmitted around the series-ring arrangement and back to the chip. Comparing half of the maximum loop latency to one N-th of the ring latency, where N is the number of chips in the plurality of chips, and storing the greater value as an inter-chip latency of the semiconductor device, the inter-chip latency representing an operational characteristic of the semiconductor device.
PROGRAMMABLE DEVICE, HIERARCHICAL PARALLEL MACHINES, AND METHODS FOR PROVIDING STATE INFORMATION
Programmable devices, hierarchical parallel machines and methods for providing state information are described. In one such programmable device, programmable elements are provided. The programmable elements are configured to implement one or more finite state machines. The programmable elements are configured to receive an N-digit input and provide a M-digit output as a function of the N-digit input. The M-digit output includes state information from less than all of the programmable elements. Other programmable devices, hierarchical parallel machines and methods are also disclosed.
System and method for communication efficient sparse-reduce
Systems and methods for building a distributed learning framework, including generating a sparse communication network graph with a high overall spectral gap. The generating includes computing model parameters in distributed shared memory of a cluster of a plurality of worker nodes; determining a spectral gap of an adjacency matrix for the cluster using a stochastic reduce convergence analysis, wherein a spectral reduce is performed using a sparse reduce graph with a highest possible spectral gap value for a given network bandwidth capability; and optimizing the communication graph by iteratively performing the computing and determining until a threshold condition is reached. Each of the plurality of worker nodes is controlled using tunable approximation based on available bandwidth in a network in accordance with the generated sparse communication network graph.
Switch fabric having a serial communications interface and a parallel communications interface
A switch fabric is disclosed that includes a serial communications interface and a parallel communications interface. The serial communications interface is configured for connecting a plurality of slave devices to a master device in parallel to transmit information between the plurality of slave devices and the master device, and the parallel communications interface is configured for separately connecting the plurality of slave devices to the master device to transmit information between the plurality of slave devices and the master device, and to transmit information between individual ones of the plurality of slave devices. The parallel communications interface may comprise a dedicated parallel communications channel for each one of the plurality of slave devices. The serial communications interface may comprise a multidrop bus, and the parallel communications interface may comprise a cross switch.
Acceleration system for facilitating processing of API calls
One embodiment includes acceleration systems that operate as intermediaries between the API processing system and the clients to reduce API call roundtrip latencies. The acceleration systems are a network of interconnected systems that are distributed across the globe. A given acceleration system establishes a network connection with a given client and receives a request for processing an API call over the connection. The programming function associated with the API call is configured in the API processing system. The acceleration system facilitates the processing of the API call over an established connection with the API processing system.
SELECTIVELY CONNECTABLE CONTENT-ADDRESSABLE MEMORY
A switching system includes a content-addressable memory (CAM) and several processing nodes. The CAM can be selectively connected to any one or more of the processing nodes during operation of the switching system, without having to power down or otherwise reboot the switching system. The CAM is selectively connected to a processing node in that electrical paths between the CAM and the processing nodes can be established, torn down, and re-established during operation of the switching system. The switching system can include a connection matrix to selectively establish electrical paths between the CAM and the processing nodes.
Programmable device, hierarchical parallel machines, and methods for providing state information
Programmable devices, hierarchical parallel machines and methods for providing state information are described. In one such programmable device, programmable elements are provided. The programmable elements are configured to implement one or more finite state machines. The programmable elements are configured to receive an N-digit input and provide a M-digit output as a function of the N-digit input. The M-digit output includes state information from less than all of the programmable elements. Other programmable devices, hierarchical parallel machines and methods are also disclosed.