Patent classifications
G06F15/17356
Processing Element-Centric All-to-All Communication
In accordance with described techniques for PE-centric all-to-all communication, a distributed computing system includes processing elements, such as graphics processing units, distributed in clusters. An all-to-all communication procedure is performed by the processing elements that are each configured to generate data packets in parallel for all-to-all data communication between the clusters. The all-to-all communication procedure includes a first stage of intra-cluster parallel data communication between respective processing elements of each of the clusters; a second stage of inter-cluster data exchange for all-to-all data communication between the clusters; and a third stage of intra-cluster data distribution to the respective processing elements of each of the clusters.
Heterogeneous ML Accelerator Cluster with Flexible System Resource Balance
Aspects of the disclosure are directed to a heterogeneous machine learning accelerator system with compute and memory nodes connected by high speed chip-to-chip interconnects. While existing remote/disaggregated memory may require memory expansion via remote processing units, aspects of the disclosure add memory nodes into machine learning accelerator clusters via the chip-to-chip interconnects without needing assistance from remote processing units to achieve higher performance, simpler software stack, and/or lower cost. The memory nodes may support prefetch and intelligent compression to enable the use of low cost memory without performance degradation.
INTERMEDIATE APPARATUS, COMMUNICATION METHOD, AND PROGRAM
An intermediate device 10A is the intermediate device 10A disposed between a requester 30 for transferring data using RDMA and a responder 50. The intermediate device 10A extracts a combination of a QPN of the requester 30 and the QPN of the responder 50 from a packet transmitted and received when establishing the connection, and registers the combination in a QPN table 15. The intermediate device 10A manages an MSN by a WQ table 17, and transitions the MSN number when receiving a request with a flag of Last or Only from the requester 30, and returns a pseudo-Response including the MSN after the transition.
METHOD FOR TIME-OF-FLIGHT-BASED CONFIGURATION OF A DEVICE-INTERNAL SIGNAL TRANSMISSION IN A CONTROL DEVICE, AND CORRESPONDINGLY OPERABLE CONTROL DEVICE AND MOTOR VEHICLE
Disclosed herein is a method for the runtime-based configuration of device-internal signal transmission in a control device, wherein in an AUTOSAR-compliant runtime environment, upon start-up of the control device, publisher components each provide at least one publisher list of the signals provided by them and subscriber components each provide at least one subscriber list of the signals required by them. For each subscriber component, for the signals required by said component, it is determined which signal path from one of the publisher components to the subscriber component the respective signal must take through the control device for given publisher lists, and from those required signals for which an identical signal path is identified, a signal group is defined by storing a common transmission command in the runtime environment for the signals of the respective signal group.
Interface between a bus and a inter-thread interconnect
A processing apparatus comprising: a bus; a first processor connected to the bus and configured to communicate over the bus according to a bus protocol; a second, multithread processor; and an inter-thread interconnect based on a system of channels. The apparatus also comprises an interface between the bus and the inter-thread interconnect, comprising a bus side implementing the bus protocol and an interconnect side for interfacing with the system of channels. The first processor is thereby operable to communicate with a designated one of said threads via the bus and a respective channel of the inter-thread interconnect.
Transmission device and communication system for artificial intelligence chips
An artificial intelligence (AI) switch chip includes a first AI interface, a first network interface, and a controller. The first AI interface is used by the AI switch chip to couple to a first AI chip in a first server. The first network interface is used by the AI switch chip to couple to a second server. The controller receives, through the first AI interface, data from the first AI chip, and then sends the data to the second server through the first network interface. By using the AI switch chip, when a server needs to send data in an AI chip to another server, an AI interface may be used to directly receive the data from the AI chip, and then the data is sent to the other server through one or more network interfaces coupled to the controller.
Heterogeneous ML accelerator cluster with flexible system resource balance
Aspects of the disclosure are directed to a heterogeneous machine learning accelerator system with compute and memory nodes connected by high speed chip-to-chip interconnects. While existing remote/disaggregated memory may require memory expansion via remote processing units, aspects of the disclosure add memory nodes into machine learning accelerator clusters via the chip-to-chip interconnects without needing assistance from remote processing units to achieve higher performance, simpler software stack, and/or lower cost. The memory nodes may support prefetch and intelligent compression to enable the use of low cost memory without performance degradation.