G06F15/17

Diagonal torus network

A device is disclosed that includes multiple channels and multiple processing nodes. Each processing node includes input/output (I/O) ports coupled to the channels and channel control modules coupled to the I/O ports. Each processing node is configured to select, by the channel control module in a first operation, a first I/O port of the I/O ports; communicate a first message, via the first I/O port, to a first processing node over a first channel or a second processing node over a second channel orthogonal to the first channel in a logic representation; select, by the channel control module in a second operation, a second I/O port of the I/O ports; and communicate a second message, via the second I/O port, to a third processing node over a third channel extending in a diagonal direction and non-orthogonal to the first and second channels in the logic representation.

Diagonal torus network

A device is disclosed that includes multiple channels and multiple processing nodes. Each processing node includes input/output (I/O) ports coupled to the channels and channel control modules coupled to the I/O ports. Each processing node is configured to select, by the channel control module in a first operation, a first I/O port of the I/O ports; communicate a first message, via the first I/O port, to a first processing node over a first channel or a second processing node over a second channel orthogonal to the first channel in a logic representation; select, by the channel control module in a second operation, a second I/O port of the I/O ports; and communicate a second message, via the second I/O port, to a third processing node over a third channel extending in a diagonal direction and non-orthogonal to the first and second channels in the logic representation.

Inter-processor communication method, electronic assembly, and electronic device

An inter-processor communication method, an electronic assembly, and an electronic device are provided. The electronic device at least includes a first core and a second core. A plurality of communication channels is defined between the first core and the second core. Each of the plurality of communication channels having a communication performance different from each other. The inter-processor communication method includes: acquiring to-be-transmitted data, the to-be-transmitted data is data transmitted between the first core and the second core; acquiring a corresponding communication channel corresponding to the to-be-transmitted data from the plurality of communication channels as a target communication channel; transmitting the to-be-transmitted data via the target communication channel.

Inter-processor communication method, electronic assembly, and electronic device

An inter-processor communication method, an electronic assembly, and an electronic device are provided. The electronic device at least includes a first core and a second core. A plurality of communication channels is defined between the first core and the second core. Each of the plurality of communication channels having a communication performance different from each other. The inter-processor communication method includes: acquiring to-be-transmitted data, the to-be-transmitted data is data transmitted between the first core and the second core; acquiring a corresponding communication channel corresponding to the to-be-transmitted data from the plurality of communication channels as a target communication channel; transmitting the to-be-transmitted data via the target communication channel.

Methods and apparatuses for remote direct memory access page fault handling
12411798 · 2025-09-09 · ·

A method for performing a remote direct memory access (RDMA) read command includes reserving a memory space in a buffer to receive data associated with an address range, sending a RDMA read instruction to a remote module to request the data, and receiving the data from the remote module. A method for performing a RDMA write command includes receiving from a remote module a request to send command requesting to send data associated with an address range, reserving a memory space in a buffer to receive data associated with the address range, sending a RDMA read instruction to the remote module, and receiving the data from the remote module.

DIAGONAL TORUS NETWORK

A device is disclosed that includes multiple channels and multiple processing nodes. Each processing node includes input/output (I/O) ports coupled to the channels and channel control modules coupled to the I/O ports. Each processing node is configured to select, by the channel control module in a first operation, a first I/O port of the I/O ports; communicate a first message, via the first I/O port, to a first processing node over a first channel or a second processing node over a second channel orthogonal to the first channel in a logic representation; select, by the channel control module in a second operation, a second I/O port of the I/O ports; and communicate a second message, via the second I/O port, to a third processing node over a third channel extending in a diagonal direction and non-orthogonal to the first and second channels in the logic representation.

DIAGONAL TORUS NETWORK

A device is disclosed that includes multiple channels and multiple processing nodes. Each processing node includes input/output (I/O) ports coupled to the channels and channel control modules coupled to the I/O ports. Each processing node is configured to select, by the channel control module in a first operation, a first I/O port of the I/O ports; communicate a first message, via the first I/O port, to a first processing node over a first channel or a second processing node over a second channel orthogonal to the first channel in a logic representation; select, by the channel control module in a second operation, a second I/O port of the I/O ports; and communicate a second message, via the second I/O port, to a third processing node over a third channel extending in a diagonal direction and non-orthogonal to the first and second channels in the logic representation.

DEEP LEARNING DATA COMPRESSION USING MULTIPLE HARDWARE ACCELERATOR ARCHITECTURES
20250298717 · 2025-09-25 ·

Deep learning data compression using multiple hardware accelerator architectures is provided herein. A system includes a computing device and first and second hardware accelerators coupled thereto. The first and second hardware accelerators may be of different types, such as a tensor streaming processor and a field programmable gate array. The first and second hardware accelerators may be directly connected to one another, such as by a chip-to-chip connection. The first and second accelerators may implement different stages of a data pipeline, such as lossless and lossy compression stages of a learned image compression.

Acceleration unit, acceleration assembly, acceleration device, and electronic device

The present disclosure relates to an acceleration unit, an acceleration assembly, an acceleration device, and an electronic device. The application unit is included in a combined processing device. The combined processing apparatus further includes an interconnection interface and other processing device. The application unit interacts with other processing device to jointly complete a computing operation specified by a user. The combined processing device further includes a storage device. The storage device is connected to the acceleration unit and other processing devices, respectively. The storage device is used for data services of the acceleration unit and other processing devices. With the help of the content of the present disclosure, high-speed processing of massive data may be realized.

Circuits and methods for coherent writing to host systems

A circuit system includes slow running logic circuitry that generates write data and a write command for a write request. The circuit system also includes fast running logic circuitry that receives the write data and the write command from the slow running logic circuitry. The fast running logic circuitry stores the write data and the write command. A host system generates a write response in response to receiving the write command from the fast running logic circuitry. The host system sends the write response to the fast running logic circuitry. The fast running logic circuitry sends the write data to the host system in response to receiving the write response from the host system before providing the write response to the slow running logic circuitry.