G06F13/28

Efficient storage architecture for high speed packet capture
11704063 · 2023-07-18 · ·

An embodiment may involve a network interface module; volatile memory configured to temporarily store data packets received from the network interface module; high-speed non-volatile memory; an interface connecting to low-speed non-volatile memory; a first set of processors configured to perform a first set of operations that involve: (i) reading the data packets from the volatile memory, (ii) arranging the data packets into chunks, each chunk containing a respective plurality of the data packets, and (iii) writing the chunks to the high-speed non-volatile memory; and a second set of processors configured to perform a second set of operations in parallel to the first set of operations, where the second set of operations involve: (i) reading the chunks from the high-speed non-volatile memory, (ii) compressing the chunks, (iii) arranging the chunks into blocks, each block containing a respective plurality of the chunks, and (iv) writing the blocks to the low-speed non-volatile memory.

REDUCING TRANSACTIONS DROP IN REMOTE DIRECT MEMORY ACCESS SYSTEM
20230014415 · 2023-01-19 ·

In order to reduce remote direct memory access (RDMA) requests drop in RDMA systems, a requesting device transmits a message that includes a prefetch operation to a responding device. The prefetch operation indicates a memory area to be loaded by the responding device to a memory of the responding device before receiving a new RDMA request or a RDMA command.

Data transmission system and operation method thereof
11704264 · 2023-07-18 · ·

A data transmission system and an operation method thereof are provided. The data transmission system includes a host, a first device and a second device. The host is configured to set a voltage base of a transmission signal, and configured to pull down or up the transmission signal based on the voltage base of the transmission signal to form a plurality of glitches. The first device is connected to the host to receive the transmission signal. The first device obtains a digital content of the transmission signal according to the glitches, if the voltage base of the transmission signal is set as a first base. The second device is connected to the host to receive the transmission signal. The second device obtains the digital content of the transmission signal according to the glitches, if the voltage base of the transmission signal is set as a second base.

Data transmission system and operation method thereof
11704264 · 2023-07-18 · ·

A data transmission system and an operation method thereof are provided. The data transmission system includes a host, a first device and a second device. The host is configured to set a voltage base of a transmission signal, and configured to pull down or up the transmission signal based on the voltage base of the transmission signal to form a plurality of glitches. The first device is connected to the host to receive the transmission signal. The first device obtains a digital content of the transmission signal according to the glitches, if the voltage base of the transmission signal is set as a first base. The second device is connected to the host to receive the transmission signal. The second device obtains the digital content of the transmission signal according to the glitches, if the voltage base of the transmission signal is set as a second base.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

Computing device and method

The present disclosure provides a computation device. The computation device is configured to perform a machine learning computation, and includes an operation unit, a controller unit, and a conversion unit. The storage unit is configured to obtain input data and a computation instruction. The controller unit is configured to extract and parse the computation instruction from the storage unit to obtain one or more operation instructions, and to send the one or more operation instructions and the input data to the operation unit. The operation unit is configured to perform operations on the input data according to one or more operation instructions to obtain a computation result of the computation instruction. In the examples of the present disclosure, the input data involved in machine learning computations is represented by fixed-point data, thereby improving the processing speed and efficiency of training operations.

SYSTEM AND METHOD FOR FACILITATING EFFICIENT MANAGEMENT OF DATA STRUCTURES STORED IN REMOTE MEMORY

A system and method are provided for facilitating efficient management of data structures stored in remote memory. During operation, the system receives a request to allocate memory for a first part in a data structure stored in a remote memory associated with a compute node in a network. The system pre-allocates a buffer in the remote memory for a plurality of parts in the data structure and stores a first local descriptor associated with the buffer in a local worker table stored in a volatile memory of the compute node. The first local descriptor facilitates servicing future access requests to the first and other parts in the data structure. The system stores a first global descriptor for the buffer in a shared global table stored in the remote memory and generates a first reference corresponding to the first part, thereby facilitating faster traversals of the data structure.

SYSTEM AND METHOD FOR FACILITATING EFFICIENT MANAGEMENT OF DATA STRUCTURES STORED IN REMOTE MEMORY

A system and method are provided for facilitating efficient management of data structures stored in remote memory. During operation, the system receives a request to allocate memory for a first part in a data structure stored in a remote memory associated with a compute node in a network. The system pre-allocates a buffer in the remote memory for a plurality of parts in the data structure and stores a first local descriptor associated with the buffer in a local worker table stored in a volatile memory of the compute node. The first local descriptor facilitates servicing future access requests to the first and other parts in the data structure. The system stores a first global descriptor for the buffer in a shared global table stored in the remote memory and generates a first reference corresponding to the first part, thereby facilitating faster traversals of the data structure.

DEEP NEURAL NETWORK (DNN) ACCELERATORS WITH WEIGHT LAYOUT REARRANGEMENT

An DNN accelerator includes a DMA engine that can rearrange weight data layout. The DMA engine may read a weight tensor from a memory (e.g., DRAM). The weight tensor includes weights arranged in a 3D matrix. The DMA engine may partition the weight tensor into a plurality of virtual banks based on a structure of a PE array, e.g., based on the number of activated PE columns in the PE array. Then the DMA engine may partition a virtual bank into a plurality of virtual sub-banks. The DMA engine may also identify data blocks from different ones of the plurality of virtual sub-banks. A data block may include a plurality of input channels and may have a predetermined spatial size and storage size. The DMA engine form a linear data structure by interleaving the data blocks. The DMA engine can write the linear data structure into another memory (e.g., SRAM).

DEEP NEURAL NETWORK (DNN) ACCELERATORS WITH WEIGHT LAYOUT REARRANGEMENT

An DNN accelerator includes a DMA engine that can rearrange weight data layout. The DMA engine may read a weight tensor from a memory (e.g., DRAM). The weight tensor includes weights arranged in a 3D matrix. The DMA engine may partition the weight tensor into a plurality of virtual banks based on a structure of a PE array, e.g., based on the number of activated PE columns in the PE array. Then the DMA engine may partition a virtual bank into a plurality of virtual sub-banks. The DMA engine may also identify data blocks from different ones of the plurality of virtual sub-banks. A data block may include a plurality of input channels and may have a predetermined spatial size and storage size. The DMA engine form a linear data structure by interleaving the data blocks. The DMA engine can write the linear data structure into another memory (e.g., SRAM).