G06F2213/2806

ENGINE ARCHITECTURE FOR PROCESSING FINITE AUTOMATA

An engine architecture for processing finite automata includes a hyper non-deterministic automata (HNA) processor specialized for non-deterministic finite automata (NFA) processing. The HNA processor includes a plurality of super-clusters and an HNA scheduler. Each super-cluster includes a plurality of clusters. Each cluster of the plurality of clusters includes a plurality of HNA processing units (HPUs). A corresponding plurality of HPUs of a corresponding plurality of clusters of at least one selected super-cluster is available as a resource pool of HPUs to the HNA scheduler for assignment of at least one HNA instruction to enable acceleration of a match of at least one regular expression pattern in an input stream received from a network.

Asymmetric read / write architecture for enhanced throughput and reduced latency

The present disclosure relates to asymmetric read/write architectures for enhanced throughput and reduced latency. One example embodiment includes an integrated circuit. The integrated circuit includes a network interface. The integrated circuit also includes a communication bus interface. The integrated circuit is configured to establish a communication link with a processor of the host computing device over the communication bus interface, which includes mapping to memory addresses associated with the processor of the host computing device. The integrated circuit is also configured to receive payload data for transmission over the network interface in response to the processor of the host computing device writing payload data to the mapped memory addresses using one or more programmed input-outputs (PIOs). Further, the integrated circuit is configured to write payload data received over the network interface to the memory of the host computing device using direct memory access (DMA).

THROUGHPUT IN OPENFABRICS ENVIRONMENTS
20170346899 · 2017-11-30 ·

Disclosed herein are systems, methods, and processes to improve throughput in OpenFabrics and Remote Direct Memory Access (RDMA) computing environments. Data and a header is received. Buffers in which the data and the header are to be written are identified. Placement information for the data and the header is determined based on a size of each buffer, a page-boundary-alignment of the data, and a header alignment of the header. The data and the header are written to the buffer(s) using the placement information. In such computing environments, throughout can be improved by writing data on page boundaries and the header on a header boundary in a second to last buffer.

OPTIMIZED CREDIT RETURN MECHANISM FOR PACKET SENDS
20170235693 · 2017-08-17 · ·

Method and apparatus for implementing an optimized credit return mechanism for packet sends. A Programmed Input/Output (PIO) send memory is partitioned into a plurality of send contexts, each comprising a memory buffer including a plurality of send blocks configured to store packet data. A storage scheme using FIFO semantics is implemented with each send block associated with a respective FIFO slot. In response to receiving packet data written to the send blocks and detecting the data in those send blocks has egressed from a send context, corresponding freed FIFO slots are detected, and a lowest slot for which credit return indicia has not be returned is determined. The highest slot in a sequence of freed slots from the lowest slot is then determined, and corresponding credit return indicia is returned. In one embodiment an absolute credit return count is implemented for each send context, with an associated absolute credit sent count tracked via software that writes to the PIO send memory, with the two absolute credit counts used for flow control.

Fixed ethernet frame descriptor

System and techniques for enhanced electronic navigation maps for a vehicle are described herein. A descriptor set-up message may be received at a network controller interface (NIC). Here, the descriptor set-up message includes an ethernet frame descriptor. The NIC may then use the ethernet frame descriptor to transmit, across a physical interface of the NIC, multiple ethernet frames, each of which use the same ethernet frame descriptor from the set-up message.

Semiconductor device and systems using the same
11204799 · 2021-12-21 · ·

A semiconductor device capable of suppressing performance degradation and systems using the same are provided. The semiconductor device includes a plurality of processors CPU1 and CPU2, a scheduling device 10 (ID1) connected to the processors CPU1 and CPU2 for controlling the processors CPU1 and CPU2 to execute a plurality of tasks in real time, memories 17 and 18 accessed by the processors CPU1 and CPU2 to store data by executing the tasks, and access monitor circuits 15 for monitoring accesses to the memories by the processors CPU1 and CPU2. When an access to the memory is detected by the access monitor circuit 15, the data stored in the memory 18 is transferred based on the destination information of the data stored in the memory 18.

Computing system for transmitting completion early between serially connected electronic devices

A computing system includes a host, a first electronic device including a memory and an accelerator, and a second electronic device including a direct memory access (DMA) engine. Based on a command transmitted from the host through the first electronic device, the DMA engine transmits data and completion information of the command to the first electronic device. The memory includes a data buffer storing the data and a completion queue buffer storing the completion information. The accelerator executes a calculation on the data. The DMA engine transmits the data to the first electronic device and then transmits the completion information to the first electronic device.

Asymmetric Read / Write Architecture for Enhanced Throughput and Reduced Latency

The present disclosure relates to asymmetric read/write architectures for enhanced throughput and reduced latency. One example embodiment includes an integrated circuit. The integrated circuit includes a network interface. The integrated circuit also includes a communication bus interface. The integrated circuit is configured to establish a communication link with a processor of the host computing device over the communication bus interface, which includes mapping to memory addresses associated with the processor of the host computing device. The integrated circuit is also configured to receive payload data for transmission over the network interface in response to the processor of the host computing device writing payload data to the mapped memory addresses using one or more programmed input-outputs (PIOs). Further, the integrated circuit is configured to write payload data received over the network interface to the memory of the host computing device using direct memory access (DMA).

SYSTEMS AND METHODS FOR ENABLING CONCURRENT APPLICATIONS TO PERFORM EXTREME WIDEBAND DIGITAL SIGNAL PROCESSING WITH MULTICHANNEL COHERENCY

A method for digital signal processing of sensor data includes receiving digitized samples of sensor signals via a network connection; converting the digitized samples into a standardized format; storing the converted digitized samples in a shared memory data structure in memory of a single instruction multiple data (SIMD) processor; and providing zero-copy read access to the converted digitized samples stored in the shared memory data structure to a plurality of applications.

PERIPHERAL COMPONENT INTERCONNECT EXPRESS DEVICE AND METHOD OF OPERATING THE SAME
20220300442 · 2022-09-22 ·

Provided are a Peripheral Component Interconnect Express (PCIe) device and a method of operating the same. The PCIe device may include a performance analyzer, a delay time information generato and a command fetcher. The performance analyzer may measure throughputs of a plurality of functions, and generate throughput analysis information indicating a comparison result between the throughputs of the plurality of functions and throughput limits corresponding to the plurality of functions. The delay time information generator may generate a delay time for delaying a command fetch operation for each of the plurality of functions based on the throughput analysis information. The command fetcher may fetch a target command from a host based on a delay time of a function corresponding to the target command.