G06F13/1663

Tensorized direct memory access descriptors

To reduce direct memory access (DMA) overhead, a tensorized descriptor can be used to generate a series of memory descriptors to perform a series of DMA data transfers. The tensorized descriptor may include attributes such as a stride and a memory descriptor template, which can be used to generate the series of memory descriptors. Hence, instead of having to retrieve each of the memory descriptors to perform the series of DMA transfers, a single tensorized descriptor can be retrieved to perform a series of data transfers.

Data processing engine arrangement in a device

A device may include a plurality of data processing engines. Each of the data processing engines may include a memory pool having a plurality of memory banks, a plurality of cores each coupled to the memory pool and configured to access the plurality of memory banks, a memory mapped switch coupled to the memory pool and a memory mapped switch of at least one neighboring data processing engine, and a stream switch coupled to each of the plurality of cores and to a stream switch of the at least one neighboring data processing engine.

LOW LATENCY EFFICIENT SHARING OF RESOURCES IN MULTI-SERVER ECOSYSTEMS
20180011807 · 2018-01-11 ·

A method is provided in one example embodiment and includes receiving by a network element a request from a network device connected to the network element to update a shared resource maintained by the network element; subsequent to the receipt, identifying a Base Address Register Resource Table (“BRT”) element assigned to a Peripheral Component Interconnect (“PCI”) adapter of the network element associated with the network device, wherein the BRT points to the shared resource; changing an attribute of the identified BRT from read-only to read/write to enable the identified BRT to be written by the network device; and notifying the network device that the attribute of the identified BRT has been changed, thereby enabling the network device to update the shared resource via a Base Address Register (“BAR”) comprising the identified BRT.

Inter-Process Signaling Mechanism
20180011804 · 2018-01-11 · ·

The disclosed embodiments provide a mechanism to support implementation of semaphores or messaging signals between masters in a multi-master system, or between tasks in a single master system. A semaphore flag register contains one or more bits indicating whether resources are free or busy. The register is aliased to allow atomic read-and-clear of individual bits in the register. Masters poll the status of a resource until the resource reads as free. Alternatively, interrupts or events per master can be implemented to indicate availability of a resource.

SYSTEMS, METHODS, AND DEVICES FOR QUEUE MANAGEMENT WITH A COHERENT INTERFACE
20230236994 · 2023-07-27 ·

A method may include accessing, by a first apparatus, a queue, wherein the queue may be accessible by a second apparatus, and the first apparatus may be connected to the second apparatus by a coherent interface, and indicating, by the coherent interface, to the second apparatus, the accessing. The indicating may include indicating by a monitoring mechanism. The indicating may include generating a monitoring request. The indicating may include generating, based on the monitoring request, an alert. The queue may include a submission queue. The queue may include a completion queue. The accessing may include reading an entry from the queue. The accessing may include writing an entry to the queue. The entry may include a command. The entry may include a completion. The first apparatus may include a host, and the second apparatus may include a device. The queue may be located at the host.

SERVICE MESH ARCHITECTURE FOR INTEGRATION WITH ACCELERATOR SYSTEMS

A processing apparatus can include a memory device having a user space for executing user applications. The processing apparatus can further include infrastructure communication circuitry that can receive a request from a user application executing in the user space. The infrastructure communication circuitry can perform a service mesh operation, in response to the request, without a sidecar proxy. Other systems and methods are described.

Method for transferring packets of a communication protocol

A method for transferring packets of a communication protocol via a memory-based interface between two processing units. The method includes providing, in each of the processing units, a send area including a read index section, a write index section, and a send buffer, and a receive area including a read index section, a write index section and a receive buffer. Each processing unit repeats as sending steps: reading a read index from the receive area; writing at least one send packet into the send buffer (from a starting write address to an ending write address, the ending write address maximally corresponding to a buffer address assigned to the read read index, and writing a changed write index into the send area.

SYSTEM-ON-CHIP OPERATING MULTIPLE CPUS OF DIFFERENT TYPES, AND OPERATION METHOD FOR SAME
20230020191 · 2023-01-19 ·

A system-on-chip (SoC) for operating a plurality of different central processing units and a method for operating the same are provided. The SoC includes a plurality of central processing units (CPUs) that execute respective software programs independently of each other, a bus interconnector for connecting the plurality of CPUs, and at least one access control device that is connected to the bus interconnector and controls each access to a physical resource shared by the plurality of CPUs via the bus interconnector, for each CPU.

LOCATION AGNOSTIC DATA ACCESS

Apparatuses, systems, and techniques to enable a program to access data regardless of where said data is stored. In at least one embodiment, a system enables a program to access data regardless of where said data is stored, based on, for example, one or more locations encoding one or more addresses of said data.

INTERCONNECT FOR DIRECT MEMORY ACCESS CONTROLLERS

A computing device is provided, including a plurality of memory devices, a plurality of direct memory access (DMA) controllers, and an on-chip interconnect. The on-chip interconnect may be configured to implement control logic to convey a read request from a primary DMA controller of the plurality of DMA controllers to a source memory device of the plurality of memory devices. The on-chip interconnect may be further configured to implement the control logic to convey a read response from the source memory device to the primary DMA controller and one or more secondary DMA controllers of the plurality of DMA controllers.