Patent classifications
G06F12/0835
SYSTEMS AND METHODS FOR PROFILING HOST-MANAGED DEVICE MEMORY
The disclosed computer-implemented method may include (1) receiving, at a storage device via a cache-coherent interconnect, a first request to access data at one or more host addresses of a coherent memory space of an external host processor, (2) updating, in response to the first request, one or more statistics associated with accessing the data at the one or more host addresses, (3) receiving, at the storage device via the cache-coherent interconnect, a second request to perform an operation associated with the one or more statistics, and (4) using the one or more statistics to perform the operation. Various other methods, systems, and computer-readable media are also disclosed.
DISTRIBUTED USER MODE PROCESSING
A first processing unit such as a graphics processing unit (GPU) pipelines that execute commands and a scheduler to schedule one or more first commands for execution by one or more of the pipelines. The one or more first commands are received from a user mode driver in a second processing unit such as a central processing unit (CPU). The scheduler schedules one or more second commands for execution in response to completing execution of the one or more first commands and without notifying the second processing unit. In some cases, the first processing unit includes a direct memory access (DMA) engine that writes blocks of information from the first processing unit to a memory. The one or more second commands program the DMA engine to write a block of information including results generated by executing the one or more first commands.
SECURING DATA DIRECT I/O FOR A SECURE ACCELERATOR INTERFACE
The present disclosure includes systems and methods for securing data direct I/O (DDIO) for a secure accelerator interface, in accordance with various embodiments. Historically, DDIO has enabled performance advantages that have outweighed its security risks. DDIO circuitry may be configured to secure DDIO data by using encryption circuitry that is manufactured for use in communications with main memory along the direct memory access (DMA) path. DDIO circuitry may be configured to secure DDIO data by using DDIO encryption circuitry manufactured for use by or manufactured within the DDIO circuitry. Enabling encryption and decryption in the DDIO path by the DDIO circuitry has the potential to close a security gap in modem data central processor units (CPUs).
SYSTEM AND METHOD FOR IMPLEMENTING A NETWORK-INTERFACE-BASED ALLREDUCE OPERATION
An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.
METHOD TO MINIMIZE HOT/COLD PAGE DETECTION OVERHEAD ON RUNNING WORKLOADS
Methods and apparatus to minimize hot/cold page detection overhead on running workloads. A page meta data structure is populated with meta data associated with memory pages in one or more far memory tier. In conjunction with one or more processes accessing memory pages to perform workloads, the page meta data structure is updated to reflect accesses to the memory pages. The page meta data, which reflects the current state of memory, is used to determine which pages are “hot” pages and which pages are “cold” pages, wherein hot pages are memory pages with relatively higher access frequencies and cold pages are memory pages with relatively lower access frequencies. Variations on the approach including filtering meta data updates on pages in memory regions of interest and applying a filter(s) to trigger meta data updates based on (a) condition(s). A callback function may also be triggered to be executed synchronously with memory page accesses.
TECHNIQUES ASSOCIATED WITH MAPPING SYSTEM MEMORY PHYSICAL ADDRESSES TO PROXIMITY DOMAINS
Examples include techniques associated with mapping system memory physical addresses to proximity domains. Examples include mapping system memory physical addresses for a memory coupled with a multi-die system to proximity domains that include cores of a multi-core processor and the associated level 3 (L3) cache for use by each core included in a respective proximity domain. The mapping is to facilitate cache line ownership of a cache line in an L3 cache by an input/output device or agent located on a separate die from the multi-core processor.
Securing data direct I/O for a secure accelerator interface
The present disclosure includes systems and methods for securing data direct I/O (DDIO) for a secure accelerator interface, in accordance with various embodiments. Historically, DDIO has enabled performance advantages that have outweighed its security risks. DDIO circuitry may be configured to secure DDIO data by using encryption circuitry that is manufactured for use in communications with main memory along the direct memory access (DMA) path. DDIO circuitry may be configured to secure DDIO data by using DDIO encryption circuitry manufactured for use by or manufactured within the DDIO circuitry. Enabling encryption and decryption in the DDIO path by the DDIO circuitry has the potential to close a security gap in modern data central processor units (CPUs).
Gateway fabric ports
A gateway for interfacing a host with a subsystem for acting as a work accelerator to the host. The gateway enables the transfer of batches of data to the subsystem at precompiled data exchange synchronisation points. The gateway acts to route data between accelerators which are connected in a scaled system of multiple gateways and accelerators using a global address space set up at compile time of an application to run on the computer system.
Last-level collective hardware prefetching
A last-level collective hardware prefetcher (LLCHP) is described. The LLCHP is to detect a first off-chip memory access request by a first processor core of a plurality of processor cores. The LLCHP is further to determine, based on the first off-chip memory access request, that first data associated with the first off-chip memory access request is associated with second data of a second processor core of the plurality of processor cores. The LLCHP is further to prefetch the first data and the second data based on the determination.
Methods and apparatuses involving radar system data paths
Exemplary aspects for a specific example concern a radar system having sensor circuitry including multiple radar sensors to provide sensor data via multiple virtual channels and multiple data types, a memory circuit with memory buffers, and a bus-interface circuit to control bus interconnects for bus communications involving a radar signal transmitter and the memory circuit. Radar signals are received and processed, via data acquisition path circuitry in multiple circuit paths and via streams of data in response to and to accommodate the operations of the sensor circuitry. A master controller conveys data, via the bus-interface circuit, to the buffers for the sensor data, and generates selectable-type transactions to be linked in selected ones of the buffers, in response to the data provided from the sensor circuitry and based on the sensor data being provided via different ones of the multiple virtual channels and of the multiple data types.