Patent classifications
G06F12/0835
Memory architecture for efficient spatial-temporal data storage and access
Described herein are systems, methods, and non-transitory computer readable media for memory address encoding of multi-dimensional data in a manner that optimizes the storage and access of such data in linear data storage. The multi-dimensional data may be spatial-temporal data that includes two or more spatial dimensions and a time dimension. An improved memory architecture is provided that includes an address encoder that takes a multi-dimensional coordinate as input and produces a linear physical memory address. The address encoder encodes the multi-dimensional data such that two multi-dimensional coordinates close to one another in multi-dimensional space are likely to be stored in close proximity to one another in linear data storage. In this manner, the number of main memory accesses, and thus, overall memory access latency is reduced, particularly in connection with real-world applications in which the respective probabilities of moving along any given dimension are very close.
RECORDING A MEMORY VALUE TRACE FOR USE WITH A SEPARATE CACHE COHERENCY PROTOCOL TRACE
A computer system that records a replayable execution trace based on recording cache coherency protocol (CCP) messages into a first trace, and on recording memory snapshot(s) into a second trace. Based on determining that tracing of execution of a first execution context is to be enabled, the computer system initiates logging, into the second trace, of one or more memory snapshots of a memory space of the first execution context, and enables a hardware tracing feature of a processor. Enabling the tracing feature causes the processor to log, into the first trace, CCP message(s) generated in response to one or more memory access into the memory space of the first execution context. After enabling the hardware tracing feature of the processor, the computer system also logs or otherwise handles a write into the memory space of the first execution context by a second execution context.
SYSTEM AND METHOD FOR IMPLEMENTING A NETWORK-INTERFACE-BASED ALLREDUCE OPERATION
An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.
System and method for implementing a network-interface-based allreduce operation
An apparatus is provided that includes a network interface to transmit and receive data packets over a network; a memory including one or more buffers; an arithmetic logic unit to perform arithmetic operations for organizing and combining the data packets; and a circuitry to receive, via the network interface, data packets from the network; aggregate, via the arithmetic logic unit, the received data packets in the one or more buffers at a network rate; and transmit, via the network interface, the aggregated data packets to one or more compute nodes in the network, thereby optimizing latency incurred in combining the received data packets and transmitting the aggregated data packets, and hence accelerating a bulk data allreduce operation. One embodiment provides a system and method for performing the allreduce operation. During operation, the system performs the allreduce operation by pacing network operations for enhancing performance of the allreduce operation.
CACHE COHERENT ACCELERATION FUNCTION VIRTUALIZATION
The embodiments herein describe a virtualization framework for cache coherent accelerators where the framework incorporates a layered approach for accelerators in their interactions between a cache coherent protocol layer and the functions performed by the accelerator. In one embodiment, the virtualization framework includes a first layer containing the different instances of accelerator functions (AFs), a second layer containing accelerator function engines (AFE) in each of the AFs, and a third layer containing accelerator function threads (AFTs) in each of the AFEs. Partitioning the hardware circuitry using multiple layers in the virtualization framework allows the accelerator to be quickly re-provisioned in response to requests made by guest operation systems or virtual machines executing in a host. Further, using the layers to partition the hardware permits the host to re-provision sub-portions of the accelerator while the remaining portions of the accelerator continue to operate as normal.
I/O agent
Techniques are disclosed relating to an I/O agent circuit of a computer system. The I/O agent circuit may receive, from a peripheral component, a set of transaction requests to perform a set of read transactions that are directed to one or more of a plurality of cache lines. The I/O agent circuit may issue, to a first memory controller circuit configured to manage access to a first one of the plurality of cache lines, a request for exclusive read ownership of the first cache line such that data of the first cache line is not cached outside of the memory and the I/O agent circuit in a valid state. The I/O agent circuit may receive exclusive read ownership of the first cache line, including receiving the data of the first cache line. The I/O agent circuit may then perform the set of read transactions with respect to the data.
Protection of data in memory of an integrated circuit using a secret token
Methods, systems, apparatuses, and computer program products are provided for protecting data in a memory of an integrated circuit (IC). A process token is obtained in a special purpose IC from a host that is external to and communicatively connected to the special purpose IC. The process token is stored in a first memory portion of the special purpose IC. In response to receiving a processing request from the host, the processing request is processed, and data generated by processing the processing request is written in a second memory portion of the special purpose IC. When a read request is received to read the data in the second memory portion, a determination is made whether the read request includes a read token that matches the previously stored process token. If the read token matches the process token, the data in the second memory portion may be returned to the host.
SYSTEM, METHOD AND APPARATUS FOR PEER-TO-PEER COMMUNICATION
In an embodiment, an apparatus includes: a first downstream port to couple to a first peer device; a second downstream port to couple to a second peer device; and a peer-to-peer (PTP) circuit to receive a memory access request from the first peer device, the memory access request having a target associated with the second peer device, where the PTP circuit is to convert the memory access request from a coherent protocol to a memory protocol and send the converted memory access request to the second peer device. Other embodiments are described and claimed.
Memory system, method of operating the same and data processing system for supporting address translation using host resource
A memory system includes: a memory device suitable for storing map information; and a controller suitable for storing a portion of the map information in a map cache, and accessing the memory device based on the map information stored in the map cache or accessing the memory device based on a physical address that is selectively provided together with an access request from a host, wherein the map cache includes a write map cache suitable for storing map information corresponding to a write command, and a read map cache suitable for storing map information corresponding to a read command, and wherein the controller provides the host with map information that is outputted from the read map cache.
Coherency tracking apparatus and method for an attached coprocessor or accelerator
An apparatus and method for hybrid software-hardware coherency. An apparatus comprises one or more processing elements to process data; a memory controller to couple the one or more processing elements to a device memory; an interconnect to couple the one or more processing elements to a host processor memory and to couple a host processor to the device memory; one or more device caches to store cache lines read from the host processor memory and/or the device memory; coherency circuitry to manage an ownership indication for each cache line, the ownership indication to be set to a first value to indicate ownership by the host processor and to be set to a second value to indicate ownership by the processing device, wherein the coherency circuitry is to transfer ownership of a first cache line from the processing device to the host processor by updating the ownership indication from the second value to the first value, the coherency circuitry to provide indirect access to the cache line by the processing device while the ownership indication is set to the first value, the coherency circuitry to maintain the ownership indication at the first value until receiving a request to change the ownership indication.