Patent classifications
G06F2213/28
MEMORY DEVICE FOR WAFER-ON-WAFER FORMED MEMORY AND LOGIC
A memory device includes an array of memory cells configured on a die or chip and coupled to sense lines and access lines of the die or chip and a respective sense amplifier configured on the die or chip coupled to each of the sense lines. Each of a plurality of subsets of the sense lines is coupled to a respective local input/output (I/O) line on the die or chip for communication of data on the die or chip and a respective transceiver associated with the respective local I/O line, the respective transceiver configured to enable communication of the data to one or more device off the die or chip.
Techniques for managing context information for a storage device
Disclosed herein are techniques for managing context information for data stored within a non-volatile memory of a computing device. According to some embodiments, the method can include (1) loading, into a volatile memory of the computing device, the context information from the non-volatile memory, where the context information is separated into a plurality of silos, (2) writing transactions into a log stored within the non-volatile memory, and (3) each time a condition is satisfied: (i) identifying a next silo of the plurality of silos to be written into the non-volatile memory, (ii) updating the next silo to reflect the transactions that apply to the next silo, and (iii) writing the next silo into the non-volatile memory. In turn, when an inadvertent shutdown of the computing device occurs, the silos of which the context information is comprised can be sequentially accessed and restored in an efficient manner.
SECURE MEMORY ISOLATION FOR SECURE ENDPOINTS
A single input/output (I/O) controller for both secure partitionable endpoints (PEs) and non-secure PEs is enabled in a trusted execution environment (TEE) where secure memory portions are isolated from non-secure PEs. Security attributes for certain endpoints indicate secure memory access privilege of owning entities of the certain endpoints. A security monitor has exclusive access to the address translation control tables (TCE) stored in secure memory associated with a secure endpoint. When owning entity reassignment occurs, the endpoints are reinitialized to support a change in ownership from an outgoing owning entity having secure memory access and an incoming owning entity not having secure memory access.
USING A HARDWARE SEQUENCER IN A DIRECT MEMORY ACCESS SYSTEM OF A SYSTEM ON A CHIP
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
Using a hardware sequencer in a direct memory access system of a system on a chip
In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
System, method. and electronic device for cloud-based configuration of FPGA configuration data
Embodiments of the present invention provide a system, a method, and an electronic device for the cloud-based configuration of FPGA configuration data. The system includes a control module internal to an FPGA and a storage module external to the FPGA. The storage module is configured to store configuration data transmitted from a cloud, and the control module is configured to retrieve the configuration data from the storage module and to configure a corresponding processing unit of the FPGA according to the configuration data. In the embodiments of the present invention, the control module internal to the FPGA is provided, and configuration data is retrieved from the storage module external to the FPGA to configure the corresponding processing unit of the FPGA. Accordingly, during FPGA data migration, the configuration data stored in the external storage module can be directly migrated by using a general data migration method, thereby implementing live migration of FPGA data.
Address translation services buffer
An address translation buffer or ATB is provided for emulating or implementing the PCIe (Peripheral Component Interface Express) ATS (Address Translation Services) protocol within a PCIe-compliant device. The ATB operates in place of (or in addition to) an address translation cache (ATC), but is implemented in firmware or hardware without requiring the robust set of resources associated with a permanent hardware cache (e.g., circuitry for cache control and lookup). A component of the device (e.g., a DMA engine) requests translation of an untranslated address, via a host input/output memory management unit for example, and the response (including a translated address) is stored in the ATB for use for a single DMA operation (which may involve multiple transactions across the PCIe bus).
Transposing Memory Layout of Weights in Deep Neural Networks (DNNs)
A compute block includes a DMA engine that reads data from an external memory and write the data into a local memory of the compute block. An MAC array in the compute block may use the data to perform convolutions. The external memory may store weights of one or more filters in a memory layout that comprises a sequence of sections for each filter. Each section may correspond to a channel of the filter and may store all the weights in the channel. The DMA engine may convert the memory layout to a different memory layout, which includes a sequence of new sections for each filter. Each new section may include a weight vector that includes a sequence of weights, each of which is from a different channel. The DMA engine may also compress the weights, e.g., by removing zero valued weights, before the conversion of the memory layout.
SYSTEM AND METHOD FOR FACILITATING EFFICIENT MANAGEMENT OF DATA STRUCTURES STORED IN REMOTE MEMORY
A system and method are provided for facilitating efficient management of data structures stored in remote memory. During operation, the system receives a request to allocate memory for a first part in a data structure stored in a remote memory associated with a compute node in a network. The system pre-allocates a buffer in the remote memory for a plurality of parts in the data structure and stores a first local descriptor associated with the buffer in a local worker table stored in a volatile memory of the compute node. The first local descriptor facilitates servicing future access requests to the first and other parts in the data structure. The system stores a first global descriptor for the buffer in a shared global table stored in the remote memory and generates a first reference corresponding to the first part, thereby facilitating faster traversals of the data structure.
Multi-socket network interface controller with consistent transaction ordering
Computing apparatus includes a host computer, including multiple non-uniform memory access (NUMA) nodes, including at least first and second NUMA nodes, which include first and second local memories and first and second host bus interfaces for connection to first and second peripheral component buses, respectively. A network interface controller (NIC) is to receive a definition of a memory region extending over respective first and second parts of the first and second local memories and to receive a memory mapping with respect to the memory region that is applicable to both the first and second local memories, and to apply the memory mapping in writing data to the memory region via first and second NIC bus interfaces in a sequence of direct memory access (DMA) transactions to the respective first and second parts of the first and second local memories in response to packets received through a network port.