Patent classifications
G06F2213/28
METHODS AND SYSTEMS FOR EXTABLISHING DIRECT COMMUNICATIONS BETWEEN A SERVER COMPUTER AND A SMART NETWORK INTERFACE CONTROLLER
This disclosure describes processes for performing direct memory access (“DMA”) between memory of a host and memory of a smart network interface controller (“SNIC”) connected to a bus of the host. The host runs a host thread in a processor of the host and the SNIC runs a SNIC thread in a processor of the SNIC. The host thread and the SNIC thread facilitate direct access of the SNIC thread to memory locations in the memory of the host. The SNIC thread can fetch data directly from and/or write data directly to the memory locations of the memory of the host over the bus.
CIRCUITRY AND METHODS FOR DIRECT MEMORY ACCESS INSTRUCTION SET ARCHITECTURE SUPPORT FOR FLEXIBLE DENSE COMPUTE USING A RECONFIGURABLE SPATIAL ARRAY
Systems, methods, and apparatuses for direct memory access instruction set architecture support for flexible dense compute using a reconfigurable spatial array are described. In one embodiment, a processor includes a first type of hardware processor core that includes a two-dimensional grid of compute circuits, a memory, and a direct memory access circuit coupled to the memory and the two-dimensional grid of compute circuits; and a second different type of hardware processor core that includes a decoder circuit to decode a single instruction into a decoded single instruction, the single instruction including a first field to identify a base address of two-dimensional data in the memory, a second field to identify a number of elements in each one-dimensional array of the two-dimensional data, a third field to identify a number of one-dimensional arrays of the two-dimensional data, a fourth field to identify an operation to be performed by the two-dimensional grid of compute circuits, and a fifth field to indicate the direct memory access circuit is to move the two-dimensional data indicated by the first field, the second field, and the third field into the two-dimensional grid of compute circuits and the two-dimensional grid of compute circuits is to perform the operation on the two-dimensional data according to the fourth field, and an execution circuit to execute the decoded single instruction according to the fields
SYSTEM AND ARCHITECTURE OF PURE FUNCTIONAL NEURAL NETWORK ACCELERATOR
An accelerator circuit including a control interface to receive a stream of instructions, a first memory to store an input data, and an engine circuit including a dispatch circuit to decode an instruction of the stream of instructions into a plurality of commands, a plurality of queue circuits, each of the plurality of queue circuits supporting a queue data structure to store a respective one of the plurality of commands decoded from the instruction, and a plurality of command execution circuits, each of the plurality of command execution circuits to receive and execute a command extracted from a corresponding one of the plurality of queues.
INTERCONNECT LAYER SEND QUEUE RESERVATION SYSTEM
Systems and methods for an interconnect layer send queue reservation system are provided. In one example, a method involves performing a transfer of data (e.g., an NVLog) from a storage system to a secondary storage system. A send queue having a fixed number of slots is maintained within an interconnect layer interposed between a file system and a Remote Direct Memory Access (RDMA) layer of the storage system. The interconnect layer implements an application programming interface (API) for the reservation system. A deadlock situation is avoided by, during a suspendable phase of a write transaction, making a reservation for slots within the send queue via the reservation system for the transfer of data. When the reservation is successful, the write transaction proceeds with a modify phase, during which the reservation is consumed and the interconnect layer is caused to perform an RDMA operation to carry out the transfer of data.
SEMICONDUCTOR DEVICE
A semiconductor device executes the processing of a neural network. The memory MEM1 holds a plurality of pixel values and j compressed weighting factors. The decompressor DCMP restores the j compressed weighting factors to the uncompressed k (k≥j) weighting factors. The DMA controller DMAC1 reads the j compressed weighting factors from the memory MEM1 and transfers them to the decompressor DCMP. The n (n>k) accumulators in the accumulator unit ACCU multiply a plurality of pixel values and k uncompressed weighting factor to accumulate and add the multiplication results to the time series. A switch circuit SW1 provided between the decompressor DCMP and the accumulator unit ACCU transfers the k uncompressed weighting factors restored by the decompressor DCMP to n accumulators based on the correspondence represented by the identifier.
Automatically generating training data for a lidar using simulated vehicles in virtual space
Automated training dataset generators that generate feature training datasets for use in real-world autonomous driving applications based on virtual environments are disclosed herein. The feature training datasets may be associated with training a machine learning model to control real-world autonomous vehicles. In some embodiments, an occupancy grid generator is used to generate an occupancy grid indicative of an environment of an autonomous vehicle from an imaging scene that depicts the environment. The occupancy grid is used to control the vehicle as the vehicle moves through the environment. In further embodiments, a sensor parameter optimizer may determine parameter settings for use by real-world sensors in autonomous driving applications. The sensor parameter optimizer may determine, based on operation of the autonomous vehicle, an optimal parameter setting of the parameter setting where the optimal parameter setting may be applied to a real-world sensor associated with real-world autonomous driving applications.
System and method for multi-node buffer transfer
A method, computer program product, and computing system for receiving, at a local node, a request to buffer data on a remote persistent cache memory system of a remote node. A target memory address within the remote persistent cache memory system may be sent from the local node via a remote procedure call (RPC). The data may be sent from the local node to the target memory address within the remote persistent cache memory system via a remote direct memory access (RDMA) command.
Method and Apparatus for Embedded Processor to Perform Fast Data Communication, and Storage Medium
A method and an apparatus for an embedded processor to perform fast data communication, and a storage medium are provided. The method comprises: dividing an internal memory into multiple on-chip storage units sequentially assigned with consecutive addresses; configuring a memory interface controller connected to the internal memory, the memory interface controller comprising multiple memory interface control units; configuring an on-chip processor and a DMA controller respectively connected to the memory interface controller, the DMA controller comprising multiple request allocation units in one-to-one correspondence with the memory interface control units; configuring a dedicated functional module connected to the DMA controller, and in a case where the on-chip processor and the DMA controller perform a read and/or write request, the memory interface control unit matches a corresponding on-chip storage unit, so as to read and/or write data in the internal memory, and returns the read data to an original request module.
DYNAMIC MEMORY PROTECTION DEVICE SYSTEM AND METHOD
A microcontroller includes a memory, direct memory access (DMA) controllers and a microprocessor. The microprocessor maintains one or more memory protection (MP) configurations to control access to protected memory areas of the microcontroller. In response to a secure service call of an unsecure user-application, the microprocessor executes a state machine which disables interrupt requests, determining whether DMA controller configurations and MP configurations satisfy secure-service criteria. When the secure-service criteria are satisfied, at least one secure operation associated with the secure service call is performed, and memory areas accessed during the execution of the at least one secure operation are cleaned. The interrupt requests are re-enabled and a response to the secure service call is generated.
Multi-socket network interface controller with consistent transaction ordering
Computing apparatus includes a host computer, including at least first and second host bus interfaces. A network interface controller (NIC) includes a network port, for connection to a packet communication network, and first and second NIC bus interfaces, which communicate via first and second peripheral component buses with the first and second host bus interfaces, respectively. Packet processing logic, in response to packets received through the network port, writes data to the host memory concurrently via both the first and second NIC bus interfaces in a sequence of direct memory access (DMA) transactions, and after writing the data in any given DMA transaction, writes a completion report to the host memory with respect to the given DMA transaction while verifying that the completion report will be available to the CPU only after all the data in the given DMA transaction have been written to the host memory.