G06F9/544

ACCELERATION OF COMMUNICATIONS

Examples described herein relate to a network interface device that includes packet processing circuitry and circuitry. In some examples, the circuitry is to execute a first process of partitioned processes to provide a remote procedure call (RPC) interface for a second process. In some examples, the second process of the partitioned processes includes a business logic. In some examples, the partitioned processes comprise resource and deployment definition are based on an Interface Description Language (IDL) and a memory allocation.

Allocation of memory resources to SIMD workgroups

A memory subsystem for use with a single-instruction multiple-data (SIMD) processor comprising a plurality of processing units configured for processing one or more workgroups each comprising a plurality of SIMD tasks, the memory subsystem comprising: a shared memory partitioned into a plurality of memory portions for allocation to tasks that are to be processed by the processor; and a resource allocator configured to, in response to receiving a memory resource request for first memory resources in respect of a first-received task of a workgroup, allocate to the workgroup a block of memory portions sufficient in size for each task of the workgroup to receive memory resources in the block equivalent to the first memory resources.

DATA PROCESSING SYSTEM, OPERATING METHOD THEREOF, AND COMPUTING SYSTEM USING THE SAME
20230061729 · 2023-03-02 ·

A data processing system includes a controller configured to receive a neural network operation processing request from a host device; and an in-memory computing device including a plurality of processing elements. The in-memory computing device is configured to receive an input feature map and a weight filter from the controller, and perform a neural network operation in the plurality of processing elements based on the weight filter and a plurality of division maps generated from the input feature map, wherein the in-memory computing device performs the neural network operation by not moving a reused element, which is operated at least twice among elements constituting the division maps during the neural network operation, between the processing elements.

ENSURING KEY EVENT DELIVERY TO A HOST FROM A CLIENT DURING A HIGH EVENT RATE
20230064833 · 2023-03-02 ·

A baseboard management controller in an information handling system receives a human interface device input and converts the human interface device input into a human interface device scan code for storage at a human interface device descriptor buffer. The baseboard management controller determines whether or not the human interface device descriptor buffer is full and copying, when the human interface device descriptor buffer is full, the human interface scan codes in the human interface device descriptor buffer to a memory segment and sending an error code to a host system. The baseboard management controller also clears the human interface device descriptor buffer and re-populates the human interface device descriptor buffer with the human interface device scan code and the human interface device scan codes from the memory segment.

PARALLEL MATRIX MULTIPLICATION TECHNIQUE OPTIMIZED FOR MEMORY FETCHES

A matrix multiplication circuit comprises a memory storage device, processing circuitry, a parallel multiply circuit, and buffer circuits. The parallel multiply circuit simultaneously performs a count of multiplies in a parallel multiplication operation. The buffer circuits include prefetch buffer circuits each having a storage array dimension corresponding to the count of multiplies in the parallel multiplication operation. The processing circuitry loads a first prefetch buffer circuit with values from the first matrix; fetches a value of the second matrix and, in parallel with the fetch, preload the second prefetch buffer circuit with another value from the first matrix; initiates a parallel multiply of the fetched value of the second matrix and the values in the first prefetch buffer circuit; and stores partial product results of the parallel multiply, including adding a current partial product result to a previously stored partial product result.

HIGH-SPEED SCANNING PARSER FOR SCALABLE COLLECTION OF STATISTICS AND USE IN PREPARING DATA FOR MACHINE LEARNING

A parser is deployed early in a machine learning pipeline to read raw data and collect useful statistics about the raw data's content to determine which items of raw data exhibit a proxy for feature importance for the machine learning model. The parser operates at high speeds that approach the disk's absolute throughput while utilizing a small memory footprint. Utilization of the parser enables the machine learning pipeline to receive a fraction of the total raw data that would otherwise be available. Several scans through the data are performed, by which proxies for feature importance are indicated and irrelevant features may be discarded and thereby not forwarded to the machine learning pipeline. This reduces the amount of memory and other hardware resources used at the server and also expedites the machine learning process.

HARDWARE ASSISTED EFFICIENT MEMORY MANAGEMENT FOR DISTRIBUTED APPLICATIONS WITH REMOTE MEMORY ACCESSES
20230114263 · 2023-04-13 ·

Systems, apparatuses and methods may provide for technology that uses centralized hardware to detect a local allocation request associated with a local thread, detect a remote allocation request associated with a remote thread, wherein the remote allocation request bypasses a remote operating system, and process the local allocation request and the remote allocation request with respect to central heap, wherein the central heap is shared by the local thread and the remote thread. The local allocation request and the remote allocation request may include one or more of a first request to allocate a memory block of a specified size, a second request to allocate multiple memory blocks of a same size, a third request to resize a previously allocated memory block, or a fourth request to deallocate the previously allocated memory block.

TOPOLOGY OF ACCELERATORS
20230111351 · 2023-04-13 · ·

A topology of accelerators is provided, including a plurality of accelerators and a broadcast buffer. Each of the plurality of accelerators corresponds to a first memory and obtain input data from an external second memory respectively, wherein the accelerator can only directly access its corresponding first memory, and the broadcast buffer is coupled between one of the plurality of accelerators and the corresponding first memory. When receiving a write command and the input data from the accelerator to which it is coupled, the broadcast buffer is configured to write the input data into the corresponding first memory according to the write command, and when broadcast is enabled, the broadcast buffer is configured to broadcast the write command and the weight data in the input data. This application can improve the access performance of the accelerators and reduce the access delay.

Systems and methods for accelerating data computation

Systems and methods for precomputing data and storing cache objects corresponding to the precomputed data are described. A system creates a new cache object when a user interacts with the system. The system precomputes formulas in the newly created cache object by replacing the formulas with corresponding calculated values. The system precomputes the formulas in the background (i.e., the user is not presented with the precomputed values while the user is manipulating the data). The system may persistently store a precomputed version cache object in a dedicated version cache storage for later use. If updates are performed to the structure and/or values of a version represented in a precomputed version cache object, effected parts of the version cache object are invalidated by replacing calculated values with the underlying formulas.

CONSOLE COMMAND COMPOSITION
20230116173 · 2023-04-13 ·

Techniques for facilitating the composition of console commands for storage systems and appliances. The techniques include receiving a command prefix at a management console and accessing a plurality of first parameter designations associated with the command prefix from a first hierarchical level of a command tree. The techniques include receiving a selection of a first parameter designation from among the first parameter designations and accessing a plurality of second parameter designations associated with the first parameter designation from a second hierarchical level of the command tree. The techniques include receiving a selection of a second parameter designation from among the second parameter designations and merging the command prefix, the first parameter designation, and the second parameter designation to form a console command for performing a specified task or operation. The techniques include performing the specified task or operation by executing the console command, which may have its own parameters.