Patent classifications
H03M7/6017
Technologies for providing accelerated functions as a service in a disaggregated architecture
Technologies for providing accelerated functions as a service in a disaggregated architecture include a compute device that is to receive a request for an accelerated task. The task is associated with a kernel usable by an accelerator sled communicatively coupled to the compute device to execute the task. The compute device is further to determine, in response to the request and with a database indicative of kernels and associated accelerator sleds, an accelerator sled that includes an accelerator device configured with the kernel associated with the request. Additionally, the compute device is to assign the task to the determined accelerator sled for execution. Other embodiments are also described and claimed.
STORAGE DEVICE
The storage device includes a first memory, a process device that stores data in the first memory and reads the data from the first memory, and an accelerator that includes a second memory different from the first memory. The accelerator stores compressed data stored in one or more storage drives storing data, in the second memory, decompresses the compressed data stored in the second memory to generate plaintext data, extracts data designated in the process device from the plaintext data, and transmits the extracted designated data to the first memory.
TECHNOLOGIES FOR DIVIDING WORK ACROSS ACCELERATOR DEVICES
Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
Method and apparatus for high performance compression and decompression
An apparatus and method for performing efficient lossless compression. For example, one embodiment of an apparatus comprises: first compression circuitry to identify and replace one or more repeated bit strings from an input data stream with distances to the one or more repeated bit strings, the first compression circuitry to generate a first compressed data stream comprising literal-length data identifying a first instance of each repeated bit string and distance data comprising distances from the first instance to each repeated instance of the repeated bit string; second compression circuitry to perform sorting, tree generation, and length calculations for literal-length values and distance values of the first compressed data stream, the second compression circuitry comprising: variable length code mapping circuitry to map each literal-length value and distance value to a variable length code; header generation circuitry to generate a header for a final compressed bit stream using the length calculations; and a transcoder to substitute the variable length codes in place of the literal-length and distance values to generate a compressed bit stream body, wherein the transcoder operates in parallel with the header generation circuitry; and bit stream merge circuitry to combine the header with the compressed bit stream body to generate a final lossless compressed bitstream.
METHOD AND SYSTEM FOR COMPRESSING APPLICATION DATA FOR OPERATIONS ON MULTI-CORE SYSTEMS
A system and method to compress application control data, such as weights for a layer of a convolutional neural network, is disclosed. A multi-core system for executing at least one layer of the convolutional neural network includes a storage device storing a compressed weight matrix of a set of weights of the at least one layer of the convolutional network and a decompression matrix. The compressed weight matrix is formed by matrix factorization and quantization of a floating point value of each weight to a floating point format. A decompression module is operable to obtain an approximation of the weight values by decompressing the compressed weight matrix through the decompression matrix. A plurality of cores executes the at least one layer of the convolutional neural network with the approximation of weight values to produce an inference output.
COMPRESSION AND DECOMPRESSION ENGINES AND COMPRESSED DOMAIN PROCESSORS
Compressed domain processors configured to perform operations on data compressed in a format that preserves order. The Compressed domain processors may include operations such as addition, subtraction, multiplication, division, sorting, and searching. In some cases, compression engines for compressing the data into the desired formats are provided.
System and components for encoding integers
A system for encoding and decoding data-tokens. In some examples, the system may be configured to encode and decode integers. In other cases, the system may be configured to encode and decode symbols or bytes of data.
Realtime multimodel lossless data compression system and method
Methods and systems for processing telemetry data that contains multiple data types is disclosed. Optimum multimodal encoding approaches can be used which can achieve data-specific compression performance for heterogeneous datasets by distinguishing data types and their characteristics at real-time and applying most effective compression method to a given data type. Using an optimum encoding diagram for heterogeneous data, a data classification algorithm classifies input data blocks into predefined categories, such as Unicode, telemetry, RCS and IR for telemetry datasets, and a class of unknown which includes non-studied data types, and then assigns them into corresponding compression models.
Method for reducing read ports and accelerating decompression in memory systems
A decompression system includes a first memory including a first write port configured to receive decompressed data from a decompressor, and a first read port configured to receive a back-reference read request, the first memory being configured to output the decompressed data to the decompressor in response to receiving the back-reference read request at the first read port, and a second memory including a second write port electrically coupled to the first write port and configured to receive the decompressed data, the second memory being configured to buffer the decompressed data for retrieval by a receiver.
Technologies for dividing work across accelerator devices
Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.