H03M7/42

Parallel processing of data having data dependencies for accelerating the launch and performance of operating systems and other computing applications

Representative embodiments are disclosed for a rapid and highly parallel decompression of compressed executable and other files, such as executable files for operating systems and applications, having compressed blocks including run length encoded (“RLE”) data having data-dependent references. An exemplary embodiment includes a plurality of processors or processor cores to identify a start or end of each compressed block; to partially decompress, in parallel, a selected compressed block into independent data, dependent (RLE) data, and linked dependent (RLE) data; to sequence the independent data, dependent (RLE) data, and linked dependent (RLE) data from a plurality of partial decompressions of a plurality of compressed blocks, to obtain data specified by the dependent (RLE) data and linked dependent (RLE) data, and to insert the obtained data into a corresponding location in an uncompressed file. The representative embodiments are also applicable to other types of data processing for applications having data dependencies.

TECHNOLOGIES FOR OFFLOADING ACCELERATION TASK SCHEDULING OPERATIONS TO ACCELERATOR SLEDS

Technologies for offloading acceleration task scheduling operations to accelerator sleds include a compute device to receive a request from a compute sled to accelerate the execution of a job, which includes a set of tasks. The compute device is also to analyze the request to generate metadata indicative of the tasks within the job, a type of acceleration associated with each task, and a data dependency between the tasks. Additionally the compute device is to send an availability request, including the metadata, to one or more micro-orchestrators of one or more accelerator sleds communicatively coupled to the compute device. The compute device is further to receive availability data from the one or more micro-orchestrators, indicative of which of the tasks the micro-orchestrator has accepted for acceleration on the associated accelerator sled. Additionally, the compute device is to assign the tasks to the one or more micro-orchestrators as a function of the availability data.

TECHNOLOGIES FOR OFFLOADING ACCELERATION TASK SCHEDULING OPERATIONS TO ACCELERATOR SLEDS

Technologies for offloading acceleration task scheduling operations to accelerator sleds include a compute device to receive a request from a compute sled to accelerate the execution of a job, which includes a set of tasks. The compute device is also to analyze the request to generate metadata indicative of the tasks within the job, a type of acceleration associated with each task, and a data dependency between the tasks. Additionally the compute device is to send an availability request, including the metadata, to one or more micro-orchestrators of one or more accelerator sleds communicatively coupled to the compute device. The compute device is further to receive availability data from the one or more micro-orchestrators, indicative of which of the tasks the micro-orchestrator has accepted for acceleration on the associated accelerator sled. Additionally, the compute device is to assign the tasks to the one or more micro-orchestrators as a function of the availability data.

Technologies for providing accelerated functions as a service in a disaggregated architecture

Technologies for providing accelerated functions as a service in a disaggregated architecture include a compute device that is to receive a request for an accelerated task. The task is associated with a kernel usable by an accelerator sled communicatively coupled to the compute device to execute the task. The compute device is further to determine, in response to the request and with a database indicative of kernels and associated accelerator sleds, an accelerator sled that includes an accelerator device configured with the kernel associated with the request. Additionally, the compute device is to assign the task to the determined accelerator sled for execution. Other embodiments are also described and claimed.

Technologies for providing accelerated functions as a service in a disaggregated architecture

Technologies for providing accelerated functions as a service in a disaggregated architecture include a compute device that is to receive a request for an accelerated task. The task is associated with a kernel usable by an accelerator sled communicatively coupled to the compute device to execute the task. The compute device is further to determine, in response to the request and with a database indicative of kernels and associated accelerator sleds, an accelerator sled that includes an accelerator device configured with the kernel associated with the request. Additionally, the compute device is to assign the task to the determined accelerator sled for execution. Other embodiments are also described and claimed.

TECHNOLOGIES FOR DIVIDING WORK ACROSS ACCELERATOR DEVICES

Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.

TECHNOLOGIES FOR DIVIDING WORK ACROSS ACCELERATOR DEVICES

Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.

CONTEXT INITIALIZATION IN ENTROPY CODING

A decoder includes an entropy decoder configured to derive a number of bins of the binarizations from the data stream using binary entropy decoding by selecting a context among different contexts and updating probability states associated with the different contexts, dependent on previously decoded portions of the data stream; a desymbolizer configured to debinarize the binarizations of the syntax elements to obtain integer values of the syntax elements; a reconstructor configured to reconstruct the video based on the integer values of the syntax elements using a quantization parameter, wherein the entropy decoder is configured to distinguish between 126 probability states and to initialize the probability states associated with the different contexts according to a linear equation of the quantization parameter, wherein the entropy decoder is configured to, for each of the different contexts, derive a slope and an offset of the linear equation from first and second four bit parts of a respective 8 bit initialization value.

CONTEXT INITIALIZATION IN ENTROPY CODING

A decoder includes an entropy decoder configured to derive a number of bins of the binarizations from the data stream using binary entropy decoding by selecting a context among different contexts and updating probability states associated with the different contexts, dependent on previously decoded portions of the data stream; a desymbolizer configured to debinarize the binarizations of the syntax elements to obtain integer values of the syntax elements; a reconstructor configured to reconstruct the video based on the integer values of the syntax elements using a quantization parameter, wherein the entropy decoder is configured to distinguish between 126 probability states and to initialize the probability states associated with the different contexts according to a linear equation of the quantization parameter, wherein the entropy decoder is configured to, for each of the different contexts, derive a slope and an offset of the linear equation from first and second four bit parts of a respective 8 bit initialization value.

DATA COMPRESSOR, DATA DECOMPRESSOR, AND DATA COMPRESSION/DECOMPRESSION SYSTEM
20210258020 · 2021-08-19 ·

A technique to prevent a retrieving process of a conversion rule from taking a longer time is provided. Provided are a conversion table in which a predetermined number of entry regions capable of storing a mapping between first data and second data smaller in size than the first data are included, the predetermined number of entry regions are divided into a plurality of bank regions, and each of the plurality of bank regions includes entry regions smaller in number than the predetermined number, a determination unit configured to uniquely determine, among the plurality of bank regions, a bank region corresponding to the first data, and a processing unit configured to search entry regions of the determined bank region the predetermined number of times each or a smaller number of times than the predetermined number each, output, when the second data corresponding to the first data is stored, the second data, and when the second data corresponding to the first data is not stored, register the second data corresponding to the first data in an entry region in which another piece of second data is not stored and output the first data.