Parallel processing of data having data dependencies for accelerating the launch and performance of operating systems and other computing applications
11669526 · 2023-06-06
Assignee
Inventors
- Paul L. Master (Sunnyvale, CA)
- Frederick Curtis Furtek (Menlo Park, CA, US)
- Kim Knuttila (Port Moody, CA)
- L. Brian McGann (Santa Clara, CA, US)
Cpc classification
H03M7/42
ELECTRICITY
G06F3/0679
PHYSICS
H03M7/46
ELECTRICITY
H03M7/40
ELECTRICITY
H03M7/60
ELECTRICITY
International classification
H03M7/30
ELECTRICITY
H03M7/40
ELECTRICITY
H03M7/42
ELECTRICITY
Abstract
Representative embodiments are disclosed for a rapid and highly parallel decompression of compressed executable and other files, such as executable files for operating systems and applications, having compressed blocks including run length encoded (“RLE”) data having data-dependent references. An exemplary embodiment includes a plurality of processors or processor cores to identify a start or end of each compressed block; to partially decompress, in parallel, a selected compressed block into independent data, dependent (RLE) data, and linked dependent (RLE) data; to sequence the independent data, dependent (RLE) data, and linked dependent (RLE) data from a plurality of partial decompressions of a plurality of compressed blocks, to obtain data specified by the dependent (RLE) data and linked dependent (RLE) data, and to insert the obtained data into a corresponding location in an uncompressed file. The representative embodiments are also applicable to other types of data processing for applications having data dependencies.
Claims
1. An apparatus for parallel decompression of a first compressed file, the first compressed file comprising a plurality of compressed blocks having independent data and dependent, run length encoded (“RLE”) data having data-dependent references spanning one or more compressed blocks of the plurality of compressed blocks, the apparatus comprising: one or more processors or processor cores configured to identify a start or an end of a selected compressed block of the plurality of compressed blocks; the one or more processors or processor cores further configured to partially decompress the selected compressed block, of the plurality of compressed blocks, into the independent data, the dependent RLE data, and any linked dependent RLE data; and the one or more processors or processor cores further configured to sequence the independent data, the dependent RLE data, and any linked dependent RLE data from a plurality of partial decompressions of the plurality of compressed blocks to obtain data specified by the dependent RLE data and any linked dependent RLE data, and to insert the obtained data into an uncompressed data file.
2. The apparatus of claim 1, wherein the one or more processors or processor cores are further configured to partially decompress multiple compressed blocks, of the plurality of compressed blocks, in parallel, into the independent data, the dependent RLE data, and any linked dependent RLE data.
3. The apparatus of claim 1, wherein a first processor or processor core of the one or more processors or processor cores, following the identification of the start or the end of each compressed block of the plurality of compressed blocks, is further configured to transfer a single or individuated compressed block to a second processor or processor core of the one or more processors or processor cores.
4. The apparatus of claim 1, wherein the one or more processors or processor cores comprise a plurality of processors or processor cores, the plurality of processors or processor cores operative in a plurality of pipelined stages.
5. The apparatus of claim 1, wherein the one or more processors or processor cores comprise a plurality of processors or processor cores, the plurality of processors or processor cores further configured to identify the start or the end of one or more compressed blocks of the plurality of compressed blocks, in parallel.
6. The apparatus of claim 4, further comprising: a memory circuit coupled to the plurality of processors or processor cores; wherein a group of processors or processor cores of the plurality of processors or processor cores are arranged as a first pipelined stage, of the plurality of pipelined stages, and are further configured to identify the start or the end of a selected compressed block of the plurality of compressed blocks by performing a partial decompression of the selected compressed block, each processor or processor core of the group of processors or processor cores of the first pipelined stage commencing the partial decompression at a starting point in the selected compressed block having a predetermined or variable offset from one or more partial decompression starting points of other processors or processor cores of the first pipelined stage; and wherein at least one of the processors or processor cores of the first pipelined stage is further configured to create and store, in the memory circuit, metadata indicating the start or the end of each compressed block of the plurality of compressed blocks.
7. The apparatus of claim 1, wherein the one or more processors or processor cores are further configured to perform a cyclic redundancy check of the uncompressed data file.
8. The apparatus of claim 1, further comprising: a memory circuit coupled to the one or more processors or processor cores; wherein at least one processor or processor core, of the one or more processors or processor cores, is further configured to compress the uncompressed data file to form a second compressed file having a plurality of second compressed blocks which have data dependencies solely within individual second compressed blocks of the plurality of second compressed blocks and to store the second compressed file in the memory circuit.
9. The apparatus of claim 1, wherein at least one processor or processor core, of the one or more processors or processor cores, is further configured, when linked dependent RLE data are in the plurality of compressed blocks, to tag or identify the linked dependent RLE data; and wherein at least one processor or processor core, of the one or more processors or processor cores, is further configured to use the tag or identification of the linked dependent RLE data to sequence the obtaining of the data specified by the linked dependent RLE data subsequent to the obtaining of the data specified by the dependent RLE data.
10. The apparatus of claim 1, further comprising: a memory circuit configured to store metadata indicating the start or the end of each compressed block of the plurality of compressed blocks.
11. The apparatus of claim 10, wherein the one or more processors or processor cores are further configured to utilize the stored metadata to identify the start or the end of the selected compressed block of the plurality of compressed blocks.
12. A processor-implemented method for parallel decompression of a first compressed file, the first compressed file comprising a plurality of compressed blocks having independent data and dependent, run length encoded (“RLE”) data having data-dependent references spanning one or more compressed blocks of the plurality of compressed blocks, the method comprising: using one or more processors or processor cores, identifying a start or an end of each compressed block of the plurality of compressed blocks; using the one or more processors or processor cores, partially decompressing a selected compressed block, of the plurality of compressed blocks, into the independent data, the dependent RLE data, and any linked dependent RLE data; and using the one or more processors or processor cores, sequencing the independent data, the dependent RLE data, and any linked dependent RLE data from a plurality of partial decompressions of the plurality of compressed blocks, obtaining data specified by the dependent RLE data and any linked dependent RLE data, and inserting the obtained data into an uncompressed data file.
13. The processor-implemented method of claim 12, further comprising: using the one or more processors or processor cores, partially decompressing a corresponding selected compressed block of the plurality of compressed blocks, in parallel, into the independent data, the dependent RLE data, and any linked dependent RLE data.
14. The processor-implemented method of claim 12, further comprising: using the one or more processors or processor cores, identifying the start or the end of a selected compressed block of the plurality of compressed blocks by performing a partial decompression of the selected compressed block; and using the one or more processors or processor cores, creating and storing, in a memory circuit, metadata indicating the start or the end of each compressed block of the plurality of compressed blocks.
15. The processor-implemented method of claim 14, further comprising: using the one or more processors or processor cores, commencing the partial decompression at a starting point in the selected compressed block having a predetermined or variable offset from one or more partial decompression starting points of other processors or processor cores of the one or more processors or processor cores.
16. The processor-implemented method of claim 12, further comprising: using the one or more processors or processor cores, performing a cyclic redundancy check of the uncompressed data file.
17. The processor-implemented method of claim 12, further comprising: using the one or more processors or processor cores, compressing the uncompressed data file to form a second compressed file having a plurality of second compressed blocks which have data dependencies solely within individual second compressed blocks of the plurality of second compressed blocks and storing the second compressed file in a memory circuit.
18. The processor-implemented method of claim 12, further comprising: using the one or more processors or processor cores, when linked dependent RLE data are in the plurality of compressed blocks, tagging or identifying the linked dependent RLE data, and using the tag or identification of the linked dependent RLE data to sequence the obtaining of the data specified by the linked dependent RLE data subsequent to the obtaining of the data specified by the dependent RLE data.
19. The processor-implemented method of claim 12, wherein the method is operative in a smartphone or tablet to accelerate booting of an operating system or to accelerate launching of a computing application.
20. An apparatus for parallel decompression of a compressed file, the compressed file comprising a plurality of compressed blocks having independent data and dependent, run length encoded (“RLE”) data having data-dependent references spanning one or more compressed blocks of the plurality of compressed blocks, the apparatus comprising: a memory circuit; and a plurality of processors or processor cores, at least one processor or processor core of the plurality of processors or processor cores configured to identify a start or an end of a selected compressed block of the plurality of compressed blocks; at least one processor or processor core of the plurality of processors or processor cores configured to partially decompress a selected compressed block, of the plurality of compressed blocks, into the independent data, the dependent RLE data, and linked dependent RLE data; at least one processor or processor core of the plurality of processors or processor cores configured to sequence the independent data, the dependent RLE data, and the linked dependent RLE data from a plurality of partial decompressions of the plurality of compressed blocks; at least one processor or processor core of the plurality of processors or processor cores configured to obtain data specified by the dependent RLE data and linked dependent RLE data; and at least one processor or processor core of the plurality of processors or processor cores configured to insert the obtained data into an uncompressed data file.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The objects, features and advantages of the present invention will be more readily appreciated upon reference to the following disclosure when considered in conjunction with the accompanying drawings, wherein like reference numerals are used to identify identical components in the various views, and wherein reference numerals with alphabetic characters are utilized to identify additional types, instantiations or variations of a selected component embodiment in the various views, in which:
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTION OF REPRESENTATIVE EMBODIMENTS
(7) While the present invention is susceptible of embodiment in many different forms, there are shown in the drawings and will be described herein in detail specific exemplary embodiments thereof, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the invention and is not intended to limit the invention to the specific embodiments illustrated. In this respect, before explaining at least one embodiment consistent with the present invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of components set forth above and below, illustrated in the drawings, or as described in the examples. Methods and apparatuses consistent with the present invention are capable of other embodiments and of being practiced and carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein, as well as the abstract included below, are for the purposes of description and should not be regarded as limiting.
(8) As mentioned above, for purposes of the present disclosure, “independent” data, as used herein, means and refers to data which does not have any data dependencies within that particular or selected sequence, stream or string of data. For example, in data compression, such independent data may also be described as “literal” data, such as a string of characters or digits (such as ASCII code) which are not themselves run length encoded and, therefore, that sequence, stream or string of data does not depend for its interpretation, processing, or decompression upon another, separate or second sequence, stream or string of data. In contrast, “dependent” data, as used herein, means and refers to a sequence, stream or string of data which has data dependencies and, therefore, that sequence, stream or string of data does depend for its interpretation, processing, or decompression upon another, separate or second sequence, stream or string of data, such as being dependent upon independent (e.g., literal) data or further dependent upon other dependent data, forming a chain or linkage of data dependencies, i.e., multiple levels of data dependencies. For example, in data compression, such dependent data may be RLE data which simply consists of and provides a reference to or refers to the position or location (e.g., bit or character position or relative bit or character position) and size (e.g., count or string length) of other data, which may be independent (literal) data or additional RLE data, and which may be within the same or a different block. In addition, also in data compression, such dependent data may be RLE data which simply provides a reference to other RLE data, and in such case, may be referred to herein as “linked” dependent data or “multiple” dependent data (i.e., data having multiple levels of data dependencies), such as linked RLE data consisting of first RLE data which provides a reference to the location and size of another, second sequence, stream or string of data which also includes second RLE data within that second sequence, stream or string of data which provides a reference to the location and size of another, third sequence, stream or string of data, and so on, creating potentially innumerable levels of data dependencies.
(9) Also as mentioned above, while the various representative embodiments are illustrated and discussed with reference to an example of parallel decompression of a compressed file having data-dependent references spanning multiple compressed blocks, such as files compressed using Gzip, those having skill in the art will recognize the wider applicability of the exemplary embodiments, allowing for the accelerated, parallel processing of applications and/or data having data dependencies. For example, the various parallel and pipelined embodiments may also be utilized with other types of data compression/decompression (such as Blosc), encryption/decryption, encoding/decoding (e.g., error correction coding and decoding), hashing/dehashing, etc. All such embodiments are considered equivalent and within the scope of the claims herein.
(10) In addition, the various representative embodiments may be implemented in many different ways. As discussed in greater detail below, the various pipelined and parallel processing may be implemented on a plurality of processors, a plurality of processor cores, or potentially multithreaded (or hyper multithreaded) on a single processor, for example and without limitation. It should be noted, however, that each such embodiment may provide greater or lesser acceleration of the selected application.
(11) In another representative embodiment, each component functionality of the pipelined and parallel processing of the various representative embodiments may be assigned as a corresponding task, in real time, to any processor or processor core. In addition, this may be done deterministically, such as to a next processor or processor core of a predetermined sequence of processor or processor core, or non-deterministically, such as to a next available processor or processor core of a predetermined sequence of processor or processor core, e.g., a round-robin assignment to the next core which happens to be available for the task, all for example and without limitation. Accordingly, and reference herein to a processor or processor core performing a selected functionality should be understood to mean and include a task having that selected functionality which has or will be assigned to a processor or processor core.
(12) As mentioned above, the various embodiments have demonstrated a dramatic improvement in the decompression time required to boot or launch applications, consistently across a wide variety of systems and devices. It should be noted that the representative methodologies are machine-dependent and pertain the actual physical acceleration of the operation of such devices, including smartphones and computing tablets, such as for accelerating the booting or launching of applications stored in a physical memory circuit, and are incapable of being performed by a human being, any thought process, or any pen and paper process. It should also be noted that the needs for the various embodiments have arisen largely due to the use of compressed executable files stored in a nonvolatile memory having comparatively reduced storage capacity, and the noticeable delays caused by the typical serial decompression of these compressed files of the prior art. As such, the various embodiments result in a measurable and dramatic improvement in performance of these various devices, providing a technological improvement in computing speed and computer or processor functioning, and addresses a long-felt technological need, for example and without limitation.
(13) The various representative embodiments may also be implemented at any of a variety of levels, as discussed in greater detail below.
(14)
(15) The various apparatus 200 and 300 embodiments, illustrated in
(16) Not separately illustrated in
(17)
(18) The dashed lines in
(19) For the file decompression example, and as discussed in greater detail below, one or more processor cores 120 are utilized in the distribution pipeline stage 305 and may be referred to as one or more processor cores 120.sub.A, e.g., as one or more distributor processor cores 120.sub.A and/or one or more distributor processor cores 120.sub.A1, 120.sub.A2 through 120.sub.AN; one or more processor cores 120 are utilized in the computation pipeline stage 310 and may be referred to as one or more processor cores 120.sub.B, e.g., as one or more decompression processor cores 120.sub.B and/or one or more decompression processor cores 120.sub.B1, 120.sub.B2 through 120.sub.BN; (for the decompression functionality); one or more processor cores 120 are utilized in the aggregation pipeline stage 315 and may be referred to as one or more processor cores 120.sub.C, processor cores 120.sub.D, and processor cores 120.sub.E, e.g., for the decompression functionality, one or more processor cores 120 are ordering (or “next”) processor cores 120.sub.C, one or more processor cores 120 are aggregation and merge processor cores 120.sub.D and/or one or more aggregation and merge processor cores 120.sub.D1, 120.sub.D2 through 120.sub.DN; and one or more processor cores 120 are CRC (cyclic redundancy check) processor cores 120.sub.E, all for example and without limitation. Any processor core 120 can be utilized for any of these functions and multiple functions, which functionality may be assigned in any way selected by the user, designer or programmer, and may be statically or dynamically assigned, as mentioned above. For example, as used in the aggregation pipeline stage 315, the functionality of an ordering (or “next”) processor core 120.sub.C and/or a CRC (cyclic redundancy check) processor core 120.sub.E may simply be subsumed or incorporated within the functionality of one or more aggregation and merge processor cores 120.sub.D, and are referred to separately solely for ease of description, it being understood that the functionality of an ordering (or “next”) processor core 120.sub.C and/or a CRC (cyclic redundancy check) processor core 120.sub.E may be included within any of the various processor cores 120 of the aggregation pipeline stage 315, as claimed herein.
(20) Continuing with the file decompression example, the pipeline stages may also be described somewhat differently. In the computation pipeline stage 310, for example, all of the processing of data is performed by one or more processor cores 120.sub.C, such as to decompress a selected compressed block and obtain both dependent data (e.g., RLE data) and independent data (e.g., literal data), with the independent data then provided or written in order (by one or more processor cores 120.sub.C) to the memory 140 and stored as a decompressed file 150. In the aggregation pipeline stage 315, for example, all of the remaining processing of dependent data is performed by one or more processor cores 120.sub.D, such as to obtain all of the specified run length encoded data and provide or write it in to the appropriate location in the decompressed file 150 in the memory 140. From this point of view, and for the broader applicability to parallel processing of data having data dependencies, the computation pipeline stage 310 may also be described equivalently as an independent data pipeline stage and the aggregation pipeline stage 315 may also be described equivalently as an dependent data pipeline stage.
(21) Typically for a Gzip application, the file is compressed using a combination of Lempel Ziv (or Lempel Ziv Welch) encoding (such as LZ77) followed by Huffman encoding. Once compressed, the boundaries between blocks are not byte aligned, but are bit boundaries. Typically for a Gzip or Gunzip application, each block has a header block (e.g., 2 bits) which indicates whether the block is uncompressed, or compressed using a fixed or dynamic (custom) Huffman table. When fixed Huffman encoding is utilized, the end of the variable length block has a known bit pattern; conversely, when dynamic or custom Huffman encoding is utilized, the end of the variable length block may have any bit pattern. Following compression, bits having a cyclic redundancy check (CRC) are also added, typically in a first header. The last variable length block of the compressed file will have a header bit set to indicate that it is the last in the sequence of variable length blocks. The only block having a known beginning is the first variable length block, having a known starting point following the first header, and all other variable length blocks have unknown starting (and usually unknown ending) points. The size of the uncompressed output file (150) is also typically unknown prior to the decompression.
(22) Referring to the one or more distributor processor cores 120.sub.A, a distributor processor core 120.sub.A receives the compressed file 130 comprising a plurality of variable length blocks, and determines the beginning and/or end of each variable length block, from the first to the last variable length block, i.e., a distributor processor core 120.sub.A determines the end of the current variable length block, which means it has effectively and automatically determined the start of the next variable length block. Once it finds the end of the current variable length block, a distributor processor core 120.sub.A transfers the current variable length block to the next (or next available) decompression processor core 120.sub.B, while maintaining the ordering or ordering information of the variable length blocks. A distributor processor core 120.sub.A performs decompression, such as building a dynamic Huffman table or using a fixed Huffman table, generally (and preferably only) to the extent needed to find the end of the current variable length block as quickly as practicable. Accordingly, when a distributor processor core 120.sub.A has a Huffman table (or partial Huffman table) for use in decompression (i.e., the current variable length block was not uncompressed), it will also transfer that information to the next (or next available) decompression processor core 120.sub.B, such as by providing a pointer to location in the memory 140 location storing the fixed or dynamic Huffman table, so that a decompression processor core 120.sub.B does not repeat any of the decompression work performed by the one or more distributor processor cores 120.sub.A in an earlier pipeline stage.
(23) When the current variable length block is uncompressed, the current variable length block may be transferred by the distributor processor core 120.sub.A to memory 140, and a pointer to location in the memory 140 location storing the uncompressed block may be provided to the ordering (or “next”) processor cores 120.sub.C, for tracking the location of the uncompressed data within the overall decompressed executable file (150).
(24) As illustrated in
(25) In an exemplary embodiment, for a higher level (level 2) implementation, the one or more distributor processor cores 120.sub.A also record all of the start (or end) of variable length block data, as metadata, for the entire sequence of variable length blocks. This compressed file metadata is determined during the execution of the first decompression and stored in memory (e.g., nonvolatile memory 105), for each such compressed executable file, which may be on installation of the executable file or upon first execution of the executable file, for example and without limitation, and is then stored (e.g., as metadata file 130A, in either or both the first or second memories 130, 140, and illustrated as metadata file 130A stored in the first, nonvolatile memory 130) utilized for all subsequent decompressions of that compressed executable file. Using the stored metadata for the compressed executable file, having all of the start (or end) positions of all of the variable length blocks, for any and all subsequent decompressions, the one or more distributor processor cores 120.sub.A may then locate the start of each variable length block and transfer each such variable length block to a next (or next available) decompression processor core 120.sub.B, without performing any decompression steps whatsoever, resulting in even greater acceleration, on the order of about a 2× to 4× (2 to 4 times) improvement in speed.
(26) Each decompression processor core 120.sub.B, then, receives a single variable length block having compressed data from the one or more distributor processor cores 120.sub.A, and may also receive a pointer to a Huffman table in memory 140, which may be a dynamic or fixed Huffman table. When the decompression processor core 120.sub.B receives only the variable length block having compressed data, it will use the information within the variable length block to create the dynamic or fixed Huffman table for use in the decompression. Using the Huffman table, the decompression processor core 120.sub.B partially decompresses the variable length block having compressed data into two types of data: independent data (e.g., literal data) and dependent data such as run length encoded (RLE) data.
(27) An example of such independent data (e.g., literal data) and dependent data (e.g., run length encoded (RLE) data) is illustrated in
(28) Additional examples are illustrated in
(29) The primary function of the one or more ordering (or “next”) processor cores 120.sub.C of the aggregation pipeline stage 315, however, is to maintain ordering (sequencing) of all of the partially decompressed data provided by each of the decompression processor core 120.sub.B, which may be provided to the ordering (or “next”) cores 120.sub.C out of order, depending upon how rapidly each decompression processor core 120.sub.B partially decompresses its current variable length block, generally with one or more processor cores 120.sub.B operating in parallel within the computation pipeline stage 310.
(30) A number of variations are available for how the partially decompressed data (comprising independent (literal) data and dependent (RLE) data) is provided to and stored in memory 140 to form the uncompressed executable file 150.
(31) In one variation, of the partially decompressed data, the independent (literal) data is moved (written or copied by the decompression processor core 120.sub.B) to the memory 140 in its corresponding location within the decompressed file 150, and the dependent (RLE) data is transferred to the one or more ordering (or “next”) processor cores 120.sub.C. Alternatively, both the independent (literal) data and dependent (RLE) data may be transferred by the decompression processor core 120.sub.B to the one or more ordering (or “next”) processor cores 120.sub.C, with the one or more ordering (or “next”) processor cores 120.sub.C moving or copying the independent (literal) data to the memory 140 in its corresponding location within the decompressed file 150 and providing the dependent (RLE) data to the one or more processor cores 120.sub.D which then use the dependent (RLE) data to complete the decompression and create the complete decompressed file 150. In another variation, the ordered, partially decompressed data (having been properly ordered by the one or more ordering (or “next”) processor cores 120.sub.C) is provided directly to the one or more aggregation and merge processor cores 120.sub.D, which transfer(s) the independent (literal) data into known locations in the memory 140, and knowing those memory locations, then uses the dependent (RLE) data to complete the decompression and create the complete decompressed file 150. In another variation, the partially decompressed data (in any order) generated by the decompression processor cores 120.sub.B is provided directly to the one or more aggregation and merge processor cores 120.sub.D, which also provides the proper ordering of the variable length blocks (eliminating the separate functionality of the one or more processor cores 120.sub.C), which also transfer(s) the independent (literal) data into known locations in the memory 140, and knowing those memory locations, also then uses the dependent (RLE) data to complete the decompression and create the complete decompressed file 150.
(32) The dependent (RLE) data, such as consisting of the RLE location (which is to be filled in) (such as a pointer to a memory 140 location), a starting position (or bit position) and a size (such as number of positions (or bits)) is or has then been provided to the one or more aggregation and merge cores 120.sub.D. The data held in the memory 140, to this point, comprises an ordered sequence of independent (literal) data, with many “holes” to be filled in by the dependent (RLE) data, with references as far back as 32 Kbytes, for example. Stated another way, to this point, the uncompressed data file looks like “swiss cheese”, having ordered and valid independent data and empty regions at known locations to be filled in.
(33) The one or more aggregation and merge processor cores 120.sub.D utilize the dependent (RLE) data, generally in sequence (although not required to be done in sequence) and going back as far as provided by the type of data compression, such as 32 Kbytes for Gzip, locate the specified starting location and copies the specified number of bits or positions into the dependent (RLE) location in the memory 140. As this occurs, the vacancies left in the uncompressed data file are filled in, providing uncompressed and sequenced data needed for use in decompressing subsequent dependent (RLE) data, and the look back “window” for the dependent (RLE) data moves forward.
(34) This aggregation and merge process performed by one or more processor cores 120.sub.D may also be run in parallel across a plurality of aggregation and merge processor cores 120.sub.D, and provides additional acceleration on the order of about a two times (2×) improvement in speed. To do this, however, and maintain proper ordering and data dependencies as may be needed, an additional function is performed by the one or more decompression processor cores 120.sub.B, namely, “tagging” linked or multiply-dependent (RLE) data, which itself references back to (or is linked) to previous dependent (RLE) data. Such a tag or other identifier then informs the aggregation and merge processor cores 120.sub.D of this additional, second-level (or secondary or multiple level) data dependency, such that the linked dependent (RLE) data is then decompressed in sequential order corresponding to the tagged linkage, or more simply, decompressed in any order subsequent to the decompression of the single-level dependent (RLE) data.
(35) Stated another way, the determinations of the independent (literal) data may be done rapidly and in parallel, and in any order, (e.g., by the decompression processor cores 120.sub.B), usually in a computation (or decompression) pipeline stage 310, or in both the distribution pipeline stage 305 and the computation (or decompression) pipeline stage 310. Technically, the determinations of the single-level (non-linked) dependent (RLE) data could also be done rapidly and in parallel, and in any order, (e.g., by the decompression processor cores 120.sub.B), except insofar as the data-dependencies may span multiple compressed blocks, which may not necessarily be decompressed in order by the decompression processor cores 120.sub.B, but may be arriving out of order to the aggregation and merge processor cores 120.sub.D, which then order the partially decompressed blocks. As a result, in exemplary embodiments, once the independent (literal) data has been determined and the blocks have been ordered (e.g., by an ordering (or “next”) processor core 120.sub.C or equivalently by an aggregation and merge processor core 120.sub.D programmed or configured with the block ordering functionality), the determinations of the single-level (non-linked) dependent (RLE) data also may be performed rapidly and in parallel, and in any order, such as by the aggregation and merge processor cores 120.sub.D. When block order has been preserved for decompression of the independent (literal) data, however, the determinations of the dependent (RLE) data also may be done rapidly and in parallel, and in any order, (e.g., by the decompression processor cores 120.sub.B or by the aggregation and merge processor cores 120.sub.D). Following those determinations, the linked dependent (RLE) data is then decompressed, and depending upon the level of linkage, such as a first level dependency on dependent (RLE) data, a second level dependency on linked dependent (RLE) data which has a first level dependency on dependent (RLE) data, a third level dependency on linked dependent (RLE) data which has a second level dependency on linked dependent (RLE) data which has a first level dependency on dependent (RLE) data, etc., those determinations of the linked dependent (RLE) data for any given level could also be done rapidly and in parallel, and in any order within that level or linkage, e.g., all first level decompressions may be done in parallel and in any order, followed sequentially by all second level decompressions being done in parallel and in any order, followed sequentially by all second level decompressions being done in parallel and in any order, etc. Accordingly, depending upon the desired complexity of implementation, the tagging process for linked dependent (RLE) data may also reflect the linkage levels, such as tagging for primary, secondary, tertiary, etc., linkages, which information may also be utilized by the aggregation and merge processor cores 120.sub.D for added parallel processing in decompressing linked dependent (RLE) data.
(36) Following all of the dependent (RLE) data insertions by the one or more processor cores 120.sub.D, a complete, uncompressed file 150 is now available in memory 140 for execution, for example and without limitation. A cyclic redundancy check (CRC) check may then be performed, by either the one or more aggregation and merge processor cores 120.sub.D, or by another processor core 120, such as a CRC processor core 120E.
(37) In an exemplary embodiment, for another higher level (level 3) implementation, and depending upon the memory space available in nonvolatile memory 105, the uncompressed file 150 available in memory 140 may be recompressed, creating a second compressed file 130B which can be utilized to replace the first compressed file 130 in memory 140. The compression, however, is performed slightly differently, such that the second compressed file 130B does not have inter-block data dependencies. For example, the second compressed file 130B may consist of a plurality of blocks (typically having a size on the order of several megabytes, each of which may be comprised of smaller sub-blocks, typically on the order of 32 kbytes each), with data dependencies occurring solely within such a larger block (but data dependencies may occur across sub-blocks). When used subsequently, these recompressed blocks of the second compressed file 130B may be transferred directly to one or more decompression processor cores 120.sub.B, each of which may then decompress the block it has received, independently and in parallel, into independent (literal) data and dependent (RLE) data, and which may further directly use the dependent (RLE) data to form the uncompressed file 150. Preferably, metadata indicating or specifying the block boundaries of these recompressed blocks of the second compressed file 130B is also maintained, to allow for the ready distribution of the blocks directly to the one or more decompression processor cores 120.sub.B.
(38)
(39)
(40) As the variable length blocks are determined and transferred by the one or more distributor processor cores 120.sub.A, in step 220, typically operating in parallel, each of the one or more decompression processor cores 120.sub.B receives a variable length block (step 235) in the computation (or decompression) pipeline stage 310, as they are being generated in the distribution pipeline stage 305, and commences partial decompression or other processing, step 240, also typically operating in parallel. The partial decompression may include generating a Huffman table (if not generated in step 215 and transferred with the variable length block), and using the Huffman table, generating independent (literal) data and dependent (RLE) data, including any linked dependent (RLE) data. Multiply-dependent or linked dependent (RLE) data may also be tagged, as mentioned above, as part of the partial decompression of step 240. The independent (literal) data and dependent (RLE) data is then transferred for further processing by another processor core 120 or into memory, or both, depending upon the implementation, as mentioned above, step 245. The partial decompression and transferring of steps 240 and 245 is repeated until the end of the block is reached, step 250, at which point that decompression processor core 120.sub.B may receive another variable length block for partial decompression, returning to step 235, with the process continuing until the last block has been received and processed (i.e., no more variable length blocks have been received) in step 252, and the decompression or other processing, transformation or computation may end, return step 254.
(41) The independent (literal) data and dependent (RLE) data (including linked dependent (RLE) data) is then ordered, step 255, such as by an ordering (or “next”) core 120.sub.C ordering or sequencing the independent (literal) data and dependent (RLE) data from a plurality of variable length blocks, provided by the one or more decompression processor cores 120.sub.B, and is typically performed by one or more ordering (or “next”) processor cores 120.sub.C. The independent (literal) data is then stored in memory (140), step 260, and the uncompressed data identified by the dependent (RLE) data is obtained and inserted in the corresponding dependent (RLE) location in the uncompressed file held in memory 140, by the one or more aggregation and merge processor cores 120.sub.D, step 265. When additional variable length blocks are being processed, the method continues to perform steps 255, 260 and 265. When the last block has been processed, step 270, if not stored previously in step 228, any generated metadata may be stored for future use, step 275, the uncompressed file is checked (e.g., CRC checking as mentioned above) and output or otherwise available for execution, step 280. As an option, as mentioned above, the uncompressed file may be recompressed into blocks which do not have inter-block data dependencies to form a second compressed file 130B, which may then be stored in memory 105, step 285. Following step 280 or optional step 285, the methodology is complete and may end, step 290. It should be noted that the step of storing any metadata may also be performed as part of or after step 220, such as by a distributor processor core 120.sub.A, as each start or end of a compressed block is determined. Also, as mentioned above, many of the steps outlined in
(42) The present disclosure is to be considered as an exemplification of the principles of the invention and is not intended to limit the invention to the specific embodiments illustrated. In this respect, it is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of components set forth above and below, illustrated in the drawings, or as described in the examples. Systems, methods and apparatuses consistent with the present invention are capable of other embodiments and of being practiced and carried out in various ways.
(43) Although the invention has been described with respect to specific embodiments thereof, these embodiments are merely illustrative and not restrictive of the invention. In the description herein, numerous specific details are provided, such as examples of electronic components, electronic and structural connections, materials, and structural variations, to provide a thorough understanding of embodiments of the present invention. One skilled in the relevant art will recognize, however, that an embodiment of the invention can be practiced without one or more of the specific details, or with other apparatus, systems, assemblies, components, materials, parts, etc. In other instances, well-known structures, materials, or operations are not specifically shown or described in detail to avoid obscuring aspects of embodiments of the present invention. In addition, the various Figures are not drawn to scale and should not be regarded as limiting.
(44) Reference throughout this specification to “one embodiment”, “an embodiment”, or a specific “embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention and not necessarily in all embodiments, and further, are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics of any specific embodiment of the present invention may be combined in any suitable manner and in any suitable combination with one or more other embodiments, including the use of selected features without corresponding use of other features. In addition, many modifications may be made to adapt a particular application, situation or material to the essential scope and spirit of the present invention. It is to be understood that other variations and modifications of the embodiments of the present invention described and illustrated herein are possible in light of the teachings herein and are to be considered part of the spirit and scope of the present invention.
(45) It will also be appreciated that one or more of the elements depicted in the Figures can also be implemented in a more separate or integrated manner, or even removed or rendered inoperable in certain cases, as may be useful in accordance with a particular application. Integrally formed combinations of components are also within the scope of the invention, particularly for embodiments in which a separation or combination of discrete components is unclear or indiscernible. In addition, use of the term “coupled” herein, including in its various forms such as “coupling” or “couplable”, means and includes any direct or indirect electrical, structural or magnetic coupling, connection or attachment, or adaptation or capability for such a direct or indirect electrical, structural or magnetic coupling, connection or attachment, including integrally formed components and components which are coupled via or through another component.
(46) A processor 110 may be any type of processor, and may be embodied as one or more processors 110, configured, designed, programmed or otherwise adapted to perform the functionality discussed herein. As the term processor is used herein, a processor 110 may include use of a single integrated circuit (“IC”), or may include use of a plurality of integrated circuits or other components connected, arranged or grouped together, such as controllers, microprocessors, digital signal processors (“DSPs”), parallel processors, multiple core processors, custom ICs, application specific integrated circuits (“ASICs”), field programmable gate arrays (“FPGAs”), adaptive computing ICs, associated memory (such as RAM, DRAM and ROM), and other ICs and components, whether analog or digital. As a consequence, as used herein, the term processor should be understood to equivalently mean and include a single IC, or arrangement of custom ICs, ASICs, processors, microprocessors, controllers, FPGAs, adaptive computing ICs, or some other grouping of integrated circuits which perform the functions discussed below, with associated memory, such as microprocessor memory or additional RAM, DRAM, SDRAM, SRAM, MRAM, ROM, FLASH, EPROM or E.sup.2PROM. A processor (such as processor 110), with its associated memory, may be adapted or configured (via programming, FPGA interconnection, or hard-wiring) to perform the methodology of the invention, as discussed above. For example, the methodology may be programmed and stored, in a processor 110 with its associated memory (and/or memory 140) and other equivalent components, as a set of program instructions or other code (or equivalent configuration or other program) for subsequent execution when the processor is operative (i.e., powered on and functioning). Equivalently, when the processor 110 may implemented in whole or part as FPGAs, custom ICs and/or ASICs, the FPGAs, custom ICs or ASICs also may be designed, configured and/or hard-wired to implement the methodology of the invention. For example, the processor 110 may be implemented as an arrangement of analog and/or digital circuits, controllers, microprocessors, DSPs and/or ASICs, collectively referred to as a “controller”, which are respectively hard-wired, programmed, designed, adapted or configured to implement the methodology of the invention, including possibly in conjunction with a memory 140.
(47) The memory 140 and/or memory 105, which may include a data repository (or database), may be embodied in any number of forms, including within any computer or other machine-readable data storage medium, memory device or other storage or communication device for storage or communication of information, currently known or which becomes available in the future, including, but not limited to, a memory integrated circuit (“IC”), or memory portion of an integrated circuit (such as the resident memory within a processor 110), whether volatile or non-volatile, whether removable or non-removable, including without limitation RAM, FLASH, DRAM, SDRAM, SRAM, MRAM, FeRAM, ROM, EPROM or E.sup.2PROM, or any other form of memory device, such as a magnetic hard drive, an optical drive, a magnetic disk or tape drive, a hard disk drive, other machine-readable storage or memory media such as a floppy disk, a CDROM, a CD-RW, digital versatile disk (DVD) or other optical memory, or any other type of memory, storage medium, or data storage apparatus or circuit, which is known or which becomes known, depending upon the selected embodiment. The memory 140 may be adapted to store various look up tables, parameters, coefficients, other information and data, programs or instructions (of the software of the present invention), and other types of tables such as database tables.
(48) As indicated above, the processor 110 is hard-wired or programmed, using software and data structures of the invention, for example, to perform the methodology of the present invention. As a consequence, the system and method of the present invention may be embodied as software which provides such programming or other instructions, such as a set of instructions and/or metadata embodied within a non-transitory computer readable medium, discussed above. In addition, metadata may also be utilized to define the various data structures of a look up table or a database. Such software may be in the form of source or object code, by way of example and without limitation. Source code further may be compiled into some form of instructions or object code (including assembly language instructions or configuration information). The software, source code or metadata of the present invention may be embodied as any type of code, such as C, C++, SystemC, LISA, XML, Java, Brew, SQL and its variations (e.g., SQL 99 or proprietary versions of SQL), DB2, Oracle, or any other type of programming language which performs the functionality discussed herein, including various hardware definition or hardware modeling languages (e.g., Verilog, VHDL, RTL) and resulting database files (e.g., GDSII). As a consequence, a “construct”, “program construct”, “software construct” or “software”, as used equivalently herein, means and refers to any programming language, of any kind, with any syntax or signatures, which provides or can be interpreted to provide the associated functionality or methodology specified (when instantiated or loaded into a processor or computer and executed, including the processor 110, for example).
(49) The software, metadata, or other source code of the present invention and any resulting bit file (object code, database, or look up table) may be embodied within any tangible, non-transitory storage medium, such as any of the computer or other machine-readable data storage media, as computer-readable instructions, data structures, program modules or other data, such as discussed above with respect to the memory 140, e.g., a floppy disk, a CDROM, a CD-RW, a DVD, a magnetic hard drive, an optical drive, or any other type of data storage apparatus or medium, as mentioned above.
(50) Furthermore, any signal arrows in the drawings/Figures should be considered only exemplary, and not limiting, unless otherwise specifically noted. Combinations of components of steps will also be considered within the scope of the present invention, particularly where the ability to separate or combine is unclear or foreseeable. The disjunctive term “or”, as used herein and throughout the claims that follow, is generally intended to mean “and/or”, having both conjunctive and disjunctive meanings (and is not confined to an “exclusive or” meaning), unless otherwise indicated. As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise. Also as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
(51) The foregoing description of illustrated embodiments of the present invention, including what is described in the summary or in the abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed herein. From the foregoing, it will be observed that numerous variations, modifications and substitutions are intended and may be effected without departing from the spirit and scope of the novel concept of the invention. It is to be understood that no limitation with respect to the specific methods and apparatus illustrated herein is intended or should be inferred. It is, of course, intended to cover by the appended claims all such modifications as fall within the scope of the claims.