Hardware Implementable Data Compression/Decompression Algorithm
20220368345 · 2022-11-17
Inventors
Cpc classification
H03M7/40
ELECTRICITY
International classification
Abstract
A hardware implementable lossless data compression decompression algorithm is disclosed, where the input data string is described in term of consecutive groups of alternating same type bits, where one of these groups of same type bits is defined as a preferred group with the other groups having either lower or higher number of same type bits, where the data string is partitioned into variable length processing strings where the variable length is determined by the occurrence of the preferred group or of a determined number of bits consisting of groups of lower number of same type bits, where these variable length processing strings are processed function of the configuration and content of each processing string only, where consecutive processing strings are additionally processed based on their content only, where processing is performed in a loop until a certain target performance is achieved, where processing is done without any data analysis, and where no negative compression gain is achieved for any content of an input string.
Claims
1. A data compression decompression algorithm or method, comprising: a string of binary bits; wherein said string is described in term of number of bits in consecutive groups of bits; wherein each of the said consecutive groups of bits are groups of bits where the bits are of same bit type as either 0 (0 logic) or 1 (1 logic); wherein said bit type in any two of said consecutive groups of bits are of the opposite type, or alternating from 0 to 1 or from 1 to 0; and a first bit in the said string of bits is used as reference to determine the bit type in every of said consecutive groups of said alternating same type bits.
2. A data compression decompression algorithm or method, comprising: a string of binary bits; wherein said string is described in term of change in-between two consecutive bits, with said change being either constant from bit-to-bit (0-to-0 or 1-to-1), or opposite from bit-to-bit (0-to-1 or 1-to-0); wherein said string, once is described in term of change in-between two consecutive bits is then described according to claim 1.
3. The algorithm of claim 1 further comprising: a group of certain number of bits that have a preferred bit pattern; wherein with respect to said preferred bit pattern, the rest of the bits in the said string of bits form groups characterized either by the same preferred bit pattern or by a different bit pattern; wherein the said different bit pattern is compared to the said preferred bit pattern as having a lower grade or a higher grade; and wherein the said string can have groups of said preferred bit pattern only, or groups of said lower grade and groups of said preferred bit pattern, or groups of said higher grade and groups of said preferred bit pattern, or groups of said lower grade, groups of said higher grade, and groups of said preferred bit pattern.
4. The algorithm of claim 3 wherein: the said lower grade represents any objective measure, such as represents a smaller (lower) binary number when compared to the binary number represented by the said preferred bit pattern, or a smaller (lower) number of binary 1 bits when compared to the number of binary 1 bits within the preferred bit pattern, or a smaller (lower) number of binary 0 bits when compared to the number of binary 0 bits within the said preferred bit pattern, or a smaller (lower) number of same type bits as compared to the number of same type bits in the said preferred bit pattern; and wherein the said higher grade represents any objective measure, such as represents a larger (higher) binary number when compared to the binary number represented by the said preferred bit pattern, or a larger (higher) number of binary 1 bits when compared to the number of binary 1 bits within the said preferred bit pattern, or a larger (higher) number of binary 0 bits when compared to the number of binary 0 bits within the said preferred bit pattern, or a larger (higher) number of same type bits as compared to the number of same type bits in the said preferred bit pattern.
5. The algorithm of claim 3 wherein: the said preferred bit pattern is a group of a certain number of bits of same bit type; wherein the said lower grade is a group of bits of a number of bits of same bit type wherein the number is smaller than the number of bits in said preferred bit pattern; and wherein the said higher grade is a group of bits of a number of bits of same bit type wherein the number is larger than the number of bits in said preferred bit pattern.
6. The algorithm of claim 3 wherein: the said preferred bit pattern is a group of four bits of same bit type; the said lower grade is a group of one, two, or three bits of same bit type; the said higher grade is a group of five or more bits of same bit type.
7. The algorithm of claim 4 comprising: a second group of bits of said preferred bit pattern that is detected in the said string of bits; a third group of bits of said preferred bit pattern or of said higher grade that immediately follows the said second group, wherein said immediately means that in-between the last bit of said second group and the first bit of said third group there are zero bits; wherein the bits in the said third group can be either of same bit type or opposite bit type as the bits in the said second group; and wherein the said third group is called to be of special grade.
8. The algorithm of claim 4 comprising: a first group of said preferred bit pattern or of said higher grade that is detected in the said string of bits; wherein in-between the first bit in the said string and the said first group there are zero bits, meaning that the said first group is first in the said string; and wherein the said first group is called to be of first grade.
9. The algorithm of claim 4, further comprising: a first group of said preferred bit pattern or of said higher grade is detected in the said string of bits; wherein in-between the first bit in the said string and the first bit in said first group there are zero bits, meaning that the said first group is first in the said string; a second group of bits of said preferred bit pattern is detected in the said string of bits; wherein when the said first group does not exist, in-between the first bit in the said string and the first bit in said second group there is at least one bit, and when the said first group exists, said second group follows said first group and in-between the last bit of said first group and first bit of said second group there is at least one bit; a third group of bits of said preferred bit pattern or of said higher grade immediately follows the said second group, wherein said immediately means that in-between the last bit of said second group and the first bit of said third group there are zero bits; wherein the bits in the said third group can be either of same or opposite bit type as the bits in the said second group; a fourth group of said preferred bit pattern which follows the said third group and in-between the last bit of said third group and first bit of said fourth group there is at least one bit; a fifth group of said preferred bit pattern which follows the said fourth group and in-between the last bit of said fourth group and first bit of said fifth group there is at least one bit; wherein at least one of the following pairs exist in the said string, as said first group and said second group, said second group and said first bit and not said first group, said third group and said fourth group, said fourth group and said fifth group, and wherein according to these pairs, in-between said first group and said second group, or in-between said first bit in the said string and said second group, or in-between said third group and said fourth group, or in-between said fourth group and said fifth group there is one or more groups of only said lower grade; and wherein summing all the bits for all groups of said lower grade that exist in-between one of said pairs, a number that is characteristic, or a characteristic number, is formed for the said groups of lower grade, with said characteristic number greater than zero.
10. The algorithm of claim 9, comprising: a set of said characteristic numbers that are accepted for use, wherein said numbers are in-between one and a determined maximum characteristic number greater than one; wherein a group of bits characterized by the said maximum characteristic number does not always end with a said preferred bit pattern; and wherein all other groups of bits characterized by characteristic numbers that are smaller than the said maximum characteristic number end with the said preferred bit pattern or a determined, defined ending.
11. The algorithm of claim 10 wherein: the number of all possible binary combinations of bits in a group characterized by a said accepted characteristic number depends on the said characteristic number; the number of all acceptable binary combinations of bits in a group characterized by a said accepted characteristic number is smaller than said possible binary combinations, when the said characteristic number is larger or equal to the number of bits in the said preferred bit pattern, where said number of acceptable binary combinations is smaller than said possible binary combinations because the possible binary combinations that contain groups of said preferred bit pattern or said higher grade are not accepted as acceptable binary combinations, which acceptable binary combinations contain only groups of said lower grade; the said number of acceptable binary combinations is equal to the said number of possible binary combinations when the said characteristic number is smaller than the number of bits in said preferred bit pattern, where the number of said acceptable binary combinations is equal to said possible binary combinations because since said characteristic number is smaller than said preferred bit pattern, binary combinations within said possible binary combinations that have groups of bits of said preferred bit pattern or said higher grade cannot exist; each of the said acceptable binary combination is uniquely described by an orderly combination of said possible binary combination (orderly possible binary combination) wherein orderly means in a specific controlled order; the difference between the said number of possible binary combinations and the said number of acceptable binary combinations are called remain combinations; each of the said remain combinations is uniquely described by a said orderly possible binary combination that was not used to describe any of the said acceptable binary combinations; and the number of said acceptable binary combinations plus the number of said remain combinations is equal to the number of said possible binary combinations.
12. The algorithm of claim 11 further comprising: a determined set of groups of bits each characterized by a said characteristic number belonging to the said set of accepted characteristic numbers generate said remain combinations; all the rest of groups each characterized by one of all other characteristic numbers in the said set of accepted characteristic numbers that are not used by the said determined set of groups, are using said generated remain combinations; wherein the characteristic numbers used by the said groups that generate remain combinations are small numbers; wherein the characteristic numbers used by the said groups that use the remain combinations are large numbers; wherein any of the said large numbers are larger than any of the said small numbers and the said small numbers and the said large numbers together represent the said full set of accepted characteristic numbers; wherein the number of all possible binary combinations of bits generated by one remain combination when this remain combination is used depends on the difference between the characteristic number of the group using the remain combination and the characteristic number of the group generating the remain combination.
13. The algorithm of claim 11 further comprising: a group of last bits in the said string; wherein the size of said group of last bits depends on the size of said maximum characteristic number and on the number of bits in said group of preferred bit pattern; wherein said group of last bits may contain one or more groups of said preferred pattern, or of lower grade, or of higher grade, or of special grade; and wherein said group of last bits is processed differently than the rest of the bits in the said string of bits.
14. The algorithm of claim 12 further comprising: an identifier that describes either a said acceptable binary combination or a said remain combination; a specially assigned bit to specify if the first bit in the said string of bits following the bits represented by said acceptable binary combination or said remain combination is of the same type or of the opposite type as the last bit from the combination represented by the said acceptable binary combination or said remain combination.
15. The algorithm of claim 12 further comprising: an identifier that immediately precedes the said orderly possible binary combination that describes either a said acceptable binary combination or a said remain combination; a specially assigned bit placed immediately after said orderly possible binary combination wherein the said specially assigned bit will specify if the first bit in the said string of bits following the bits represented by said acceptable binary combination or said remain combination is of the same type or of the opposite type as the last bit from the combination represented by the said acceptable binary combination or said remain combination; wherein the total number of such identifiers is limited; wherein the size, or number of bits of such identifiers, is such that the number of bits of such identifiers plus the number of bits of said orderly possible binary combination plus the said specially assigned bit is smaller of equal to the number of bits of the said characteristic number plus the number of bits in the said preferred bit pattern.
16. The algorithm of claim 13 further comprising: an id-fier that is used to describe the said first group or said third group of bits; wherein the total number of such id-fiers is limited; and wherein the number of bits of said id-fier depends on the number of bits in said preferred bit pattern or said higher grade group such that the number of bits in said id-fier is smaller or equal to the number of bits for the smallest content of said group 1 or group 3.
17. The algorithm of claim 15, further comprising: a group of last bits in the said string; wherein the size of said group of last bits depends on the size of said maximum characteristic number and on the number of bits in said group of preferred bit pattern; wherein said group of last bits may contain one or more groups of said preferred pattern, or of lower grade, or of higher grade, or of special grade; and wherein said group of last bits is treated differently than the rest of the bits in the said string of bits; an id-fier that is used to describe the said first group or said third group of bits; wherein the total number of such id-fiers is limited; and wherein the number of bits of said id-fier depends on the number of bits in said preferred bit pattern or said higher grade such that the number of bits in said id-fier is smaller or equal to the number of bits for the smallest content of said group 1 or group 3; wherein a combination of multiple groups described by said identifiers and id-fiers only, by said identifiers, id-Piers, together with said orderly possible binary combinations, and by said group of last bits, can describe any of said string of bits; wherein said identifiers and id-fiers create a common pool of ids; wherein said ids are limited, and wherein said orderly possible binary combinations correspond to all acceptable characteristic numbers; and wherein by describing such any said string, a data output characterized by a compression gain is obtained.
18. The algorithm of claim 17, further comprising: a combination of a reduced set of groups out of said combination of multiple groups which exist within a segment of said string of input bits; wherein the said reduced set refers to either a reduced set of said ids, or a reduced set of said characteristic numbers, or both; and wherein said reduced set of groups can be consecutive or within a limited distance from each other.
19. The algorithm of claim 18, further comprising: a limited number of binary words; wherein said binary words are formed by comprising ids only, or ids and select bits that are part of said orderly possible binary combinations; wherein said binary words are formed such that each of said binary words represent the least number of bits that are common to a collection of said ids only or said ids followed by said orderly possible binary combinations that describe only one of said characteristic numbers, and such that in order to fully describe all possible binary combinations of every individual characteristic number, the least number of said binary words are necessary.
20. The algorithm at claim 19 where any of said binary words is transformed in an equivalent form to the original binary word form, and where the transformation is done for such objectives as to improve the algorithm performances.
21. The algorithm of claim 19, wherein: any one of said multiple groups is partitioned in two parts, wherein one part consists of the applicable of said binary words, and the second part is the remainder after the said applicable binary word is removed; and the part containing the said applicable binary word partition and where the part containing the said remainder partition are processed separately.
22. The algorithm of claim 21 wherein the said part containing the said applicable binary word partition is processed by combining two or more of said binary words.
23. The algorithm of claim 21 wherein: the said part containing the said applicable binary word partition is processed by pairing every two consecutive said binary words and by assigning a new optimized binary combination to each pair; and said new optimized binary combination has optimized properties, such as it is unique and it has minimum number of bits, where said minimum number of bits is smaller or equal to the number of bits in the two said binary words creating the pair.
24. The algorithm at claim 17 wherein the said string of bits is described in term of change in-between two consecutive bits, and then in term of groups of alternating same type bits with the first bit in the said string of bits being used as reference to determine the bit type in every of said consecutive groups of said alternating same type bits.
25. The algorithm at claim 17, wherein said string of bits is partitioned in multiple segments, and where multiple parallel devices are used where each device is processing one of said multiple segments, and where the output of each of said multiple devices is merged in the same order in which it was partitioned, and where the resulting merged output represents the total algorithm output.
26. The algorithm of claim 17 wherein the said string to be processed is pre-processed to determine the optimal way this processing can be done to achieve optimal performances, and where said pre-processing can include any optimizations, such as in re-assigning said orderly possible binary combinations or said identifiers or said alternating same type bits, or such as adjusting the size of the said preferred bit pattern.
27. The algorithm of claim 11 wherein the said remain combinations are reassigned to binary combinations of different properties, where such different properties may be either of less number of bits, or properties that eliminate a specific undesired bit pattern.
28. The algorithm at claim 18 wherein the said reduced set of ids or employed said possible binary combinations corresponding to the said reduced set of characteristic numbers are changed, including reassigned or adjusted in size in order to achieve optimal performances.
29. A data compression decompression method wherein a binary segment of limited and variable length is processed in hardware or software based on the configuration and content of that segment only.
30. The method of claim 29 wherein the said binary segment is described in term of groups of alternating same type bits, wherein said segment comprising a group of preferred bit pattern and one or more groups of lower grade, or comprising a group of preferred bit pattern or of higher grade, or comprising a fixed length group of several said lower grade groups, and wherein said segment is converted in a unique, optimized, equivalent output structure comprising an identifier that may be associated to an orderly possible binary combination.
31. The method of claim 29 wherein the said binary segment is first described in term of alternating same type bits; wherein said first described segment comprising a group of preferred pattern and one or more groups of lower grade, or comprising a group of preferred pattern or of higher grade; and wherein said first described segment is converted in a first unique, optimized, equivalent output structure comprising an identifier that may be associated to an orderly possible binary combination; wherein same said binary segment is second described in term of change in-between two consecutive bits which is then described in term of alternating same type bits; wherein said second described segment comprising a group of preferred pattern and one or more groups of lower grade, or comprising a group of preferred pattern or of higher grade; wherein said second described segment is converted in a second unique, optimized, equivalent output structure comprising an identifier that may be associated to an orderly possible binary combination; and wherein the said first unique, optimized, equivalent output structure is compared to the said second unique, optimized, equivalent output structure, and either the said first or the said second output structure is chosen as the final output based on specific determined criteria.
32. The method of claim 30 wherein specific and representative parts of two consecutive unique, optimized, equivalent output structures corresponding to two consecutive of said binary segments are paired to generate new optimized binary combinations.
33. The method of claim 32, wherein an input string comprising of multiple of said binary segments is processed.
34. The method of claim 33, wherein customizable settings are used to optimize and reconfigure the hardware or software implementation options, resulting in optimized compression/decompression performances such as gain, hardware complexity, or execution speed, to optimally match an application performance targets.
35. The method of claim 33, wherein the processing is repeated in a loop of multiple processing cycles when the output of current processing cycle is used as input for the next processing cycle, and where this repeated loop processing is tracked by specific constructs such as counters.
36. A data compression decompression method wherein an input string is partitioned in processing strings of such length, or such limited number of bits, that enables each of these partitioned processing strings to be processed in hardware in one step in order to achieve compression gain or decompression restoration.
37. The method at claim 36 where said processing is performed based on the content of each of said processing strings only.
38. The method at claim 36 where every of the said processing strings retrieves a compressed output that uniquely corresponds to each of said every processing string, or where a compressed output retrieves a unique decompressed said processing string.
39. The method at claim 36 where the partitioning of said input string occurs every time a group of preferred bit pattern is detected, and where the bits in-between any two consecutive such detections have specially formulated properties, and where a maximum size of a partition is set when such detection does not occur, and where the bits in the said maximum size partition have specially formulated properties.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0065] Embodiments will be described, by way of example, with reference to the drawings, in which
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
DETAILED DESCRIPTION OF THE INVENTION
[0082] At the outset it should be noted that the examples presented in the disclosure are in no way limiting, and the skilled person will appreciate that the disclosure is equally applicable to multiple variations and alternatives, and multiple optimizations are possible to increase the performance metrics of the algorithm, such as the algorithm compression/decompression gain or the execution speed of one compression/decompression cycle.
[0083]
[0088]
[0104]
[0121]
[0122] In order to properly introduce these concepts, the focus shifts on describing the four input PS in the string 300 in term of FB and AB. The string 300 can also be described with the RB transformation, leading to a completely different outcome, but as mentioned, the key differentiating aspects coming from using the RB transformation will be detailed at the appropriate time in this disclosure.
[0123] The description, or representation of the four input PS from string 300 in term of FB and AB, is introduced using
[0130] Adding all groups of lower grade in a core, results in a unique number representing that core. This number is called in this disclosure the core bit sum, and is referred to as Sum, or core characteristic number. To exemplify, the three Sum numbers for the three full PS, are: [0131] a) For PS_1 (410), Sum is 1 plus 2 plus 2 plus 3->Sum=8 [0132] b) For PS_2 (420), Sum is 2 plus 1 plus 1 plus 3 plus 1->Sum=8 [0133] c) For PS_3 (430) Sum is 1->Sum=1
[0134] The three full PS are classified, for the purpose of use in one or more embodiments in this disclosure, as “Sum_4” (or Sum_delimiter). In string 300 therefore, there are three PS, classified as 8_4, 8_4, 1_4.
[0135] There is always one single termination PS per IFDS, and this termination PS has a different format than the full PS, format which will be detailed at the appropriate time in this disclosure. For string 300, this termination PS (PS_4) consists in one bit.
[0136] In conclusion with regard to
[0137] It was mentioned that a link bit always follows a delimiter with one exception. This exception is detailed next, with regard to
[0138] After a delimiter, any number of same type bits can follow, one, two, three, four, five, six, and greater. If this number is smaller than four, then these bits will be part of the core of the next PS, and the three full PS examples with regard to string 300 in
[0139] In
[0147] The exemplification of this exception PS rule is shown in
[0160] Concluding, there are three types of input PS used in this algorithm—full PS, exception PS, and termination PS. [0161] The full PS format consists of core, delimiter, and link bit, where the delimiter is always four same type bits. [0162] The exception PS format consist of delimiter only, where the delimiter is of any number of same type bits greater or equal to four. An exception PS occurs immediately after a full PS and can also be the first input PS in an IFDS. [0163] A termination PS always occurs at the end of an IFDS. The format of a termination PS is discussed at the appropriate time, later in the disclosure.
[0164] It is time now to start focusing on introducing the necessary concepts to describe the compressed output of the algorithm, as used in one or more of the embodiments.
[0165] The first concept to focus on from this point of view is the output structure that corresponds to the core part of a full PS, output structure that has key implications on how the output of the algorithm will look like.
[0166] As discussed: [0167] The core part of a full PS consists of multiple groups of same type bit, where these multiple groups are of 1, 2, or 3 same type bits only, i.e. multiple groups of lower grade. [0168] Adding the bits in all these multiple groups results in a unique number characteristic for that respective input PS—this unique number is called Sum or characteristic number. [0169] The full PS is called to be of class “Sum 4”. Since Sum characterizes the core content, Sum makes sense only for a full PS, and Sum is always greater or equal to 1. [0170] It is fundamental for this disclosure that the full PS, therefore the core, is described in term of AB.
[0171] Given all the above,
[0175] Continuing, as shown in
[0183] These configurations that are acceptable as part of a core, must be uniquely described for use in the algorithm output, using the minimum amount of bits. In other words, every unique input core acceptable configuration (ICAC) must have a unique output description configuration (ODC), where this ODC has minimum bits. Sum4 and Sum5 will be used to exemplify this correspondence, and
[0191] Next, note in
[0196] As mentioned above, there is one remain ODC (RODC) for Sum4, and there are three RODC for Sum5. The higher the Sum, the more RODC. In
[0197] Important to note at this point is that Sum that is greater or equal than the number of bits in the delimiter feature RODC, while Sum that is smaller than the number of bits in the delimiter, do not feature RODC.
[0198] With respect to
[0204] Note in
[0214] Now that the core of a full PS is described in term of how it will look like in the output of the algorithm (ICAC will be ODC), it is time to introduce and discuss the entire output format corresponding to a full PS. [0215] The format for a full PS has been shown to be core (ICAC), followed by 4 bit delimiter, followed by link. [0216] The corresponding output format of this full PS will be introduced directly here, as identifier, followed by ODC, followed by link.
[0218] In order to explain this newly introduced identifier concept,
[0222] Note that above, choices that are not properly introduced or explained are made. These choices refer to the identifier length of four bits, and the identifier absolute value of 0100. These choices are explained next. [0223] As mentioned, the full PS 1101 has eight actual data bits. [0224] For a compression gain equal to zero, the number of input data bits must be equal to the number of output data bits. [0225] In the output data bits, the link bit is an integral part of the output, because if the link bit would not exist in the output, the actual value of the bits following a delimited would not be possible to be restored during decompression. The link bit specified in the format of the input full PS, is in fact determined for the output use. [0226] Therefore, in the output, for the considered example, ODC plus the link bit account for four bits already. Accordingly, for compression gain to be equal to zero, the choice for the identifier must be four bits in length. [0227] Another way to look at the choice of size 4 for the identifier, is that the identifier size is equal to the delimiter size (input-output correspondence therefore is that the delimiter size in the input equals to the identifier size in the output, and core (Sum) size in the input equals to (Sum-1) ODC size plus link bit in the output).
[0228] Therefore, the motivation of choosing the identifier length at four bits is clear. In order to explain the chosen absolute value, reference is being made to
[0234] At this point, based on the information included in the algorithm output, the full PS can be fully and uniquely restored during decompression (0100 identifier means a 4_4 class full input PS, where the core is 010 (i.e. a 121 AB format), and where the link to the next PS is 0 (i.e., there is one or more bits after this PS that are the same type as the delimiter). With the FB (not shown), the actual value of every bit in the input PS is also restored.
[0235] The core identifiers listed in
[0236] As mentioned above, classes 4_4 to 14_4 generate RODC configurations that are used by the algorithm for coverage of other constructs and for compression gain. The next focus of this disclosure is to explain how these RODC configurations are used for the stated objective. In order to address this fundamental aspect, a close-up look at a real life data string is needed. [0237] Such a real-life data string will have same type bit groups of 4, 5 and greater and 1_4 to 14_4 class input full PS, all covered, as described in
[0240] Before describing how RODC are used to cover the above stated constructs, a closer look at the Remain configurations is required. [0241] According to
[0243] On the same lines, an important point to make is with regard to class groups where the Remain number is the same order of magnitude as the Need number. [0244] For example, if class 11_4 is considered (where Remain=520 and Need=504), it can be derived that this class makes available a five bit configuration together with an eleven bit configuration. [0245] The five bit configuration for example is derived and shown to be 10111. [0246] Of course, rather than having made available a five bit and an eleven bit configuration, it may be preferred to use 520 14 bit configurations (and that is perfectly equivalent). [0247] The point made here is that such classes, where the remain number is in the same order of magnitude as the need number, provide a wide range of RODC. Similar discussions can be made for all available configurations, of all classes.
[0248] To understand how these Remain configurations are used to achieve the above stated objective, the configuration made available by class 4_4 will be used for the discussion. The reason why this is chosen is because this is the most straightforward example and easiest to follow. This discussion then can be similarly extended to any and all Remain configurations.
[0249] The Remain configuration made available by class 4_4 is, as derived above, 0100111, a seven bit configuration. [0250] This seven bit configuration of class 4_4 can cover 2 configurations of class 5_4, 4 configurations of class 6_4, etc.—for every consecutive class increase a factor of 2 increase in coverage is noted. [0251] For example, the named class 4_4 Remain configuration has the power to cover 2.sup.10 (1024) configurations in class 14_4, or 2.sup.16 (65536) configurations in class 20_4. This number, 65536, is called 20_4 Worth of one 4_4 Remain configuration. [0252] In
[0253] Besides the Worth factor of a Remain configuration into a higher class, just defined above, another very important aspect must be noted with regard to the potential use of these Remain configurations. [0254] This aspect is regarding the rate increase of a Remain configuration into a higher class, versus the rate increase of the Need configurations into that same respective higher class. [0255] This aspect has been already discussed above, and it was shown that the rate increase of one Remain configuration is two between two consecutive classes, while the rate increase for Need is notably less than two between the same two consecutive classes. [0256] This is yet another fundamental aspect for one or more of the embodiments developed in this disclosure. In order to benefit the most from this rate increase difference, high order classes must be exploited. [0257] In order to fundament this statement, an example is considered. The class 4_4 remain configurations has a worth rate increase factor into class 14_4 of 2.sup.10 (1024), and a Need rate increase factor into the same class of 3136/7 (448), in other words, over ten classes, the power of one Remain configuration increases by a net factor of more than 2 over the Need.
[0258] These two considerations (the worth factor of Remain configurations, and the rate increase differences between Remain and Need) represent the basis of usage of Remain configurations coming from classes 4_4 to 14_4, in order to achieve the goals of full coverage of an arbitrary input data string (i.e. coverage of any x_4 class and coverage of an open string).
[0259] For the delimiter size and all other factors exemplified in this disclosure, the worth factor and the rate increase differences accumulate sufficient power to achieve the goals of full coverage of an arbitrary input data string, all as defined above, while providing notable compression gain, when classes are considered up to and including 28_4. At that point for example, just the single class 4_4 RODC configuration will have the power to cover all class 28_4 configurations (which are in the order of 14 million). In
[0260] It must be noted that a person skilled in the art can improve the algorithm performances by covering up to higher order classes. The objective of the disclosure is achieved as presented here, allowing persons skilled in the art to develop versions and implementations to achieve multiple optimizations.
[0261] Returning to
[0266] As defined, Exception PS (1201) do not need a link bit. The full PS classes at 1202 require a link bit. While necessary and highly beneficial, the link bit does introduce a disadvantage since it reduces by a factor of 2 the configurations that can be covered by that respective output class. For example, the 4_4 class has a four bit identifier, a three bit ODC, and a one bit link. If the link bit would be eliminated in some way (such that the link bit is not required anymore), the ODC would be four bit in length resulting in the class covering 16 configurations instead of just 8, This is the motivation why 28 in 1203 (the open string) is broken down in class 25_3, class 26_2 and class 27_1, and why 1204 in
[0270] Finally, to conclude the above discussion, note that at this point, with the classes outlined in
[0271] It is time to introduce the format of a termination PS: [0272] As described, the largest class, in term of bit length, is class 28_4. That is a 32 bit PS. [0273] A termination PS is defined as a string of data, 31 bits or less in length, representing the last bits before the end of an IFDS. [0274] In other words, the last 31 bits or less of an IFDS are not processed through the normal algorithm procedure as described above, no matter what these last 31 bits or less contain, including if these bits contain a full PS of any class that fits in the 31 bit space, an open string (a 28), or 31 bits of same type (an exception PS). The following need to be clarified: [0275] a. If any class, as specified in
[0277] This different processing named above at b. is introduced with respect to
[0282] This concludes the basic implementation of the algorithm, with the following performance highlights from the compression gain point of view: [0283] Linear monotonic gain for exception PS of ten and more same type bit groups. Linear monotonic gain means gain 1 for 10 same type bits, gain 2 for 11 same type bits, . . . , gain 90 for 100 same time bits, and so on. [0284] Gain 1 for about 6% of all ICAC (Need) configurations of classes 15_4 to 28_4 and all classes at 1203 and 1204 in
[0288] The following immediate extensions of the basic implementation of the algorithm can be made. There are four extensions detailed below. All extensions do not interfere with the basic compression algorithm just presented. These extensions simply use the output of the basic compression algorithm to obtain additional gain. [0289] 1. When an open string occurs (a class 28), clearly a delimiter or an exception PS (a group of four or greater same type bits) cannot occur immediately following the open string, otherwise a 28_4 class would have been considered. That means that the identifiers used for the exception PS (identifier 0000 and 1111) can be redistributed for use across all other classes and constructs shown in
[0300] Extensions 2, 3, and 4 are mutually exclusive in this order.
[0301] A key final improvement within this disclosure is introduced next, improvement that is used for one or more additional embodiments of this disclosure. This improvement insures further additional compression gain. The improvement uses un-altered output of the basic implementation of the algorithm and of any of the extensions one to four presented above, so, this improvement does not interfere or alter in any way the process presented up to now, including the basic processing and all the four extensions. This improvement is simply an addition to the algorithm, to insure additional gain.
[0302] This improvement works as follows.
[0303] To introduce and describe this improvement very clearly, two new concepts are being introduced. [0304] The first concept is the concept of well defined identifier, and the second concept is the concept of root identifier. [0305] In order to understand the concepts of “well defined identifiers” and “root identifiers”, reference to
[0306] With respect to
[0312] Therefore, the concept of root identifier is introduced: [0313] A root identifier represents the minimum common root to describe a group of well defined identifiers within a single class only.
[0314] To understand this definition for the root identifier, class 4_4 is being referred to again. [0315] As described above, class 4_4 consists of seven well defined identifiers. According to
[0318] For class 28_4, since all ICAC of this class are covered by the remain configuration of class 4_4, the root identifier is directly this remain configuration. Therefore, the root identifier for class 28_4 is 0100111 (1409 in
[0319] Determining the root identifiers for all classes of this algorithm (all classes outlined in
[0320] In
[0328] The root identifier transformation from column 1503 to column 1504 needs to be clarified: [0329] Taking class 4_4 of full PS as an example again, where there were three root identifiers, one of each 5 bits, 6 bits, and 7 bits (see above) i.e. one each of class 5, 6, and 7, the above transformation means nothing else but to say that class 4_4 has one root identifier of class 6 and five root identifiers of class 7, or seven root identifiers of class 7, which both are perfectly equivalent to the initial class 5, 6, 7 distribution. [0330] It can be seen that any transformation of root identifiers can be made, as long as it makes sense practically, meaning that one cannot transform class 7 root identifiers from PS class 4_4 in class 8 or larger because there are not so many bits in an original class 4_4 PS, so, such transformation has no practical basis, it does not make sense. [0331] Many optimizations to achieve improved algorithm performances can be made by a person skilled in the art, and again, the examples offered in this disclosure have the goal to fundament the disclosure and provides the means for a person skilled in the art, and these examples do not limit in any way the disclosure.
[0332] These root identifiers, as mentioned, are found already in the output of the algorithm, after the basic and extensions processing.
[0341] When an extension identifier occurs (such as an open string 28 for extension 1, or the 11 bit identifiers for extensions 2, 3, or 4) this full extension identifier is placed in the 1613 part of the output, while all the output data for the constituent PS of the exception (such 25 PS for extension 4) are written in 1623.
[0342] Now, with 1623, nothing is done further for this improvement. The additional processing that is being done further for this improvement is for 1613 (the root identifiers partition). From the point of view of the above named processing of the root identifiers, there are several steps, as follows: [0343] 1. The root identifiers are paired, as shown by 1614, 1615, 1616. [0344] 2. The formed pairs are described based on their classes. For example, RI1 is of class 4 (four bits root identifier), and RI5 is of class 5 (five bits). According to column 1504, there are 32 combinations (4 times 8) to describe the class4-class5 pair. [0345] a, That is exactly what this improvement is doing, for every possible pair (there are 13 classes times 13 classes, therefore 169 possible pairs, each pair with a different number of combinations according to column 1504 in
[0346] In term of implementation details of step 2 above, there are two tiers: [0347] i. Tier 1: Combinations of two of root identifiers of class 4, 5, and 6 [0348] The output of such combinations has the format shown in
[0358] The additional compression gain resulting from implementing this improvement is as follows: [0359] a. For tier 1, gain of one is noted for about 6% of all combinations [0360] b. For tier 2-1, gain of 1 is noted for about 23% of all combinations [0361] c. For tier 2-2, gain of 1 is noted for 100% of all combinations where the sum of the two classes of root identifiers is greater or equal to 21, and no gain for the rest of combinations in this tier (sum of the two classes between 11 and 20)
[0362] The performance in term of this additional compression gain is significant, and some remarks and comments are required: [0363] i. This additional compression gain is on top of the compression gain obtained from the basic algorithm and extensions [0364] ii. Looking at the distribution of root identifiers over the PS classes, it is noted that the lower the number of bits in the root identifier, the more ICAC combinations are being covered in a PS class. [0365] That is also intuitive to see from the examples that have been discussed above. For example, it has been shown that for full PS class 4_4, the root identifier of class 5 (5 bits long) covers 4 out of 7 ICAC (i.e. 57% of all ODC of class 44), the root identifier of class 6 covers 2 out of 7 ICAC (i.e. 29% of all ODC of class 4_4), and the root identifier of class 7 covers 1 out of 7 ICAC (i.e. 14% of all ODC of class 4_4). Also, it was shown that a class 7 root identifier covers the entire class 28_4. [0366] iii. Based on (ii.) above, the lower the sum of the two classes of root identifiers in the combinations, the higher the weight in term of coverage of PS in the algorithm, [0367] Accordingly, tier 1 combinations have a significant weight in term of coverage, while tier 2_2 has a notably lower comparative coverage (for example, the 16*16 root identifier combinations account for less than 0.2% of total PS in the algorithm). [0368] iv. Another important observation is that this improvement strongly complements the performance of the basic algorithm. [0369] In the basic algorithm (without any of the four extensions) classes 1_4 to 14_4 had zero gain, while this improvement provides notable gain exactly for those classes since those classes have lower root identifier classes.
[0370] As a final note concerning the algorithm, the differentiating aspects are outlined when the RB (relative bit) transformation is employed. As outlined when the RB transformation has been defined and exemplified in connection with
[0371] Clearly, the classic approach when there are two different representations of the same input data is to perform a data analysis and determine which of the two representations generate better results. This would be one approach here as well. However, the current implementation of the algorithm does not feature any data analysis and this stance is extremely beneficial for hardware implementation, as it has been discussed and will be further shown next in the hardware section. Therefore, in order to benefit from having two very different representations of the input data and therefore possibly two very different outcomes in term of compression gain, while preserving this great benefit of requiring no data analysis, the following solution is outlined: [0372] a. Consider the two formats for the input data: [0373] i. The regular IFDS, and [0374] ii. The IFDS obtained after the RB transformation [0375] b. Run the two inputs through the algorithm. This can be done through two parallel chains of similar hardware (implementation 1), or by running the algorithm once for each data set (implementation 2). While theoretically, the first solution would suggest that double the hardware is necessary, and the second solution would suggest that the execution time will be double, none is completely true, since the first solution can be additionally pipelined and/or only certain sections parallelized, while the second solution can similarly benefit from pipelining. In any case, a hardware penalty will be predominantly visible for implementation 1, and an execution time penalty will be predominantly visible for implementation 2, but for neither will be double. [0376] c. Once the two algorithm outputs are generated, the output with the highest compression gain will be chosen.
[0377] An average (considered conservative) estimation of the gain achieved by the entire algorithm (basic, plus extensions, plus improvement), is as follows: [0378] When everything is considered, i.e. the basic algorithm, the four extensions, and the improvement, all as described, about 10% of all PS feature a gain of 1 bit. [0379] The average length of a PS is considered to be 20 bits (a PS can be between 4 and 32 bits in length) [0380] A uniform distribution of PS and classes is also considered [0381] All this, results in a gain of 1 bit at every 200 bits.
[0382] In the above estimation, the distribution of PS and classes in an IFDS varies greatly. Since the gain coverage over all classes is fairly uniform in the algorithm, an IFDS containing low number classes predominantly may feature a higher gain than the above estimated average, and this is just one variable that gives a conservative flavour for the above estimation.
[0383] As mentioned in the summary section, the compression process can be repeated using as new IFDS the output of the just completed cycle. Unlimited number of cycles can be processed for unlimited compression, the only theoretical limit being that an IFDS cannot be smaller than a practical limit of about 1000 bits. When implementing such a multi-cycle compression, the only addition is a header where a counter keeps track of the number of executed cycles, so that the decompression is processed accordingly. The above quoted number of 1000 bits results from the extensions primarily, from the headers (such as the mentioned counter), and of course the need to provide statistical variability of PS and classes. This is a theoretical limit, practically, having IFDS smaller than 1M order is not really justified. The decompression is perfectly mirrored to the compression process, leading to an identical restored file to the initial IFDS, which was the input to the first cycle.
[0384] In the remaining of this disclosure, the suggested hardware implementation is described. Similar to the algorithm, it should be noted that the suggested hardware implementation is by example, and is in no way limiting, and the skilled person will appreciate that the disclosed implementation is equally applicable to multiple variations and alternatives, and multiple optimizations are possible to increase the performance metrics of the hardware, such as the execution speed or the hardware complexity. It must also be noted that the suggested hardware implementation is described at a high level, outlining key steps, therefore the low level details, which are not relevant for the substance and objective of this disclosure, may be not addressed.
[0385] The suggested hardware implementation architecture is a memory intensive pipelined architecture, as depicted in
[0425] Note the fully serial flow of data within the chip—the ideal case for a pipelined implementation. In fact, as presented above, the flow of data is already pipelined (registers are present at every block—1802, 1804, 1806 and all on that level, 1811 and all on that level, 1816, 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825.
[0426] As mentioned at the start of this hardware section, this suggested hardware implementation architecture is a memory intensive pipelined architecture. The pipelined part of the proposed architecture is clarified. The memory intensive part refers to the fact that major blocks are represented by memory (such as 1806, 1807, 1808, 1809, 1810, 1816, 1820, and 1825). This memory is of two types: [0427] Functional memory, required by the algorithm to operate. This memory is represented by blocks 1806, 1807, 1808, 1809, 1810, 1816b, and 1820. The total size of this memory is about 210M of 32 bit. [0428] Data memory, require to write algorithm output. This memory is represented by block 1825. As mentioned, the size of this memory depends on the application, with a suggested size of 100M of 64 bit.
[0429] The chip memory internal requirements seem large for today's capabilities, but doable. High speed internal and external memory architectures are to be employed for the implementation—this is not the object of this disclosure.
[0430] Obviously, all this memory can be external to the chip. [0431] In this version of chip implementation, where most of the memory is external, the chip architecture is altered by means of additional external data and address busses (one or more, depending on the speed of data throughput desired from the chip). [0432] Several versions are possible, that are apparent to a person skilled in digital design. By having external memory, there will be a penalty in the chip data throughput, in the best case of just an increased memory access time, and in the worst case of multiplexing and serializing the access to the output memory. [0433] Two extreme examples are provided as two implementation options of having the memory external to the chip: [0434] Example 1 is when all external memory banks have their own dedicated busses. [0435] Blocks 1806 and 1807. These will require an extra 33 bit address bus and an extra 32 bit data bus. [0436] Block 1816b. This will require an extra 32 bit address bus and an extra 32 bit data bus [0437] Block 1825. The data bus already exists for this (1827 and 1801), and an address bus is not needed, because data is written or read serially in a LIFO fashion. Only a controller is needed for this. [0438] This implementation will require a chip package of about 250 pins. The actual remaining silicon chip is fairly small (just the controllers and some small memory), and is I/O dominated. In term of throughput, this is not affected as compared to the initial proposed all-included architecture, rather than an increased memory access time. [0439] Example 2 is when the memory is in two banks, one functional and one output, and the access to the functional memory is all multiplexed. [0440] Blocks 1806, 1807, and 1816b will require a 34 bit address and 32 bit data bus. [0441] This arrangement will produce a savings in pins of about 70 pins, so the package pin requirement is dropped to about 180 pins, but the throughput is affected, since the memory access to perform the functions for 1806/1807 and 1816b need to be serialized.
[0442] As an estimated speed performance of the chip, or data throughput, considering a 2 GHz clock, for the chip version with the internal memory: it is estimated that ten clock cycles are needed to process one PS. Since, as outlined above at estimating the compression performance of the algorithm, the average length of one PS is considered to be 20 bits and one bit compression is achieved at every 200 bits, it means that in term of chip speed, one bit compression is obtained at about 100 clock cycles, or, for the considered 2 GHz clock, the chip performance is 20M bits per second gain. To compress a 20G file, given as an example, into a 1M file, as a result of a multi-cycle compression, takes about 200 cycles and about 1000 seconds.
[0443] For decompression, a different chip is required. The controllers are different, and the memory content for the functional memory is different.
[0444] a. 1901 represents the input data bus where the compressed data is received. This data bus is preferred to be 64 bits, because the compressed data is written in 64 bit format (see the compression chip above) but 32 bits is acceptable. [0445] b. 1902 is a small data buffer. The goal of this buffer is to stream the flow of data in case an input delay in receiving data occurs. The second goal of this buffer is to transform the data in a 64 bit data in case 1901 is of 32 bit. [0446] c. 1903 is the new 64 bit data bus [0447] d. 1904 is a controller, implementing the following attributions: [0448] i. Extract the FB [0449] ii. Extract the headers, such as the counter indicating how many cycles occurred in generating the compressed data. The same number of cycles will be replicated during decompression. [0450] iii. Transform the 64 bit format of the data into 192 bit wide words. [0451] e. 1905 is a controller that disassembles the 192 bit words in words that represent processed root identifier pairs and the details of the two root identifiers that have been processed in pair. Since the processed root identifier pair has a wide range of bit length (from 8 to 32 bit in length) and the details part corresponding to each root identifier has also a wide variation (from zero bits to 32 bits, and more for the extensions 2, 3, 4), the complexity of this controller is notable, and to generate the disassembled output can be slow. To resolve this problem and to seamlessly generate the disassembled output fast, the following solution is proposed for this controller: [0452] i. Controller takes 96 bits from the received 192 bit words, obtaining 96 bit words. [0453] ii. Controller takes the first 32 bits of this 96 bit word. Clearly this 32 bit word will contain for sure the entire processed root identifier pair, but it may contain also full or part of the details of the two root identifiers, and more. All that is important is that it contains the processed root identifiers and these are the first bits in this string, from the most significant bit. [0454] iii. Controller reverses this 32 bit word, i.e. the most significant bit becomes the least significant bit. [0455] iv. This reversed 32 bit word becomes an address to a memory. The address will be 32 hit long, but the memory content will recognize just the addresses that are relevant. For example, if the address contains an eight bit word (starting from the least significant bit) that is relevant to represent a processed root identifier pair, since the eight bit word is unique. The rest of the 24 bit content in the address is ignored. In other words, a 32 bit word will generate a unique output from this memory (determining automatically which of the 32 bits are relevant for the memory, starting from the least significant bit). [0456] v. The output of the memory will indicate: [0457] i. The two unprocessed root identifiers. Since the root identifiers are up to 16 bit long, the two root identifiers will be outputted from the memory as two 16 bit words. [0458] ii. How many bits in the 32 bit address were representing the processed root identifier. Since the processed root identifier pair is up to 32 bit long, this information is provided by the memory in five bits format. This information is used by the controller to eliminate these bits from the 96 bit word and know where the detail bits for the two root identifiers start in this string. [0459] iii. How many bits each root identifier has in the details part. Since each root identifier can have up to 32 bits in the detail part (without exception 2, 3, 4), and the three exceptions must be added, this information is outputted by the memory as six bits for each root identifier. This information is used by the controller to know how many bits to consider for the details of the two root identifiers from the 96 bit word. If any of the exceptions 2, 3, 4 have been flagged as present, then the controller will consider up to nine of the next 96 bit words for the details of this root identifier pair. [0460] iv. Therefore the output of the memory will be 16 bit for the first unprocessed root identifier, plus 16 bit for the second unprocessed root identifier, plus 5 bits for the number of bits for the processed root identifier pair in the 96 bit word, plus six bits each for the number of bits in the 96 bit word (or multiple of 96 bit words in case of extensions 2, 3, 4), representing the details part corresponding to each root identifier—total 49 bits. [0461] vi. The controller will process the 49 bit output of the memory by creating the two output PS by putting together each unprocessed root identifier and the corresponding details. [0462] vii. The controlled will eliminate the bits for the processed root identifiers and the bits for the two corresponding details of the root identifiers, from the 96 bit word (or multiple of 96 bit words for extensions 2, 3, 4), and therefore prepare the next 96 bit word for the next processing. [0463] f. 1906 is the memory associated to controller 1905, as described above. The size of this memory is 32k of 49 bit words, and has a 32 bit word address. [0464] g. 1907 is a buffer of 8 (33) locations in size and 32 bit words, containing the first output PS obtained from controller 1905 and memory 1906 as described above, from the root identifiers and details. 1907 will contain the first root identifier in the pair, therefore will contain the first output PS. It is obviously very important in the decompression to keep the order of the PS. The extra 25 locations (33-8) are only used in case of extensions 2, 3, 4. [0465] h. 1908 is a buffer of 8 (33) locations in size and 32 bit words, containing the second output PS obtained from controller 1905 and memory 1906 as described above, from the root identifiers and details. 1908 will contain the second root identifier in the pair, therefore will contain the second output PS. It is obviously very important in the decompression to keep the order of the PS. The extra 25 locations (33-8) are only used in case of extensions 2, 3, 4. [0466] i. 1909 is a straight-forward controller merging the output PS from 1907 and 1908 (i.e. putting them in order, second after first). A series of memory locations containing serial output PS, is formed. Now, what is left in order to restore the IFDS, is to transform this series of output PS in input PS. [0467] j. 1910 is a controller that interprets the current PS coming from 1909. The controller is simply comparing this current PS with the identifier for extension 1 (the open string of class 28), or for extension 2, 3, 4 (the three 11 bit identifiers). If the comparison is true for extension 1, is sending the data to 1912, if the comparison is true for extension 2, is sending the data to 1913, if the comparison is true for extension 3, is sending the data to 1914, if the comparison is true for extension 4, is sending the data to 1915, and if the comparison is not true for the extensions, is sending the data to 1911. To compare the current PS with the identifier for extension 1, controller 1910 will require a memory of 15M of 32 bit words. [0468] k. 1911 is a memory that transforms output PS in the corresponding input PS for basic. The size of this memory is 68M of 32 bit words, and is the reversal of 1806. [0469] l. 1912 is a memory that transforms output PS in the corresponding input PS for extension 1. The size of this memory is 68M of 32 bit words, and is the reversal of 1807. [0470] m. 1913 is a memory that transforms output PS in the corresponding input PS for extension 2. The size of this memory is 1k of 32 bit words, and is the reversal of 1808 [0471] n. 1914 is a memory that transforms output PS in the corresponding input PS for extension 3. The size of this memory is 8k of 32 bit words, and is the reversal of 1809 [0472] o. 1915 is a memory that transforms output PS in the corresponding input PS for extension 4. The size of this memory is 2.5M of 32 bit words, and is the reversal of 1810 [0473] p. 1916 is a buffer, of 64 locations of 32 bits, which takes the data as is generated from either of 1911, 1912, 1913, 1914, and 1915. [0474] q. 1917 is a controller that uses the FB and headers extracted by 1902. If the headers indicate that this is the last decompression cycle, the real bit value in the PS from 1916 is restored. If this is not the last cycle, the counter for the number of cycles is decremented, and after the current decompression cycle completes, a new cycle is restarted. [0475] r. 1918 is the memory data in which the uncompressed data is written. Similar considerations are outlined as for the 1825 data memory in the proposed compression chip. A 200M of 32 bit words size is suggested, for the same target of about 6 Gbit uncompressed final output, with a similar discussion for an off-chip data memory. [0476] s. 1919 is the internal 64 bit data bus going back to block 1902, this data bus being used during the multi-cycle decompression, as described. Similarly as for the compression chip, this internal bus is primarily motivated when the data memory is internal. [0477] t. 1920 is the 32 bit data bus for chip output, to out the uncompressed data for external uses, or for the external data memory when such used. [0478] a. Finally, 1921, the border, signifies the entire chip
[0479] Similar to the compression chip, the decompression chip features a fully serial flow of data within the chip as well—the ideal case for a pipelined implementation. For the decompression chip, similarly as for the compression chip, the flow of data is already pipelined. Also similarly, the decompression chip features functional memory and data memory. The functional memory is about 160M of 32 bit words (smaller than the functional memory in the compression chip, since the memory that extracts the rood identifiers and details from PS is not necessary in the decompression chip). The data memory is the same size as in the compression chip. The functional memory for the decompression chip consists of 1906, 1910, 1911, 1912, 1913, 1914, and 1915, while the data memory consists of 1918.
[0480] External memories to the chip can be used for the decompression chip as well, with the same discussions and potential penalties. The functional memory that is primarily preferred to be external is 1910, 1911, 1912, and 1915, with 1910 representing one functional bank and 1911, 1912, and 1915 representing the second functional bank, where the two banks need to be serialized in an implementation solution similar to solution at example 2 described at the compression chip. Estimated performances are similar as the compression chip.
[0481] To conclude, final remarks about the full disclosure, with the goal to outline possible attractive optimizations or modifications, are provided.
[0482] The embodiments discussed in this disclosure use delimiters of size four. Consequently, in the PS core, groups of same type bit smaller than the delimiter (i.e. smaller than four, i.e. groups of 1, 2, 3) are acceptable. Briefly, the discussion here targets to outline the consequences of focusing on a size larger than four and on a size smaller than four. [0483] a. For a size larger than four: [0484] a, Consider a size five. [0485] b. Preliminary analysis shows that a size five may generate more gain, because the identifiers will be of size five, and much more remain configurations will be available. However, the main drawback is that the size of functional memory will increase to very large sizes. Accordingly, the general outline will be: [0486] i. The hardware complexity increases [0487] ii. The functional memory sizes in particular increases substantially [0488] iii. The gain once the processing is activated appropriately increases [0489] iv. The execution speed decreases since the hardware complexity and the processing is higher [0490] b. For a size smaller than four [0491] a. Consider a size three [0492] b. Preliminary analysis shows that a size three may not be possible, because identifiers of size three are required, leading to not having sufficient classes to generate sufficient remain configurations. Just for conformity, considering that size three actually works (which, again, seems it does not), the general outline would be: [0493] i. The hardware complexity decreases, including the functional memory sizes in particular which decreases substantially. External functional memory would never be justified. [0494] ii. The processing is activated with increased probability as compared to size four, however, more cases are estimated to generate gain zero. [0495] iii. The execution speed increases since the hardware complexity and the processing is lower
[0496] Concluding, changing the size has notable implications on all aspects and performances. While in some cases a larger or a smaller size can be beneficial, it is considered that for the embodiments as described in this disclosure size four is providing good trade-offs between gain, complexity, and execution speed. A person skilled in the field however can alter and re-engineer the embodiments of this disclosure to optimize the performances at a higher level for other sizing, or other types of delimiters. From the simple analysis provided above, a larger size (size five) is attractive for such investigations and optimizations.
[0497] Another possible attractive optimization or modification refers to an aspect discussed across this disclosure, respectively the RB transformation. The suggested hardware outlined in
[0498] Yet another attractive optimizations, as briefly discussed in the disclosure, is to extend to use higher order Sum (higher order core). For example, an attractive Sum would be Sum36, where the Need configurations can be described with Sum-4 number of bits, i.e. the worth factor of remain RODC configurations greatly increases, That can be an attractive pursuit, with the note that the functional memory needs increase quite substantially, making a requirement to have all the functional memory as an external memory (at least at the current level of technology). Having the functional memory as an external memory is not an impediment, other than a possibly slower speed (with proper data-busses and adjusted chip architecture, as outlined above in the hardware section), so, this optimization is notably attractive.
[0499] Finally, another attractive optimization pursuit is a multi-chip parallel architecture. In such architecture, the chip will consists of the controllers only, largely as described. Multiple such chips (for example a 32 chip parallel architecture) will access external memory banks. The applications of such parallel architectures can be for example live compression, transmission, and decompression of raw high definition lossless video content, where the transmission medium requirements are even of the lowest available standards.
[0500] The applications of the chips and chip-sets outlined in this disclosure, based on the disclosed algorithm, are countless, advancing the current state of the art in communications, high definition and hi-fi video and audio transmission including cell-phone and social media, audio/video cameras, laptops/computers, internet, data storage applications, conferencing, etc., including: [0501] Integrated in the transceiver chain of a cell-phone, for the highest quality audio/video, social media communications, and related [0502] Integrated in a laptop wireless/wireline, for the highest quality video conferencing, data storage, communication between users, and related [0503] Integrated in appropriate devices for highest quality, reliability, storage capabilities, and other performances as a function of application, for multimedia applications, internet, high definition audio/video downloads, cloud computing, IoT (Internet of Things), ADAS, GPS, and related.
[0504] From reading the present disclosure, other variations and modifications will be apparent to the skilled person. Such variations and modifications may involve equivalent and other features which are already known in the art or are implied by the embodiments presented in this disclosure. Such variations and modifications may increase the performance of the algorithm, such as improve the processing speed or the gain. For example, modifying the sizing (as outlined above), or increasing the Sum order, or implement parallel processing paths for AB versus RB representations, and others suggested or not explicitly in this disclosure, will achieve such improvements. There are countless such variations and modifications possible as covered or derived by/from the embodiments of this disclosure, and a chip implementing the object of the disclosure can be designed to be reconfigurable, since one optimization, alteration, or modification may be useful in one application and may be skipped in another application—therefore one chip is to be available with variable settings function of the application.
[0505] Although the appended claims are directed to particular combinations of features, it should be understood that the scope of the disclosure of the present invention also includes any novel feature or any novel combination of features disclosed herein either explicitly or implicitly or any generalisation thereof, whether or not it relates to the same invention as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present invention.
[0506] Features which are described in the context of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination. The applicant hereby gives notice that new claims may be formulated to such features and/or combinations of such features during the prosecution of the present application or of any further application derived therefrom.
[0507] For the sake of completeness it is also stated that the term “comprising” does not exclude other elements or steps, the term “a” or “an” does not exclude a plurality, and reference signs in the claims shall not be construed as limiting the scope of the claims.