Method and apparatus for adaptive data compression
10084477 ยท 2018-09-25
Assignee
Inventors
Cpc classification
H03M7/46
ELECTRICITY
H03M7/40
ELECTRICITY
International classification
H03M7/00
ELECTRICITY
H03M7/30
ELECTRICITY
H03M7/46
ELECTRICITY
Abstract
Adaptively compressing an input string (10) comprising a sequence of symbols in order to create a plurality of segment dictionaries D.sub.m, with the steps of: generating a lookup map (110); generating a key value segment S.sub.m,n; searching the lookup map for each symbol received in the input string (120, 130); upon detecting a symbol is not stored in the lookup map, adding the symbol by storing the symbol at a next sequential key index in the lookup map lookup map (135) and assigning a next sequential key value entry to the symbol and adding this key value to the key value segment S.sub.m,n (150); upon detecting the symbol is stored in the lookup map, adding the corresponding key value assigned to this symbol to the next sequential entry of the key value segment S.sub.m,n (150); wherein a new key value segment S.sub.m,n+1 of the lookup map is generated if the number of different symbols equals the number of available key values k=2.sup.n for the opened/current key value segment S.sub.m,n (141, 142), and where-in the lookup map is converted into a segment dictionary D.sub.m if the maximal key value size k.sub.nmax=2.sup.nmax is reached (132, 133, 134), with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size, and m being any positive integral number denoting the consecutive numbering of segment dictionaries D.sub.m.
Claims
1. A computer method for adaptively compressing an input string (10) comprising a sequence of symbols in order to create a plurality of segment dictionaries Dm, the method comprising the steps of: generating a lookup map (110); generating a key value segment Sm,n; searching the lookup map for each symbol received in the input string (120, 130); upon detecting a symbol is not stored in the lookup map, adding the symbol by storing the symbol at a next sequential key index in the lookup map (135) and assigning a next sequential key value entry to the symbol and adding this key value to the key value segment Sm,n (150); upon detecting the symbol is stored in the lookup map, adding the corresponding key value assigned to this symbol to the next sequential entry of the key value segment Sm,n (150); wherein a new key value segment Sm,n+i of the lookup map is generated if the number of different symbols equals the number of available key values k=2.sup.n for the opened/current key value segment Sm,n (141, 142), and wherein the lookup map is converted into a segment dictionary Dm if the maximal key value size knmax=2.sup.nmax is reached (132, 133, 134), with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size, and m being any positive integral number denoting the consecutive numbering of segment dictionaries Dm.
2. The method of claim 1, wherein the plurality of segment dictionaries Dm with the corresponding key value segments Sm,n form a column store database.
3. The method of claim 1, wherein the maximal key value size kn,max is predeterminable.
4. The method of claim 3, wherein the maximal key value size kn,max is predeterminable.
5. An apparatus (20) for adaptively compressing an input string (10) having a sequence of symbols, the apparatus (20) comprising a database storage (DB) for storing a plurality of segment dictionaries (Dm), a receiver (30) for receiving an input string (10), an analyzer (40) for excerpting each symbol from the sequence of symbols of the input string (10), a generator (60) for generating a lookup map (Lm) and a key value segment (Sm,n), an engine (50) for detecting whether a symbol is contained in a lookup map (Lm) , and a converter (70) to convert a lookup map (Lm) into a segment dictionary (Dm), wherein the generator (60) generates a lookup map (Lm) and a key value segment (Sm,n) once the receiver (30) receives an input string (10), and the engine (50) searches the lookup map (Lm) for each symbol received in the input string (10) excerpted by the analyzer (40), wherein upon detecting a symbol is not stored in the lookup map (Lm) , the symbol is added to the lookup map (Lm) by storing the symbol at a next sequential key index in the lookup map (Lm) and a next sequential key value entry is assigned to the symbol and added to the key value segment (Sm,n), and wherein upon detecting the symbol is already stored in the lookup map (Lm) , the corresponding key value assigned to this symbol is added to the next sequential entry of the key value segment (Sm,n) wherein the generator (60) generates a new key value segment (Sm,n+i) of the lookup map (Lm) if the number of different symbols equals the number of available key values k=2<n>for the opened new key value segment (Sm,n) and wherein the converter (70) converts the lookup map into a segment dictionary (Dm) if the maximal key value size knmax=2.sup.nmax is reached, with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size, and m being any positive integral number denoting the consecutive numbering of segment dictionaries (Dm).
6. The apparatus of claim 5, wherein the plurality of segment dictionaries (Dm) with the corresponding key value segments (Sm,n) are stored in the database storage (DB) as a column store database.
7. The apparatus of claim 6, wherein the converter (70) prompts the generator (60) to generate a new lookup map (Lm+i) upon converting the opened lookup map (Lm) into a segment dictionary (Dm).
8. A non-transitory computer readable storage medium storing one or more sequences of instructions for adaptively compressing an input string (10) comprising a sequence of symbols in order to create a plurality of segment dictionaries Dm, which instructions when executed by one or more processors cause performing: generating a lookup map (110); generating a key value segment Sm,n; searching the lookup map for each symbol received in the input string (120, 130); upon detecting a symbol is not stored in the lookup map, adding the symbol by storing the symbol at a next sequential key index in the lookup map (135) and assigning a next sequential key value entry to the symbol and adding this key value to the key value segment Sm,n (150); upon detecting the symbol is stored in the lookup map, adding the corresponding key value assigned to this symbol to the next sequential entry of the key value segment Sm,n (150); wherein a new key value segment Sm,n+i of the lookup map is generated if the number of different symbols equals the number of available key values k=2.sup.n for the opened/current key value segment Sm,n (141, 142), and wherein the lookup map is converted into a segment dictionary Dm if the maximal key value size knmax=2.sup.nmax is reached (132, 133, 134), with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size, and m being any positive integral number denoting the consecutive numbering of segment dictionaries Dm.
9. The storage medium of claim 8, further comprising sequences of instructions which when executed cause the plurality of segment dictionaries Dm with the corresponding key value segments Sm,n to form a column store database.
10. The storage medium of claim 8, wherein the maximal key value size kn,max is predeterminable.
11. The storage medium of claim 8, wherein the maximal key value size kn,max is predeterminable.
12. The storage medium of claim 8, further comprising a digital database (DB) comprising for one column a plurality of segment dictionaries Dm, m being any positive integral number denoting the consecutive numbering of segment dictionaries Dm, each segment dictionary Dm having a number of correlated key value segments Sm,n with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size of the key values, and a segment dictionary Dm having a maximum number of knmax=2.sup.nmax entries.
13. The storage medium of claim 12, further comprising sequences of instructions which when executed cause the segment dictionaries Dm and correlated key value segments Sm,n to form a column store database.
14. The apparatus of claim 5, wherein the converter (70) prompts the generator (60) to generate a new lookup map (Lm+i) upon converting the opened lookup map (Lm) into a segment dictionary (Dm).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
DETAILED DESCRIPTION
(8)
(9) In the embodiment shown in
(10)
(11) With reference to
(12) Therefore, the first character taken from the input string 10 by the analyzer 40 (cf. step 120), namely N is not to be found in the lookup map L.sub.1 by the engine and is therefore entered in the lookup map L.sub.1 at the first key index (cf. sequence 2 in
(13) Then the second character is taken from input string 10 (cf. steps 160, 120 of
(14) Then, the third character, namely N, is read (steps 160, 120 of
(15) This is repeated for the next three characters OON already contained in the lookup map L.sub.1 and resulting in the bit key entries 110 in the first key value segment S.sub.1,1 as depicted in
(16) Then a new character, namely a blank , is identified in input string 10, resulting in a new entry in the first lookup map L.sub.1 (cf. sequence 4 of
(17) Step 160 then refers back to step 120, and the following character B is also new to the lookup map L.sub.1 such that this character is entered into the fourth key index of first lookup map L.sub.1 (sequence 5 of
(18) The following character E in input string 10 (which is the fourteenth character in the string) is another character which is not yet contained in the lookup map. As this the fifth new character, it is entered in the fifth key index of the first lookup map L.sub.1 (cf. sequence 6 of
(19) This third key value segment S.sub.1,3 is the last key value segment as it provides entry of up to eight different symbols (k.sub.nmax=2.sup.3). This eighth character in the example input string 10 is the character S (which is the 24.sup.th character in this string), and it is assigned key value 7 and is the last entry in the first lookup map L.sub.1 (cf. sequences 7 to 9 in
(20) As the following two characters on positions 25 and 26 are already contained in the dictionary ( and A), there are two more entries into the third key value segment S.sub.1,3 so that the last three entries of this segment read 725.
(21) The next position in input string 10 (position number 27) contains the ninth different character, namely a -. As the maximum of eight key values is reached, the current key value segment S.sub.1,3 is finalized and the first lookup map L.sub.1 is converted into first segment dictionary D.sub.1 by the converter 70 (cf. bottom of
(22) It is to be noted that in the creation of the second lookup map L.sub.2 in the embodiment shown in
(23) As shown in the depiction of
(24)
(25) The graph of
(26) By using the method of the invention for adaptively compressing the input string comprising the sequence of temperature measurement values, a relatively small dictionary size is chosen, unlike known methods which would work with a dictionary size covering the whole span of around 14 C. to 1,000 C.
(27) As can be seen from the depiction of
(28) This approach leads to a compressed segment stream as illustrated in
(29) In the heating-up phase (first four lines in
(30) In contrast thereto, the two stable operating phases at around 825 C. and 870 C., respectively, each only have one segment dictionary with accordingly longer key value segments (i.e. key value segments with more entries) as the incoming temperature measurement readings are repeatedly the same values within a small interval (cf. lines five and ten of
(31)
(32) According to the teachings of the invention, the key value segments are filled quite fast with the sixteen available values such that the key value segments of dictionaries D.sub.1 to D.sub.4 and D.sub.6 to D.sub.9 remain relatively short. In the stable phases, the temperatures are more repetitive such that are also repetitive and accordingly longer (cf. particularly key value segments S.sub.5,4 and S.sub.10,4).
(33) As a reading example, the first temperature entry of segment dictionary D.sub.1 equaling 14 C. corresponds to the first 13 measurement values (all four entries of key value segments S.sub.1,1 and S.sub.1,2 and the first nine entries of key value segments S.sub.1,3). The sequence of the measurement values stored in segment dictionary D.sub.10 and key value segments S.sub.10,1 and S.sub.10,2 is 864 C., 866 C., 864 C., 863 C., 863 C., 870 C., 863 C., etc.
(34) The example of