Method and apparatus for adaptive data compression

10084477 ยท 2018-09-25

Assignee

Inventors

Cpc classification

International classification

Abstract

Adaptively compressing an input string (10) comprising a sequence of symbols in order to create a plurality of segment dictionaries D.sub.m, with the steps of: generating a lookup map (110); generating a key value segment S.sub.m,n; searching the lookup map for each symbol received in the input string (120, 130); upon detecting a symbol is not stored in the lookup map, adding the symbol by storing the symbol at a next sequential key index in the lookup map lookup map (135) and assigning a next sequential key value entry to the symbol and adding this key value to the key value segment S.sub.m,n (150); upon detecting the symbol is stored in the lookup map, adding the corresponding key value assigned to this symbol to the next sequential entry of the key value segment S.sub.m,n (150); wherein a new key value segment S.sub.m,n+1 of the lookup map is generated if the number of different symbols equals the number of available key values k=2.sup.n for the opened/current key value segment S.sub.m,n (141, 142), and where-in the lookup map is converted into a segment dictionary D.sub.m if the maximal key value size k.sub.nmax=2.sup.nmax is reached (132, 133, 134), with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size, and m being any positive integral number denoting the consecutive numbering of segment dictionaries D.sub.m.

Claims

1. A computer method for adaptively compressing an input string (10) comprising a sequence of symbols in order to create a plurality of segment dictionaries Dm, the method comprising the steps of: generating a lookup map (110); generating a key value segment Sm,n; searching the lookup map for each symbol received in the input string (120, 130); upon detecting a symbol is not stored in the lookup map, adding the symbol by storing the symbol at a next sequential key index in the lookup map (135) and assigning a next sequential key value entry to the symbol and adding this key value to the key value segment Sm,n (150); upon detecting the symbol is stored in the lookup map, adding the corresponding key value assigned to this symbol to the next sequential entry of the key value segment Sm,n (150); wherein a new key value segment Sm,n+i of the lookup map is generated if the number of different symbols equals the number of available key values k=2.sup.n for the opened/current key value segment Sm,n (141, 142), and wherein the lookup map is converted into a segment dictionary Dm if the maximal key value size knmax=2.sup.nmax is reached (132, 133, 134), with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size, and m being any positive integral number denoting the consecutive numbering of segment dictionaries Dm.

2. The method of claim 1, wherein the plurality of segment dictionaries Dm with the corresponding key value segments Sm,n form a column store database.

3. The method of claim 1, wherein the maximal key value size kn,max is predeterminable.

4. The method of claim 3, wherein the maximal key value size kn,max is predeterminable.

5. An apparatus (20) for adaptively compressing an input string (10) having a sequence of symbols, the apparatus (20) comprising a database storage (DB) for storing a plurality of segment dictionaries (Dm), a receiver (30) for receiving an input string (10), an analyzer (40) for excerpting each symbol from the sequence of symbols of the input string (10), a generator (60) for generating a lookup map (Lm) and a key value segment (Sm,n), an engine (50) for detecting whether a symbol is contained in a lookup map (Lm) , and a converter (70) to convert a lookup map (Lm) into a segment dictionary (Dm), wherein the generator (60) generates a lookup map (Lm) and a key value segment (Sm,n) once the receiver (30) receives an input string (10), and the engine (50) searches the lookup map (Lm) for each symbol received in the input string (10) excerpted by the analyzer (40), wherein upon detecting a symbol is not stored in the lookup map (Lm) , the symbol is added to the lookup map (Lm) by storing the symbol at a next sequential key index in the lookup map (Lm) and a next sequential key value entry is assigned to the symbol and added to the key value segment (Sm,n), and wherein upon detecting the symbol is already stored in the lookup map (Lm) , the corresponding key value assigned to this symbol is added to the next sequential entry of the key value segment (Sm,n) wherein the generator (60) generates a new key value segment (Sm,n+i) of the lookup map (Lm) if the number of different symbols equals the number of available key values k=2<n>for the opened new key value segment (Sm,n) and wherein the converter (70) converts the lookup map into a segment dictionary (Dm) if the maximal key value size knmax=2.sup.nmax is reached, with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size, and m being any positive integral number denoting the consecutive numbering of segment dictionaries (Dm).

6. The apparatus of claim 5, wherein the plurality of segment dictionaries (Dm) with the corresponding key value segments (Sm,n) are stored in the database storage (DB) as a column store database.

7. The apparatus of claim 6, wherein the converter (70) prompts the generator (60) to generate a new lookup map (Lm+i) upon converting the opened lookup map (Lm) into a segment dictionary (Dm).

8. A non-transitory computer readable storage medium storing one or more sequences of instructions for adaptively compressing an input string (10) comprising a sequence of symbols in order to create a plurality of segment dictionaries Dm, which instructions when executed by one or more processors cause performing: generating a lookup map (110); generating a key value segment Sm,n; searching the lookup map for each symbol received in the input string (120, 130); upon detecting a symbol is not stored in the lookup map, adding the symbol by storing the symbol at a next sequential key index in the lookup map (135) and assigning a next sequential key value entry to the symbol and adding this key value to the key value segment Sm,n (150); upon detecting the symbol is stored in the lookup map, adding the corresponding key value assigned to this symbol to the next sequential entry of the key value segment Sm,n (150); wherein a new key value segment Sm,n+i of the lookup map is generated if the number of different symbols equals the number of available key values k=2.sup.n for the opened/current key value segment Sm,n (141, 142), and wherein the lookup map is converted into a segment dictionary Dm if the maximal key value size knmax=2.sup.nmax is reached (132, 133, 134), with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size, and m being any positive integral number denoting the consecutive numbering of segment dictionaries Dm.

9. The storage medium of claim 8, further comprising sequences of instructions which when executed cause the plurality of segment dictionaries Dm with the corresponding key value segments Sm,n to form a column store database.

10. The storage medium of claim 8, wherein the maximal key value size kn,max is predeterminable.

11. The storage medium of claim 8, wherein the maximal key value size kn,max is predeterminable.

12. The storage medium of claim 8, further comprising a digital database (DB) comprising for one column a plurality of segment dictionaries Dm, m being any positive integral number denoting the consecutive numbering of segment dictionaries Dm, each segment dictionary Dm having a number of correlated key value segments Sm,n with n being any positive integral number 1 to nmax, nmax denoting the maximal bit size of the key values, and a segment dictionary Dm having a maximum number of knmax=2.sup.nmax entries.

13. The storage medium of claim 12, further comprising sequences of instructions which when executed cause the segment dictionaries Dm and correlated key value segments Sm,n to form a column store database.

14. The apparatus of claim 5, wherein the converter (70) prompts the generator (60) to generate a new lookup map (Lm+i) upon converting the opened lookup map (Lm) into a segment dictionary (Dm).

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1a shows a schematic block diagram depiction illustrating an example of an adaptive dictionary setup of the invention, and

(2) FIG. 1b shows the first lookup map of the example of FIG. 1a in subsequent filling states until creation of the first segment dictionary.

(3) FIG. 2 is a flow diagram illustrating the method of the invention.

(4) FIG. 3 shows a schematic block diagram illustrating an apparatus of the invention.

(5) FIG. 4 is a graph showing a temperature profile of a turbine in operation.

(6) FIG. 5 is a schematic depiction of the compressed dictionary database setup of the temperature data of FIG. 4 according to the invention.

(7) FIG. 6 is a more detailed illustration of the depiction of FIG. 5 showing the content of the compressed dictionary database setup.

DETAILED DESCRIPTION

(8) FIG. 1a shows a schematic depiction illustrating the invention. Particularly, FIG. 1a shows an incoming string 10 and its adaptive compression (compressed segment stream D) according to the invention comprising a plurality of segment dictionaries D.sub.1, D.sub.2. The dictionaries are accompanied by key value segments S.sub.1,1, S.sub.1,2, S.sub.1,3, S.sub.2,2, S.sub.2,3. The key value segments S.sub.m,n are sorted according to the n-bit keys. The key value segments S.sub.m,n are stored in front of the respective segment dictionaries D.sub.m. Below the compressed segment stream D, the bitwise data stream of the key value segments is illustrated, i.e. the actual bit entries are shown.

(9) In the embodiment shown in FIG. 1a, the maximal key value size is set to be k.sub.nmax=8, resulting in a bit size of nmax=3 as k.sub.nmax=2.sup.n. Each segment dictionary D.sub.m of the embodiment according to the invention can thus contain up to a maximum of eight character entries.

(10) FIG. 2 shows the flow diagram of the method of the invention, and FIG. 3 illustrates an exemplary embodiment of an apparatus 20 to perform the method of the invention. The apparatus 20 can be a computer database system, and the exemplary embodiment comprises a database storage DB for storing a plurality of segment dictionaries D.sub.m, a receiver 30 for receiving an input string 10, an analyzer 40 for excerpting each symbol from the sequence of symbols of the input string 10, a generator 60 for generating a lookup map L.sub.m and a key value segment S.sub.m,n, an engine 50 for detecting whether a symbol is contained in a lookup map L.sub.m, and a converter 70 to convert a lookup map L.sub.m into a segment dictionary D.sub.m.

(11) With reference to FIG. 1a, an incoming character string 10 consists of a plurality of various characters. The incoming or input string 10 is read from the left to right as indicated in FIG. 1a by arrow R. When starting (step 100 in FIG. 2) the method to adaptively compress an input string 10 received by the receiver 30, a new lookup map L.sub.1 (L.sub.m with m=1) with k.sub.nmax=2.sup.3=8 empty array cells (key indices) is generated (cf. step 110 in FIG. 2) by generator 60, and there will first be no entry in the lookup map L.sub.1 (cf. FIG. 1b, top, sequence 1).

(12) Therefore, the first character taken from the input string 10 by the analyzer 40 (cf. step 120), namely N is not to be found in the lookup map L.sub.1 by the engine and is therefore entered in the lookup map L.sub.1 at the first key index (cf. sequence 2 in FIG. 1b, and steps 130, 131, 135 of FIG. 2), and an according key value 0 is entered in a first key value segment S.sub.1,1 containing the one-bit keys (cf. first entry in key value S.sub.1,1 segment in the depiction of FIG. 1a and steps 140, 150 of FIG. 2).

(13) Then the second character is taken from input string 10 (cf. steps 160, 120 of FIG. 2), namely O, which is also not contained in the first lookup map L.sub.1 yet and therefore entered in the next sequential key index of the first lookup map L.sub.1 (cf. sequence 3 in FIG. 1b and repetition of steps 130, 131, 135 of FIG. 2) with an according key value 1 entered into first key value segment S.sub.1,1 at the next position (cf. second entry in key value S.sub.1,1 segment in the depiction of FIG. 1a and repetition of steps 140, 150 of FIG. 2).

(14) Then, the third character, namely N, is read (steps 160, 120 of FIG. 2). As this character is already contained in the lookup map, namely in the first key index of the first lookup map L.sub.1, there is no additional entry of this character into the lookup map L.sub.1. However, the corresponding key value 0 is entered in the next sequential entry of the first key value segment S.sub.1,1 (cf. FIG. 2: step 130 directly refers to steps 140 and 150).

(15) This is repeated for the next three characters OON already contained in the lookup map L.sub.1 and resulting in the bit key entries 110 in the first key value segment S.sub.1,1 as depicted in FIG. 1a.

(16) Then a new character, namely a blank , is identified in input string 10, resulting in a new entry in the first lookup map L.sub.1 (cf. sequence 4 of FIG. 1b). As this is the third character in the lookup map L.sub.1, the one-bit key segment S.sub.1,1 is not available anymore, and the next, i.e. second key value segment for two-bit keys S.sub.1,2 is started, bearing entry 2 for the character (blank) (cf. steps 140, 141, 142, 150 of FIG. 2).

(17) Step 160 then refers back to step 120, and the following character B is also new to the lookup map L.sub.1 such that this character is entered into the fourth key index of first lookup map L.sub.1 (sequence 5 of FIG. 1b) and a corresponding key value 3 is added to the second key value segment S.sub.1,2. The next five characters NOB N are already contained in the lookup map so that no new entries are done to the first lookup map L.sub.1, but of course resulting in the new key value assignments 01320 to the second key value segment S.sub.2,1.

(18) The following character E in input string 10 (which is the fourteenth character in the string) is another character which is not yet contained in the lookup map. As this the fifth new character, it is entered in the fifth key index of the first lookup map L.sub.1 (cf. sequence 6 of FIG. 1b), and as the two-bit key value segment S.sub.1,2 is now full (containing 2.sup.n with n=2 values), the third key value segment S.sub.1,3 is started with the entry 4 assigned to the new character E (cf. FIG. 1a).

(19) This third key value segment S.sub.1,3 is the last key value segment as it provides entry of up to eight different symbols (k.sub.nmax=2.sup.3). This eighth character in the example input string 10 is the character S (which is the 24.sup.th character in this string), and it is assigned key value 7 and is the last entry in the first lookup map L.sub.1 (cf. sequences 7 to 9 in FIG. 1b).

(20) As the following two characters on positions 25 and 26 are already contained in the dictionary ( and A), there are two more entries into the third key value segment S.sub.1,3 so that the last three entries of this segment read 725.

(21) The next position in input string 10 (position number 27) contains the ninth different character, namely a -. As the maximum of eight key values is reached, the current key value segment S.sub.1,3 is finalized and the first lookup map L.sub.1 is converted into first segment dictionary D.sub.1 by the converter 70 (cf. bottom of FIG. 1b after sequence 9, and steps 131, 132, 133 of FIG. 2). Then, a new (second) lookup map L.sub.2 is opened, with an according key value segment (step 134). The new character - is entered into the first entry position or key index of the second lookup map L.sub.2 and an according entry is added to the segment key S.sub.2,2.

(22) It is to be noted that in the creation of the second lookup map L.sub.2 in the embodiment shown in FIG. 1, no 1-bit key segment is created but directly the 2-bit key segment S.sub.2,2 is opened as the first three characters starting at position 27 -JU are three distinct different characters so that the 1-bit key segment would only consist of the two entries 01. It is of course understood that there could as well be a 1-bit segment S.sub.2,1.

(23) As shown in the depiction of FIGS. 1a and 3 the key segments may be stored in front of the respective corresponding segment dictionary.

(24) FIG. 4 shows a graph with a temperature profile of a turbine in operation, and FIG. 5 is a schematic depiction of the plurality of segment dictionaries obtained from the temperature data of the graph of FIG. 4.

(25) The graph of FIG. 4 shows a turbine that is put into operation on 1.sup.st January. In the beginning, the operating temperature of the turbine is at around room temperature, and then increases at a rate of roughly 100 to 120 C. per hour until it reaches a first operating temperature at around 8255 C. This level is maintained during a long term operation until 4.sup.th January when a reconfiguration mode is run which brings the temperature down to around 640 C. within around two hours and then directly back up to second operation level at around 8708 C. within another two hours. The turbine is then operated at this second temperature level for several days.

(26) By using the method of the invention for adaptively compressing the input string comprising the sequence of temperature measurement values, a relatively small dictionary size is chosen, unlike known methods which would work with a dictionary size covering the whole span of around 14 C. to 1,000 C.

(27) As can be seen from the depiction of FIG. 5, nmax is chosen to be 4, leading to a maximum key value size of k.sub.nmax=2.sup.4=16, and an according maximum segment dictionary size of 16 entries.

(28) This approach leads to a compressed segment stream as illustrated in FIG. 5 with a plurality of segment dictionaries D.sub.1 to D.sub.10, each having associated four key value segments S.sub.1,1, S.sub.1,2, S.sub.1,3, S.sub.1,4; S.sub.2,1, S.sub.2,2, S.sub.2,3, S.sub.2,4; . . . ; S.sub.10,1, S.sub.10,2, S.sub.10,3, S.sub.10,4.

(29) In the heating-up phase (first four lines in FIG. 5, corresponding to segment dictionaries D.sub.1 to D.sub.4) and the reconfiguration phase (lines six to nine in FIG. 5, corresponding to segment dictionaries D.sub.6 to D.sub.9), the key value segments are rather short and there is a higher number of segment dictionaries as the temperature measurement readings vary fast in these phases, i.e. there are a lot of different values.

(30) In contrast thereto, the two stable operating phases at around 825 C. and 870 C., respectively, each only have one segment dictionary with accordingly longer key value segments (i.e. key value segments with more entries) as the incoming temperature measurement readings are repeatedly the same values within a small interval (cf. lines five and ten of FIG. 5, corresponding to segment dictionaries D.sub.5 to D.sub.10).

(31) FIG. 6 shows the depiction of FIG. 5 in more detail, particularly by showing the actual entries in the key value segments and the segment dictionaries. The entries in segment dictionaries D.sub.1 to D.sub.4 and D.sub.6 to D.sub.9 reflect the rapid temperature changes in the high gradient portions of the temperature profile of FIG. 4, the first temperature increase starting at 14 C. in D.sub.1 to 801 C. in D.sub.4, whereas the entries in segment dictionaries D.sub.5 to D.sub.10 reflect the stable plateau phases of the temperature profile of FIG. 4 with temperatures of around 825 C. in segment dictionary D.sub.5 and 870 C. in segment dictionary D.sub.10 (at the end of segment dictionary D.sub.5 the decrease phase already starts as reflected by the last three to four entries).

(32) According to the teachings of the invention, the key value segments are filled quite fast with the sixteen available values such that the key value segments of dictionaries D.sub.1 to D.sub.4 and D.sub.6 to D.sub.9 remain relatively short. In the stable phases, the temperatures are more repetitive such that are also repetitive and accordingly longer (cf. particularly key value segments S.sub.5,4 and S.sub.10,4).

(33) As a reading example, the first temperature entry of segment dictionary D.sub.1 equaling 14 C. corresponds to the first 13 measurement values (all four entries of key value segments S.sub.1,1 and S.sub.1,2 and the first nine entries of key value segments S.sub.1,3). The sequence of the measurement values stored in segment dictionary D.sub.10 and key value segments S.sub.10,1 and S.sub.10,2 is 864 C., 866 C., 864 C., 863 C., 863 C., 870 C., 863 C., etc.

(34) The example of FIGS. 4 to 6 shows how the invention can create a database compression which adapts to the nature of the data to be compressed, i.e. having a higher number of segment dictionaries with short key value segments in cases of highly varying data, and having a very low number of segment dictionaries with accordingly longer key value segments in case of stable or less varying data, all of the segment dictionaries having the same maximum size.