METHOD AND DEVICE FOR THE LOSSLESS COMPRESSION OF A DATA STREAM
20200007154 · 2020-01-02
Inventors
Cpc classification
H03M7/30
ELECTRICITY
International classification
Abstract
Provided is a method and a device for the lossless compression of a data stream which includes a sequence of structured data objects which have a list of properties which each contain a key value pair, the method having the following steps: dividing the structured data objects of the data stream into a constant data object portion which has key value pairs with constant values and into variable data object portions which have key value pairs with variable values; transmitting the constant data object portion of the structured data objects once to a receiver; and transmitting the variable data object portions of the divided data objects of the data stream to the receiver.
Claims
1. A method for a lossless compression of a data stream having a sequence of structured data objects having a list of properties which each contain a key-value pair, the method comprising: (a) decomposing the structured data objects of the data stream into a constant data object portion, which has key-value pairs having constant values, and into variable data object portions, which have key-value pairs having variable values; (b) transmitting the constant data object portion of the structured data objects once to a receiver; and (c) transmitting the variable data object portions of the decomposed data objects of the data stream to the receiver.
2. The method as claimed in claim 1, wherein the structured data objects of the data stream have JavaScript Object Notation, JSON, and data objects.
3. The method as claimed in claim 1, wherein each key-value pair has a key formed by a character string or a number.
4. The method as claimed in claim 1, wherein the key-value pair has a value formed by a data object, an array, a character string, a numerical value or a logical value.
5. The method as claimed in claim 1, wherein the variable data object portion of the decomposed structured data object of the data stream has an array, the components of which are key-value pairs that are streamed asynchronously to the receiver.
6. The method as claimed in claim 1, wherein the variable data object portions of a data stream each contain a unique identification identifier for identifying their association with the data stream.
7. The method as claimed in claim 1, wherein changes in the variable data object portion of the structured data object of the data stream relative to a reference data object are determined and transmitted to the receiver.
8. The method as claimed in claim 7, wherein the reference data object is formed by a structured data object of the data stream having a predetermined endurance.
9. The method as claimed in claim 7, wherein the reference data object of the data stream is identified as such and is transmitted to the receiver.
10. The method as claimed in claim 8, wherein the endurance of the reference data object is defined by a number of transmission cycles or a validity period.
11. The method as claimed in claim 1, wherein decomposing the structured data objects of the data stream is carried out at the run time during the transmission of the data stream to the receiver.
12. The method as claimed in claim 1, wherein, on the part of the receiver, the structured data objects of the data stream are reconstructed for further data processing on the basis of the obtained constant data object portion and the received variable data object portions.
13. The method as claimed in claim 1, wherein, on the part of the receiver, the structured data objects of the data stream are reconstructed for further data processing on the basis of the buffer-stored reference data objects and the received changes in the variable data object portions.
14. The method as claimed in claim 1, wherein the data stream is an event data stream, in particular a sensor data stream.
15. A compression device for a lossless compression of a data stream comprising a sequence of structured data objects having a list of properties which each consists of a key-value pair, wherein the device comprises: (a) a data decomposition unit suitable for decomposing the structured data objects of the data stream into a constant data object portion, which has key-value pairs having constant values, and into variable data object portions, which have key-value pairs having variable values; and (b) a data transmission unit, which transmits once the constant data object portion and separately the variable data object portions of the decomposed data objects to a receiver.
Description
BRIEF DESCRIPTION
[0033] Some of the embodiments will be described in detail, with reference to the following figures, wherein like designations denote like members, wherein:
[0034]
[0035]
DETAILED DESCRIPTION
[0036] As can be discerned in
[0037] The data stream DS consists of a sequence of structured data objects. These data objects each have a list of so-called properties which each contain a key-value pair.
[0038] A first step S1 involves decomposing the structured data objects of the data stream into a constant data object portion and into variable data object portions. In this case, the constant data object portion has key-value pairs having constant values. The variable data object portions have key-value pairs having variable values.
[0039] Afterward, step S2 involves transmitting the formed constant data object portion of the structured data objects once to a receiver.
[0040] A further step S3 involves transmitting the variable data object portions of the decomposed data objects of the data stream to the receiver E.
[0041] In this exemplary embodiment of the method according to the embodiments of the invention as illustrated in
[0042]
[0043] As illustrated in
[0044] In one possible embodiment, the structured data objects DO of the data stream DS have JavaScript Object Notation, JSON, data objects. The JSON data object forms a data format suitable for the transmission and storage of structured data. With a JSON data format, data can be nested. In the case of JSON, a null value, a Boolean value, a number, a character string, an array and an object are provided as data types. In this case, an object contains a list of properties. However, objects without properties, so-called empty objects, are also permissible. A property of the object consists of a key and a value, wherein each key is permitted to be contained in an object only once. In one preferred embodiment, the key is formed by a character string. The value of the key-value pair can be formed by an object, by an array, by a character string, by a number or by a Boolean expression.
[0045] In one possible embodiment of the method according to the embodiments of the invention, each key-value pair of the structured data object DO has a key formed by a character string or a number. Furthermore, in one possible embodiment, the key-value pair of the JSON data object has a value formed by a data object, an array, a character string, a numerical value or a logical value.
[0046] In one possible embodiment, the variable data object portions of the structured data objects of the data stream DS, said structured data objects being decomposed by the data decomposition unit 2, can have an array, the components of which are key-value pairs that are streamed asynchronously by the data transmission unit 3 through the receiver 5.
[0047] In a further possible embodiment of the method according to the embodiments of the invention, the variable data object portions of a data stream each contain a unique identification identifier for identifying their association with the respective data stream DS.
[0048] In one possible embodiment, changes in the variable data object portion of the structured data object of the data stream DS relative to a reference data object are determined and transmitted to the receiver 5. In this case, the reference data object can be formed by a structured data object of the data stream having a predetermined endurance. In one possible embodiment, the reference data object is firstly transmitted to the receiver 5 and buffer-stored there for further use. The reference data object is preferably identified as such and is transmitted to the receiver 5 by the data transmission unit 3 of the compression device 1 and is buffer-stored in the receiver.
[0049] In a further possible embodiment of the method according to the embodiments of the invention, the endurance of the reference data object is defined by a number of transmission cycles. In a further possible embodiment, the endurance of the reference data object is defined by a specific validity period and/or expiry time.
[0050] In one possible embodiment, decomposing the structured data object at the data decomposition unit 2 is carried out at the run time during the transmission of the data stream DS to the receiver 5. On the part of the receiver 5, the structured data objects DO of the original data stream DS can be reconstructed for further data processing on the basis of the obtained constant data object portion and the received variable data object portions. In one possible embodiment, on the part of the receiver 5, the structured data objects DO of the data stream DS are reconstructed for further data processing on the basis of the buffer-stored reference data objects and received changes in the variable data object portions. The data stream received from the data source 4 is preferably an event stream. In one possible embodiment, the data source 4 is a sensor that generates sensor data which are transmitted as a sensor data stream from the data source 4 to the compression device 1.
[0051] The compression of the data streams DS by the compression device 1 makes it possible to lower the data rate of the data transmission. A data stream DS, which can be a data stream of event data, comprises a sequence of structured data objects. In one possible embodiment, a single data stream DS is generated by a data source 4, for example a sensor or some other unit of the system. Said data stream can consist for example of a sequence of data triples comprising for example a sensor ID of the sensor 4, a time stamp and a sensor value.
[0052] By way of example, a structured data object of the data stream DS in JSON notation can be represented as follows:
TABLE-US-00001 { SensorID : ABC1234, Timestamp : 2016-06-10T13:47:26.0123+02:00, Sensorvalue : 987.654 }.
[0053] In one possible embodiment, firstly an event data object is generated from a generated sensor signal or sensor data sequence. An event constitutes a structured data configuration comprising further information over and above the pure sensor data. By way of example, an event data object containing sensor data of a heat sensor can specify the dimension or unit of the sensor data, that is to say for example C. or F. or K, depending on how the relevant sensor or the data source 4 is designed or configured. In association with a published/subscriber approach, a consumer or a data processing unit can subscribe to an event that supplies a temperature value from a specific location, wherein it is of secondary importance, for example, whether the sensor value originates from a heat sensor or a thermal imaging camera. In the case of higher-level events generated by correlation of different data, a numerical value may sometimes be totally absent. By way of example, a fire alarm event can be derived from the sensor data of a smoke detector and from the sensor data of a heat sensor, wherein the two sensors, i.e. the smoke detector and the heat sensor, monitor the same spatial region and their sensor data must originate from a given time window.
[0054] The compression device 1 according to the embodiments of the invention allows a lossless compression of the data stream originating from the data source 4. This means that no loss of information is brought about by the method according to the embodiments of the invention and the compression device 1 according to the embodiments of the invention. Precisely if the value of an event is not of a numerical nature, but rather constitutes a complex data configuration, a loss of information should be avoided. This holds true, in particular, if the source of the event stream is already filtered with respect to a time or value criterion. Each event that is transmitted or emitted by a data source or event source 4 has to be able to be completely reconstructed at the associated receivers 5.
[0055] A discrete sequence of sensor data or events originating from the same data source 4 can be transmitted as a data stream or a so-called stream. One example of a sequence of primitive events or sensor data reads as follows:
[0056] This sequence of data objects DO can be represented as a stream as follows:
TABLE-US-00002 {Sensor data Stream : { SensorID : string , [ { Timestamp : string1, Sensorvalue : number1 } , { Timestamp : string2, Sensorvalue : number2 } , ..., { Timestamp : stringN, Sensorvalue : numberN } ] } }
[0057] In this case, the constant portion of the sequence of sensor data objects, namely the sensor IDs, is withdrawn or bracketed out from the sequence.
[0058] The variable portions of the sensor data sequence, namely the time stamps and values, are combined into a JSON array in the example illustrated. The streaming mechanism ensures that the data are transmitted asynchronously and continuously to the receiver 5, that is to say that the data objects DO of the data stream DS that transmits the array can already be read at the receiver 5 while the data transmission unit of the compression device 1 is still generating further data objects and writing them into the array or the stream.
[0059] In the case of discrete event data and a portion that is variable in a defined manner, the streaming can generally be carried out by bracketing out the constant portions of the structured data object as follows:
[0060] In this case, K.sub.s denotes the keys of the key-value pairs, and C.sub.s denotes the constant values. V.sub.s denotes variable values. The sequence of data objects DO represented above can be summarized as follows:
[0061] Here the components of the JSON array are streamed asynchronously.
[0062] In one possible embodiment variant, a procedure for implementing the bracketing out can be embodied as follows. The following procedure functions for both sensor data and event data:
TABLE-US-00003 obj.get( ); // all data from the source are read into the object. constObj = newObj( ); // ... obtains the constant portions of the object varObj = newObj( ); // ... obtains the variable portions of the object. for all ( pair obj ) { if ( pair.value.isVariable( ) ) { // decomposition of the object into ... varObj.add( pair.key, pair.value ); // ... variable and ... } else { constObj.add( pair.key, pair.value ); // ... constant portions } } stream.send( constObj ); // the constant portion of the object is transmitted stream.send( [ ); // the array is opened for streaming stream.send( varObj ); // the variable portion of the object is transmitted while ( TRUE ) { obj.get( ) // reading in the next object. for all ( pair obj ) { if ( pair.value.isVariable( ) } { // filtering of the variable portions } stream.send( varObj ); // varObj.add( pair.key, pair.value ); } only the variable portion of the object is transmitted } stream.send( ] ); // merely for the sake of completeness
[0063] This procedure involves determining which portions of the data object are deemed to be constant, and which are deemed to be variable. In one possible embodiment, the evaluation is carried out during the run time. Alternatively, it may also not be at the run time when the data evaluation is carried out.
[0064] On the part of the receiver, the processing is effected in the reverse order. Each received data object can be updated by the next data object and be completed for further data processing.
[0065] If no streaming mechanism is available in the system, it is likewise possible to carry out the data reduction or data compression in accordance with the method according to the embodiments of the invention. In this case, with respect to the variable portion of the data object, a unique identification identifier is sent as well, on the basis of which the variable object portions can be assigned again to the object portion having the constant attributes:
[0066] In this case, K.sub.s denotes the keys, C.sub.s denotes constant values, and V.sub.s denotes variable values.
[0067] This can be summarized as follows:
[0068] The complete data sets can thus be reconstructed again as far as necessary on the part of the receiver 5 on the basis of the identification identifier.
[0069] Streaming of the variable portions of discrete data objects can thus be carried out. In some cases, however, it is not known which portions of a data object are variable portions and which are constant portions or when the variable portions change and when not. In these cases, a differentiation can be carried out dynamically. The transmission of time stamps (Timestamp) at intervals of 1.5 seconds is represented below as an example. The construct is given as follows:
TABLE-US-00004 { Timestamp : { Year : number , Month : number , Day : number , Hour : number , Minute : number , Second : number , Millisecond : number } }
[0070] In the example illustrated, all of the components of the object are variable, albeit not always. One specific point in time is for example:
TABLE-US-00005 { Timestamp : { Year : 2016 , Month : 7 , Day : 29 , Hour : 1 , Minute : 23 , Second : 45 Millisecond : 678,01 } }
[0071] In order to represent the time stamp one and a half seconds later, it suffices to update only the seconds and milliseconds:
TABLE-US-00006 { Timestamp : { Second : 47 , Millisecond : 178,01 } }
[0072] One hour later it is necessary merely to transmit the following data object if no further transmissions have taken place in the interim:
TABLE-US-00007 { Timestamp : { Hour : 2 } }
[0073] As can be discerned from the example above, the method according to the embodiments of the invention and the compression device according to the embodiments of the invention offer a considerable potential for data reduction.
[0074] For the case where only a few changes are present even in the variable portion of a data object, in one possible embodiment it is possible to transmit only these changes. A procedure in which this is implemented may read as follows:
TABLE-US-00008 obj.get( ); // data from the source are read into the object refObj = obj; // creates a copy as reference object send( obj ); // complete transmission of the data object while ( TRUE ) { obj.get( ); newObj = reduce( obj, refObj ); // construction of a reduced copy of obj newObj.add( EventID, obj.getValue( Event.ID ) }; // for recovery refObj = obj; // object becomes reference object send( newObj ); // transmission of the reduced data object }
[0075] The reduced function specified in the above procedure implements the following table for each key-value pair of the data object. It is assumed here that the JSON data objects obj and refObj differ only in the values, but not in the keys. The reduction can be extended to a JSON array by virtue of only the differing components of the array being transmitted.
TABLE-US-00009 Key Value reduced result .fwdarw. newObj obj K V1 = { k1 : v1 , k2 : return v2 , . . . , kn : vn } refObj K V2 = { k1 : w1 , k2 : { w2 , . . . , kn : wn } ( if (v1 w1 ) k1 : v1 ) ( if ( v2 w2 ) , k2 : v2 ) . . . ( if ( vn wn ) , kn : vn ) } obj K V1 = { v1 , return v2 , . . . , vn } refObj K V2 = { w1 , [ w2 , . . . , wn } ( if (v1 w1 ) v1 ) , ( if ( v2 w2 ) v2 ) , . . . ( if ( vn wn ) vn ) ]
[0076] In the case of reducing a data object, an empty data object { } can be transmitted in the simplest case. An empty data object is permissible according to the JSON definition.
[0077] In the case of reducing an array, the result may deviate from the JSON definition since a comma is always transmitted, even if components are missing on account of equality. In the extreme case that may appear as follows:
[ ,, . . . ,].
[0078] A construct of this type is not provided in the conventional JSON definition, but during the reconstruction of an array may help in finding the correct place for the changed value, for example in an array the third component: [ ,,w, . . . ].
[0079] The reception on the part of the receiver 5 takes place in the reverse order. Each received data object is updated by the next data object and completed and processed further internally.
[0080] In one possible embodiment, an extension is possible in the case of recursive data structures. In the case of a recursive structure of the values of the data objects or arrays, the reduction can also be applied to the values. In the case of objects, the above table can be extended as follows:
[0081] . . . (if (vi !=wi) ki: *reduce(vi)*,) * . . . .
[0082] In the case of arrays, the table can be extended in a manner as represented as follows:
[0083] . . . (if(vi !=wi) reduce(vi)) * . . . .
[0084] The reduction of atomic values, such as, for example, a character string or strings, numerical values, or Boolean values, is the values themselves. The recursion can be anchored as a result.
[0085] In a further possible embodiment of the method according to the embodiments of the invention, it is possible to carry out an extension with respect to the reference object. In one possible embodiment variant, the transmission of structured data or data objects is realized in the sense that each data object is utilized as reference object, and so only the changes with respect thereto have to be transmitted. In an alternative embodiment variant deviating therefrom, a reference object can be given a longer endurance. Depending on the application or configuration, every 10.sup.th, 20.sup.th or for example 100.sup.th data object can be declared as a reference object by means of the transmission cycles being counted as well. Instead of the line:
[0086] refObj=obj; object becomes reference object
the following lines can be used in the procedure represented above:
TABLE-US-00010 if ( count++ >= N ) { // N denotes endurance in transmission cycles refObj = obj; // object becomes reference object count = 0; }
[0087] In an alternative embodiment, a temporal endurance can also be taken into account or defined as a replacement for the number of transmission cycles by means of the counter being replaced by a point in time, as follows:
TABLE-US-00011 if ( getTime( ) >= nextTime ) { // nextTime denotes the point in time for the transmission of the next reference object. refObj = obj; nextTime = getTime( ) + deltaT; // deltaT denotes the endurance of the reference object.
[0088] On the part of the receiver 5 there must be clarity in respect of which data object is a reference object. Therefore, corresponding information is preferably communicated, e.g. a further attribute, which can be set whenever the corresponding condition is met:
TABLE-US-00012 if ( count++ >= N ) { // or if ( getTime( ) >= nextTime ) { ... newObj.add( isRefObj, true }; } send( newObj };
[0089] On the part of the receiver 5, the reference object is selected and updated with the subsequent received objects and completed in order afterward to be able to carry out the data processing. The reference object is maintained until a new reference object is received by the receiver 5. For this purpose, the reference object is preferably stored in a buffer memory provided for this purpose.
[0090] The method according to the embodiments of the invention and the compression device according to the embodiments of the invention serve for the efficient data transmission of structured data or data objects. In this case, a lossless compression without loss of information is effected. In the method according to the embodiments of the invention, the constant portion of the structured data object is preferably transmitted only once and afterward the variable data object portions are transmitted separately. In one possible embodiment, at the run time it is possible to decide whether there has been a change in an object component in relation to a reference data object. In this case, only the changes are transmitted to the receiver 5. The method according to the embodiments of the invention also allows a data transmission without streaming.
[0091] Although the invention has been illustrated and described in greater detail with reference to the preferred exemplary embodiment, the invention is not limited to the examples disclosed, and further variations can be inferred by a person skilled in the art, without departing from the scope of protection of the invention.
[0092] For the sake of clarity, it is to be understood that the use of a or an throughout this application does not exclude a plurality, and comprising does not exclude other steps or elements.