Device for and method of determining a length of a relevant history
10922289 ยท 2021-02-16
Assignee
Inventors
Cpc classification
G06F18/214
PHYSICS
H03M7/30
ELECTRICITY
G06F16/215
PHYSICS
G06F17/18
PHYSICS
G06N7/00
PHYSICS
International classification
G06F16/215
PHYSICS
H03M7/30
ELECTRICITY
G06F17/18
PHYSICS
Abstract
A device (100) for and a method of determining a length of a history of instances of a first type of data are provided. The device comprises a second data set unit (104), a first data set unit (102), a first conditional entropy unit (106), a threshold unit (108), and a length determining unit (114). The first data set unit, the second data set unit and the threshold unit obtain input data. The first conditional entropy unit determines a first conditional entropy for a second data set given a first data set. The determining unit searches for a length of the relevant history by determining a smallest length for sequences of a reduced first set of data for which a second conditional entropy for the second set of data, given the reduced first set of data, is within a maximum entropy increase threshold from the first conditional entropy.
Claims
1. A device including a processor for determining a length of a history of instances of a first type of data for reducing a data set comprising instances of the first type of data, the instances of the first type of data comprising information for predicting at least one instance of a second type of data, the device comprising: a second data set unit for obtaining a second set of data comprising instances of the second type of data, instances of the second type of data comprising data being based on a characteristic of a physical entity; a first data set unit for obtaining a first set of data comprising sequences of instances of the first type of data, each sequence providing a history of instances of the first type of data for a corresponding element of the second set, each sequence comprising instances preceding the moment in time at which the corresponding element of the second set is determined, the instances of the first type of data being ordered according to time in the sequences, instances of the first type of data comprising data being based on measured sensor; a first conditional entropy unit for obtaining a first conditional entropy for the second set of data given the first set of data; a threshold unit for obtaining a maximum entropy increase threshold indicating a factor by which the first conditional entropy may increase when the length of the sequences in the first set of data is reduced; a length determining unit for determining the length of the relevant history by determining a smallest length for the sequences of a reduced first set of data for which a second conditional entropy for the second set of data given the reduced first set of data is within the maximum entropy increase threshold from the first conditional entropy, the reduced first set of data comprising sequences of the smallest length and the sequences of the reduced set of data comprising the most recent instances of the first type of data of their corresponding sequences in the first set of data; and an output unit for providing the smallest length for the sequences as the length of the relevant history for reducing the amount of history stored for the data of the first type.
2. A data reduction system including a processor, comprising: a device for determining a length of a history of instances of a first type of data for reducing a data set comprising instances of the first type of data, the instances of the first type of data comprising information for predicting at least one instance of a second type of data, the device comprising: a second data set unit for obtaining a second set of data comprising instances of the second type of data, instances of the second type of data comprising data being based on a characteristic of a physical entity; a first data set unit for obtaining a first set of data comprising sequences of instances of the first type of data, each sequence providing a history of instances of the first type of data for a corresponding element of the second set, each sequence comprising instances preceding the moment in time at which the corresponding element of the second set is determined, the instances of the first type of data being ordered according to time in the sequences, instances of the first type of data comprising data being based on measured sensor; a first conditional entropy unit for obtaining a first conditional entropy for the second set of data given the first set of data; a threshold unit for obtaining a maximum entropy increase threshold indicating a factor by which the first conditional entropy may increase when the length of the sequences in the first set of data is reduced; a length determining unit for determining the length of the relevant history by determining a smallest length for the sequences of a reduced first set of data for which a second conditional entropy for the second set of data given the reduced first set of data is within the maximum entropy increase threshold from the first conditional entropy, the reduced first set of data comprising sequences of the smallest length and the sequences of the reduced set of data comprising the most recent instances of the first type of data of their corresponding sequences in the first set of data; and an output unit for providing the smallest length for the sequences as the length of the relevant history for reducing the amount of history stored for the data of the first type; and a data reduction unit for obtaining a reduced set of data being based on the first set of data and comprising sequences of instances of the first type of data, wherein a length of the sequences of the reduced set is based on the determined length provided by the device for determining a length of a history of instances of a first type of data.
3. A computer-implemented method of determining a length of a history of instances of a first type of data for reducing a data set comprising instances of the first type of data, the instances of the first type of data comprising information for predicting instances of a second type of data, the method comprising: obtaining a second set of data comprising instances of the second type of data, instances of the second type of data comprising data being based on a characteristic of a physical entity; obtaining a first set of data comprising sequences of instances of the first type of data, each sequence providing a history of instances of the first type of data for a corresponding element of the second set, each sequence comprising instances preceding the moment in time at which the corresponding element of the second set is determined, the instances of the first type of data being ordered according to time in the sequences, instances of the first type of data comprising data being based on measured sensor; obtaining a first conditional entropy for the second set of data given the first set of data; obtaining a maximum entropy increase threshold indicating a factor by which the first conditional entropy may increase when the length of the sequences in the first set of data is reduced; determining the length of the relevant history by determining a smallest length for the sequences of a reduced first set of data for which a second conditional entropy for the second set of data given the reduced first set of data is within the maximum entropy increase threshold from the first conditional entropy, the reduced first set of data comprising sequences of the smallest length and the sequences of the reduced set of data comprising the most recent instances of the first type of data of their corresponding sequences in the first set of data; and providing the smallest length for the sequences as the length of the relevant history for reducing the amount of history stored for the first type of data.
4. The method according to claim 3, wherein the determining of the length of the relevant history comprises: obtaining a temporary reduced first set of data from the first set data of data, in the temporary reduced first set for each sequence an oldest instance has been removed compared to the sequences of the first set of data; obtaining the second conditional entropy for the second set of data given the temporary reduced first set; comparing the second conditional entropy with the first conditional entropy for determining whether the second conditional entropy is within the maximum entropy increase threshold from the first conditional entropy; and if the second conditional entropy is within the maximum entropy increase threshold from the first conditional entropy, then remove from the sequences of the temporary reduced first set of data the oldest instances and the obtaining of the second conditional entropy and the comparing of the second conditional entropy with the first conditional entropy are performed once again; or if the second conditional entropy is not within the maximum entropy increase threshold from the first conditional entropy, then the determined length of the relevant history is the length of the sequences of the temporary reduced first set of data plus one.
5. The method according to claim 3, wherein the obtaining of the first conditional entropy comprises estimating the first conditional entropy by applying a Context Tree Weighting method to the second set and first set to obtain a first context tree; and using a weighted probability of the root of the first context tree to calculate an estimation of first conditional entropy, wherein in the Context Tree Weighting method every unique sequence of the first set is represented by a path in the first context tree and counts stored in the nodes of the first context tree are based on the corresponding elements of the second set.
6. The method according to claim 3, wherein the obtaining of the second conditional entropy comprises estimating the second conditional entropy by applying the Context Tree Weighting method to the second set and the reduced first set or the temporary reduced first set to obtain a second context tree; and using a weighted probability of the root of the second context tree to calculate the estimated second conditional entropy, wherein in the Context Tree Weighting method every unique sequence of the reduced first set or the temporary reduced first set is represented by a path in the second context tree and counts stored in the nodes of the second context tree are based on the corresponding elements of the second data set.
7. The method according to claim 5, wherein instead of completely applying the Context Tree Weighting method to the second set and the reduced first set or the temporary reduced first set, the second context tree is obtained by removing leafs and nodes from the first context tree that have, seen from the root, a depth that is larger than the length of the sequences of the reduced first set or the temporary reduced first set.
8. The method according to claim 3, wherein the instances of the first type of data are discrete values.
9. The method according to claim 8, further comprising quantizing instances of continuous data to obtain the discrete values of the instances of the first type of data, wherein the quantizing is performed such that a loss of mutual information between the second type of data and the first type of data is within a maximum information loss threshold.
10. The method according to claim 3, further comprising reducing the first data set by deleting in every sequence of the first set of data oldest instances of the first type of data until the sequence has the determined length of the relevant history.
11. The method according to claim 10, further comprising receiving a recent history of instances of the first type of data; and using the reduced first set of data and the second set of data to predict an instance of the second type of data or to train a prediction model.
12. A non-transitory computer-readable medium having one or more executable instructions stored thereon, which when executed by a processor, cause the processor to perform a method for determining a length of a history of instances of a first type of data for reducing a data set comprising instances of the first type of data, the instances of the first type of data comprising information for predicting instances of a second type of data, the method comprising: obtaining a second set of data comprising instances of the second type of data, instances of the second type of data comprising data being based on a characteristic of a physical entity; obtaining a first set of data comprising sequences of instances of the first type of data, each sequence providing a history of instances of the first type of data for a corresponding element of the second set, each sequence comprising instances preceding the moment in time at which the corresponding element of the second set is determined, the instances of the first type of data being ordered according to time in the sequences, instances of the first type of data comprising data being based on measured sensor; obtaining a first conditional entropy for the second set of data given the first set of data; obtaining a maximum entropy increase threshold indicating a factor by which the first conditional entropy may increase when the length of the sequences in the first set of data is reduced; determining the length of the relevant history by determining a smallest length for the sequences of a reduced first set of data for which a second conditional entropy for the second set of data given the reduced first set of data is within the maximum entropy increase threshold from the first conditional entropy, the reduced first set of data comprising sequences of the smallest length and the sequences of the reduced set of data comprising the most recent instances of the first type of data of their corresponding sequences in the first set of data; and providing the smallest length for the sequences as the length of the relevant history for reducing the amount of history stored for the first type of data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) These and other aspects of the invention will be apparent from and elucidated further with reference to the embodiments described by way of example in the following description and with reference to the accompanying drawings, in which
(2)
(3)
(4)
(5)
(6) The figures are purely diagrammatic and not drawn to scale. In the Figures, elements which correspond to elements already described may have the same reference numerals.
DETAILED DESCRIPTION OF EMBODIMENTS
(7)
(8)
(9) The second data set unit 104 is arranged for obtaining a second set of data that comprises instances of the second type of data. Instances of the second type of data comprise a characteristic of a physical entity. The second set is, for example, indicated by Y. The set Y comprises several instances y.sub.i that are a characteristic of the physical entity at different moments in time. For example, the instance y.sub.i are the characteristics the motor 304 or 354 operates well or malfunctions. For example, a first element of the set is: y.sub.1(machine 1, t=15)=1 if the motor of machine 1 operates well at the moment in time 15, a second element of the set is: y.sub.2(machine 2, t=25)=0 if the motor of machine 2 malfunctions at the moment in time 25.
(10) The first data set unit 102 is arranged for obtaining a first set of data that comprises sequences of instances of the first type of data. Each sequence provides a history of instances of the first type of data for a corresponding element of the second set. Each sequence comprises instances preceding, and optionally of, the moment in time at which the corresponding element of the second set is determined. The instances of the first type of data are ordered according to time in the sequences. Instances of the first type of data comprise measured sensor data. For example, the first set is indicated with X and comprises several sequences x.sub.1, x.sub.2, . . . Sequence x.sub.1 relates to instance y.sub.i of the second set Y and, thus, comprises instances of data of sensor 302 of an interval of time that precedes (and optionally comprises) the moment in time at which the characteristic y.sub.1=1 of the motor 304 is obtained. For example, sequence x.sub.1=(xi.sub.(motor sensor 302, t=15)xi.sub.(motor sensor 302, t=14)xi.sub.(motor sensor 302, t=13)xi.sub.(motor sensor 302, t=12) . . . ), wherein xi refers to an instance of the first type of data. Sequence x.sub.2 relates to instance y.sub.2 of the second set Y and comprises instance of sensor 352 of an interval of time that precedes (and optionally comprises) the moment in time that the characteristic of motor 354 is obtained. For example, sequence x.sub.2=(xi.sub.(motor sensor 352, t=25)xi.sub.(motor sensor 352, t=24)xi.sub.(motor sensor 352, t=23)Xi.sub.(motor sensor 352, t=22) . . . ). The sequences x.sub.1 and x.sub.2 comprise the same number of instances of the first type of datathus, in other words, the intervals of time for which the sequences x.sub.1 and x.sub.2 comprise instances of the first type have the same length.
(11) The first conditional entropy unit 106 is arranged for obtaining a first conditional entropy for the second set of data given the first set of data. In the example of the data of machines 300 and 350, the first conditional entropy is indicated by H.sub.1(Y|X). Conditional entropy is well defined in the field of information theory. Later in this document embodiments of calculating or estimating the first conditional entropy are provided. The first conditional entropy unit 106 may be coupled to the first data set unit 102 and the second data set unit 104 may be arranged for receiving the first set of data and the second set of data.
(12) The threshold unit 108 is arranged for obtaining a maximum entropy increase threshold indicating a factor by which the first conditional entropy may increase when the length of the sequences in the first set of data is reduced. For example, in practical embodiments, the maximum entropy increase threshold is indicated by a and the value of a is a number in between zero and one. 1 means that the first conditional entropy may increase by 100%, 0.5 means that the first conditional entropy may increase by 50%, etc. Assume that the first set is reduced to X.sub.r, than the maximum entropy increase threshold defines that H(Y|X.sub.r)(1+)H.sub.1(Y|X) (in which the lengths of sequences of reduced first set X.sub.r are shorter than the sequences of X).
(13) The length determining unit 114 is arranged for determining the length of the relevant history by determining a smallest length for the sequences of a reduced first set of data for which a second conditional entropy for the second set of data given the reduced first set of data is within the maximum entropy increase threshold from the first conditional entropy. The reduced first set of data comprises sequences of the smallest length and the sequences of the reduced set of data comprises the most recent instances of the first type of data of their corresponding sequences in the first set of data. The length determining unit 114 may be coupled to the first data set unit 102, to the second data set unit 104, to the threshold unit 108, and to the first conditional entropy unit 106 for receiving the first set of data, the second set of data, the maximum entropy increase threshold and the first conditional entropy, respectively.
(14) The threshold unit 108 does not determine to which length the sequences in the reduced first set X.sub.r may be reduced, but provides a value that is part of a condition for determining the length of the relevant history. The length determining unit 114 determines the minimum length of the sequences in the reduced first data set such that the condition H.sub.2(Y|X.sub.r)<=(1+)H.sub.1(Y|X) is true. H.sub.2(Y|X.sub.r) is the second conditional entropy for the second set given the reduced first data set. Thus, the length determining unit 114 performs a search to find this minimum length. Such a search may be executed in different ways. Embodiments are discussed later in the document. It is further to be noticed that a reduced sequence still comprises the most recent instances of the first type of data. For example, if the above discussed sequence x.sub.1 is reduced to 3 elements only, then the reduced sequence is: x.sub.1r=(xi.sub.(motor sensor 302, t=15)xi.sub.(motor sensor 302, t=14)xi.sub.(motor sensor 302, t=13))
(15) The length determining unit 114 may comprise a reduction unit 110 that creates the reduced first set X.sub.r on the basis of the first set of data by removing a specific number of oldest instances from the sequences of the first set of data. The length determining unit 114 may comprise a second conditional entropy unit for calculating a second conditional entropy for the second set given the reduced first set of data: H.sub.2(Y|X.sub.r).
(16) The output unit 116 is arranged for providing the smallest length for the sequences as the length of the relevant history to, for example, a data reduction arrangement for reducing the amount of history stored for the data of the first type. The output unit is coupled to the length determining unit to receive the smallest length for the sequences. If this value is provided to, for example, a data reduction arrangement that has to reduce the size of the first set of data X, the oldest instances are removed from all sequences in X until the sequences have the length that is indicated by the value.
(17) The first data set unit 102, the second data set unit 104 and the threshold unit 108 may comprise data storages in which the first set of data, the second set of data and the maximum entropy increase threshold are, respectively, stored. These units may also share a common memory in which this data may be stored. The first data set unit 102, the second data set unit 104 and the threshold unit 108 may in addition, or alternatively, comprise an input at which the first set of data, the second set of data and the maximum entropy increase threshold are, respectively, received. These units may also share a common input. Such an input may also comprises a user interface, such as a graphical user interface, at which a user can provide input for at least one of the first set of data, the second set of data or the maximum entropy increase threshold.
(18) The first data set unit 102, the second data set unit 104 and the threshold unit 108 may all comprise dedicated hardware at which the first set of data, the second set of data and the maximum entropy increase threshold are, respectively, generated or they are implemented on a general purpose processor which runs a computer program comprising instructions that generate the first set of data, the second set of data and the maximum entropy increase threshold, respectively. The first conditional entropy unit 106 and the length determining unit 114 may comprise dedicated hardware which is configured to perform the task of the respective units. The first conditional entropy unit 106 and the length determining unit 114 may also comprise a general purpose processor which runs a computer program that comprises instructions for executing the tasks of the respective units. Also, in another embodiment, the device 100 for determining a length of a relevant history may comprise a computer that comprises a memory or a data storage, optional inputs, outputs and user interface and that comprises a general purpose processor that runs a computer program that comprises instructions to perform at least one of the tasks of one of the units of the device 100 for determining a length of a relevant history of instances of a first type of data.
(19)
(20)
(21) Hereinafter the operation of the device for determining a length of a relevant history of instances of a first type of data and of the method of determining a length of a relevant history of instances of a first type of data is discussed more in detail by defining the first set of data as a matrix X and the second set of data as the vector Y. The sequences of the first set of data are rows r.sub.i in the matrix X and a first element y.sub.1 of the vector Y is a corresponding instance of the second type of data that belongs to the first row r.sub.1 of the matrix X. Furthermore, the first elements x.sub.i1 of each row r.sub.i is the most recent instance of that sequence and each subsequent instance of the row is an older instance of the first type of data. Thus, the columns of the matrix represent time and columns with a higher number relate to older moments in time. Using a matrix for the first type of data, defining that the sequences are provided in the rows of the matrix X, defining that the first element of the rows are the most recent elements of the rows and representing the second set of data by the vector Y are just implementation details. Embodiments of the invention are not limited to these details. A practitioner in the art would directly understand that the columns of the matrix may also comprises the sequences and that the time order of the instances in the sequences may be different as well.
(22) Thus, the first set of data is matrix
(23)
wherein the rows r.sub.i comprise the sequences with instances of the first type of data, and x.sub.ij are instances of the first type of data. The matrix X comprises m sequences with instances of the first type of data, and, thus, the matrix has m rows. Each sequence has a length n and, thus, the matrix comprises n columns. It is to be noted that the number of columns n is a system variable that may depend on the amount of historical data available and the processing capabilities of the device 100 or method 200. It is to be noted that the number over rows m is also a system variable that may depend on the amount of historical data available and the processing capabilities of the devices 100 or method 200. If one has a given set of historical data, an increase in n results in a reduction of m, and vice versa.
(24) Thus, the second set of data is
(25)
wherein y.sub.i is an instance of the second type of data that relates to the sequence of row r.sub.i.
(26) In an embodiment, there is a first test machine, such as, for example, machine 300 of
(27) Temperature sequence=(T.sub.t=1, T.sub.t=2, T.sub.t=3, . . . T.sub.t=1000)
(28) Motor sequence=(M.sub.t=1, M.sub.t=2, M.sub.t=3, . . . M.sub.t=1000)
(29) And if we assume that the maximum lengths of the rows in the matrix X is 100, we can generate the above discussed matrix X and vector Y from these sequences by:
(30)
(31) Thus, each row r.sub.i of matrix X comprises a history of the sensor 302 of an interval of time that immediately precedes the moment of time that the characteristic of the motor is determined. Note that, if the length of the sequences (rows r.sub.i) is increased, the number of rows in matrix X and the number of elements in Y decreases.
(32) In another embodiment, there are two test machines at which instance of the first data type and instances of the second data type were collected. For example, the machines 300 and 350 of
(33) Machine 1:
(34) Temperature sequence machine 1=(T1.sub.t=1, T1.sub.1=2, T1.sub.t=3, . . . , T1.sub.t=500)
(35) Motor sequence machine 1=(M1.sub.t=1, M1.sub.1=2, M1.sub.t=3, . . . , M1.sub.t=500)
(36) Machine 2:
(37) Temperature sequence machine 2=(T2.sub.t=1, T2.sub.t=2, T2.sub.t=3, . . . , T2.sub.t=500)
(38) Motor sequence machine 2=(M2.sub.t=1, M2.sub.t=2, M2.sub.t=3, . . . , M2.sub.t=500) And if we assume that the lengths of the rows in the matrix X is 200, we can generate the above discussed matrix X and vector Y from these sequences by:
(39)
(40) Please note that the above discussed embodiments of obtaining the matrices X and Y from the data obtained from one or more machines are just examples. Data from more machines of the same type may be used. Data from different moments in time (e.g. of different days, weeks or months) may be used, etc.
(41) In an embodiment, the instances of data of the first type are discrete values and, in an embodiment, the values of the instances of data of the first type can only be chosen from a small set of possible values. If a sensor generates more values, the sensor data may be quantized, i.e., put in bins and each bin is represented by a discrete value. To prevent too much loss of information, in an advantageous embodiment, the data is quantized such that the mutual information between the non-quantized data with respect to Y and the mutual information between the quantized data with respect to Y does not differ much from each other. Thus, I(X, Y.sub.non-reduced)(X, Y.sub.reduced). For example, Cardinal (Quantization with an Information-Theoretic Distortion Measure) describes a method using a Loyd quantizer to quantize data such that the mutual information between X and Y does not much reduce as the result of the quantizing. The document Quantization with an Information-Theoretic Distortion Measure, Jean Cardinal, Oct. 23, 2002, published by the Universit Libre de Bruxelles on the website http://www.ulb.ac.be/do/publications/RT_2002.html, and also published on the website http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.3058, is included by reference.
(42) In order to determine which portion of the rows r.sub.i can be defined as not relevant, in other words, in order to determine columns are not relevant, initially a first conditional entropy H.sub.1(Y|X) is determined. If the probabilities of the individual instances p(x), p(y) are known and if the mutual probabilities of combinations of instances p(x,y) are known, the conditional entropy can be calculated by
(43)
One may obtain the probabilities of the individual instances p(x), p(y) and the mutual probabilities of combinations of instances p(x,y) by estimating these probabilities on the basis of the available data: one may count, for example, how many times a first value is present in the matrix X and estimate the probability of this first value by dividing the count by the number of element in the matrix X.
(44) The first conditional entropy H.sub.1(Y|X) can also be estimated by building a first context tree with the so-termed Context Tree Weighting method and use the weighted probability of the root of the first context tree to calculate an estimation of the first conditional entropy. The Context Tree Weighting method is well known in the art of information theory and (text-) compression and has, for example, been described by Willems et al in papers i) The Context-Tree Weighting Method: Basic Properties, Willems et al, IEEE Transactions on Information Theory, Vol 42, No 3, pp 653-664; ii) Reflections on The Context-Tree Weighting Method: Basic Properties Willems et al., Newsletter of the IEEE Information Theory Society, 1997; and iii) 1996 IT society Paper Award, Reflection on the Prize Paper: The Context-Tree Weighting Method: Basic Properties, Willems et al, IEEE Information Theory Society Newsletter, Vol. 47, No 1, March 1997, pp 19-27. The above mentioned papers i), ii) and iii) are hereby included by reference.
(45) The Context-Tree Weighting Method is applied in such a way that unique rows r.sub.i in the matrix X form paths in the first context tree and the counts register in the nodes of the first context tree correspond to the number of occurrences of the corresponding elements y.sub.i of Y Initially the Context-Tree Weighting Method has been described for binary data and how this may be extended towards text-data. Today the practitioner in the art has several implementations at his disposal for other forms of non-binary data. Thus, if the discrete data in X or Y has more than two different possible values, one of such implantations must be used to build the first context tree. For example, the article Context Tree Weighting: Multi-alphabet Sources of Tjalkens et al, Proceedings Fourteenth Symposium on Information Theory in the Benelux, Veldhoven, The Nether-lands, May 17-18, 1993, pp. 128-135 describes how the Context-Tree Weighing method may be applied to non-binary data. The document Context Tree Weighting: Multi-alphabet Sources is herewith included by reference.
(46) The Context-Tree Weighting Method provides means to calculate for every node a weighted probability P.sub.w. It has been proven that the first conditional entropy may be estimated on the basis of the weighted probability P.sub.w by formula
(47)
wherein N is the number of elements of vector Y.
(48) In the Context-Tree Weighting Method for each node an estimated probability P.sub.e is calculated for each node on the basis of the counts stored in each node. In the Context-Tree Weighting Method the weighted probability P of a leaf node is equal to the estimated probability P.sub.e of that leaf node. For a specific node that is connected to one or more nodes at a deeper level in the context tree, the weighted probability P is determined by a formula that depends on the estimated probability P.sub.e of that specific node and the weighted probabilities P of the one-level-deeper-nodes to which it is connected. These calculations are well described in the previously introduced documents i) The Context-Tree Weighting Method: Basic Properties, Willems et al, IEEE Transactions on Information Theory, Vol 42, No 3, pp 653-664; ii) Reflections on The Context-Tree Weighting Method: Basic Properties Willems et al., Newsletter of the IEEE Information Theory Society, 1997; and iii) 1996 IT society Paper Award, Reflection on the Prize Paper: The Context-Tree Weighting Method: Basic Properties, Willems et al, IEEE Information Theory Society Newsletter, Vol. 47, No 1, March 1997, pp 19-27.
(49) A maximum entropy increase threshold a defines how much the first conditional entropy H.sub.1(Y|X) may increase if the amount of data in the matrix X is reduced, and more in particular, if for every row r.sub.i one or more oldest instance x.sub.ij of the first type of data are removed. This maximum entropy increase threshold a defines that the optimal length of the rows r.sub.i of a reduced matrix X.sub.r is the minimum length at which the condition H.sub.2(Y|X.sub.r) (1+)H.sub.1(Y|X) is fulfilled. Wherein X.sub.r is a submatrix of matrix X comprising the same number of rows as matrix X and each row of matrix X.sub.r has a reduced length and is equal to a front portion of its corresponding row of matrix X. In other words, reduced matrix X.sub.r has only a first number of columns of matrix X, and more specifically, the number of columns is equal to the minimum length.
(50) A search algorithms is provided for finding the minimum length. This algorithm is based on the chain rule from information theory, which is, when being translated to the specific problem of this document:
H(Y|X.sub.n columns)H(Y|X.sub.r,n-1 columns)H(Y|X.sub.r,n-2 columns) . . . H(Y|X.sub.r,one column)
(51) Thus, one may start with generating a reduced matrix X.sub.r,n-1columns to determine a second conditional entropy for H.sub.2(Y|X.sub.r,n-1 columns) and check whether H.sub.2(1+)H.sub.1 is still fulfilled. If yes, the reduced matrix is further reduced by deleting the last column with the oldest instances of each sequence towards X.sub.r,n-2 columns and a new second conditional entropy is determined for H.sub.2(Y|X.sub.r,n-2 columns) and one checks whether the condition H.sub.2(1+)H.sub.1. If yes, the size of the reduced matrix is further reduced by deleting the last column and determining the second conditional entropy. As soon as the condition H.sub.2(1+)H.sub.1 is not anymore fulfilled, one knows that not the last reduced matrix, but the previously reduced matrix X.sub.r,i columns comprises the optimal relevant history of the first type of data and the number of columns i of the previously reduced matrix X.sub.r,i columns is the length of the relevant history of instances of the first type of data. In other words, the length of the relevant history is the number of columns of the last reduced matrix X.sub.r plus one.
(52) In this algorithm a sort of linear forward search is executed: start with the smallest conditional entropy and search into the direction of larger conditional entropies to find the point where the condition is not anymore fulfilled. It is to be noted that other search algorithms for ordered lists of data may also be applied. An example of such another search algorithm is a binary search.
(53) In the above described search algorithm, the size of the matrix X is reduced in each subsequent step and a second conditional entropy H.sub.2(Y|X.sub.r) is determined. The determining of the second conditional entropy may also be based on, similar as discussed above, the Context-Tree Weighting Method to obtain a second context tree and use the weighted probability P.sub.w of the root of the second context tree to estimate the second conditional entropy
(54)
(55) The context trees build by the Context-Tree Weighting method have specific characteristics: the most recent instances of each row of matrix X (or the reduce matrix X.sub.r) are represented by edges from the root to the nodes at the first depth level. The second most recent instances of each row of X are represented by edges from the nodes at the first depth level to the second depth level, etc. The edges from the leaf node to the nodes that precede the leaf nodes represent the oldest instances of each row of X. Therefore, instead of reducing the size of the matrix and rebuilding the second context tree in every step, the second context tree may be derived from its predecessor (and the first obtained second context tree may be derived from the first context tree). This is done as follows: In a first search step, if the second context tree has to be obtained for the first time, the second context tree is a copy of the first context tree in which the leaf nodes and edges ending in the leaf nodes are removed and in which the weighted probabilities P.sub.w of the nodes are updated according to the new situation. The Context-Tree Weighting Method describes how to calculate the weighted probabilities P.sub.w of the nodes based on the counts stored in the nodes. In a subsequent search step, the size of the second context tree is reduced by removing the leaf nodes and edges ending in the leaf nodes and updating the weighted probabilities P.sub.w of the nodes.
(56) The updating of the weighted probabilities of the (reduced) second context tree can be performed as follows: As discussed previously, according to the Context-Tree Weighting method, one can calculate for every node in the second context tree an estimated probability P.sub.e which only depends on the counts stored in the respective node. The counts in the nodes do not change when the depth of the second context tree is reduced and, thus, the estimated probability P.sub.e do also not change when the leaf nodes and their corresponding edges are removed. The weighted probability P.sub.w is a function of the estimated probability P.sub.e of the node and of the weighted probabilities P.sub.w of the nodes at a larger depth to which it is connected. Thus, when the leaf nodes and their corresponding edges are removed from the second context tree, one only has to re-calculate the weighted probabilities P.sub.w by starting to calculate the weighted probabilities P.sub.w of the leaf nodes and moving towards the root of the second context tree.
(57)
(58) The carrier of a computer program may be any entity or device capable of carrying the program. For example, the carrier may include a storage medium, such as a ROM, for example a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example a floppy disc or hard disk. Further the carrier may be a transmissible carrier such as an electrical or optical signal, which may be conveyed via electrical or optical cable or by radio or other means. When the program is embodied in such a signal, the carrier may be constituted by such cable or other device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant method.
(59) The computer program 480 may be a computer program for a distributed processor system and may comprise computer code which causes a first processor system to perform a subset of the steps of the above discussed method and which causes a second processor system to perform another subset of the steps of the above discussed method. The subset of steps and the another subset of steps may be mutually exclusive.
(60) In summary, this document provides a device for and a method of determining a length of a history of instances of a first type of data. The device comprises a second data set unit, a first data set unit, a first conditional entropy unit, a threshold unit, a length determining unit. The first data set unit, the second data set unit and the threshold unit obtain input data. The first conditional entropy unit determine a first conditional entropy for a second data set given a first data set. The determining unit searches for a length of the relevant history by determining a smallest length for sequences of a reduced first set of data for which a second conditional entropy for the second set of data given the reduced first set of data is within a maximum entropy increase threshold from the first conditional entropy.
(61) It is to be noted that the invention may be implemented in hardware and/or software, using programmable components. A method for implementing the invention has the steps corresponding to the functions defined for the system as described with reference to
(62) It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without deviating from the invention. For example, functionality illustrated to be performed by separate units, processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization. The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these.
(63) It is noted, that in this document the word comprising does not exclude the presence of other elements or steps than those listed and the word a or an preceding an element does not exclude the presence of a plurality of such elements, that any reference signs do not limit the scope of the claims, that the invention may be implemented by means of both hardware and software, and that several means or units may be represented by the same item of hardware or software, and a processor may fulfill the function of one or more units, possibly in cooperation with hardware elements. Further, the invention is not limited to the embodiments, and the invention lies in each and every novel feature or combination of features described above or recited in mutually different dependent claims.