Data reduction for reducing a data set

10762064 ยท 2020-09-01

Assignee

Inventors

Cpc classification

International classification

Abstract

A data reduction device (150) for and a method of reducing a data set based on a subset of variables from a set of variables are provided. Instances of the plurality of variables comprise information to predict an instance of a further type of data. The device comprises a first data set unit (102), a second data set unit (104), a searching unit (110) and a data reduction unit (152). The first data set unit obtains a first set comprising tuples of instances of data. The second data set unit obtains a second set comprising instances of the further type of data. Each instance of the second set corresponds to one of the tuples of the first set. The searching unit obtains a reduced set of variables that represents an at least local optimum of an optimization function being a combination of a first mutual information value between the reduced first set and the second set and a penalty value being based on a number of variables in the reduced set of variables.

Claims

1. A data reduction machine for reducing a data set based on a subset of variables from a set of variables comprising a plurality of variables, the plurality of variables representing different types of data, instances of the plurality of variables being based on sensor data and comprising information to predict an instance of a further type of data, the further type of data relating to a characteristic of a physical entity, the data reduction machine comprising: a first data set unit configured for obtaining a first set of data comprising tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables; a second data set unit configured for obtaining a second set of data comprising instances of the further type of data, wherein each instance of the second set of data corresponds to one of the tuples of the first set of data; a searching unit configured for obtaining a reduced set of variables being a subset of the set of variables, the reduced set of variables representing an at least local optimum of an optimization function being a function of a reduced first set of data, the second set of data and the reduced set of variables, the reduced first set of data comprising reduced tuples obtained from the tuples of the first set of data, and the reduced tuples only comprising instances of the variables of the reduced set of variables, the optimization function being based on a combination of a first mutual information value between the reduced first set of data and the second set of data and a penalty value that increases with an increasing number of variables in the reduced set of variables; and a data reduction unit configured for generating a reduced data set from a data set that comprises tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables, the reduced data set comprising instances of data of variables that are present in the reduced set of variables.

2. A method of reducing a data set based on a subset of variables from a set of variables comprising a plurality of variables, the plurality of variables representing different types of data, instances of the plurality of variables being based on sensor data and comprising information to predict an instance of a further type of data, the further type of data relating to a characteristic of a physical entity, the method comprising: obtaining a first set of data comprising tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables; obtaining a second set of data comprising instances of the further type of data, wherein each instance of the second set of data corresponds to one of the tuples of the first set of data; searching for a reduced set of variables being a subset of the set of variables, the reduced set of variables representing an at least local optimum of an optimization function being a function of a reduced first set of data, the second set of data and the reduced set of variables, the reduced first set of data comprising reduced tuples obtained from the tuples of the first set of data, and the reduced tuples only comprising instances of the variables of the reduced set of variables, the optimization function being based on a combination of a first mutual information value between the reduced first set of data and the second set of data and a penalty value that increases with an increasing number of variables in the reduced set of variables; and generating a reduced data set from a data set that comprises tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables, the reduced data set comprising instances of data of variables that are present in the reduced set of variables.

3. The method according to claim 2, wherein the first mutual information value between the reduced first set of data and the second set of data is estimated by determining a second mutual information value between the second set of data and clustered data, wherein the clustered data comprises clusters that are derived from the reduced first set of data.

4. The method according to claim 3, wherein the clusters are derived from a combination of the reduced first set of data and the second set of data.

5. The method according to claim 4, wherein the clusters represents tuples of data that are a combination of a specific tuple of the reduced first set of data that is extended with a value that is derived from an instance from the second set of data that corresponds to the specific tuple of the reduced first set of data.

6. The method according to claim 5, wherein the clusters are formed on basis of data that comprises unique tuples of the reduced first set of data that are extended with a value that is derived from the instances from the second set of data that correspond to the respective unique tuple.

7. The method according to claim 3, wherein the clustered data is obtained by a k-means clustering algorithm.

8. The method according to claim 7, wherein the k-means clustering algorithms uses a Euclidean distance measure to determine distances between the tuples to be clustered.

9. The method according to claim 2, wherein searching for the reduced set of variables representing the at least local optimum of the optimization function is performed by applying an iterative search method to find the at least local maximum to find the reduced set of variables that has a relatively large value for the optimization function.

10. The method according to claim 9, wherein the iterative search method is based on simulated annealing.

11. The method according to claim 9, wherein each iteration of the iterative search method comprises: forming a proposal of the reduced set of variables to be assessed; determining the value of the optimization function that is provided by the proposed reduced set of variables; storing the proposed reduced set of variables as a best reduced set of variables if the proposed reduced set of variables provides a best value for the optimization function until the currently executed iteration; deciding whether the proposed reduced set of variables is accepted as a basis for a subsequent proposal of the reduced set of variables to be assessed in a subsequent iteration.

12. The method according to claim 9, wherein the iterative search method starts a first iteration with a proposal of the reduced set of variables comprising a single variable of which the mutual information between that the data of the single variable and the second set of data is highest compared to the other variables of the set of variables.

13. The method according to claim 12, wherein, in the second or later iterations, the proposal of the reduced set of variables to be assessed is formed by extending or reducing the last accepted reduced set of variables, wherein one of the variables of the set of variables is randomly selected, the randomly selected variable being added to the last accepted reduced set of variables if the randomly selected variable is not yet present in the last accepted reduced set of variables, and the randomly selected variable is removed from the last accepted reduced set of variables if the randomly selected variable is present in the last accepted reduced set of variables.

14. The method according to claim 13, wherein the randomly selected variable is obtained by a random selection that is based on a probability for each variable that is a sum of a first portion of a uniform probability value for all variables and a second portion of a non-uniform probability value derived from the mutual information between the specific variable and the second set of data.

15. A machine, comprising: a plurality of sensors configured to generate sensor data; a physical entity; and a processor to perform a method for reducing a data set based on a subset of variables from a set of variables comprising a plurality of variables, the plurality of variables representing different types of data, instances of the plurality of variables being based on sensor data and comprising information to predict an instance of a further type of data, the further type of data relating to a characteristic of the physical entity, the method comprising: obtaining a first set of data comprising tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables; obtaining a second set of data comprising instances of the further type of data, wherein each instance of the second set of data corresponds to one of the tuples of the first set of data; searching for a reduced set of variables being a subset of the set of variables, the reduced set of variables representing an at least local optimum of an optimization function being a function of a reduced first set of data, the second set of data and the reduced set of variables, the reduced first set of data comprising reduced tuples obtained from the tuples of the first set of data, and the reduced tuples only comprising instances of the variables of the reduced set of variables, the optimization function being based on a combination of a first mutual information value between the reduced first set of data and the second set of data and a penalty value that increases with an increasing number of variables in the reduced set of variables; and generating a reduced data set from a data set that comprises tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables, the reduced data set comprising instances of data of variables that are present in the reduced set of variables.

16. A non-transitory computer-readable medium having one or more executable instructions stored thereon, which when executed by a processor, cause the processor to perform a method for reducing a data set based on a subset of variables from a set of variables comprising a plurality of variables, the plurality of variables representing different types of data, instances of the plurality of variables being based on sensor data and comprising information to predict an instance of a further type of data, the further type of data relating to a characteristic of the physical entity, the method comprising: obtaining a first set of data comprising tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables; obtaining a second set of data comprising instances of the further type of data, wherein each instance of the second set of data corresponds to one of the tuples of the first set of data; searching for a reduced set of variables being a subset of the set of variables, the reduced set of variables representing an at least local optimum of an optimization function being a function of a reduced first set of data, the second set of data and the reduced set of variables, the reduced first set of data comprising reduced tuples obtained from the tuples of the first set of data, and the reduced tuples only comprising instances of the variables of the reduced set of variables, the optimization function being based on a combination of a first mutual information value between the reduced first set of data and the second set of data and a penalty value that increases with an increasing number of variables in the reduced set of variables; and generating a reduced data set from a data set that comprises tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables, the reduced data set comprising instances of data of variables that are present in the reduced set of variables.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) These and other aspects of the invention will be apparent from and elucidated further with reference to the embodiments described by way of example in the following description and with reference to the accompanying drawings, in which

(2) FIG. 1 schematically shows an embodiment of a data reduction device for reducing a data set based on a subset of variables from a set of variables comprising a plurality of variables,

(3) FIG. 2 schematically shows an embodiment of a method of reducing a data set based on a subset of variables from a set of variables comprising a plurality of variables,

(4) FIG. 3 schematically shows two machines of the same type having a plurality of sensors, and

(5) FIG. 4 schematically shows an embodiment of a computer program product.

(6) The figures are purely diagrammatic and not drawn to scale. In the Figures, elements which correspond to elements already described may have the same reference numerals.

DETAILED DESCRIPTION OF EMBODIMENTS

(7) FIG. 3 schematically shows two machines 300, 350 of the same type. Both machines have a plurality of sensors 302 . . . 310, 352 . . . 360. In the following of this document, the data of one specific sensor is termed a variable. Instances of one variable originate from one type of sensor that has the same function. If at one specific moment in time all sensors 302 . . . 310, 352 . . . 360 of one machine 300, 350 measure and provide a value, instances of the variables are collected and together they form a tuple of instances of the variables. In this document it is assumed that the different variables have a specific order and that the instances of the variables are in the same order in the tuples. Both machines may also have a physical entity 312, 362 of which a characteristic may be determined or may be predicted. In the following of this document, a determined characteristic of physical entity is an instance of a further type of data. It is not necessary that the instances of the data of the further type originate from one specific physical entity, but it is relevant that the instances of the data of the further type originate from one specific type of physical entity, which means: the physical entities are equal to each other and that have the same function. A specific obtained characteristic of the physical entity belongs to a corresponding tuple of instances of the variables that is obtained from the same machine 300, 350 at the same moment of time.

(8) For example, the physical entity 312, 362 is a motor and the characteristic is the operational status of the motor, for example, operates well or malfunctions. If the motors drives, for example, a conveyer belt in the respective machines 300, 350, the sensors may sense different characteristics that may be relevant to predict the operational status of the motors. For example, a first sensor 302, 352 may measure the temperature of the motor. For example, a second sensor 304, 354 may measure the (electrical) power usage of the motor. For example, a third sensor 306, 356 may measure the speed of the conveyer belt. For example, a fourth sensor 308, 358 may measure the weight of the goods that are currently placed on the conveyer belt. For example, a fifth sensor 310, 360 may measure the amount of dust in the air present near the motor. Thus, the variables are: temperature of the motor, power usage of the motor, speed of the conveyor belt, weight of good on conveyor belt and amount of dust near motor. At regular moments in time, the operational status of the motor may be determined and at the same moments all sensor sense their specific characteristic. Then instances of the variables are obtained and these instances form one tuple, in the example, namely the 5-tuple (temp, power, speed, weight, dust-amount). At that specific moment in time the operational status of the motor may be determined and this determined characteristics belong to, corresponds to, the tuple with instances of the variables obtained/determined at the same moment in time. Please note that it may be difficult to obtain all measurements at the exact same moment in time. One may also read within the same interval of time instead of at the same moment of time.

(9) It is noted that in other practical embodiments the machines 300, 350 have a multitude of sensors 302 . . . 310, 352 . . . 360. The number of sensors per machine 300, 350 may be in the order of at least one hundred sensors, or in other embodiments, at least one thousand sensors. The sensors 302 . . . 310, 352 . . . 360 may provide measurements every relatively short interval or time, for example, every minute, or every five minutes. Therefore, every relatively short interval of time a large amount of data is generated by the sensors 302 . . . 310, 352 . . . 360 and every day an enormous amount of data has to be processed and/or stored. The sensor data is often stored in log-files which have quite often a size in the order of one or a multitude of Terabytes. Examples of machines that deliver such amounts of data are, for example, medical scanners, such as CT scanners, PET scanners, MRI scanners, etc.

(10) FIG. 3 is an example of sources of data for the variables and for the characteristic of the physical entity. The example will be used in the subsequent discussion of the embodiments of the invention.

(11) FIG. 1 schematically shows an embodiment of a data reduction device 150 that comprises a device 100 for selecting a subset of variables from a set of variables comprising a plurality of variables. For example, the set of variables is S (and is, for example, defined by the sensors 302 . . . 320, 352 . . . 360 of the machines 300, 350 of FIG. 3). The number of variables in the set of variables S is indicated by |S|. The variables are indicated with v. The subset is indicated by S* and S*.Math.S. Preferably, the set of variable S comprises a plurality of variables v, and the subset S* comprises at least one variable. It is assumed that there is a relation between the instances of variables and an instance of a further type of data. The instances of the variables comprise information that may be relevant for predicting the instance of the further type. The further type of data relates to a characteristics of a physical entity, for example, the operational status of the motor 322 or 362 of one of the machines 300, 350, respectively. The instance of the further type of data is indicated with y.

(12) The device 100 comprises a first data set unit 102 for obtaining a first of set data X comprising tuples x.sub.i of instances of data. Each tuples x.sub.i comprises an instance of each variable v.sub.j in the set of variable S. An instance of variable v.sub.j in tuple x.sub.i is indicated by x.sub.ij. Each tuple x.sub.i in the first set of data X comprises the same number of instances x.sub.ij, as the number of variables in the set of variables S. In the context of FIG. 3, one tuple x.sub.i is an observation of time of all sensor values at a specific moment of time or within a specific interval. The total number of observations, the total number of tuples x.sub.i in the first set of data X, is indicated with the number n. The tuples x.sub.i have all an equal length: they all have a number of instances that is equal to the number of variables in the set of variables S. In embodiments, the data present in the first set of data is discrete data that is obtained by quantizing continuous data provided by the sensors.

(13) In an embodiment, the first set of data is represented by a matrix X:

(14) X = [ x 11 .Math. x 1 .Math. S .Math. .Math. .Math. x n 1 .Math. x n .Math. S .Math. ]
It is to be noted that the rows in the matrix X are the tuples x.sub.i and that the columns represent the variables v.sub.j. Embodiments are not limited to the use of the above matrix X. The use of a matrix with the tuples as rows and the columns representing variables is just an implementation detail and a skilled person knows that matrices can be transposed or that an ordering between the columns or between the rows may be changed as well.

(15) The device 100 also comprises a second data set unit 104 for obtaining a second set of data Y comprising instances the further type of data. As discussed earlier, this further type of data indicates a characteristics of a physical entity, for example, in the context of FIG. 3, the operational status of a specific type of motor of the machines. The instances of the further type of data are indicated with y.sub.i. Each instance y.sub.i corresponds to a tuple x.sub.i in the first set of data X, which means that the instance is determined at the same moment of time, or during the same interval of time, as the instances of the variables of the tuples are obtained. Because there is a corresponding instance y.sub.i of the further type of data for each tuple x.sub.i, there are n instances of the further type in the second set of data Y. In other words, there are n observations of the characteristic of the physical entity. The characteristics of the physical entity may be represented by binary values, but also by numbers from higher order numerical systems. If the characteristics of the physical entity are continuous values, the values may be quantized before being used in the device or method of this document.

(16) In an embodiment, the second set of data Y is represented by:

(17) Y = [ y 1 .Math. y n ]

(18) The device 100 also comprises a searching unit 110 for obtaining a reduced set of variables S.sub.r. The searching unit 110 is coupled to the first data set unit 102 and to the second data set unit 104 for receiving the first set of data X and the second set of data Y, respectively. The reduced set of variables is a subset of the set of variables S, S.sub.r .Math.S. The reduced set of variables S.sub.r represents an at least local maximum of an optimization function that is a function of a reduced first set of data, the second set of data and the reduced set of variables. The reduced first set of data comprising reduced tuples being based on the tuples of the first set of data, and the reduced tuples only comprising instances of the reduced set of variables. The reduced first set of data being indicated by X.sub.r and each reduced tuple in the reduced first set of data X.sub.r is indicated with x.sub.ri. Hence, the optimization function is (X.sub.r, Y, S.sub.r). The optimization function being a combination of a first mutual information value between the reduced first set of data and the second set of data minus a penalty value that increases with an increasing number of variables in the reduced set of variables. The first mutual information is, in accordance with information theory, indicated by I(X.sub.r, Y). The penalty value is a function of the number of elements in the reduced set of variables, p(|S.sub.r|), wherein |S.sub.r| is the number of variables in the reduced set of variables S.sub.r. If the number of elements in the reduced set of variables is higher, the penalty value is also higher. Thus, the optimization function is (X.sub.r, Y, S.sub.r)=I(X.sub.r, Y)p(|S.sub.r|). In an example, the penalty value is the number of elements in the reduced set of variables S.sub.r that is normalized for the number of variables in the set of variables S, hence, p(|S.sub.r|)=|S.sub.r|/|S|, wherein |S| is the number of variables in the set of variables S.

(19) The device 100 optionally comprises an output 112 for providing the reduced set of variables S.sub.r as the selected subset of variables S*. The output 112 is coupled to the searching unit 110 for receiving the reduced set of variables S.sub.r.

(20) In an example, the device 100 for selecting a subset of variables from a set of variables comprising a plurality of variables is part of a data reduction device 150. The data reduction device also comprises a data reduction unit 152 that is coupled to the output 112 for receiving the selected subset of variables S* or is directly coupled to the searcher unit for receiving the selected subset of variables S*. The data reduction unit 152 is configured to reduce the tuples x.sub.p in a data set X to obtain a reduced data set X.sub.r such that the reduced tuples x.sub.rp only comprise instances x.sub.pj of variables that are present in the selected subset of variables S*. The data reduction unit 152 may be configured to store the reduced data set X.sub.r. The data reduction unit 152 may be configured to provide the reduced data set X.sub.r to a subsequent device that uses, for example, the reduced data set X.sub.r in predictions of events (i.e. instance y of the further type of data). In a specific embodiment, the data set X is the first set of data X, and, thus, the data reduction unit 152 may generate a reduced first set of data X.sub.r that can be used as a training set in machine learning applications.

(21) FIG. 2 schematically shows an embodiment of a method 200 of selecting a subset of variables S* from a set of variables S comprising a plurality of variables v.sub.j. The variables v.sub.j represent different type of data based on measurements of sensors, for example the sensors 302 . . . 310, 352 . . . 360 of the machines 300, 350 of FIG. 3. The instances x.sub.ij of the variables v.sub.j comprising information for predict an instance y of further type of data. The further type of data relating to a characteristic of a physical entity, for example, the physical entities 312, 362 of the machines 300, 350 of FIG. 3.

(22) The method 200 comprises obtaining 202 a first set of data X comprising tuples x.sub.i of instances of data. Each tuple comprises an instance x.sub.ij of each variable v.sub.j of the set of variables S.

(23) More details of the first set of data X, the second set of data Y, the set of variables S, and the reduced set of variables S* have been discussed in the context of FIG. 1. The earlier discussed characteristics of these forms of data also apply to this method. Hereinafter more details of these forms of data are discussed and these details also apply to the device 100 of FIG. 1.

(24) The method 200 comprises obtaining 204 a second set of data Y comprising instances y.sub.i of the further type of data. Each instance y.sub.i of the second set of data Y corresponds to one of the tuples x.sub.i of the first set of data X.

(25) The method 200 comprises searching 206 for a reduced set of variables S.sub.r that is a subset of the set of variables S. The reduced set of variables S.sub.r represents an at least local optimum of an optimization function being a function of a reduced first set of data, the second set of data and the reduced set of variables, hence, (X.sub.r, Y, S.sub.r). The reduced first set of data X.sub.r comprising reduced tuples xr.sub.i that are based on the tuples x.sub.i of the first set of data X The reduced tuples xr.sub.i only comprising instances x.sub.ij of variables of the reduced set of variables S.sub.r. The optimization function being a combination of a first mutual information value I between the reduced first set of data X.sub.r and the second set of data Y minus a penalty value pv that increases with an increasing number of variables in the reduced set of variables. Hence, (X.sub.r, Y, S.sub.r)=I(X.sub.r, Y)p(|S.sub.r|).

(26) The method 200 optionally comprises providing 220 the reduced set of variables S.sub.r as the selected subset of variables S* to, for example, a data reduction arrangement for reducing the amount of data in the first set of data X.

(27) The method 200 comprises generating 222 the reduced first set of data X.sub.r as discussed in the context of the data reduction device 150.

(28) The method 200 may be implemented in and executed by a general purpose computer by means of loading a program into the computer, and the program is operative to cause the processor of the computer to perform the method 200. The Input/Output interfaces of the computer are used to receive the first set of data, the second set of data, to output the reduced set of variables and/or to output the reduced first set of data. The memory of the computer is used to store the set of data and the sets of variables. The method 200 may also be implemented in and executed by dedicated hardware that is specifically designed to execute the method 200.

(29) Optionally, the searching 206 for the reduced set of variables S.sub.r is based on an iterative searching algorithm. Optionally, the iterative searching algorithm is based on simulated annealing. Several details of the iterative searching algorithm are discussed hereinafter. It is to be noted that other searching algorithms that are capable of finding an at least local maximum of the optimization function can be used as well. Simulated annealing is, in particular, advantageous because it is able to explore many areas of the solutions space and is capable of finding a relatively good local maximum and possibly the global maximum. In particular, in the context of simulated annealing, an imaginary temperature function that decreases (or remains equal) during each iteration is important as well as a probability function to accept whether the solution that is assessed during the current iteration is accepted as a basis for a solution to be assessed in a subsequent iteration. For an example of such an imaginary temperature function and probability function, and also for more information about simulated annealing, a reference is provided to the article: Convergence Theorems for a Class of Simulated Annealing Algorithms on R.sup.d of Claude J. P. Blisle, Journal of Applied Probability, Vol. 29, No. 4 (December, 1992), pp. 885-895. The article of C. J. P. Blisle is herewith incorporated by reference. In particular, this article discusses a Metropolis function for the probability function and the inventors have found that this function leads to good results in the context of this document. In particular, Blisle discusses a logarithmic cooling schedule, for example, the temperature is set to temp/log((floor((t1)/tmax))*tmax+exp(1))), where t is the current iteration step, floor( ) is a function that returns the integer part of a value, and temp and tmax are variables. The inventors have used temp=200 or temp=100 and tmax=10. It has to be noted that other values may be used as well for the variables temp and tmax.

(30) Optionally, during each iteration of the iterative searching algorithm the subsequent data is used: the combination of S.sub.r_best, f.sub.best which are the best reduced set of variables found until the current iteration and its corresponding value of the optimization function, respectively; S.sub.r_accepted which is a reduced set of variables that is accepted at the end of one of the previous iterations as the reduced set of variables that forms the basis for a current or a subsequent proposal of a reduced set of variables that has to be assessed in the current or a subsequent iteration; and S.sub.r_proposal, which is a proposal for the reduced set of variables that must be assessed during the current iteration, and X.sub.r_proposal which is a proposed reduced first set of data that comprises proposed reduces tuples of data x.sub.r_proposed that only comprise instance of variables of the proposed reduced set of variables S.sub.r_proposal. Being assessed means in this context that, in the iteration, a value for the optimization function is determined for the proposed reduced set of variables S.sub.r_proposal and it is checked whether this is the best solution until the current iteration and it is checked whether this is a good solution to be accepted as a basis for a subsequent proposed reduced set of variables.

(31) Optionally, the iterative searching algorithm starts a first iteration with proposing 208 a proposal of the reduced set of variables S.sub.r_proposal comprising a single variable of which the mutual information between the data of the single variable and the second set of data is highest compared to the other variables of the set of variables. This is a greedy first selection of the reduced set of variables to be assessed and is a good starting point for finding an at least local maximum of the optimization function. If it is assumed that the data of the single variable v.sub.j is

(32) D = [ d 1 .Math. d n ] = [ x 1 j .Math. x nj ]
and, as discussed previously, the second set of data

(33) Y = [ y 1 .Math. y n ] ,
then the mutual information is calculated by

(34) I ( Y , D ) = .Math. k = 1 .Math. n .Math. l = 1 .Math. n p ( y k , d l ) log ( p ( y k , d l ) p ( y k ) p ( d l ) ) ,
wherein P(y.sub.k) is the probability that a value of Y is equal to the value of y.sub.k, P(d.sub.1) is the probability that a value of D is equal to the value of d.sub.1, and P(y.sub.k, d.sub.1) is the probability that a value of Y is equal to y.sub.k and a corresponding value of D is equal to d.sub.1. During this initialization of the iterative searching algorithm, one may also set the other data that is used in the subsequent iterations. For example, the best reduced set of variables S.sub.r_best found until this iteration is set to the selected single variable. The value of the optimization function .sub.best belonging to the best reduced set of variables S.sub.r_best may be set to the value of the optimization function for a reduced first set that is based on the best reduced set of variable S.sub.r_best. The accepted reduced set of variables S.sub.r_accepted may be set to the first proposal of the reduced set of variables S.sub.r_proposal.

(35) Optionally, each iteration of the iterative searching algorithm comprises determining 210 the value of the optimization function for the proposed reduced first set of data X.sub.r_proposal that is based on proposed reduced set of variables S.sub.r_proposal. Determining 210 the value of the optimization function for proposed reduced first set of data X.sub.r_proposal includes forming the proposed reduced first set of data X.sub.r_proposal that has reduced tuples x.sub.r_proposal wherein each reduced tuple xr.sub.i_proposal is based on a corresponding tuple x.sub.i of the first set of data X and comprises only instances of variables that are present in the proposed reduced set of variables S.sub.r_proposal. Thus, in the stage of determining 210, (X.sub.r_proposal, Y, S.sub.r_proposal) is determined.

(36) The determining 210 of the value of the optimization function includes determining the mutual information I between the proposed reduced first set of data X.sub.r_proposal and the second set of data Y. Optionally, this determining of the mutual information comprises estimating the mutual information between the proposed reduced first set of data X.sub.r_proposal and the second set of data Y by determining 212 a second mutual information I.sub.2 between the second set of data Y and clustered data Q. The clustered data Q comprises clusters that are based on the proposed reduced first set of data X.sub.r_proposal. Thus, I(X.sub.r_proposal, Y)I.sub.2(Q, Y).

(37) Optionally, the stage of determining 212 the second mutual information I.sub.2 comprises clustering 214 data. Clusters Q are formed that are at least based on the proposed reduced first set of data X.sub.r_proposal. Hereinafter a heuristic is discussed that may be used for clustering the data. The inventors have found that in the hereinafter discussed heuristic maintains a relatively large portion of the mutual information between the proposed reduced first set of data X.sub.r_proposal and the second set of data Y such that I(X.sub.r_proposal, Y)I.sub.2(Q, Y). It is to be noted that embodiments of clustering the data is not limited to this heuristic. In the context of this document, it is important that the clusters are formed such that a large portion of mutual information between the proposed reduced first set of data X.sub.r_proposal and the second set of data Y is maintained. A reference to an alternative algorithm is provided of which it is known that it is capable of maintaining a relatively large amount of mutual information between the proposed reduced first set of data X.sub.r_proposal if the proposed reduced first set of data X.sub.r_proposal is clustered by this algorithm: For example, Cardinal (Quantization with an Information-Theoretic Distortion Measure) describes a method using a modified Lloyd's algorithms to quantize data such that the mutual information between X and Y does not much reduce as the result of the quantizing. The document Quantization with an Information-Theoretic Distortion Measure, Jean Cardinal, Oct. 23, 2002, published by the Universit Libre de Bruxelles on the website http://www.ulb.ac.be/di/publications/RT_2002.html, and also published on the website http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.20.3058, is included by reference.

(38) In the heuristic, the data that is the basis for forming the clusters is based on a combination of the reduced tuples of X.sub.r_proposal and corresponding instances of the second set of data Y. The data that is the basis for the clustering is indicated by Z and, Z=g(X.sub.r_proposal, Y), wherein g is a function that generates data that are based on the reduced tuples of the proposed reduced first set of data X.sub.r_proposal and on the corresponding instances of the second set of data Y. Optionally, Z comprises tuples of instances of data.

(39) Optionally, the total number of tuples of data in Z corresponds to the number of unique tuples in the proposed reduced first set of data X.sub.r_proposal.

(40) Optionally, each tuple in Z starts with one of the unique tuples of the proposed reduced first set of data X.sub.r_proposal and is extended with a value that is based on the instances of data of the second set of data Y that correspond to, belong to, the one of the unique tuples. For example, in the proposed reduced first set of data X.sub.r_proposal there may be several unique tuples and a first unique tuple ux.sub.1 occurs, for example, two times. Then, in the second set of data Y, there are two instances y.sub.k, y.sub.1 of data that correspond to the unique tuple ux.sub.1 and the value that is used to extend the unique tuple ux.sub.1 in the data Z is a function of y.sub.k, y.sub.1. In an embodiment, the value that is going to be added to the unique tuple ux.sub.1 is in the same order as the possible values of the instances in the unique tuple ux.sub.1. For example, if the possible values of the instances in the unique tuple can range from 0 to 10, then the value that is added is normalized towards a value that may also end up between 0 and 10.

(41) In other words, in an embodiment, the data that is going to be used for the clustering may be the subsequent data:

(42) Z = [ ux 1 v 1 .Math. .Math. ux m v m ] ,
wherein m is the number of unique tuples in the proposed reduced first set of data X.sub.r_proposal; wherein ux.sub.1 is one of the unique tuples and forms the beginning of the rows of Z; and wherein v.sub.1 is a value that is added to each row of Z and is based on the instances of data of the second set of data Y that correspond to, belong to, the unique tuple ux.sub.1 of that row. Optionally, the value v.sub.1, assuming that the instances of the second set of data Y may have a binary value (e.g. 0 or 1), may be calculated by: v.sub.1=P (instance of Y y.sub.i=1 only for l.sup.th unique sequence)|S|{square root over (n)}, wherein |S| is the number of variables in the set of variables; wherein n is the square root of the number of rows in the proposed reduced first set of data X.sub.r_proposal (and, thus, also the number of rows in the first set of data X and the number of rows of the second set of data Y, hence, the value of n is the number of observations in the first and second set of data X, Y); wherein is a variable that may be chosen to, for example, normalize the possible values of v.sub.1 to a specific range; and wherein P (instance of Y y.sub.i=1 only for l.sup.th unique sequence) represents the probability that instances of the second set of data Y, which are the instances that correspond to, belong to, the unique tuple ux.sub.1 have the value of 1. In a practical embodiment is larger than 0 and smaller than or equal to 1. If appropriate, a value larger than 1 may also be used.

(43) Then in the stage of clustering 214 the data Z, a number of clusters are formed in which tuples of data Z (e.g. rows of matrix Z) are grouped together on basis of a distance measure. Thus, each cluster represents tuples of data Z that are, according to the distance measure, close to each other. The distance measure may be a Euclidean distance, or may be a distance measure that is based on information theory such as, for example, a KL distortion (sometimes referred to as KL distance). In the stage of clustering, for example, n clusters are formed, wherein n is the above discussed number of rows/observations in the proposed reduced first set of data X.sub.r_proposal. Also another number of clusters may be formed, but it is known from information theory that when n clusters are formed a relatively large amount of mutual information is maintained, while the number of clusters is relatively small. The clustering may be based on a k-means algorithm with uses one of the above examples of a distance measure.

(44) In the subsequent description, the clusters are represented by data Q, which may be a

(45) Q = [ 1 .Math. p ] ,
wherein p is the number of clustersin other words, each cluster may be represented by a tuple number in data Q. Each cluster is indicated by q.sub.i. Subsequently it is important to realize that: each cluster q.sub.i comprises a number of tuples from the data Z, each tuple of Z corresponds to a number of reduced proposed tuples xr.sub.i_proposal in the proposed reduced first set of data X.sub.r_proposal (via the unique tuples of the proposed reduced first set of data X.sub.r_proposal) and, thus, each cluster q.sub.i represents a number of tuples (i.e. observations) of the proposed reduced first set of data X.sub.r_proposal and this number is indicated by |q.sub.i|.

(46) If clustering 214 the data is ready, in the stage of determining 212 the second mutual information I.sub.2(Q, Y), the second mutual information can be calculated by

(47) I ( Q , Y ) = .Math. y .Math. q p ( y , q ) log ( p ( y , q ) p ( y ) p ( q ) ) ,
wherein P(q) represents the probability that a tuple of the proposed reduced first set of data X.sub.r_proposal is represented by, ended up in, segment q, P(y) represents the probability that an instance of the second set of data Y is equal to y, P(y,q) represents the probability a tuple of the proposed reduced first set of data X.sub.r_proposal is represented by segment q and that the corresponding instance of the second set of data Y is equal to y. Thus, P(q)=|q|(total number tuples in proposed reduced first set of data X.sub.r_proposal)=|q|/n. Thus, P(y)=(number occurances of y in Y)/(total number instances in second set of data Y)=(number occurances of y in Y)/n. P(y,q)=(number occurrences of y in segment q)/(total number tuples in proposed reduced first set of data X.sub.r_proposal)=(number occurances of y in segment q)/n.

(48) Optionally, after determining the value of the optimization function for the proposed subset of variables, each iteration of the iterative searching algorithm comprises storing 216 the proposed reduced set of variables S.sub.r_proposal as the best reduced set of variables S.sub.best if the proposed reduced set of variables S.sub.r_proposal provides a best value for the optimization function until the currently executed iteration. This stage includes comparing the value of the optimization function (X.sub.r_proposal, Y, S.sub.r_proposal) as determined in stage 210 with a previously stored best value .sub.best of the optimization function. If (X.sub.r_proposal, Y, S.sub.r_proposal)>.sub.best then S.sub.best=S.sub.r_proposed and .sub.best=(X.sub.r_proposal, Y, S.sub.r_proposal).

(49) Optionally, each iteration of the iterative searching algorithm comprises deciding 218 whether the proposed reduced set of variables S.sub.r_proposal is accepted as a basis for a subsequent proposal of the reduced set of variables to be assessed in a subsequent iteration. If the proposed reduced set of variables S.sub.r_proposal is accepted as the basis, then S.sub.r_accepted=S.sub.r_proposal. In an embodiment, in particular if simulated annealing is used, the earlier described probability function in the article of C. J. P. Blisle (Convergence Theorems for a Class of Simulated Annealing Algorithms on R.sup.d) can be used as an acceptance function. In another embodiment, the acceptance of the proposed reduced set of variables S.sub.r_proposal may be based on comparing the value of the optimization function (X.sub.r_proposal, Y, S.sub.r_proposal) with the previously stored best value .sub.best of the optimization function. For example, if the value of the optimization function (X.sub.r_proposal, Y, S.sub.r_proposal) is larger than the previously stored best value .sub.best minus a threshold value, then the proposed reduced set of variables S.sub.r_proposal is accepted as a basis for a subsequent proposal of the reduced set of variables.

(50) Optionally, each iteration of the iterative searching algorithm comprises deciding 219 whether another iteration has to be made or that the iterations stop and that the best found reduced set of variables S.sub.r_best is considered to be the result of the searching 206 for the reduced set of variables S.sub.r. In specific embodiments, the decision whether the iterations have to stop may be based on the number of iterations performed until that moment in time, the amount of computing power used until that moment in time, or the time spent on the searching for the reduced set of variables S.sub.r. Another stop criterion may be that the iterative searching algorithm administers how large the improvement of the best value for optimization function is during a specific number of previous iterations and if the improvement is below a certain threshold, it may be decided to stop the iterations. In another embodiment, in particular if simulated annealing is used, the iterations may stop if the temperature is below a certain threshold temperature.

(51) Optionally, each iteration of the iterative searching algorithm comprises forming 222 a proposal of the reduced set of variables S.sub.r_proposal to be assessed. To explore areas of the solution space, in many iterations, the proposed reduced set of variables S.sub.r_proposal is different from a previous assessed proposed reduced set of variables S.sub.r_proposal. In an embodiment, the previously accepted set of variables S.sub.r_accepted is either enlarged with a single variable or reduced with one variable. This may be done by randomly selecting one of the variables v.sub.random of the set of variables S. If the selected variable v.sub.random is not yet present in the previously accepted set of variables S.sub.r_accepted, then the proposed reduced set of variables S.sub.r_proposal is equal to previously accepted set of variables S.sub.r_accepted plus the selected variable v.sub.random. If the selected variable v.sub.random is already present in the previously accepted set of variables S.sub.r_accepted, then the proposed reduced set of variables S.sub.r_proposal is equal to previously accepted set of variables S.sub.r_accepted minus the selected variable v.sub.random. If this would result in an empty proposed reduced set of variables S.sub.r_proposal, then another variable v.sub.random2 is randomly selected and the proposed reduced set of variables S.sub.r_proposal comprises one variable, namely the another randomly selected variable v.sub.ramdom2. Randomly selecting the variables from the set of variables S may be based on a uniform selection of one of the variables. In another embodiment, the probability that a specific variable is selected may, in addition or alternatively, depend on the mutual information between the data of that specific variable and the second set of data Y. Thus, the selection probability may be

(52) P select ( v i ) = ( 1 - ) 1 .Math. S .Math. + I ( Y , data of variable v ) .Math. v I ( Y , data of variable v ) ,
wherein is a value selected from the interval [0,1] and that is used to control the contribution of the uniform probabilities and the probabilities based on the mutual information. I(Y, data of variable v) can be determined, as discussed once before in this document by: the data of the single variable is D=(d.sub.1, . . . , d.sub.n) and the second set of data Y=(y.sub.r, . . . , y.sub.n), then the mutual information is calculated by

(53) 0 I ( Y , D ) = .Math. k = 1 .Math. n .Math. l = 1 .Math. n p ( y k , d l ) log ( p ( y k , d l ) p ( y k ) p ( d l ) ) .
If is a value that is close to 1, then the selection of the randomly selected variable v.sub.random strongly depends on the mutual information between the data of that variable and the second set of data Y. This may be advantageous if a variable must be added to the previously accepted reduced set of variables S.sub.r_accepted, because the probability that the value of the optimization function increases if that variable is added is largehowever, this may be disadvantageous if the variable must be removed from the previously accepted reduced set of variables S.sub.r_accepted. Therefore it is wise to use that is closer to the center of the range [0,1] than to the boundaries of the range. In an alternative embodiment, it is first determined whether a variable is added to the previously accepted reduced set of variables S.sub.r_accepted or that a variable is removed from the previously accepted reduced set of variables S.sub.r_accepted this may be done by using uniform probabilities for adding and removing. Subsequently, depending on the selected action, the selection probability is based strongly based on a uniformly distribution probability (for removing) or based on a probability that strongly depends on a mutual information of the data of the variable and the second set of data Y (for adding). In yet another embodiment, it is first randomly determined whether a variable is added or removed (with equal probabilities). Subsequently: if a variable must be added, the random selection of the variable to be added uses probabilities that are strongly based on a mutual information of the data of the variable and the second set of data Y; if a variable must be removed, the probabilities for the random selection may be inverse proportional to this mutual information (e.g. proportional to the conditional entropy for the second set of data given the data of the single variable, H(Y|D)). Also other embodiments of proposing a reduced set of variables S.sub.r_proposal may be usedfor example, embodiments are not limited to adding or removing one variable per iteration.

(54) In line with the above discussed stages of determining 212 a second mutual information I.sub.2 and of clustering 214 data, the searching unit 110 of device 100 may comprise a second mutual information determining unit 106 for determining a second mutual information I2 and a clustering unit 108 for clustering data. The second mutual information determining unit 106 and the clustering unit 108 are configured to perform the function of the above discussed stages of determining 212 a second mutual information I.sub.2 and of clustering 214 data and have similar embodiments with similar effects.

(55) FIG. 4 schematically shows an embodiment of a computer program product 470 which program is operative to cause a processor to perform one of the previously discussed methods. Embodiments also extends to computer program products 470, particularly computer programs 480 on or in a carrier 470, adapted for putting the invention into practice. The computer program product may comprises a computer program 480. The program may be in the form of source code, object code, a code intermediate source and object code such as partially compiled form, or in any other form suitable for use in the implementation of the one of the above discussed methods. It will also be appreciated that such a program may have many different architectural designs. For example, a program code implementing the functionality of the method or device may be subdivided into one or more subroutines. Many different ways to distribute the functionality among these subroutines will be apparent to the skilled person. The subroutines may be stored together in one executable file to form a self-contained program. Such an executable file may comprise computer executable instructions, for example processor instructions and/or interpreter instructions (e.g. Java interpreter instructions). Alternatively, one or more or all of the subroutines may be stored in at least one external library file and linked with a main program either statically or dynamically, e.g. at run-time. The main program contains at least one call to at least one of the subroutines. Also, the subroutines may comprise function calls to each other. An embodiment relating to a computer program product 470 comprises computer executable instructions 480 corresponding to each of the processing steps of at least one of the methods set forth. These instructions may be subdivided into subroutines and/or be stored in one or more files that may be linked statically or dynamically. Another embodiment relating to a computer program product 470 comprises computer executable instructions 480 corresponding to each of the means of at least one of the systems and/or products set forth. These instructions may be subdivided into subroutines and/or be stored in one or more files that may be linked statically or dynamically.

(56) The carrier of a computer program may be any entity or device capable of carrying the program. For example, the carrier may include a storage medium, such as a ROM, for example a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example a floppy disc or hard disk. Further the carrier may be a transmissible carrier such as an electrical or optical signal, which may be conveyed via electrical or optical cable or by radio or other means. When the program is embodied in such a signal, the carrier may be constituted by such cable or other device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant method.

(57) The computer program 480 may be a computer program for a distributed processor system and may comprise computer code which causes a first processor system to perform a subset of the steps of the above discussed method and which causes a second processor system to perform another subset of the steps of the above discussed method. The subset of steps and the another subset of steps may be mutually exclusive.

(58) In summary, this document provides a device for and a method of selecting a subset of variables from a set of variables are provided. Instances of the plurality of variables comprise information to predict an instance of a further type of data. The device comprises a first data set unit, a second data set unit and a searching unit. The first data set unit obtains a first set comprising tuples of instances of data. The second data set unit obtains a second set comprising instances of the further type of data. Each instance of the second set corresponds to one of the tuples of the first set. The searching unit obtains a reduced set of variables that represents an at least local optimum of an optimization function being a combination of a first mutual information value between the reduced first set and the second set and a penalty value being based on a number of variables in the reduced set of variables.

(59) Further Embodiments are Defined in the Subsequent Clauses: 1. A device (100) for selecting a subset of variables from a set of variables comprising a plurality of variables, the plurality of variables representing different types of data, instances of the plurality of variables comprising information to predict an instance of a further type of data, the further type of data relating to a characteristic of a physical entity, the device comprising:

(60) a first data set unit (102) for obtaining a first set of data comprising tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables,

(61) a second data set unit (104) for obtaining a second set of data comprising instances of the further type of data, wherein each instance of the second set of data corresponds to one of the tuples of the first set of data,

(62) a searching unit (110) for obtaining a reduced set of variables being a subset of the set of variables, the reduced set of variables representing an at least local optimum of an optimization function being a function of a reduced first set of data, the second set of data and the reduced set of variables, the reduced first set of data comprising reduced tuples obtained from the tuples of the first data set and the reduced tuples only comprising instances of the variables of the reduced set of variables, the optimization function being based on a combination of a first mutual information value between the reduced first set of data and the second set of data and a penalty value that increases with an increasing number of variables in the reduced set of variables, and

(63) an output (112) for providing the reduced set of variables as the selected subset of variables to, for example, a data reduction arrangement for reducing the amount of data in the first set of data. 2. A method (200) of selecting a subset of variables from a set of variables comprising a plurality of variables, the plurality of variables representing different types of data, instances of the plurality of variables comprising information for predict an instance of a further type of data, the further type of data relating to a characteristic of a physical entity, the method comprising:

(64) obtaining (202) a first set of data comprising tuples of instances of data, wherein each tuple comprises an instance of each variable of the set of variables,

(65) obtaining (204) a second set of data comprising instances of the further type of data, wherein each instance of the second set of data corresponds to one of the tuples of the first set of data,

(66) searching (206) for a reduced set of variables being a subset of the set of variables, the reduced set of variables representing an at least local optimum of an optimization function being a function of a reduced first set of data, the second set of data and the reduced set of variables, the reduced first set of data comprising reduced tuples obtained from the tuples of the first data set and the reduced tuples only comprising instances of the variables of the reduced set of variables, the optimization function being based on a combination of a first mutual information value between the reduced first set of data and the second set of data and a penalty value that increases with an increasing number of variables in the reduced set of variables, and

(67) providing (220) the reduced set of variables as the selected subset of variables to, for example, a data reduction arrangement for reducing the amount of data in the first set of data. 3. A method (200) of selecting a subset of variables according to clause 2, wherein the first mutual information value between the reduced first set of data and the second set of data is estimated by determining (212) a second mutual information value between the second set of data and clustered data, wherein the clustered data comprises clusters that are derived from the reduced first set of data. 4. A method (200) of selecting a subset of variables according to clause 3, wherein the clusters are derived from a combination of the reduced first set of data and the second set of data. 5. A method (200) of selecting a subset of variables according to clause 4, wherein the clusters represents tuples of data that are a combination of a specific tuple of the reduced first set of data that is extended with a value that is derived from an instance from the second set of data that corresponds to the specific tuple of the reduced first set of data. 6. A method (200) of selecting a subset of variables according to clause 5, wherein the clusters are formed on basis of data that comprises unique tuples of the reduced first set of data that are extended with a value that is derived from the instances from the second set of data that correspond to the respective unique tuple. 7. A method (200) of selecting a subset of variables according to any one of the clauses 3 to 6, wherein the clustered data is obtained by a k-means clustering algorithm. 8. A method (200) of selecting a subset of variables according to clause 7, wherein the k-means clustering algorithms uses a Euclidean distance measure to determine distances between the tuples to be clustered. 9. A method (200) of selecting a subset of variables according to any one of the clauses 2 to 8, wherein searching (206) for the reduced set of variables representing the at least local optimum of the optimization function is performed by applying an iterative search method to find the at least local maximum to find the reduced set of variables that has a relatively large value for the optimization function. 10. A method (200) of selecting a subset of variables according to clause 9, wherein the iterative search method is based on simulated annealing. 11. A method (200) of selecting a subset of variables according to any one of the clauses 9 or 10, wherein each iteration of the iterative search method comprising

(68) forming (222) a proposal of the reduced set of variables to be assessed,

(69) determining (210) the value of the optimization function that is provided by the proposed reduced set of variables,

(70) storing (216) the proposed reduced set of variables as a best reduced set of variables if the proposed reduced set of variables provides a best value for the optimization function until the currently executed iteration,

(71) deciding (218) whether the proposed reduced set of variables is accepted as a basis for a subsequent proposal of the reduced set of variables to be assessed in a subsequent iteration. 12. A method (200) of selecting a subset of variables according to any one of the clauses 9 or 11, wherein the iterative search method starts (208) a first iteration with a proposal of the reduced set of variables comprising a single variable of which the mutual information between that the data of the single variable and the second set of data is highest compared to the other variables of the set of variables. 13. A method (200) of selecting a subset of variables according to any one of the clauses 11 or 12 in so far clause 12 refers to clause 11, wherein, in the second or later iterations, the proposal of the reduced set of variables to be assessed is formed by extending or reducing the last accepted reduced set of variables, wherein one of the variables of the set of variables is randomly selected, the randomly selected variable being added to the last accepted reduced set of variables if the randomly selected variable is not yet present in the last accepted reduced set of variables, and the randomly selected variable is removed from the last accepted reduced set of variables if the randomly selected variable is present in the last accepted reduced set of variables. 14. A method (200) of selecting a subset of variables according to clause 13, wherein the randomly selected variable is obtained by a random selection that is based on a probability for each variable that is a sum of a first portion of a uniform probability value for all variables and a second portion of a non-uniform probability value derived from the mutual information between the specific variable and the second set of data. 15. Computer program product (470) for selecting a subset of variables from a set of variables comprising a plurality of variables, which program (280) is operative to cause a processor to perform one of the methods according to one of the clauses 2 to 14.

(72) It is to be noted that the invention may be implemented in hardware and/or software, using programmable components.

(73) It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional units and processors. However, it will be apparent that any suitable distribution of functionality between different functional units or processors may be used without deviating from the invention. For example, functionality illustrated to be performed by separate units, processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization. The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these.

(74) It is noted, that in this document the word comprising does not exclude the presence of other elements or steps then those listed and the word a or an preceding an element does not exclude the presence of a plurality of such elements, that any reference signs do not limit the scope of the claims, that the invention may be implemented by means of both hardware and software, and that several means or units may be represented by the same item of hardware or software, and a processor may fulfill the function of one or more units, possibly in cooperation with hardware elements. Further, the invention is not limited to the embodiments, and the invention lies in each and every novel feature or combination of features described above or recited in mutually different dependent claims.