UNSUPERVISED STATISTICAL METHOD FOR MULTIVARIATE IDENTIFICATION OF ATYPICAL SENSORS

20230039304 · 2023-02-09

    Inventors

    Cpc classification

    International classification

    Abstract

    A method for identifying atypical sensors measuring characteristics of individuals. Curves of characteristic of individuals are collected, the curves being measured by each sensor. For a given sensor, a reference curve is processed to calculate a dissimilarity index between the reference curve and each of the other curves of the sensor and the dissimilarity processing is iteratively repeated for each curve resulting from the same sensor to obtain the dissimilarity index for each curve. The dissimilarity processing is repeated for the other sensors to obtain a table of dissimilarity indices. An atypicality index is calculated for each individual from a multivariate statistical processing of the tables. Atypical individuals and atypical sensors are identified.

    Claims

    1-13. (canceled)

    14. A method for identifying at least one atypical sensor from a plurality of sensors measuring a set of characteristics of a set of individuals from a population of events, implemented by a computer software executed by a processor, comprising: collecting, for each sensor, data curves measured by said each sensor, each data curve being representative of a characteristic of an individual; processing, for a given sensor, a reference data curve to calculate a dissimilarity index, which is representative of a distance between the reference data curve and each of other data curves from the given sensor; a first iteration, wherein the step of processing is iteratively repeated for each curve resulting from the given sensor, so as to obtain a dissimilarity index for said each curve of the given sensor; a second iteration, wherein the steps of collecting and processing are performed for other sensors, so as to obtain, for said each of the other sensors, a table of dissimilarity indices of the respective data curves thereof with a set of the other data curves thereof; calculating an atypicality index for each individual from a multivariate statistical processing on all or part of the tables of dissimilarity indices resulting from the second iteration; identifying at least one atypical individual depending on the calculated atypicality indices; and identifying at least one atypical sensor, by performing a statistical processing depending on the dissimilarity indices calculated for said each sensor and depending on said at least one atypical individual identified.

    15. The method of claim 14, wherein the step of processing further comprises: subtracting the reference data curve successively from said each of the other data curves of the given sensor, so as to obtain difference curves; squaring the difference curves to provide resulting curves; adding the resulting curves to obtain a single sum curve; and determining the dissimilarity index of the reference data curve as being equal to a square root of the average of the single sum curve.

    16. The method of claim 14, wherein the step of processing comprises calculating correlation coefficients between the reference data curve and said each of other data curves generated by the given sensor; and calculating an average of the correlation coefficients.

    17. The method of claim 14, wherein the step of processing comprises calculating a multivariate dissimilarity index by applying an abnormality detection method to values of measurements of a curve relative to that of said each of the other curves.

    18. The method of claim 14, further comprising, upstream of the step of processing, a preliminary step of preparing data before processing is performed, in which the data curves are time scaled to a same temporality so that all data curves have a same number of points and aligned on same indices.

    19. The method of claim 14, wherein data originate from said plurality of sensors integrated in an equipment for producing electronic components and are representative of physical parameters.

    20. The method of claim 14, wherein data originate from said plurality of sensors integrated into an aircraft for performing flight tests and are representative of physical parameters.

    21. The method of claim 14, wherein data originate from said plurality of sensors measuring physiological parameters.

    22. The method of claim 14, wherein data originate from said plurality of sensors generating spectral data.

    23. The method of claim 14, applied to an image, wherein the image is characterised by a matrix of pixels, and said plurality of sensors measures a level of colors, grey, blue, red or green, for each pixel.

    24. The method of claim 14, applied to a hyper-spectral image, wherein the hyper-spectral image is characterised by a matrix of pixels, each pixel being characterised by a wavelength, and said each sensor senses a given wavelength.

    25. A computer program product comprising program code instructions, when executed by one or more processors, configure the one or more processors to implement the method of claim 14.

    26. A computer memory storing the computer program product of claim 25.

    Description

    BRIEF DESCRIPTION OF FIGURES

    [0041] The invention will be better understood on reading the following description, given by way of non-limiting example, and made with reference to the figures which represent:

    [0042] FIG. 1 is a flowchart of the method according to the present invention,

    [0043] FIG. 2 is a table of three measurement cycles carried out by a sensor,

    [0044] FIG. 3 are the curves corresponding to the measurement cycles of FIG. 2,

    [0045] FIG. 4 is a table of values representative of the distance between each of the curves with all other curves generated by the same sensor, said table originating from a first iteration, and

    [0046] FIG. 5 is a table of dissimilarity index of the curves from a set of two sensors, said table originating from a second iteration step,

    DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

    [0047] The present invention is implemented by computer software executed by a calculator, such as a processor, of a computer.

    [0048] The present description is given by way of non-limiting example, each characteristic of an embodiment being able to be combined with any other characteristic of any other embodiment in an advantageous manner.

    [0049] As previously indicated, the present invention aims at a method for identifying sensors called atypical sensors from a plurality of sensors, the steps of which are represented in FIG. 1.

    [0050] Said sensors measure a set of characteristics of a set of individuals from a population of events. The characteristics can in particular be physical characteristics such as the pressure, the temperature, the luminosity. Thus, each individual is characterised by characteristics which can be different, and are measured by said sensors. In other words, each sensor can measure a different characteristic of the individual.

    [0051] Individuals can be individuals in the statistical sense. They can be events or parts thereof.

    [0052] The events can be varied. It can be, still without limitation, a flight test, a step of manufacturing a component or a system, the monitoring of a given space by a monitoring system.

    [0053] Thus, the plurality of sensors can comprise pressure, temperature, humidity, luminosity, current, displacement, image sensors . . . The nature of each sensor can be different, both in terms of type of data measured only in measurement technology. For example, for a set of pressure sensors, there may be sensors based on strain gauges, and/or capacitive pressure sensors, and/or piezoresistive pressure sensors, and/or even resonant pressure sensors.

    [0054] The sensors can be of the analog or digital type.

    [0055] The data from the sensors may relate to the same physical parameter, or different physical parameters. For example, a first sensor can measure pressures, and a second temperatures. In a preferred embodiment, the physical parameters measured by the sensors are different. Indeed, this allows having more information on the production equipment.

    [0056] These measurements are generally carried out with one or more measurement points per second: at the end of a measurement cycle, for each sensor, a data curve, called curve, is available. This curve can be temporal. More specifically, each curve relates to a characteristic of an individual and can comprise, on the ordinate, a dimension specific to the parameter measured by the sensor and, on the abscissa, a temporal dimension.

    [0057] The sensors are in the present non-limiting example of the invention integrated into a physical environment.

    [0058] A sensor called atypical sensor can mean that said sensor has a problem in its operation and is dysfunctional. It can also mean that it measures a characteristic of a set of individuals among a population of events which is atypical in the sense that this characteristic has a behaviour that does not correspond to the expected or usual behaviour of the characteristic of such an event.

    [0059] This atypicality can highlight a failure of a system that impacts the magnitudes of the characterised environment. It can also highlight the appearance of an unforeseen event or a non-conformity in this environment, for example the appearance of a foreign body when monitoring a space.

    [0060] The event, or the individual measured by the sensor in which the atypicality is highlighted is called atypical, is also considered as atypical.

    [0061] The first step of the method is a step 50 of collecting, for each sensor, curves of data measured by said sensor, each curve of data being representative of a characteristic of an individual.

    [0062] For example, FIG. 2 represents a table of data resulting from three cycles of measurements C1, C2 and C3 carried out by a sensor and FIG. 3 represents the corresponding curves.

    [0063] The data characterising the curves collected in the present invention can result from sensors integrated in equipment for producing electronic components, for example semiconductors. These sensors can provide a set of measurements of physical parameters, such as temperature, pressure, humidity, etc. Thus, identifying one or more sensors that have generated atypical curves can allow highlighting that the products being manufactured or which are manufactured by said production equipment when the sensor(s) have generated these atypical curves are likely to have one or more non-conformities. Thus, the invention may constitute means for carrying out predictive maintenance. Also, the method can allow highlighting one or more defective sensors.

    [0064] The curves can alternatively result from sensors integrated into aircraft for carrying out flight tests and intended to measure hundreds of physical parameters. In this implementation, the sensors can be integrated into a system or subsystem of an aircraft, or in its surroundings. In this application, a curve of a sensor corresponds to a cycle of measurements which are carried out during a flight. In this implementation, the method can highlight that the system or the subsystem monitored by said sensor has a non-nominal behaviour. Also, the method can allow highlighting one or more defective sensors.

    [0065] Still alternatively, the data can originate from sensors intended to measure physiological parameters of a subject. They can then be positioned on the subject or close thereto. In this mode of application, without limitation, the series of sensors can comprise movement sensors, or even sensors allowing producing an electroencephalogram. In this implementation, the method can highlight that the subject may have predispositions to a given disease or disorder. Also, the method can allow highlighting one or more defective sensors.

    [0066] A set of curves can also represent a set of images. These images can be obtained by a CMOS sensor (“Complementary Metal-Oxide-Semiconductor”), CCD (“Charge-Coupled Device”), which can originate from a camera, such as a thermal camera. Indeed, an image can be characterised by a matrix of pixels. Each pixel can be characterised by a level of grey if the image is in black and white, or by a level of green, blue and red if the image is in colour. In the case of a black and white image, the sensor measures a level of grey. In the case of a colour image, three sensors measure the level of red, the level of green and the level of blue of each pixel. The individuals can be all or part of an image. Preferably, each individual is a row or a column of pixels. For a black and white image, a measured curve is then a level of grey depending on the pixels of all or p part of an image. For a colour image, a curve is then a level of red, blue or green depending on the pixels of all or part of an image.

    [0067] The method can allow detecting an atypical zone in a hyper-spectral image. The hyper-spectral image can be defined by a matrix of pixels, each of which being characterised by a wavelength. Each sensor is configured to sense a given wavelength. Individuals can be all or part of the image. Preferably, each individual is a row or a column of pixels. Each curve is a wavelength depending on the pixel of all or part of a hyper-spectral image.

    [0068] The detection of abnormal images in a set of images is advantageously a field of application of the invention, for example, to identify in an unsupervised manner, “abnormal” elements in a video or on satellite images repeated over time.

    [0069] The method includes a curve processing step 100 in which, for a given sensor, all curves resulting from said sensor are recovered in order to calculate, for a considered curve called “reference curve”, an index representative of the distance between said reference curve and each of the other curves generated by this sensor.

    [0070] This step is described below in three embodiments, each of which implements different manners of obtaining this index.

    [0071] As described below, this index can take the form of a dissimilarity index.

    [0072] Advantageously, the dissimilarity index allows facilitating the evaluation of the atypical character of a curve.

    [0073] In a first embodiment of the invention, the curve processing step is implemented as follows.

    [0074] For a given sensor, a curve is selected, as previously mentioned, this curve is called “reference curve”. This reference curve is characterised by the data of one of the columns C1 to C3 of FIG. 2.

    [0075] This reference curve is successively subtracted from each of the other curves, point to point. By way of example, as shown in FIG. 2, if C1 is the reference curve, the data thereof, at each instant t, are successively subtracted from the data at the same instants t of the curves C2 and C3.

    [0076] More specifically, for a given sensor, on a set of N processed curves, for each reference curve, it results in N curves called “difference curves”, of size identical to the reference curves. Each difference curve is then squared, also point to point.

    [0077] The squared difference curves are added to obtain a single sum curve; the last operation consisting in averaging this sum curve and taking the square root thereof. At the end of these operations, an index representative of the distance between said reference curve and each of the other curves generated by the sensor is obtained, as shown in the table of FIG. 3. This index is called herein “dissimilarity index”.

    [0078] In the table of FIG. 3, it can be seen that the dissimilarity index of the curve C3 is abnormally high relative to the dissimilarity indices of the curves C1 and C2, which allows deducing thanks to the method according to the invention that C3 has an atypical behaviour compared to C1 and C2.

    [0079] To summarise, the processing step 100 consists in calculating the square root of the average of the sum of the deviations, as an index of dissimilarity associated with a curve obtained by a sensor.

    [0080] Thus, thanks to the method according to the present invention, the information relating to the difference between the reference curve and the set of the other curves is aggregated into a single value, the dissimilarity index.

    [0081] Generally, the curve processing step according to the first embodiment of the invention can be formulated as follows, with N the number of curves and P the number of sensors, n∈[1, N] and p∈custom-character1, Pcustom-character C.sub.p,n the n-th curve of the sensor p:

    [0082] For all curves C.sub.p,k with k∈custom-character1, Ncustom-character the point-to-point squared difference is performed:


    (C.sub.p,n−C.sub.p,k).sup.2  [Equation 1]

    [0083] The point-to-point sum is then performed:


    S=ρ.sub.k=1.sup.N(C.sub.p,n−C.sub.p,k).sup.2  [Equation 2]

    [0084] Finally, a dissimilarity index is obtained by calculating the average of all points of the sum curve:


    φ=√{square root over (average(S))}  [Equation 3]

    [0085] These processing operations are then repeated for each curve n resulting from the same sensor, iteratively during a first iteration step 200. In other words, each curve resulting from said sensor successively becomes the reference curve. At the end of this step, a dissimilarity index for each of the curves is therefore obtained, as represented in FIG. 3.

    [0086] In a second exemplary embodiment of the invention, the curve processing step 100 is implemented as follows.

    [0087] For a given sensor and a given reference curve, the correlation coefficient is calculated between the reference curve and each of the other curves generated by said sensor. This correlation coefficient is obtained by any method known as such to the person skilled in the art, such as the Bravais-Pearson method, the Spearman coefficient method, etc.

    [0088] More specifically, for a given sensor, over a set of N processed curves, for each reference curve, it results in N-1 correlation coefficients.

    [0089] For each of the curves, the average of the correlation coefficients is then calculated. This results in an index representative of the distance between the reference curve and each of the other curves generated by the sensor. This index is referred to herein as the “dissimilarity index”.

    [0090] Analogously to the processing step 100 described for the first embodiment of the invention, the information relating to the deviation between the reference curve and the set of the other curves is advantageously aggregated into a single value.

    [0091] Generally, the curve processing step 100 according to the second embodiment of the invention can be formulated as follows, with N the number of curves and P the number of sensors, n∈custom-character1, Ncustom-character, p∈custom-character1, Pcustom-character, and k≠n:

    [0092] The correlation coefficient between curve n and each of the N curves is calculated


    corr[(C.sub.n,C.sub.k)],n≠k  [Equation 4]

    [0093] The average correlation for a given parameter p and a curve n is calculated:


    φ=average[corr(C.sub.n)]  [Equation 5]

    [0094] Also, these processing operations are then iteratively repeated for each curve resulting from the same sensor, during a first iteration step 200. At the end of this step, a dissimilarity index for each of the curves is therefore obtained, as represented in FIG. 3 and FIG. 4.

    [0095] In this third exemplary embodiment, the processing step 100 comprises a step of calculating a multivariate dissimilarity index obtained for example with an abnormality detection method known as such to the person skilled in the art, such as the Mahalanobis distance method, applied to the values of the measurements of a curve relative to that of each of the other curves; the time points then being the variables in the statistical sense of the term.

    [0096] Generally, the curve processing step 100 according to the third embodiment of the invention can be formulated as follows, with N the number of curves and P the number of sensors, n∈custom-character1, Ncustom-character, p∈custom-character1, Pcustom-character, and M.sub.p the matrix containing the set of the curves for the parameter p:

    [0097] All dissimilarity indices for the N curves are calculated in a multivariate way, the time points then being the variables in the statistical sense of the term:


    Φ=(ϕ.sub.1, . . . ,ϕ.sub.N)=F[M.sub.p]  [Equation 6]

    [0098] This step is repeated for each of the curves of a sensor. More particularly, steps 100 and 200 are combined during this calculation.

    [0099] At the end of this step, a dissimilarity index for each of the curves is therefore obtained, as represented in FIG. 4.

    [0100] During a second iteration step 300, the curve processing step 100, which comprises the first iteration step 200, is carried out for the curves from the other sensors so as to obtain the dissimilarity index of each of the curves, for each of these sensors. This second iteration step is represented in FIG. 5.

    [0101] The method also comprises a step 400 of calculating an atypicality index for each individual. Said atypicality index may be unique for each individual. It can be determined from a multivariate statistical processing on all or part of the tables of dissimilarity indices resulting from the second iteration step 300. For example, the multivariate statistical processing can comprise the application of a multivariate statistical algorithm to non-functional data known to the person skilled in the art, for example, the Mahalanobis distance, the Hotelling T.sup.2 etc. These multivariate statistical methods generally require a processing of the table of dissimilarity indices resulting from the iteration step 300 in order to obtain one or more values therefrom which will be in a format allowing them to be compared with a threshold. The multivariate statistical methods also require the predetermination of a statistical threshold, aiming at selecting the values above (respectively below) said threshold, and at considering them as atypical. On the contrary, the values below (respectively above) said threshold will be considered as typical. The threshold can be refined depending on the sensitivity of the desired detection that is desired. The threshold can thus be based on a maximum rate of predefined acceptable atypical curves.

    [0102] The method comprises a step of identifying atypical individuals 600, from the population of events. This is carried out from the calculated atypicality indices. They can then be compared with said threshold.

    [0103] The method then comprises a step of identifying at least one atypical sensor. For this, a statistical processing is carried out, depending on the dissimilarity indices calculated for said sensor and depending on the atypical individual identified in the previous step.

    [0104] In order to identify the sensors generating atypical curves, a statistical processing is carried out. For each sensor, the distance of the dissimilarity index of the curve characterising the identified atypical individual from said sensor can be compared to the average of all dissimilarity indices of the other curves characterising the other individuals from this sensor. This calculation can, for example, allow calculating a distance which can be expressed as a number of standard deviations from the average.

    [0105] Thus, the greater the distance, the more the sensor is involved in the atypicality of the individual. On the contrary, the lower the distance, the less the sensor is involved in the atypicality of the individual.

    [0106] The curves of the most involved sensors can be plotted and compared to the curves of non-atypical individuals resulting from the same sensors.

    [0107] Thereafter, a more accurate cause of the atypicality of the sensors can be highlighted. For example, if a number greater than a predetermined threshold of curves from an atypical sensor are atypical, the sensor can be considered as faulty.

    [0108] In the case where a sensor is identified as faulty, it can be disabled, either manually or automatically. It can then be reset or changed.

    [0109] If the number of atypical curves from an atypical sensor is less than a given threshold, the individual can be considered as faulty.

    [0110] For the proper operation of the algorithm implemented by the present invention, in these three embodiments of the invention, all curves must have the same number of points and must be aligned on the same indices, in particular, the same time dimension, or the same spectrum.

    [0111] To this end, the method according to the invention preferably includes, before carrying out the processing step 100, a preliminary step of preparing the data before processing so that they can be directly exploited by the algorithm, in which the curves are time scaled such that they all respect the same temporality. In other words, in this data preparation step, the curves are all wedged to each other, for example with a method known per se to the person skilled in the art, such as the dynamic time warping method.

    [0112] Advantageously, the invention constitutes a predictive maintenance tool because allows anticipating a problem on a production machine.

    [0113] Furthermore, it allows, according on its field of application, detecting atypical aircraft flights which may reveal, for example, abnormal behaviour of an aircraft member, a failure of one or more sensors, etc.

    [0114] In another field, the present invention allows predicting the appearance of symptoms, such as tremor or blockage episodes of a patient suffering from Parkinson's disease. Similarly, these data can come from an electroencephalogram or an electrocardiogram which can reveal a pathology if atypicals are detected.

    [0115] An example of computer program product is now described. Said program product comprises program code instructions which, when executed by one or more processors, configures the processor(s) to implement one of the methods in any mode.

    [0116] An example of computer memory is now described. Said computer memory stores the aforementioned computer program product. By way of example, it can be a USB key, a hard drive or even a cloud.

    [0117] More generally, it should be noted that the implementations and embodiments considered above have been described by way of non-limiting examples, and that other variants are consequently possible.