METHOD FOR DETERMINING THE LEVEL OF UNUSUALNESS OF INDIVIDUALS, IN PARTICULAR IN ORDER TO STATISTICALLY DETECT UNUSUAL INDIVIDUALS IN A MULTIVARIATE CONTEXT
20230018313 · 2023-01-19
Inventors
Cpc classification
International classification
Abstract
A method for determining the level of unusualness of individuals, in particular in order to statistically detect unusual individuals in a set of previously gathered data resulting from measurements of parameters of individuals taken by a plurality of measuring systems. The data is pre-processed, and a multivariate unusualness index is determined. The index being transformed by a function so as to be between 0 and 1, on the set of measurements for each individual based on the preprocessed data. Unusual individuals are then identified.
Claims
1-11. (canceled)
12. A method for identifying unusual electronic components in a sample of electronic components, implemented by a processor-based computer, comprising: statistically detecting the unusual electronic components in a set of previously gathered data on parameters or physical characteristics of the electronic components, derived from measurements of parameters or of the physical characteristics of the electronic components performed by a plurality of measuring systems; pre-processing the data to provide pre-processed data; determining a multivariate unusualness index from the pre-processed data, the multivariate unusualness index being transformed by a function so as to be between 0 and 1, on all measurements for each electronic component; and identifying the unusual electronic components.
13. The method of claim 12, wherein the pre-processing step is performed by a standardization step in which for each measured parameter j, each electronic component x(i,j) is centered by the empirical average μ.sub.j of a set of values of said each measured parameter j and divided by an empirical standard deviation σ.sub.j.
14. The method of claim 12, wherein the pre-processing step is performed by a robust standardization step implementing robust statistical indicators.
15. The method of claim 12, further comprising, subsequent to the determining step, identifying and selecting a subset of the electronic components whose multivariate unusualness index is zero.
16. The method of claim 12, wherein deviations in an absolute value from a mean on each variable p, for said each electronic component i and for each difference greater than a reference value k are summed to determine a raw multivariate unusualness index.
17. The method of claim 16, further comprising applying the following transformation to the raw multivariate unusualness indices, with (D the distribution function of an inverse Gaussian law:
18. The method of claim 12, wherein the step of determining the multivariate unusualness index further comprises determining a reference subset from an unusualness index of electronic components and determining a multivariate unusualness index of a new electronic component; wherein the identifying step is applied afterwards to the new electronic component with regards to the unusualness indices of the electronic components of the reference subset, the new electronic component then being integrated into the reference subset if the new electronic component is considered as a non-unusual, and being rejected if the new electronic component is considered as unusual.
19. The method of claim 12, wherein the identifying step further comprises: sorting the multivariate unusualness indices z.sub.i of the electronic components in an ascending order; measuring a deviation between two consecutive unusualness indices z.sub.i; determining groups of electronic components having statistically similar unusualness indices; and identifying the unusual electronic components based on a predefined maximum rate of acceptable unusual electronic components.
20. The method of claim 19, further comprising (a) determining whether a group of electronic components having a highest unusualness indices includes a number of electronic components lower than a maximum threshold of acceptable unusual electronic components, the maximum threshold being predetermined as a function of a predefined maximum rate; and repeating, in response to a determination that the number of electronic components in the group of electronic components is lower than the maximum threshold, the step (a) for another group of electronic components, in a successive and decreasing order, until a sum of the electronic components of the groups is lower than the maximum threshold to determine a set of groups.
21. A processor configured to: pre-process previously gathered data on characteristics of electronic components, derived from measurements of parameters of the electronic components performed by a plurality of measuring systems; determine a multivariate unusualness index from the pre-processed data, the multivariate unusualness index being transformed by a function so as to be between 0 and 1, on all measurements for each electronic component; and identify unusual electronic components in a sample of the electronic components.
22. A computer program product comprising program code instructions, when executed by at least one processor, configures said at least one processor to implement a method of claim 12.
23. A computer memory storing the computer program product of claim 22.
Description
BRIEF DESCRIPTION OF THE FIGURES
[0056] The invention will be better understood upon reading the following description, provided as a non-limiting example, and made with reference to the figures which represent:
[0057]
[0058]
[0059] In these figures, identical reference numerals from one figure to another refer to identical or similar elements. Moreover, for clarity, the drawings are not to scale, unless stated otherwise.
DESCRIPTION OF THE EMBODIMENTS
[0060] The present invention is implemented by computer software executed by a calculator, such as a processor, of a computer.
[0061] The method for determining the level of unusualness of individuals, allowing in particular statistically detecting unusual individuals according to the present invention, is applied on a set of data previously gathered and characterised by one or several measurement(s) carried out by measuring systems on a plurality of individuals.
[0062] In the present application, a measuring system may be an electrical tester, a biological measuring system, a physical measuring sensor or more generally any tool adapted to carry out a measurement.
[0063] For example, this data set may be organised in the form of a data table whose rows represent n individuals and whose columns represent p variables corresponding to the measurements performed on the individuals. The values of these measurements are numerical or binary.
[0064] Advantageously, the data set may contain a different number of measurements for each individual without this being detrimental to the proper operation of the method according to the invention. Furthermore, the number of measurements may be greater than or equal to the number of individuals, and conversely, the number of individuals may be greater than or equal to the number of measurements, without this impacting the operation of the method according to the invention.
[0065] The data set are derived from a sample of at least three individuals, the number of variables may be comprised between one and several thousand(s), the only limit being the computing power of the computer means with which the invention is intended to be implemented.
[0066] As an example of a non-limiting application, the individuals may be electronic components, batches of vaccines, etc. Preferably, the individuals are electronic components.
[0067] Furthermore, the measurements may be carried out on parameters or physical characteristics of individuals produced by means of an industrial production, for example at the end of production, in the context of a quality control of said individuals.
[0068] At the end of production, the quality control of electronic components comprises several steps. At first, a control operation, also called wafer testing, is carried out on a wafer grouping together several electronic components by means of microscopic needles. Equipment simulates the operation of the component and measures its physical characteristics and the parameters during operation. Said tests may be carried out under different environmental conditions, such as at cold, ambient or hot temperature.
[0069] Thus, for one component, the number of tests carried out may vary between ten and several thousands. A wafer may comprise a few hundred to several thousand electronic components.
[0070] Another control operation is usually carried out after the assembly phase, called final test.
[0071] The physical characteristics and the parameters may be currents, voltages, frequencies, delays but also accelerations or else luminosity.
[0072] The method aims to identify unusual electronic components during each of the control operations of said electronic components.
[0073] As shown in the flowchart of
[0077] The data pre-processing allows obtaining data independent of their unit or measurement scale.
[0078] In the following text, we define x.sub.1, . . . , x.sub.n, n observations characterised by p quantitative variables. More specifically, in the example of application of the method according to the invention, x.sub.i represents all of the measurements carried out on the part i.
[0079] In this pre-processing step 100, we look to standardise initial data x.sub.i, j in order to obtain new data y.sub.i, j.
[0080] In an embodiment of the invention, the pre-processing step 100 may be carried out by a non-robust standardisation step which could be characterised as described hereinbelow.
[0081] For each measured parameter j, each individual x.sub.i, j is centred by the empirical mean μ.sub.j; of all of the values of the parameter j and we divide by the empirical standard deviation σ.sub.j:
[0082] In another embodiment of the invention, the pre-processing step 100 may be carried out by a robust standardisation step.
[0083] More specifically, the robust standardisation step may implement robust statistical indicators such as the median or the truncated mean as a position parameter, i.e., instead of the empirical mean μ.sub.j in the equation formulated hereinabove, and of the interquartile range or the absolute median deviation for the scale parameter, i.e., instead of the empirical standard deviation σ.sub.j in the equation formulated hereinabove.
[0084] Concretely, the non-robust standardisation step is preferred to the robust standardisation step in the presence of data that do not follow a Normal law or when the values of these data are very extreme. Indeed, non-robust estimators such as the mean are sensitive to the presence of extreme values, unlike the median.
[0085] Afterwards, the step 200 of determining a multivariate unusualness index is implemented on all measurements for each individual.
[0086] More specifically, in this step, one seeks to determine a raw multivariate unusualness index for each part or individual i from the pre-processed data y.sub.i,j.
[0087] To this end, for each individual i, the deviations in absolute value from the mean on each variable p are summed, for any deviation greater than a reference value k, with k∈.sup.+.
[0088] Raw unusualness indices with values in .sup.+ undergo a transformation f which projects them between [0, 1]. Unusualness indices z; whose value is comprised between 0 and 1 are obtained, which makes them comparable with each other.
z.sub.i=f(index.sub.i) [Equation 4]
[0089] Alternatively, to the previous transformation, it is possible to apply the following transformation to raw unusualness indices:
[0090] With Φ the distribution function of an inverse Gaussian law with a mean and a dispersion equal to 1.
[0091] Advantageously, the step 200 of determining a multivariate unusualness index allows creating an index that aggregates the univariate unusualnesses, even the low ones.
[0092] More particularly, the unusualness index allows identifying as potentially unusual individuals, on the one hand, weakly unusual in a univariate manner over a large number of parameters and, on the other hand, strongly unusual over one or a few parameters.
[0093] It should be noted that individuals whose unusualness index is zero do not have any unusualness, and that the more an individual has an unusualness index close to 1, the more likely it will be unusual.
[0094] It should herein be understood that an individual whose unusualness index is non-zero could be either non-unusual or unusual.
[0095] Hence, the present invention advantageously allows detecting individuals having no unusualness, to the extent that the unusualness index of an individual could be zero. If all individuals have a zero unusualness index, there is no unusual individual in the analysed sample of individuals.
[0096] Thus, following the determination step 200, it is possible to identify and select a subset of individuals that are certainly not unusual, by identifying and selecting the individuals whose unusualness index is zero.
[0097] This subset is herein called “reference subset”. This selection of the “best” individuals, best in the sense that they have no unusualness, is not just the symmetric of the procedure that consists in eliminating all unusual individuals because the latter statistically assesses unusualness the level while the selection of a reference subset goes further by eliminating individuals that have a little unusualness even though this is not statistically proven.
[0098] The selection of a reference subset is particularly advantageous, in particular because it allows isolating parts having no unusualness and using these parts. This is essential in some kinds of industry, for example, in the space industry or in large space programs, only parts that do not have any unusualness are sent into space.
[0099] To summarise, the method according to the invention allows identifying unusual individuals and individuals having no unusualness.
[0100] Furthermore, the multivariate unusualness index is computable even when there are more variables than observations, unlike most multivariate detection methods.
[0101] This specificity is particularly advantageous to the extent that too many variables in comparison with the number of observations could generate noise that might prevent or complicate the identification of unusual individuals.
[0102] Finally, these features advantageously allow applying the method to data sets including missing data.
[0103] The standardisation between 0 and 1 of the unusualness indices, upon completion of the step 200 of determining said indices, confers a major industrial advantage, in particular when the traceability of the parts is not ensured during their quality control after production.
[0104] Indeed, thanks to the method according to the invention, in a particular embodiment of the invention, the step 200 of determining a multivariate unusualness index includes a sub-step in which an unusualness index is determined for individuals forming a subset so-called “reference subset”, and a sub-step in which an unusualness index of a new individual is determined.
[0105] Afterwards, the step 300 of identifying unusual individuals is applied to the new individual with regards to the unusualness indices of the individuals of the reference subset.
[0106] Said new individual is then directly rejected if it is considered as unusual.
[0107] Alternatively, the step 200 of determining a multivariate unusualness index may include sub-steps in which the unusualness index of individuals forming the reference subset is determined, then limits of unusualness indices of said reference subset are determined on the basis of the unusualness indices of its individuals. For example, these limits are defined according to the standard deviation of the reference subset, of the type μ±kσ with k=3 for example, μ and σ being determined according to the values of the reference subset.
[0108] Afterwards, analogously to what has been previously described, a sub-step of determining an unusualness index of a new individual is carried out.
[0109] Thus, it is possible to achieve a turnover of the “first in, first out” type, known to a person skilled in the art by the acronym “FIFO” standing for “first in first out”.
[0110] Consequently, it is possible to eliminate any unusual part in an integrated manner in the industrial production process, continuously.
[0111] These dynamic modes of implementations of the invention are particularly advantageous in the cases where, during production, the traceability of the produced individuals is not physically ensured, and that the unusual individuals detected with conventional methods for detecting unusual individuals, so-called “post-processing”, cannot be put off.
[0112] The post-processing methods consist in statistically identifying the unusual individuals from a sample of individuals, then physically identifying the unusual individuals to eliminate them. Hence, this requires the traceability of each produced individual and processing a sample of individuals. The present invention suppresses this constraint and allows eliminating individuals on the fly.
[0113] It should be noted that thanks to the standardisation of the unusualness indices, that of the observed new individual is comparable to that of the other individuals, even when it originates from a new batch, which is not necessarily the case with known statistical detection methods.
[0114] The step 300 of identifying the unusual individuals may preferably consist in implementing the following sub-steps: [0115] sort out the determined unusualness indices z; of the individuals in ascending order, as shown in the graph of
z.sub.(1)≥ . . . ≥z.sub.(n) [Equation 6] [0116] measure the deviation between two consecutive unusualness indices z.sub.i:
w.sub.i=z.sub.(i)−z.sub.(i−1) [Equation 7]
[0117] These differences w.sub.i follow the following distribution law:
F(w)=1−(1−w).sup.m with a mean [Equation 8]
[0119] This last sub-step is carried out by identifying groups of individuals with statistically similar unusualness indices. This identification is carried out according to the deviations between two successive unusualness indices.
[0120] In the graph of
[0121] More specifically, the deviations between two successive unusualness indices greater than the a-th percentile are identified:
[0122] In the embodiment represented in
[0123] These deviations are calculated, because theoretically the differences of the deviations follow the distribution law previously described in [Math.7], and the method could be based on a theoretical threshold to identify groups of individuals.
[0124] Thus, a comparison between individuals is advantageously carried out so that if an individual is statistically unusual but not significantly more than the other individuals then it is not identified as unusual.
[0125] Thanks to this identification operation, groups of individuals with a substantially similar or analogous level of unusualness are determined. In this manner, all of the individuals forming a group are identified as unusual or as non-unusual. Thus, the reliability of the detection of unusual individuals is considerably increased to the extent that the method guarantees that two individuals having a very close unusualness index are both identified as unusual or as non-unusual. And it is therefore possible to go beyond the detection of unusual individuals, since it is thus possible to generate a subset of individuals with homogeneous unusualness indices.
[0126] Advantageously, the approach of the method of the present invention is thus to identify groups of unusual individuals.
[0127] The groups of individuals with a substantially similar or analogous level of unusualness being determined and the determined unusualness indices z; of the individuals being sorted in increasing order, the groups are, actually, sorted in increasing order according to the level of unusualness of their respective individuals, as shown in
[0128] The next step of the method is the identification of the groups of unusual individuals according to a predetermined maximum rate of acceptable unusual individuals.
[0129] In the embodiment represented in
[0130] More specifically, during this step, it is determined whether the group of individuals in which the individuals have the highest unusualness indices, called the “first group”, includes a number of individuals less than or equal to the predetermined maximum threshold of acceptable unusual individuals.
[0131] If so is the case, then all of the individuals of that group are identified as unusual, otherwise they are considered as non-unusual.
[0132] In the case where the individuals of the first group are identified as unusual, it is proceeded with the identification of the group whose individual(s) has/have immediately lower unusualness indices and each of the next groups, successively, as long as the sum of the individuals of said group or of all of said groups is less than or equal to the maximum threshold, the individuals of said identified group(s) then being identified as unusual.
[0133] In other words, if the individuals of the first group are identified as unusual, one seeks to know whether those of the group in which the individuals have unusualness indices immediately lower than those of the first group, called “second group”, are unusual too.
[0134] In the example represented in
[0135] Afterwards, the number of individuals in the first and second groups is then added.
[0136] If the sum of the individuals of the first and second groups is less than or equal to the maximum threshold, then all of the individuals of the second group are also identified as unusual, otherwise they are considered as non-unusual.
[0137] In the example represented in
[0138] These operations are iterated until identifying a group whose individuals are not unusual is identified, i.e., until the sum of the individuals of the studied groups is greater than the maximum threshold. From this group, all next groups, i.e., the groups having individuals whose unusualness indices are lower than those of the studied groups, therefore consist of non-unusual individuals.
[0139] In the example represented in
[0140] The individuals of the first to third groups are considered as unusual, the individuals of the other groups being considered as non-unusual.
[0141] Advantageously, thanks to these operations, the rate of false alarms, i.e. of error in the identifications of unusual individuals, is substantially reduced, and that being so, without the need for imposing a strict rate of unusual individuals to be met.
[0142] Once detected, the unusual individuals could finally be identified thanks to physical traceability, or else be eliminated directly thanks to the dynamic mode of implementation.
[0143] Alternatively, in another mode of implementation of the method according to the invention, the step 300 of identifying the unusual individuals may consist in applying a statistical method on the unusualness index to detect unusual individuals.
[0144] For example, it is possible to calculate a limit of the type μ±kσ with k=3 for example, μ and σ being determined according to the values of the sample. These unusual individuals are then eliminated from the production, if the production process of the individuals allows the traceability of the produced individuals.
[0145] The present invention then allows returning to a univariate case to the extent that there is only one unusualness index per individual, but by eliminating or considerably limiting false alarms in comparison with the statistical detection methods of the prior art.
[0146] An example of a processor is now described. Said processor is configured to: [0147] pre-process data previously gathered on characteristics of the electronic components, derived from measurements of parameters of the electronic components carried out by a plurality of measuring systems, [0148] determine a multivariate unusualness index, said index being transformed by a function f so as to be comprised between 0 and 1, on all measurements for each electronic component, from the pre-processed data, [0149] identify unusual electronic components in a sample of electronic components.
[0150] The processor may contain several calculation cores, and have a clock frequency of several gigahertz.
[0151] The processor may be integrated into a computer and connected to a compatible motherboard.
[0152] Also, several processors may be architectured so as to process all of the calculations necessary for carrying out the different steps, in parallel. This allows increasing the computing power, therefore accelerating the completion of the different steps, being able to process more data and limiting the overheating of the processors.
[0153] An example of a computer program product is now described. Said program product comprises program code instructions which, when they are executed by one or several processor(s), configure the processor(s) to implement any of the methods in any mode. Said program code instructions may be coded in the Python or C++ language for example.
[0154] More generally, it should be noted that the implementations and embodiments of the invention considered hereinabove have been described as non-limiting examples and that other variants could consequently be considered.