METHOD FOR PREDICTING A FEEDSTUFF AND/OR FEEDSTUFF RAW MATERIAL
20220261399 · 2022-08-18
Assignee
Inventors
- Ingolf REIMANN (Reinheim, DE)
- Joachim REISING (Kleinostheim, DE)
- Christoph MUELLER (Offenbach am Main, DE)
Cpc classification
G06F16/2458
PHYSICS
G16C20/20
PHYSICS
International classification
Abstract
A computer-implemented method for predicting a feedstuff and/or feedstuff raw material is described. The method comprises providing a near infrared (NIR) spectrum of a sample of an unknown feedstuff raw material and/or feedstuff. The absorption intensities of wavelengths or wavenumbers in the spectrum are transformed to give a query vector. A set of database vectors of a population of spectra of known feedstuff raw materials and/or feedstuffs is also provided, and an outlier database vector is removed based on different comparison methods. The similarity between the query vector and each of database vectors is analyzed to produce a similarity value, and the feedstuff raw material and/or feedstuff of the database vector with the highest similarity is assigned to the sample.
Claims
1-15. (canceled)
16. A computer-implemented method for predicting a feedstuff and/or feedstuff raw material, the method comprising: a) providing a near infrared spectrum of a sample of an unknown feedstuff raw material and/or feedstuff; b) transforming absorption intensities of wavelengths or wavenumbers in the spectrum of step a) to produce a query vector; c) providing a set of database vectors of a population of spectra of known feedstuff raw materials and/or feedstuffs, wherein an outlier is removed from the set of database vectors, wherein step c) further comprises one or more of c1) to c3): c1) removing a pair of database vectors being the most dissimilar to each other in a set of database vectors from the set of database vectors, the removing comprising: c1a) calculating a similarity measure and/or a distance measure of each database vector in a set of database vectors to the other database vectors in the set of database vectors to give similarity values of pairs of database vectors; c1b) ranking the similarity values obtained in step c1a) in descending order, when a similarity measure is calculated in step c1a), or in ascending order, when a distance measure is calculated in step c1a), wherein in any case the bottom-ranked similarity value relates to the two database vectors being the most dissimilar to each other; and c1c) pairwise removing at least two database vectors with the lowest ranking in step c1b) from the set of database vectors; c2) removing a database vector being the most dissimilar on average to the other database vectors in a set of database vectors from the set of database vectors, the removing comprising: c2a) calculating a similarity measure and/or a distance measure of each database vector in a set of database vectors to the other database vectors in the set of database vectors to give similarity values of each of a database vector to the other database vectors; c2b) forming the sum of the similarity values obtained for each database vector in step c2a), and calculating the average similarity value for each database vector; c2c) ranking the average similarity values obtained in step c2b) in descending order when a similarity measure is calculated in step c2b), or in ascending order when a distance measure is calculated in step c2b), wherein in any case the bottom-ranked average similarity value relates to the database vector being the most dissimilar on average to all other database vectors; and c2d) removing the database vector with the lowest ranking in step c2c) from the set of database vectors; c3) removing a database vector being the most dissimilar to the centroid of a set of database vectors from the set of database vectors, the removing comprising: c3a) determining the centroid of all database vectors in a set of database vectors; c3b) calculating a similarity measure and/or a distance measure of each database vector to the centroid of step c3a) to give a similarity value for each database vector to the centroid; c3c) ranking the similarity values obtained in step c3b) in descending order when a similarity measure is calculated in step c3b), or in ascending order when a distance measure is calculated in step c3b), wherein in any case the bottom-ranked similarity value relates to the database vector being the most dissimilar to the centroid; and c3 d) removing at least the database vector with the lowest ranking in step c3c) from the set of database vectors; d) calculating a similarity measure and/or a distance measure between the query vector of step b) and each database vector of step c) to give a similarity value for each database vector with the query vector; e) ranking the similarity values obtained in step d) in descending order when a similarity measure is calculated in step d), or in ascending order when a distance measure is calculated in step d), wherein in any case the top-ranked database vector has the highest similarity with the query vector; and f) assigning the feedstuff raw material and/or feedstuff of the database vector with the highest similarity in step e) to the sample of step a).
17. The method of claim 16, wherein a database vector with a similarity value of 0 is removed from the set of database vectors in step c1b), c2c), and/or c3c).
18. The method of claim 16, wherein the vector in steps b) and c) is a multi-dimensional vector, with each dimension corresponding to an absorption intensity of a specific wavelength or wavenumber.
19. The method of claim 16, wherein a corresponding outlier spectrum is removed from the infrared spectra of known feedstuff raw materials and/or feedstuffs which are to be transformed into the set of database vectors, and the steps c1), c2), and/or c3) are carried out with the infrared spectra of a population of known feedstuff raw materials and/or feedstuffs.
20. The method of claim 16, wherein in step b) and/or c) the absorption intensities of equidistant wavelengths or wavenumbers in a spectrum are transformed to give a vector of a spectrum in step b) and/or c).
21. The method of claim 16, wherein the distances of the absorption intensities being transformed to vectors in step b) are identical with the distances of the absorption intensities transformed to vectors in step c).
22. The method of claim 16, wherein the population of spectra of known feedstuff raw materials and/or feedstuffs of step c) comprises at least 50 spectra of samples of each feedstuff raw material and/or feedstuff from each of its global growing areas.
23. The method of claim 16, wherein step e) comprises: e1) counting the number of occurrence of each of the feedstuff raw materials and/or feedstuff among the top-ranked database vectors in the ranking of step e), wherein said number of occurrences is indicated by the variable N; e2) weighting the first N similarity values of each of the feedstuff raw materials and/or feedstuffs according to their position in the ranking of step el) to give weighted rank positions of each of the feedstuff raw materials and/or feedstuffs; and e3) forming the sum of the weighted rank positions of step e2) for the feedstuff raw materials and/or feedstuffs to give scores of each of the feedstuff raw materials and/or feedstuffs, wherein the highest score indicates the highest similarity.
24. The method of claim 16, wherein step a) comprises recording a near infrared spectrum of a sample of an unknown feedstuff raw material and/or feedstuff.
25. A system for predicting a feedstuff raw material and/or feedstuff, the system comprising: a processing unit adapted to carry out the method of claim 16, and a database, comprising: a set of database vectors of a population of spectra of known feedstuff raw materials and/or feedstuffs, and/or a population of spectra of known feedstuff raw materials and/or feedstuffs, wherein the set of database vectors and/or the population of spectra is free from outliers.
26. The system of claim 25, further comprising a near infrared spectrometer.
Description
DESCRIPTION OF THE FIGURES
[0133]
[0134]
[0135]
[0136]
EXAMPLE
[0137] In the example 4 different types of filters, indicated as Filter 1 to Filter 4, were compared for their suitably for cleaning up a dataset for predicting a material class, i.e. a feedstuff raw material and/or feedstuff. The results for these filters were compared with the case where no use was made of a filter, indicated as Filter 0. Hence, the example with Filter 0 was a comparison example not according to the invention and the examples with Filters 1 to 4 were according to the invention. In detail, the 4 different types of filters were: [0138] Filter 0: no filter, [0139] Filter 1: the 2 most distant spectra per class were removed, [0140] Filter 2: the average distance for each spectrum for a class was calculated and the most distant spectrum was removed. [0141] Filter 3: the score for each spectrum in a class was calculated using a combination of majority-voting and weighting and the spectrum with the lowest score was removed, [0142] Filter 4: the spectrum with the highest distance to the centroid was removed.
[0143] The filters were used for cleaning up two datasets of NIR spectra for predicting a material class, specifically a feedstuff raw material and/or feedstuff. The two data sets contained spectra measured on two different infrared spectrometers, a NIRS™ DS2500 Feed Analyzer from Foss and an MPA FT-NIR Analyzer or TANGO FT-NIR Analyzer from Bruker. The four different filters were applied onto the datasets, i.e. the criteria were calculated, and the thus identified most distant spectra or the spectra with lowest score in a majority-voting and weighting procedure were removed from the dataset. Next, the criteria for the filter were recalculated and applied again until 20% of the spectra were removed from the datasets.
[0144] After application of the filters, a nearest neighbor search was carried out to predict the material code for a set of query spectra. In detail, 20 runs with 200 random query spectra were carried out, where it was counted how many times a material code was predicted correctly out of the 200 queries.
[0145] The results for the data set with spectra measured on a NIRS™ DS2500 Feed Analyzer from Foss are summarized in table 1 and the results for the data set with spectra measured on an MPA FT-NIR Analyzer or TANGO FT-NIR Analyzer from Bruker are summarized in table 2.
TABLE-US-00001 TABLE 1 Results for cleaning up a dataset with spectra measured on a Foss NIR analyzer. Run filter 0 filter 1 filter 2 filter 3 filter 4 1 197 198 199 198 199 2 195 196 195 198 195 3 198 198 198 198 198 4 196 197 197 200 197 5 198 199 199 199 199 6 196 196 195 198 195 7 196 197 197 198 197 8 196 196 197 197 197 9 200 200 200 200 200 10 197 197 197 197 197 11 198 199 199 200 199 12 196 197 197 197 197 13 199 200 200 200 200 14 194 194 194 193 194 15 195 197 197 198 197 16 197 198 198 196 198 17 199 199 199 198 199 18 197 198 197 198 197 19 196 197 197 198 197 20 195 197 197 199 197 Average 196.75 197.50 197.45 198 197.45
[0146] Each of the filters 1 to 4 in the cleaning of the dataset generally led to an improvement in the prediction of a material code, compared to filter 0. The results for the filter 1, filter 2 and filter 4 are almost identical. The use of the filter 3 led to the best improvement in the prediction results with the dataset of spectra measured on a Foss NIR analyzer than the other filters. Specifically, the use of the filter 3 gave better results than the rest of the filters in 8 of 20 runs.
TABLE-US-00002 TABLE 2 Results for cleaning up a dataset with spectra measured with a Bruker NIR analyzer. Run filter 0 filter 1 filter 2 filter 3 filter 4 1 191 193 193 198 193 2 194 195 195 198 195 3 196 196 196 196 196 4 192 194 194 198 194 5 197 198 199 199 199 6 194 194 195 197 196 7 194 194 194 198 194 8 191 194 194 199 194 9 191 194 194 197 194 10 190 190 192 195 192 11 197 198 197 199 197 12 195 195 197 198 197 13 197 197 197 200 197 14 197 198 197 199 197 15 196 197 195 198 196 16 194 196 195 197 195 17 196 197 198 198 198 18 196 196 196 197 196 19 197 197 197 198 197 20 195 195 195 197 195 Average 194.5 195.4 195.5 197.8 195.6
[0147] Each of the filters 1 to 4 in the cleaning up of the dataset generally led to an improvement in the prediction of a material code, compared to filter 0. The results for the filter 1, filter 2 and filter 4 are almost identical. Again, the use of the filter 3 led to the best improvement in the prediction results with the dataset of spectra measured on a Bruker NIR analyzer than the other filters. Here, the improvement with the filter 3 was even stronger than with the dataset of spectra measured on a Bruker NIR. Specifically, the use of the filter 3 gave better results in the improvement than the rest of the filters in 17 of 20 runs. This is also expressed by the significantly higher average value for the filter 3 than in the other cases.
[0148] Summarizing all options for removing spectral outliers from the database of vectors according to the present invention led to an improvement in the prediction of a material code, compared to the case where no spectral outliers were removed. Generally, the use of filter 3, a combination of majority-voting and weighting for removal of spectral outliers from the dataset, gave the best improvement in the prediction results for a material code. Further, the results are not dependent on the specific NIR device on which the NIR spectra of the datasets were measured.