Method for the spectrometric characterization of microorganisms

11661620 · 2023-05-30

    Inventors

    Cpc classification

    International classification

    Abstract

    The invention relates to a method for the spectrometric characterization of microorganisms, comprising: providing a test microorganism; acquiring spectrometric measurement data from the test microorganism under potential exposure to variance that is not based on taxonomic classification; selecting a classifier which is trained to determine the identity of a microorganism on a second taxonomic level; and applying the classifier to the measurement data in order to determine the identity of the test microorganism on the second taxonomic level, wherein the classifier is variance-conditioned in such a way that it largely or completely masks out the effect of variance in the characterization of the test microorganism on the second taxonomic level.

    Claims

    1. A method for the spectrometric characterization of microorganisms, comprising: providing a test microorganism whose identity is known on a first taxonomic level; acquiring spectrometric measurement data from the test microorganism under conditions which allow the influence of at least one source of variance that is not based on a taxonomic classification of the test microorganism; selecting a classifier which is trained to determine an identity of a microorganism on a second taxonomic level which is subordinate to the first taxonomic level, where possible identities of the classifier on the second taxonomic level are assigned to the known identity of the test microorganism on the first taxonomic level, and applying the classifier to the measurement data in order to determine the identity of the test microorganism on the second taxonomic level; wherein the classifier is variance-conditioned by obtaining it through training on targetedly variance-loaded spectrometric reference data of different known reference microorganisms which exhibit the same identity as the test microorganism on the first taxonomic level and cover different identities on the second taxonomic level, where the training includes the stipulation of giving greater weighting to spectral characteristics of a first type from the reference data which promote the differentiation of the different identities on the second taxonomic level, than to spectral characteristics of a second type from the reference data which are affected by the targeted variance, in order to largely or completely mask out an effect of variance in the characterization of the test microorganism on the second taxonomic level.

    2. The method according to claim 1, wherein the provision includes isolation of the test microorganism from a habitat.

    3. The method according to claim 2, wherein the habitat is a biological and/or chemical matrix.

    4. The method according to claim 3, wherein the isolation of the test microorganism includes the removal of the matrix.

    5. The method according to claim 1, wherein the provision of the test microorganism includes a multiplication step.

    6. The method according to claim 1, wherein the test microorganism is sterilized before the spectrometric measurement data are acquired.

    7. The method according to claim 6, wherein the sterilization includes exposure of the test microorganism to a metabolism-inhibiting liquid or to an impact of energy.

    8. The method according to claim 1, wherein the first taxonomic level and the second taxonomic level are immediately adjacent to each other.

    9. The method according to claim 8, wherein the first taxonomic level corresponds to a species and the second taxonomic level corresponds to a subspecies.

    10. The method according to claim 8, wherein the first taxonomic level corresponds to a species and the second taxonomic level comprises different varieties, e.g. pathogenic and non-pathogenic varieties, resistant and sensitive (susceptible) varieties, or different strains of the species.

    11. The method according to claim 1, wherein the identity of the test microorganism on the first taxonomic level was determined in advance by means of at least one of the following methods: (i) mass spectrometry, (ii) infrared spectrometry, (iii) growth on selective media (“API (Analytical Profile Index) test”) and (iv) gene sequence analyses.

    12. The method according to claim 1, wherein the one or more variances are of atmospheric origin.

    13. The method according to claim 12, wherein the one or more variances contain different values on at least one of the following scales: temperature, humidity, pressure, and carbon dioxide content of ambient air.

    14. The method according to claim 1, wherein the classifier is obtained and trained with the aid of one or more methods of machine learning.

    15. The method according to claim 14, wherein said methods of machine learning comprise at least one of artificial neural networks (ANN) or linear discriminant analyses (LDA).

    16. The method according to claim 1, wherein the characterization uses infrared spectrometric methods.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    (1) The invention can be better understood by referring to the following illustrations. The elements in the illustrations are not necessarily to scale, but are primarily intended to illustrate the principles of the invention (mostly schematically):

    (2) FIG. 1A shows a principal component analysis (PCA) in a three-dimensional space of infrared spectra of different serotypes (SV) of Streptococcus pneumoniae as a function of a varying relative humidity (rH).

    (3) FIG. 1B shows the evaluation of the same spectra as in FIG. 1A, but by means of a linear discriminant analysis directed at removing the influence of the varying humidity in the spectral characteristics.

    (4) FIG. 2 shows a greatly simplified diagram of the measurement setup of a Fourier transform infrared spectrometer (FT-IR) in transmission with acquired spectrum.

    (5) FIG. 3 shows an example sequence of steps in the preparation of a spectrometric microorganism sample before the measurement (counterclockwise).

    (6) FIG. 4 shows a principal component analysis (PCA) in a two-dimensional space of infrared spectra of different serotypes (SV) of Legionella pneumophila as a function of a varying relative humidity (rH), where the serotypes are grouped by pathogenicity.

    DETAILED DESCRIPTION

    (7) While the invention has been illustrated and explained with reference to a number of embodiments, those skilled in the art will recognize that various changes in form and detail can be made without departing from the scope of the technical teaching, as defined in the enclosed claims.

    (8) FIG. 1A shows a principal component analysis (PCA) of infrared spectra of different serotypes (SV) of the microorganism Streptococcus pneumoniae as a function of four different humidity levels between 10% and 80% relative humidity (rH #1 to rH #4), which were set specifically for the measurement in the measurement chamber of the infrared spectrometer. As is to be expected, the individual accumulations of data points are spread over the serotypes. In FIG. 1A, the four resulting data clouds of two example serotypes are emphasized with differently chain-dotted ovals (SV #1 and SV #2) for the sake of clarity. The measurements were each based on several strains/isolates of one serotype, as reflected in the elongated data clouds for the individual humidity levels. The different strains/isolates are labeled by means of different shapes such as pyramids, cubes and spheres. In addition, the separation of the measurements of the same serotype for the different humidity levels is clearly visible. Surprisingly, it is also apparent that the humidity dependence of the spectral characteristics does not cause data clouds of different serotypes to merge into one another and/or become mixed up. Instead, the individual data clouds of the different serotypes remain separate (“stratified”), including for different humidity levels, and can therefore be distinguished in the course of a characterization.

    (9) This finding gave reason to assume that this property of stable distinguishability even when the variance during a spectrometric measurement is considerable (10% to 80% on the humidity scale) can be used to eliminate this variance in the characterization by means of advanced evaluation methods, such as methods of machine learning, and thus to obviate the need for complex conversions of the spectrometers used for this purpose.

    (10) The spectra on which the diagram in FIG. 1A is based were accordingly evaluated a second time. A linear discriminant analysis was used with the specified purpose of identifying and increasing the weighting of those spectral characteristics, e.g. principal components, which maximize the distinctiveness of the individual, known serotypes, whereas those spectral characteristics which are influenced by the variances within the individual serotype classes, which essentially originate from the different strains (albeit a variance within the organism) and humidity levels during the measurement, were to be masked out as much as is possible.

    (11) The result is shown in FIG. 1B. As can be seen, the four different humidity levels, as manifested in FIG. 1A in a stratum of four clearly separate data clouds per serotype, were merged, regardless of humidity, into one continuous, spindle-shaped data cloud per serotype, without causing the individual serotype data clouds to become superimposed or overlapping each other. This result can be used to create a variance-conditioned classifier, which—when it is applied to a new spectrum of Streptococcus pneumoniae with yet unknown serotype—can assign the newly determined spectral characteristics to the known spatial volumes. To this end, the volumes of the multi-dimensional feature space, which underly the data clouds, can be marked out and an examination is carried out to ascertain which volume of the feature space is the one into which the important spectral characteristics of a new spectrum are projected.

    (12) For an infrared spectrometric characterization measurement in transmission, a Fourier transform spectrometer (FT-IR), which provides a high resolution, can be used. See diagram of the measurement setup in FIG. 2. The spectra are typically measured from 4000 cm.sup.−1 to 500 cm.sup.−1. The area highlighted in FIG. 2 (bottom) with a broken line between around 1350 cm.sup.−1 and around 700 cm.sup.−1 is deemed to be particularly productive for the spectral specificity. Several hundred spectra are measured and summed at acquisition rates of twenty spectra per second in order to improve the signal-to-noise ratio.

    (13) The infrared spectra are based on thousands of vibrations of the functional groups and the polar bonds in the biological material; these in turn originate from all the components of the microorganism cells, such as DNA, RNA, proteins, internal structures, membranes, and cell walls, through to energy stores. There are no obvious assignments of molecules to individual characteristics in the spectra, even though certain spectral ranges can be preferentially assigned to certain molecular species: the fatty acid range from 3050 to 2800 cm.sup.−1 with vibrations of the CH.sub.2 and CH.sub.3 groups, the amide range from 1750 to 1500 cm.sup.−1 with peptide bonds, the polysaccharide range from 1200 to 900 cm.sup.−1. The range from 900 to 700 cm.sup.−1 is sometimes called the fingerprint range because it contains something from all molecules and is very important for differentiating between the varieties.

    (14) In a slightly modified embodiment, the infrared spectra can also be measured in reflected light. In this case they are prepared on a metallically reflective substrate made of aluminum, for example. It is also possible to use Raman spectroscopy, which has the advantage that the spectra of the prepared microorganisms can also be measured in liquids, and also require much smaller quantities of sample material.

    (15) The knowledge gained from FIGS. 1A and 1B can be implemented to create a variance-conditioned classifier as per the steps below:

    (16) (i) Prepare the Reference Microorganisms and Specify the Variance(s).

    (17) The first task is to specify the classes to be distinguished. If knowledge of the species, e.g. Streptococcus pneumoniae, as the identity on the first taxonomic level is assumed, the objective can be to determine the corresponding serotypes as possible identities on the second, subordinate taxonomic level. As an example, the 23 serotypes of Streptococcus pneumoniae which are found most frequently in clinical tests can be selected. The reference biomass of these microorganisms can be obtained from the publicly operated depositories such as the Leibniz Institute DSMZ—German Collection of Microorganisms and Cell Cultures GmbH in Braunschweig.

    (18) To give adequate consideration to the variance within the organism, a representative selection of microorganisms of the classes to be distinguished can be taken into account. Depending on availability, this can be three to six different strains per serotype in the example of Streptococcus pneumoniae; in the case of the 23 most common serotypes, 69 to 138 strains could be used for compiling the reference data and creating the classifier.

    (19) The next task is to specify the parameter whose variance is to be imposed on the recording of the reference data and whose variable occurrence during an infrared spectrometric measurement appears possible. This can be an atmospheric variance, e.g. humidity, pressure, gas concentration or temperature. In principle, more than one variance parameter can be taken into account when recording the reference data, for example both humidity and temperature. However, broader coverage in respect of the conceivable variances is also associated with a corresponding increase in the work required to measure the reference data, since the different representative values or reference values of the variance parameters have to be recorded in combination with each other. A list of reference points of the variance parameter(s) is selected which cover all realistic conditions during a spectrometric measurement. It should be possible to interpolate between the values of this representative selection of reference points.

    (20) (ii) Recording the Reference Data

    (21) First the strains of the reference microorganisms can be prepared in a standardized way. For example, after incubation on or in a suitable culture medium and, if necessary, after being sterilized to prevent biological contaminations, they can be deposited on a specimen slide for infrared spectrometry in several replicates and then introduced into the measurement chamber. The measurement chamber is maintained at a constant, predetermined value in respect of the variance parameter(s), for example 10% relative humidity at 20° C. After the specimen slide is introduced, it is preferable to wait a certain length of time, e.g. five to ten minutes, so that the prepared biomass of the reference microorganisms can become acclimatized to the preset conditions.

    (22) After all the parameters have settled, the reference data of the prepared reference microorganisms can be recorded under the preset conditions. This procedure is repeated under the appropriately varied conditions, i.e. for example at 30%, 55% and 85% relative humidity and constant 20° C. Each change in the variance value should be followed by an acclimatization period of several minutes to allow the transient processes to decay and to obtain reproducible stable results.

    (23) This method of recording reference data can be supplemented by measurements of subordinate variances, which result, for example, from slightly different incubation conditions (biological replicates), or preparation conditions (e.g. technical replicates, use of different batches of reagents/agents or chemicals), or from measurements taken on different spectrometers to allow for instrumental variances. The reference data thus recorded is checked for completeness, obvious outliers (e.g. using methods of Local Outlier Factoring, LOF), and/or plausibility, and are corrected and/or re-recorded, where necessary.

    (24) (iii) Training of the Variance-Conditioned Classifier

    (25) It is preferable to use methods of machine learning, e.g. artificial neural networks (ANN) or linear discriminant analyses (LDA). In respect of the class affiliation, e.g. serotype #1, serotype #2, . . . , serotype #23 in the previously described example of Streptococcus pneumoniae, the training is supervised. Regarding the variance conditions, i.e. different ambient conditions (e.g. humidity) or other influencing factors (e.g. varying incubation, preparation, spectrometer), the training of the classifier is unsupervised, however. This is equivalent to the requirement to emphasize the significance of those spectral characteristics in the reference data which maximize the distinctiveness of the individual classes (here serotypes of the species Streptococcus pneumoniae), whereas those spectral characteristics which are strongly influenced by the variances have a lower weighting and are thus virtually masked out. The spectral characteristics can manifest themselves in the principal components, for example.

    (26) In simple terms, and for the purpose of illustration (without any claim to strict scientific correctness), the machine learning algorithm identifies those partial volumes in a usually multi-dimensional, multivariate feature space which are each to be assigned to one of the classes distinguished (i.e. identities on the second taxonomic level). An unexpected aspect of this basically known method of taking interferences into account was that atmospheric variances such as relative humidity do not cause the spectral characteristics of one serotype/strain to overlap with those of other serotypes/strains when the humidity varies, but instead they remain separate, and thus ensure distinguishability in a space of spectral characteristics, also under such varying conditions.

    (27) As is usual in such training phases which use reference data, there is the option to test the efficiency of the resulting classifier by means of a cross-validation. If appropriate, the machine learning algorithm can be adjusted on the basis of the results of the cross-validation in order to further improve the accuracy of the classifier.

    (28) (iv) Validation (Optional)

    (29) When the taxonomic assignment of one or more test microorganisms is known, a validation test run can be conducted under conditions which permit the expected variance (e.g. varying relative humidity) in order to verify the efficiency on the basis of external data.

    (30) This procedure for creating a classifier can be repeated to create a variance-conditioned classifier database with a very wide range of microorganisms, which in turn can be identified on different taxonomic levels. Reference data can preferably be acquired and processed from pathogens which occur in the clinical environment with the greatest frequency.

    (31) After the variance-conditioned classifier is created, the method for characterizing a microorganism can be conducted as follows, see the schematic sequence in FIG. 3:

    (32) First, the identity of the test microorganism must be known or must have been determined on the first taxonomic level, e.g. the species, using a mass spectrometer such as the MALDI Biotyper® (Bruker Daltonik GmbH, Bremen, Germany). On this basis, the variance-conditioned classifier that is appropriate for the identity determined is selected. By way of example, attention is drawn in this context to the method described in EP 3 083 981 A1.

    (33) To obtain sufficient biomass, the test microorganism can be incubated in a nutrient solution or on a flat nutrient medium. The microorganism cells thus grown can then be removed from the nutrient medium, for example by separating them from the nutrient solution, e.g. by centrifuging or filtering, or by sampling from an agar plate. For the purpose of sterilization, the microorganisms thus harvested can be re-suspended in an activity-inhibiting liquid such as ethanol (e.g. 70% v/v).

    (34) Microorganisms react very sensitively to changes in growth conditions, such as different media, temperatures, nutrients, changes in the gas supply (oxygen and others), moisture, incubation period etc. These factors can bring about changes in cell composition and in metabolism, which can be detected with infrared spectrometry. For the purpose of incubation, the cell material of a pure single colony can be spread onto an agar plate using a spatula in order to bring about confluent growth. This technique enables the sampling of cells in a very reproducible mixture of the different growth phases which are always present in colonies. For most clinically relevant strains, the optimum incubation period is around 16 to 24 hours, and the incubation temperature frequently used for bacteria is around 35° C. to 37° C. The sample material of an incubated test microorganism can be harvested directly from the center of the cell layer e.g. using a calibrated platinum loop with a diameter of one millimeter (step A).

    (35) When the test microorganism is grown on a flat nutrient medium such as agar, biomass can be sampled from one or more colonies and deposited directly on a spectrometric specimen slide. It is important to ensure uniform distribution, with the option to sterilize the biomass by irradiating it with ultraviolet light (e.g. in the case of Streptococcus pneumoniae). Alternatively, the biomass can likewise be re-suspended in a metabolism-inhibiting liquid (step B). The liquid can also be de-ionized water, which does not usually exert any metabolism-inhibiting effect. In this case also, the test microorganism can be sterilized by ultraviolet radiation or other energy source (e.g. heat) after being deposited on a test site of a specimen slide.

    (36) Care must be taken that no residues of the nutrient medium, which could interfere with the measurement result, adhere to the test microorganism taken out of or from the nutrient medium. To achieve uniform distribution of the biomass of the test microorganism in the suspension, small cylinders or beads of reaction-inert material such as steel can be added to the suspension and the sealed suspension vessel can then be shaken (step C). The suspension is then aliquoted and applied gently e.g. by means of a pipette with a plastic tip, onto the specimen slide in replicates (step D), whose number may vary from protocol to protocol. Uniform application with homogeneous layer thickness promises the best measurement results (step E). After all samples under investigation have been applied to the specimen slide, it is left to stand for several minutes, e.g. ten to thirty minutes, at a specified temperature, e.g. 37° C., for the suspensions to dry (step F). If the test microorganism is applied to the specimen slide as soon as it is harvested from the incubation vessel without any further re-suspension, the drying can be omitted completely, or at least it can be made much shorter.

    (37) The specimen slides thus prepared can then be introduced into a measurement chamber of a spectrometer and measured sample by sample under conditions which allow the influence of at least one source of variance. Several positions on the specimen slide can also be coated with test standard biomass to check the technical performance of the spectrometer, for example in line with the applicant's method explained in EP 3 392 342 A1.

    (38) The spectra recorded can be subjected to an analysis with the variance-conditioned classifier created in advance after the usual processing steps, such as baseline subtraction, smoothing and calculation of the second derivative. As described above, only (or at least predominantly) those spectral characteristics that are not influenced by the variance, or only to a slight degree, are taken into account here, whereas those spectral characteristics which exhibit a high variance-induced variation are largely or completely masked out.

    (39) Processing the measurement data with the variance-conditioned classifier leads to the spectrum under investigation being assigned to one of the possible identities on the second taxonomic level. In the example of Streptococcus pneumoniae, this means one of the referenced serotypes. Only in rare cases is a reliable characterization not possible, for example because of unforeseen disturbances during the incubation, sample preparation or measurement, or because the identity of the test microorganism sought on the second taxonomic level is not included in the reference data (e.g. in the case of a very rare serotype which is of almost no relevance in clinical practice).

    (40) FIG. 4 shows a further example of the continuing distinguishability of serotypes even under varying levels of humidity in a principal component space, which is shown in two dimensions here for the sake of clarity (PC 2, PC 3). The basis is the species Legionella pneumophila, of which fifteen serotypes are taken into account in the illustration of spectral characteristics. The serotypes are not individually classified, however, but grouped into strongly pathogenic and less pathogenic serotypes. This distinguishes the first serotype SV #1 (data cloud on the right), which is responsible for 70 to 80 percent of all legionellosis in humans, from the next fourteen SV #2 to SV #15, which are much less pathogenic, if at all. Similarly, it would also be possible to classify susceptible (sensitive) and resistant serotypes/strains together in groups.

    (41) The reference data of the different serotypes of the reference microorganisms Legionella pneumophila, whose underlying strains are coded with different symbols such as triangles, squares and circles, were acquired under four different relative humidities (arid 10%, semi-arid 30%, humid 55% and tropical 85%). In the diagram, this variance essentially manifests itself in the elongation of the data clouds along the principal component axis PC 2. However, it is clear that, irrespective of the variance, the data cloud belonging to serotype 1 (SV #1), is sufficiently removed from the grouped data cloud of the other serotypes SV #2 to SV #15 to ensure the distinguishability on the basis of spectral characteristics. If a clinic experiences an increase in the number of cases of diarrhea which can be ascribed in a first analysis, for example with the established mass spectrometric MALDI-TOF method, to the bacterial species Legionella pneumophila, an appropriately trained classifier can be used in the subsequent infrared spectrometric analysis of the isolated and incubated pathogen to distinguish the particularly pathogenic serotype SV #1 from the other less dangerous serotypes SV #2 to SV #15 in order to start a specific treatment in the case of a positive result. This procedure can of course be transferred to other microorganisms. The flexibility of the classifier creation described here is boundless.

    (42) Starting from the afore-described methods, variance-conditioned classifiers are determined for a plurality of possible micro-organisms and also for a plurality of possible sources of variance, individually and also several in combination, during a spectrometric measurement. With knowledge of the identity of a microorganism to be characterized on a first taxonomic level, a spectrometric sub-characterization of the identity on a second subordinate taxonomic level can thus be robustly and reliably carried out by selecting the appropriate variance-conditioned classifier.

    (43) Further embodiments of the invention are conceivable in addition to the embodiments described by way of example. With knowledge of this disclosure, those skilled in the art can easily design further advantageous embodiments, which are to be covered by the scope of protection of the claims, including any equivalents as the case may be.