DEVICE AND METHOD FOR DETECTING AND IDENTIFYING EXTRACELLULAR VESICLES IN A LIQUID DISPERSION SAMPLE

20210372910 · 2021-12-02

    Inventors

    Cpc classification

    International classification

    Abstract

    Device and method for detecting dispersed extracellular vesicles in a liquid dispersion sample, said method using an electronic data processor for classifying the sample as having, or not having, extracellular vesicles present, the method comprising the use of the electronic data processor for pre-training a machine learning classifier with a plurality of extracellular vesicle liquid dispersion specimens comprising the steps of: emitting a laser modulated by a modulation frequency onto each specimen; capturing a temporal signal from laser light backscattered by each specimen for a plurality of temporal periods of a predetermined duration for each specimen; calculating specimen DCT or Wavelet transform coefficients from the captured signal for each of the temporal periods; using the calculated coefficients to pre-train the machine learning classifier; wherein the method further comprises the steps of: using a laser emitter having a focusing optical system coupled to the emitter to emit a laser modulated by a modulation frequency onto the sample; using a light receiver to capture a signal from laser light backscattered by the sample for a plurality of temporal periods of a predetermined duration; calculating sample DCT or Wavelet transform coefficients from the captured signal for each of the temporal periods; using the pre-trained machine learning classifier to classify the calculated sample coefficients as having, or not having, extracellular vesicles present.

    Claims

    1. A method for detecting dispersed extracellular vesicles in a liquid dispersion sample, said method using an electronic data processor for classifying the sample as having, or not having, extracellular vesicles present, the method comprising the use of the electronic data processor for pre-training a machine learning classifier with a plurality of extracellular vesicle liquid dispersion specimens comprising the steps of: emitting a laser modulated by a modulation frequency onto each specimen; capturing a temporal signal from laser light backscattered by each specimen for a plurality of temporal periods of a predetermined duration for each specimen; calculating specimen DCT or Wavelet transform coefficients from the captured signal for each of the temporal periods; using the calculated coefficients to pre-train the machine learning classifier; wherein the method further comprises the steps of: using a laser emitter having a focusing optical system coupled to the emitter to emit a laser modulated by a modulation frequency onto the sample; using a light receiver to capture a signal from laser light backscattered by the sample for a plurality of temporal periods of a predetermined duration; calculating sample DCT or Wavelet transform coefficients from the captured signal for each of the temporal periods; and using the pre-trained machine learning classifier to classify the calculated sample coefficients as having, or not having, extracellular vesicles present.

    2. The method according to the previous claim wherein the extracellular vesicles have a particle size, in any particle direction, below 1 μm.

    3. The method according to claim 1, further comprising the electronic data processor classifying, if present, the extracellular vesicle into one of a plurality of extracellular vesicle type classes by using the machine learning classifier which has been pre-trained using a plurality of extracellular vesicle liquid dispersion specimen type classes.

    4. (canceled)

    5. The method according to claim 1, wherein the laser is further modulated by one or more additional modulation frequencies.

    6. The method according to claim 1, wherein the specimen modulation frequency and the sample modulation frequency are identical.

    7. (canceled)

    8. The method according to claim 1, wherein the captured plurality of temporal periods of a predetermined duration are obtained by splitting a captured temporal signal of a longer duration than the predetermined duration.

    9. The method according to claim 8, wherein the split temporal periods are overlapping temporal periods.

    10. (canceled)

    11. The method according to claim 1, wherein the electronic data processor is further arranged to pre-train and classify using time domain histogram-derived or time domain statistics-derived features from the captured signal, including features selected from the list consisting of: wNakagami; μNakagami; entropy; standard deviation; and combinations thereof.

    12. The method according to claim 1, wherein the focusing optical system is a convergent lens having, a polymeric ptotoconcentrator arranged at the tip of an optical fibre or waveguide.

    13. The method according to claim 12, wherein the lens has a focusing spot corresponding to a beam waist of ⅓th to ¼th of a base diameter of the lens.

    14. The method according to claim 11, wherein the lens has a Numerical Aperture, NA, above 0.5.

    15. (canceled)

    16. (canceled)

    17. (canceled)

    18. (canceled)

    19. The method according to claim 1, wherein the calculation of transform coefficients comprises selecting a minimum subset of transform coefficients such that a predetermined percentage of the total energy of the signal is preserved by the transform.

    20. (canceled)

    21. The method according to claim 1, further comprising signal capture of at least a sampling frequency of at least five times the modulation frequency.

    22. (canceled)

    23. (canceled)

    24. A non-transitory storage media including program instructions for implementing a method for detecting extracellular vesicles in a liquid dispersion sample, the program instructions including instructions executable by an electronic data processor to carry out the method of claim 1.

    25. A device for detecting dispersed extracellular vesicles in a liquid dispersion sample, said device comprising a laser emitter; a focusing optical system coupled to the emitter; an infrared light receiver; and an electronic data processor arranged to classify the sample as having, or not having, extracellular vesicles present using a machine learning classifier which has been pre-trained using a plurality of extracellular vesicle liquid dispersion specimens by a method comprising: emitting a laser modulated by a modulation frequency onto each specimen; capturing a temporal signal from laser light backscattered by each specimen for a plurality of temporal periods of a predetermined duration for each specimen; calculating specimen DCT or Wavelet transform coefficients from the captured signal for each of the temporal periods; using the calculated coefficients to pre-train the machine learning classifier; wherein the electronic data processor is further arranged to: use the laser emitter to emit a laser modulated by a modulation frequency onto the sample; use the light receiver to capture a signal from laser light backscattered by the sample for a plurality of temporal periods of a predetermined duration; calculating sample DCT or Wavelet transform coefficients from the captured signal for each of the temporal periods; and using the pre-trained machine learning classifier to classify the calculated sample coefficients as having, or not having, extracellular vesicles present.

    26. The device according to claim 25, wherein the electronic data processor is further arranged to classify, if present, the extracellular vesicle into one of a plurality of extracellular vesicle type classes by using the machine learning classifier which has been pre-trained using a plurality of extracellular vesicle liquid dispersion specimen type classes.

    27. The device according to claim 25, wherein the extracellular vesicles have a particle size, in any particle direction, below 1 μm.

    28. The device according to according to claim 25, wherein the laser is an infrared laser.

    29. The device according to claim 25, wherein the split temporal periods are overlapping temporal periods.

    30. The device according to claim 25, wherein the electronic data processor is further arranged to pre-train and classify using time domain histogram-derived or time domain statistics-derived features from the captured signal, including features selected from the list consisting of: wNakagami; μNakagami; entropy; standard deviation; and combinations thereof.

    31. The device according to according to claim 25, wherein the focusing optical system is a convergent lens having a focusing spot corresponding to a beam waist of ⅓th to ¼th of a base diameter of the lens.

    32. (canceled)

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0072] The following figures provide preferred embodiments for illustrating the description and should not be seen as limiting the scope of invention.

    [0073] FIG. 1: Schematic representation of the optical setup according to an embodiment.

    [0074] FIG. 2: Schematic representation of the optic concentrator according to an embodiment.

    [0075] FIG. 3: Schematic representation of the signal processing flow according to an embodiment.

    [0076] FIG. 4: Graphical representations of 2D distribution of the two most significant DCT-derived features extracted from the simulated for EVs type a and EVs type back-scattered signals (crosses vs circles)

    [0077] FIG. 5: Schematic representation according to an embodiment of how data is split for training and testing, considering an example of an experiment including three classes of particles, wherein by “n” is intended to represent the number of evaluation runs/number of different combinations between train and test sets.

    [0078] FIG. 6: Signal charts for an experiment for complex solutions containing complex biological nanoparticles.

    DETAILED DESCRIPTION

    [0079] The disclosure is described in more detail below.

    [0080] In FIG. 1 an optical setup is depicted (100). A pigtailed 980 nm laser (500 mW, Lumics, ref. LU0980M500) (105) was included in the optical setup. A 50/50 fiber coupler with a 1×2 topology (110) is used for connecting two inputs—the laser (105) and the photodector (115) (back-scattered signal acquisition module). The optical fiber tip (120) was then spliced to the output of the fiber coupler (110) and inserted into a metallic capillary (125) controlled by the motorized micromanipulator (130). This configuration allowed both laser light guidance to the optical fiber tip (120) through the optical fiber and the acquisition of the back-scattered signal through a photodetector (PDA 36A-EC, Thorlabs) (115). In addition to the photodetector, the back-scattered signal acquisition module was also composed by an analog-to-digital acquisition board (National Instruments DAQ) (135), which was connected to the photodetector (115) for transmitting the acquired signal to the laptop where it is stored for further processing (145). A digital-to-analog output of the DAQ (135) was also connected to the laser for modulating its signal using a sinusoidal signal with a fundamental frequency of 1 KHz. A liquid sample (140) is loaded over a glass coverslip and a fiber with the photoconcentrator (120) on its extremity is inserted into the sample.

    [0081] The type of photoconcentrator is presented in FIG. 2 and consists in a polymeric lens fabricated through a guided wave photopolymerization method. This photoconcentrator is characterized by a converging spherical lens with a NA>0.5, able to focus the laser beam onto a highly focused spot corresponding to a beam waist of about ⅓-¼th of the base diameter of the lens. Additionally, a base diameter between 6-8 μm (205) and a curvature radius between 2-3.5 μm is also a suitable solution. The fiber tip with the photoconcentrator is immersed into the liquid sample and the back-scattered signal is acquired considering different locations of the tip in the solution.

    [0082] Reference is made to FIG. 3 to explain signal acquisition and processing. Back-scattered raw signal was acquired through a photodetector (PDA 36A-EC, Thorlabs) connected to an Analog-to-Digital converter (National Instruments DAQ) at a sampling rate of 5 kHz for all the Experiments (I-VII). After each acquisition, the original signal was passed through processing steps. During signal processing, the signal was at first filtered, using a second-order 500 Hz Butterworth high-pass filter (305), since the input irradiation laser was modulated using a 1 kHz sinusoidal signal, and to remove noisy low-frequency components of the acquired signal (e.g. 50 Hz electrical grid component). Then, the entire signal acquired for each particle and condition is split into epochs of 2 seconds (310). The z-score of each 2 seconds signal portion is computed in order to remove noisy signal epochs (315). 2 seconds z-scored signal portions which, in magnitude, exceeded the threshold value between 5-10 are removed (315). After these steps, it was possible to obtain a dataset with 2 s signal portions with a reasonable Signal to Noise Ratio (SNR) for the EVs type identification to be possible (320).

    [0083] A total of 54 features are extracted (FIG. 3, 325) from the back-scattered signal to characterize each class that could be separated in two main types: time-domain and frequency-domain features. The first set can be divided into two subsets: time-domain statistics and time-domain histogram-derived features. The frequency-domain set is also divided into two groups: Discrete Cosine Transform (DCT)-derived features and Wavelet features. The 54 features considered are summarized in table 1.

    [0084] The following time-domain statistics features are extracted from each 2-seconds signal portion: Standard Deviation (SD), Root Mean Square (RMS), Skewness (Skew), Kurtosis (Kurt), Interquartile Range (IQR), Entropy (E), considering its adequacy in differentiating with statistical significance synthetic particles from different types and yeast cells. Considering that the Nakagami distribution have been widely used to describe the back-scattered echo in statistical terms, mainly within the Biomedical area, the Probability Density Function (PDF)-derived μ.sub.Nakagami and ω.sub.Nakagami parameters that better fit the approximation of each 2-seconds signal portion distribution to the Nakagami distribution are also considered. These were the time-domain histogram-derived parameters considered in the classification. In total, eight features obtained through time-domain analysis of the back-scattered signal are used by the proposed method. Considering the ability to capture minimal periodicities of the analyzed signal, the associated coefficients being uncorrelated and due to the fact, in contrast to the Fast Fourier Transform (FFT), it does not inject high frequency artefacts in the transformed data, the Discrete Cosine Transform (DCT) is applied to the original short-term signal portions to extract frequency-derived information. Considering that the first n coefficients of the DCT of the scattering echo signal are defined by the following equation:

    [00001] E i DCT [ l ] = .Math. k = 0 N - 1 .Math. i [ k ] cos [ π l ( 2 k + 1 ) 2 N ] , for l = 1 .Math. n , ( 1 )

    [0085] in which ε.sub.i is signal envelope estimated using the Hilbert transform; by sorting the DCT coefficients from the highest to the lowest value of magnitude and obtaining the following vector:


    y.sub.i=(E.sub.i.sup.DCT, . . . ,E.sub.i.sup.DCT[l.sup.n]).sup.T,  (2)

    [0086] in which E.sup.DCT.sub.i[l.sup.1] represents the highest DCT coefficient in magnitude, it is possible to determine the percentage of the total amount of the signal energy that each set of coefficients represent (organized from the highest to the lowest one). Each percentage value regarding each set of coefficients (from the first to the nth coefficient) can be obtained by dividing the norm of the vector formed by the first till the nth coefficient by the norm of the vector composed by all the n coefficients. Thus, the following DCT-derived features are used for characterizing each 2 s signal portion: the number of coefficients needed to represent about 98% of the total energy of the original signal (N.sub.DCT), the first 20, 30 or 40 DCT coefficients extracted from the vector defined in (2), the Area Under the Curve (AUC) of the DCT spectrum for all the frequencies (from 0 to 2.5 kHz) (AUC.sub.DCT), the maximum amplitude of the DCT spectrum (Peak.sub.DCT) and the signal power spectrum obtained through the DCT considering all the values within the frequency range analyzed (from 0 to 2.5 kHz) (P.sub.DCT)—please consult Table 1. The remaining 12 features were extracted after 2-seconds signal portion decomposition using wavelets.sup.21 (consult table 1). Two mother wavelets—Haar and Daubechies (Db10)—are selected to characterize each back-scattered signal portion. Six features for each type of mother Wavelet based on the relative power of the Wavelet packet-derived reconstructed signal (one to six levels) are therefore extracted from each short-term 2 seconds signal.

    [0087] The disclosure is able to detect and identify different types of extracellular vesicles because extracts frequency derived features from the backscattering signal that are sensitive to particle's dimension, optical polarizability and microscopic refractive index.

    [0088] As stated in Equation 3, nanoparticles motion is influenced by both the diffusivity D and the response of the particle to the optical potential that is exerted on it by the highly focused electromagnetic field. Therefore, the variability of the particle position along time is given by the Equation 3:

    [00002] σ ( t ) = k B T k potential [ 1 - e ( - 2 k potential Dt k B T ) ] ( 3 )

    [0089] Where k.sub.potential determines the response of the particle to the optical potential and depends on the particle polarizability a, which is presented in equation 4:

    [00003] k potential = ( 2 π c I ) α .Math. 1 x

    [0090] Where ∇I represents the gradient of the electromagnetic field over 1D and x is the coordinate of given point in 1D subjected to the forces exerted by the applied electromagnetic field. The particle polarizability a is defined as:

    [00004] α = n m 2 r 3 ( n p 2 n m - 1 n p 2 n m + 2 ) ( 4 )

    [0091] Where n.sub.p is the microscopic refractive index of the particle and n.sub.m is the refractive index of the media.

    [0092] Equations 3 and 4 contrast with the “simpler” formulation used to describe the Brownian motion of nanoparticles in state-of-art methods (e.g. dynamic light scattering), which solely depends on the diffusivity D of the particle within the dispersion. This simple Brownian motion is given by the variability of the particle position along time (σ(t)):

    [00005] σ ( t ) = 2 Dt . and D : D = k B T 6 π η r ( 5 )

    [0093] where k.sub.B is the Boltzmann constant, T is the absolute temperature, η is the viscosity of the fluid and r the radius of the particle. Thus, this mathematical formulation of the Brownian motion states that the particle position along time (σ(t)) just depends on nanoparticles' radius.

    [0094] Reference is made to FIG. 4 to illustrate the results obtained for the intensity of the light scattered by an ensemble of two populations of different EVs with approximately the same size (populations a and b) using theoretical simulations. Two populations of EVs a and b were used and are characterized by: r.sub.a=100 nm, r.sub.b=120 nm, and a ratio between k.sub.potential,a and k.sub.potential, b of 2. FIG. 4 highlights the instrumental role of considering optical polarizability and microscopic refractive index together with particle's dimension to obtain a perfect separation between two different classes of EVs (FIG. 4A), recapitulating the experimental results obtained in the lab (FIG. 4B). Class separation was not achieved when just the simple Brownian motion was considered (FIG. 4C).

    [0095] A classification algorithm is used to detect EVs in liquid samples, namely Random Forests classifier.

    [0096] Reference to FIG. 5 is made to explain the Leave-One-Out procedure (400), that was performed to ensure that the data used for evaluating the performance of a classifier belongs to a subject/entity who was never involved in the training. Thus, if a dataset is composed by data from n subjects/entities, the test set is divided in n testing rounds, in which, in each round, the data from a subject are used for test and the data from the remaining n−1 subjects are used for classifier training. In the next round, the data subset from another subject that was selected for training in the previous round is used separately for testing the classifier. Then, the classifier performance is determined based on the mean values obtained after the n testing rounds.

    [0097] The above mentioned method and device was used in several experiments to prove its feasibility and potential for the aimed objective. Thus, Experiments II, IV V, VI and VII were designed not to individualize a specific particle and identify it, but instead to detect the presence of a given type of nanoparticles in solution, the above Leave-One-Out based method was slightly modified. The factor that differentiated the 2 seconds signal portions acquired during experiments involving nanoparticles and microparticles was the place where they were taken between acquisitions. Thus, signal portions used for test were acquired at different locations from the ones considered for training during the Experiments with nanoparticles, a way to avoid overfitting effects. Note that, in these cases, it was not possible to individualize particles due to their nanoscale dimensions and the inability of our fiber tools to trap them.

    [0098] The most accurate classification rate for each one of the Experiments/Problems and nth evaluation run was obtained by determining the most suitable combination of values between the three parameters (FIG. 5; 405): number of trees, number of predictors to sample and minimum leaf size—please consult table 1. This combination, therefore, produces a classifier trained considering that combination of values (FIG. 5, 405). The most effective combination of these parameters was then determined using five-fold cross-validation (FIG. 5, 405), for each Experiment and evaluation run, during the training phase. However, training samples were normalized. Training samples mean value across each feature was subtracted to each data sample from that feature, and then divided by the corresponding feature standard deviation. Test input samples must be normalized also according to this procedure, using the previously obtained training mean and standard deviation for each feature. This allows to map the novel test features vectors in the training features space.

    TABLE-US-00001 TABLE 1 List of parameters tuned during classifier training stage for model optimization. Training Parameters Nr. of Trees 5, 20, 30, 40, 50, 60, 70, 80, 90, 100 Min. Leaf Size 3, 5, 7 Nr. Predictors To Sample 5, 7, 9, 11, 13, 15 Nr. of Optimization Runs 10 × 3 × 6 = 180 Nr. - Number. Min. - Minimum.

    [0099] The two selected cell lines and their EVs used in the Experiments II, VI, and VII were derived from the gastric cancer cell line MKN45: HST6, genetically modified to present shorter/truncated O-glycans at their surface, due to the over-expression of the ST6GalNAc1 sialyltransferase—and Mock—the corresponding control cells transfected with the empty vector that does not induce any change on O-glycans. The referred Mock and HST6 cancer cell lines only differ in the O-glycans (carbohydrates) attached to their surface.

    [0100] Shorter or truncated O-glycans are considered predictive markers of poor prognosis in certain cancers. These phenomena are frequently associated with an incomplete glycans synthesis during cell glycosylation, in comparison with the cellular pathway under healthy conditions.

    [0101] Experiment II tested the identification and classification of eukaryotic cells in Phosphate Buffered Saline (PBS) in a four-classes problem. Three types of solutions were prepared to test the proposed single-cell identification method. Two of them were composed by the differently glycosylated cancer cells described below—Mock and HST6—suspended in PBS (Phosphate-Buffered Saline, 1×). The third solution contained 8 μm Polystyrene (PS) synthetic microspheres also suspended in PBS (1×).

    [0102] Experiment IV tested the identification and classification of bacterial cells in PBS in a three-classes problem: (1) “no particle trapped”; (2) “Lactobacillus Acidophilus yogurt bacteria trapped”, and (3) “Streptococcus Thermophilus yogurt bacteria trapped” (target dimensions: 1.5-0.6 μm).

    [0103] Experiments VI and VII tested the identification and classification of extracellular vesicles produced HST6 and Mock cells.

    [0104] Experiment VI tested Mock- and HST6-derived exosomes suspended in PBS through the proposed method and device; Classes considered: “Class 1: No exosomes (only blank solution)”; “Class 2: Presence of Mock-derived exosomes in suspension” and “Class 3: Presence of HST6-derived exosomes in suspension”.

    [0105] Experiment VII was carried out in challenging conditions using PBS supplemented with Fetal Bovine Serum (FBS) to resuspend EVs, a complex liquid medium with high concentrations of proteins, sugars and lipids. This FBS was treated to remove the native EVs. FIG. 6 shows the backscattered signals obtained with three different types of samples: EV-free FBS with cell culture media (A), EV-free FBS supplemented with Mock EVs with cell culture media (B), and EV-FBS free supplemented with HST6 EVs with cell culture media (C).

    [0106] Table 2 summarizes experimental results obtained with the present disclosure, in particular results regarding the differentiation performance between cells or EVs through the proposed method and device.

    [0107] The term “comprising” whenever used in this document is intended to indicate the presence of stated features, integers, steps, components, but not to preclude the presence or addition of one or more other features, integers, steps, components or groups thereof. The disclosure should not be seen in any way restricted to the embodiments described and a person with ordinary skill in the art will foresee many possibilities to modifications thereof. The above described embodiments are combinable. The following claims further set out particular embodiments of the disclosure.

    TABLE-US-00002 TABLE 2 Results of extracellular vesicles (EVs) identification in three different scenarios (Experiments V, VI and VII) Number of different Total nr. of Nr. of test Nr. of acquisition 2s signal evaluation train Nr. of test Test Test Class spots portions runs (n) samples samples Accuracy F-Measure Experiment II 1: No particle or 16 852 1000 3183 ± 18 207 ± 18 0.963 ± 0.045 0.917 ± 0.101 cell 2: Mock cell 18 813 3: HST6 cell 16 903 4: PS particle 16 821 Experiment IV 1: No particle  4 135 17 257 ± 6 97 ± 5 0.877 ± 0.047 0.808 ± 0.074 2: Lactobacillus  4 144 bacteria 3: Streptococcus  5 185 bacteria Experiment V 1: Blank solution  8 135 100  34 ± 0 38 ± 1 0.901 ± 0.130 0.865 ± 0.195 2: 100 nm  7  95 polystyrene nanoparticles Experiment VI 1: Blank solution 10 290 500  790 ± 13  72 ± 13 0.918 ± 0.109 0.823 ± 0.209 (PBS) 2: Mock EVs in PBS 13 339 3: HST6 EVs in PBS 15 433 Experiment VII 1: Blank solution 13 390 500  851 ± 38 146 ± 36 0.982 ± 0.039 0.939 ± 0.127 (FBS) 2: Mock EVs in FBS 14 369 3: HST6 EVs in FBS 14 369

    TABLE-US-00003 TABLE 3 EVs identification performance difference considering an exposed perpendicularly cleaved optical fiber and an optical fiber with the photoconcentrator on its extremity. Assay I Assay II F-Measure (%) Accuracy (%) F-Measure (%) Accuracy (%) Culture Media Culture Media Culture Media Culture Media with FBS with FBS (free of EVs) with FBS with FBS (free (free of EVs) (free of EVs) of EVs) EVs concentration 200 μL solution with 200 μL solution with 20 μL solution 20 μL solution EVs (1:1000) EVs (1:1000) with EVs (1:10) with EVs (1:10) Acquisition Height (h) Fixed (only x, y random) Fixed (only x, y random) Random (all x, Random (all y, z values x, y, z values random) random) N (evaluation runs) 300 300 300 300 With photoconcentrator 0.9430 ± 0.1195 0.9873 ± 0.0254 0.8443 ± 0.9200 ± 0.1718 0.0810 Without photoconcentrator — — 0.6980 ± 0.7860 ± 0.1788 0.1354