METHOD FOR DETECTING LUNG CANCER
20220412873 · 2022-12-29
Assignee
Inventors
Cpc classification
G01N15/1456
PHYSICS
International classification
Abstract
The present invention relates to a diagnostic method for determining lung disease. The method comprises obtaining a plurality of spectra produced by spectroscopic interrogations of a plurality of cells. The method comprises determining a feature of interest from each spectrum of the plurality of spectra. The method comprises determining a distribution of the features of interest. The method comprises diagnosing a lung disease in dependence on the distribution of features of interest.
Claims
1. A diagnostic method for determining lung disease comprising: obtaining a plurality of spectra produced by spectroscopic interrogations of a plurality of cells; determining a feature of interest from each spectrum of the plurality of spectra; determining a distribution of the features of interest; and diagnosing a lung disease in dependence on the distribution of features of interest.
2. A diagnostic method according to claim 1 wherein lung disease is diagnosed in case the distribution is asymmetric.
3. A diagnostic method according to claim 2 further comprising determining a ratio of outliers to non-outliers in the distribution of features of interest, and determining asymmetry based on the ratio.
4. A diagnostic method according to claim 3 wherein the distribution is asymmetric in case the ratio of outliers to non-outliers is above a threshold.
5. A diagnostic method according to claim 4 wherein the threshold is at least 0.05, preferably at least 0.1, preferably at least 0.15.
6. A diagnostic method according to any of claims 3 to 5 wherein the outliers are determined in dependence on a one-sided boundary.
7. A diagnostic method according to claim 6 wherein the one-sided boundary is determined in dependence on a mean of the features of interest and/or in dependence on a standard deviation of the features of interest.
8. A diagnostic method according to any of claims 2 to 7 further comprising determining an asymmetry measure of the distribution of features of interest, and determining asymmetry based on the asymmetry measure.
9. A diagnostic method according to claim 8 wherein the asymmetry measure is a skew, a Pearson's skew, and/or a kurtosis.
10. A diagnostic method according to any preceding claim wherein lung disease is diagnosed in case the distribution has a spread above a threshold.
11. A diagnostic method according to claim 10 further comprising determining a ratio of outliers to non-outliers in the distribution of features of interest, and determining a spread above a threshold based on the ratio.
12. A diagnostic method according to claim 11 wherein the outliers are determined in dependence on a two-sided boundary.
13. A diagnostic method according to any of claims 10 to 12, further comprising determining a standard deviation as measure of the spread, wherein lung disease is diagnosed in case the standard deviation is above a threshold.
14. A diagnostic method according to any preceding claim wherein the plurality of cells are from the upper respiratory tract.
15. A diagnostic method according to any preceding claim wherein the plurality of cells are buccal cells.
16. A diagnostic method according to any preceding claim wherein the spectroscopic interrogations are infrared spectroscopic interrogations, Fourier-transform infrared spectroscopic interrogations, benchtop spectroscopic interrogations, and/or Raman spectroscopic interrogations.
17. A diagnostic method according to any preceding claim wherein at least 20 spectra are obtained with each spectrum from a different cell, preferably at least 50 spectra with each spectrum from a different cell, more preferably at least 75 spectra with each spectrum from a different cell, yet more preferably at least 100 spectra with each spectrum from a different cell.
18. A diagnostic method according to any preceding claim wherein the feature of interest is a peak area in a spectroscopic band of interest.
19. A diagnostic method according to any of claims 1 to 17 wherein the feature of interest is a mean value, an ordinary arithmetic mean, a weighted arithmetic mean or a centroid within a spectroscopic band of interest.
20. A diagnostic method according to any of claims 1 to 17 wherein the feature of interest is a value at a wavenumber of interest.
21. A diagnostic method according to any of claims 1 to 17 wherein the feature of interest is a wavenumber at which a spectroscopic maximum or minimum occurs within a spectroscopic band of interest.
22. A diagnostic method according to any of claims 18 to 21 wherein the spectroscopic band of interest or wavenumber of interest is one or more of: in the region of 1150 cm.sup.−1; between 1140 and 1160 cm.sup.−1; in the region of 1080 cm.sup.−1; between 1070 and 1090 cm.sup.−1; in the region of 1065 cm.sup.−1; between 1060 and 1070 cm.sup.−1; in the region of 1050 cm.sup.−1; and between 1060-1070 cm.sup.−1.
23. A diagnostic method according to any of claims 1 to 17 wherein the feature of interest is a combination of two or more of the features of interest of claims 18 to 22.
24. A diagnostic method according to any preceding claim wherein the lung disease is lung cancer or a non-cancerous respiratory disease, optionally a chronic obstructive pulmonary disease.
25. A diagnostic method according to any preceding claim wherein each spectroscopic interrogation is of a portion of a single cell, preferably of a portion of a single cell including the nucleus.
26. A diagnostic method according to claim 25 wherein the portion includes cytoplasm.
27. A diagnostic method according to any preceding claim further comprising normalising spectra to an amide II peak height.
28. A diagnostic method according to any preceding claim further comprising calculating second derivatives of the spectra.
29. A diagnostic method according to any preceding claim further comprising obtaining a plurality of cells from a subject; and/or performing spectroscopic interrogations of the plurality of cells.
30. A computer program comprising code means to carry out a method according to any preceding claim.
31. A computer readable medium carrying a computer program according to claim 30.
32. A system comprising a computer enabled to run the computer program according to claim 30.
33. The system according to claim 32, further comprising a spectrometer.
Description
[0033] These and other aspects of the present invention will become apparent from the following exemplary embodiments that are described with reference to the following figures in which:
[0034]
[0035]
[0036] A sample of buccal cells is collected from a subject and fixed for example in 4% formaldehyde or 10% neutral buffered formalin (NBF) for 20 mins. The cell suspensions are cytospun onto substrates suitable for IR transmission, for example calcium fluoride (CaF2) or zinc selenide (ZnSe) IR windows, e.g. 1 mm thick and 22 mm in diameter. Other suitable protocols for cell preparation may be used; for example cytospinning may be omitted, the cells may be permitted to sediment; excess fluid may be evaporated off; or cells may be smeared directly onto a window.
[0037] The sample of buccal cells is analysed with a suitable FTIR instrument. In an example, the sample is analysed with benchtop FTIR spectrometer with a conventional (globar) light source. Suitable examples include a Perkin Elmer Spotlight 200i FT-IR microscope coupled to a Frontier spectrometer controlled with Spectrum 10 software; or a ThermoFischer Scientific, Nicolet iN10 Mx Infrared Imaging Microscope controlled with OMIC Picta software can be used. A benchtop FTIR spectrometer may be cooled with liquid nitrogen and may have a mercury cadmium telluride (MCT) detector. Examples of suitable IR detectors include a liquid nitrogen-cooled mercury cadmium telluride (MCT) single element detector or a liquid nitrogen-cooled FPA detector in 64×64 array. In an example single point transmission measurements are recorded using a 15×15 μm aperture. A larger aperture may be selected to interrogate a larger portion of a cell. An aperture may be selected to cover substantially an entire cell. The aperture is advantageously selected smaller than the cell diameter in order to minimise Mie scattering.
[0038] Single point transmission measurements are taken for 100 individual non-apoptotised, undamaged cells per sample, selected at random (e.g. manually, or automatically with cell identification by automated image processing) from the sample of buccal cells. The measurement interrogates a portion of a single cell focusing on the nucleus, the portion preferably including the nucleus and some of the cytoplasm (in a variant the portion may include only nucleus, or only cytoplasm). Data are recorded at room temperature between 4000-600 cm.sup.− and the system is optimised to maximise signal at 1800-1000 cm.sup.−1. 16 interferograms are averaged at 4 cm.sup.−1 resolution before Fourier transformation. Absorbance spectra are calculated using as reference a background measurement (16 interferograms averaged at 4 cm.sup.−1 resolution) taken from a clear area of the window. Background spectra are recorded for example before the first cell measurement and then after every 15 cells.
[0039] Other examples of benchtop ‘FTIR spectrometer systems include a Bruker HYPERION 3000 FTIR Microscope coupled with a INVENIO spectrometer and OPUS software, or a Shimadzu AIM-9000 Microscope coupled with an IRTracer-100 spectrometer and AIMsolution software.
[0040] In a variant a synchrotron light source is used rather than a benchtop FTIR spectrometer with a conventional (globar) light source as described above. In an example a synchrotron light source is provided by the Diamond Light Source (Harwell Science and Innovation Campus, Didcot) using FTIR microspectroscopy at beamline 22. In this example FTIR data are recorded using a Bruker IFS 66s spectrometer, fitted with a KBr beamsplitter and coupled to a Bruker Hyperion 3000 microscope with a suitable IR detector, operated in an example with OPUS 7.0. A white light image is recorded using a 36× objective on the microscope.
[0041] A variety of alternatives for sample analysis to obtain FTIR data are possible, for example a 30×30 μm aperture may be used, background readings may be taken every 5 mins while taking measurements, 256 interferograms or more may be averaged, amongst many other alternatives know to the person skilled in the art.
[0042] Absorbance spectra data may be pre-processed. Absorbance spectra data can be pre-processed to normalise absorbance spectra, for example to the amide II peak height between 1465 and 1575 cm.sup.−1. Absorbance spectra data can be pre-processed to calculate the second derivatives, for example using 13 point Savitzky-Golay smoothing in order to narrow broad peaks and correct any baseline drift. Alternative procedures to normalise spectra and/or find a suitable derivative of the spectra may be used, as are well known in the art. Pre-processing may also include the steps of water subtraction, water vapour subtraction and/or baseline correction, as are well known in the art.
[0043] Specific bands of interest within the 1200-900 cm.sup.−1 region show particularly large differences between normalised spectra of samples from patients with cancer and samples from healthy subjects. An example of four bands of interest is: 1140-1160 cm.sup.−1; 1070-1090 cm.sup.−1; 1060-1070 cm.sup.−1; and 1040-1060 cm.sup.−1;. Another example of bands of interest includes a band in the region of 1050 cm.sup.−1, a band in the region of 1065 cm.sup.−1, a band in the region of 1080 cm.sup.−1 and a band in the region of 1150 cm.sup.−1.
[0044] The means and standard deviations of the cancer group and the healthy group may be analysed to determine bands with particularly large differences.
[0045] For bands of interest the peak area of individual spectra within the band are determined. A straight line is defined between the start and end points of a normalised second derivative spectrum within that band. The area between the straight line and the peak/trough of the normalised second derivative spectrum in the band of interest is calculated (referred to as the peak area).
[0046] The peak areas of the spectra are analysed to identify samples from patients with cancer.
[0047] Chi-squared testing of the calculated peak areas for a set of measurements from a sample (including data from around 100 individual cell spectra from the same patient) is performed to determine if the data is normally distributed. Across different subjects, some with lung cancer and some without lung cancer, it is observed that many of the sets of measurements have data that is not normally distributed. Wilcoxon rank-sum analysis is performed to show that the data from different patients have similar or dissimilar distribution. It is observed that many of the patients have data with dissimilar distributions.
[0048] It is observed that the distribution of peak areas from a sample belonging to a control group (subjects without lung cancer) and the distribution of peak areas from a sample belonging to a cancer group (subjects with lung cancer) is dissimilar. The spectra of a particular sample, with a number of spectra from a random selection of cells, form a cluster with a number of outliers. For the control group the cluster is typically narrower, the outliers are fewer, and the distribution is relatively symmetric; for the cancer group the cluster is more distributed and the number of outliers is greater and the asymmetry is more pronounced. It is thought that of the random selection of cells from a sample a proportion is altered in cancer patients, and therefore the spectra distribution becomes shifted.
[0049] In order to distinguish a sample from a subject without lung cancer from a sample from a subject with lung cancer, a variety of measures of the distribution can be used. For example, for a set of measurements from a sample (i.e. for around 100 individual cell spectra from the same patient) the proportion of outliers compared to non-outliers, with reference to a particular boundary, can give a suitable measure for the distribution.
[0050]
[0051]
[0052]
[0053]
[0054] The distinction illustrated in the examples correctly classifies 3 of the 4 cancer samples, and correctly classifies 13 of the 15 healthy samples. A sensitivity of 75% and a specificity of 87% is observed. In other examples the classifier correctly identifies patients with cancer with a sensitivity 60% and specificity 77.8%, and in other examples the sensitivity is 60% and the specificity is 66%.
[0055] It is known that smoking can be a confounding factor in the analysis of samples from the respiratory pathway. It is however observed that samples obtained from subjects who are smokers and are without lung cancer show the same pattern as samples obtained from subjects who are not smokers and are without lung cancer. The distinction between samples from subjects with or without lung cancer is not affected by whether or not the subject is a smoker.
[0056] It is known that chronic obstructive pulmonary diseases can be a confounding factor in the analysis of samples from the respiratory pathway. It is however observed that samples obtained from subjects without cancer but with a non-cancerous respiratory disease (including chronic obstructive pulmonary diseases) are distinct from samples obtained from subjects with lung cancer. Samples obtained from subjects with a non-cancerous respiratory disease may show a different distribution than samples obtained from subjects without a respiratory disease.
[0057] In the illustrated example a sample of buccal cells is collected and analysed. The sample of buccal cells can be collected by a buccal swab or an oral wash. In a variant the sample is collected from one or more sites in the upper respiratory tract, including other mouth, dental or tongue tissue (e.g. by swab collection), sputum, saliva, or throat, nose or pernasal tissue (e.g. by swab collection).
[0058] In the illustrated example the boundary 2 and the threshold 4 are selected based on the data shown in
[0059] In the illustrated example the boundary is a one-sided boundary, and only outliers on one side of a cluster are considered, but in an alternative the boundary is a two-sided boundary, one on either side of the cluster, and outliers on either side of the cluster are considered.
[0060] In the illustrated example only a band of interest is considered for the classification, but in an alternative two or more bands of interest are considered.
[0061] In the illustrated example the peak area in a particular band of interest is determined and analysed, but a variety of alternative measures can be used to quantify features of interest in a spectrum. Some examples include [0062] an absorbance (or a derivative of the absorbance) at a specific wavenumber [0063] a mean of the absorbance (or of a derivative of the absorbance) over a range of wavenumbers (a band of interest); the mean may be an ordinary arithmetic mean or a weighted arithmetic mean; [0064] a peak position, i.e. a wavenumber at which a peak or trough absorbance (or a derivative of the absorbance) occurs within a band of interest; [0065] a centroid of the absorbance (or a derivative of the absorbance) over a range of wavenumbers (a band of interest).
[0066] A combination of two or more of the measures quantifying features of interest in a spectrum may be used.
[0067] Other measures to quantify the distribution, and thereby to distinguish the control from the cancer group, include for example:
a standard deviation Γ:
a full width at half maximum for a histogram of the distribution; a range between top and bottom e.g. quartiles, deciles, or percentiles;
a mean absolute deviation s:
a skew y:
a Pearson's skew:
a kurtosis c:
with N elements in the set of data {x.sub.1. . . x.sub.n}, and ordinary arithmetic mean
[0068] In the illustrated example infrared spectroscopy data is used, but in an alternative Raman spectroscopy or another type of spectroscopy is used.
[0069] Various other modifications will be apparent to those skilled in the art.
[0070] It will be understood that the present invention has been described above purely by way of example, and modifications of detail can be made within the scope of the invention.
[0071] Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.
[0072] The term ‘comprising’ as used in this specification and claims preferably means ‘consisting at least in part of’.