CHROMATOGRAM DATA PROCESSING DEVICE
20200088700 ยท 2020-03-19
Assignee
Inventors
Cpc classification
G01N30/7233
PHYSICS
International classification
Abstract
A peak detection unit collects peak information by executing peak detection on data obtained by performing LC/MS analysis on a plurality of specimens. A same-component candidate extraction unit extracts peaks between which retention time difference and m/z value difference are equal to or smaller than an allowable value among two or more peaks for specimens different from each other, and a spectrum similarity determination unit calculates similarity between mass spectra corresponding to the two or more peaks, respectively. When the similarity is equal to or larger than a predetermined value, it is determined that the two or more peaks are attributable to the same component, and a retention-time and m/z-value correction unit performs correction to eliminate any difference between the retention times or m/z values of peaks. A data array table production unit produces a data array table based on peak information after the retention time and m/z value correction.
Claims
1. A chromatogram data processing device configured to process data of a plurality of specimens collected by using an analysis device including a chromatograph configured to separate a plurality of components contained in a specimen in a time direction and a detection unit configured to acquire signal intensities in a second dimension different from the time direction for the specimen after being separated by the chromatograph, the chromatogram data processing device comprising: a) a peak detection unit configured to execute peak detection on a plurality of sets of chromatogram data of the plurality of specimens and to collect peak information including a retention time for each detected peak; b) a same component determination unit configured to determine, when difference between at least retention times of two or more peaks derived from specimens different from each other is zero or within a predetermined range, whether the peaks are attributable to a same component based on similarity between signal intensity waveforms along the second dimension or between signal intensity values at a value of the second dimension, and correct the retention times and/or values of the second dimension of one or more of the peaks as necessary; and c) a data list production unit configured to arrange, based on data corrected by the same component determination unit, the retention time and the second dimension in one of a column direction and a row direction, and information for identifying a plurality of specimens in the other of the column direction and the row direction, and produce a data list in a table format including, as a matrix element, a signal intensity value at a retention time and a second dimension value of a specimen.
2. The chromatogram data processing device according to claim 1, wherein the same component determination unit calculates similarity between signal intensity waveforms in the direction of the second dimension in retention times of peak tops of two or more peaks derived from specimens different from each other, and determines whether the peaks are attributable to the same component based on the similarity.
3. (canceled)
4. The chromatogram data processing device according to claim 1, wherein the detection unit is a mass spectrometer, and the same component determination unit determines whether the peaks are attributable to the same component based on similarity between mass spectrum waveforms.
5. The chromatogram data processing device according to claim 1, wherein the detection unit is a photodiode array detector or an ultraviolet-visible absorption spectroscopic detector, and the same component determination unit determines whether the peaks are attributable to the same component based on similarity between absorption spectrum waveforms.
6. The chromatogram data processing device according to claim 1, wherein the similarity is similarity between spectrum patterns along the second dimension.
7. The chromatogram data processing device according to claim 1, wherein the similarity is similarity of a ratio of signal intensity values at a plurality of second dimension values along the second dimension.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0023]
[0024]
[0025]
[0026]
DESCRIPTION OF EMBODIMENTS
[0027] The following describes an LC-MS as an exemplary analysis device including a chromatogram data processing device according to the present invention with the accompanying drawings.
[0028]
[0029] The LC-MS of the present example includes a measurement unit 1 configured to execute measurement on a specimen, a data processing unit 2, and an input unit 3 and a display unit 4 as user interfaces.
[0030] The measurement unit 1 includes a liquid chromatograph unit (LC unit) 11 and a mass spectrometer (MS unit) 12. Although not illustrated, the LC unit 11 includes a pump configured to supply a mobile phase at a constant flow speed, an injector configured to inject a specimen into the supplied mobile phase, and a column configured to separate various components contained in the specimen in the time direction. The MS unit 12 includes an ion source configured to ionize components of elution liquid eluted from a column exit of the LC unit 11 upstream of the MS unit 12, a quadrupole mass filter configured to separate generated ions in accordance with the mass-to-charge ratio, a mass separator such as a time-of-flight mass separator, and a detector configured to detect the separated ions.
[0031] The data processing unit 2 includes, as functional blocks, a data storage unit 20, a peak detection unit 21, a same-component candidate extraction unit 22, a spectrum similarity determination unit 23, a retention-time and m/z-value correction unit 24, a data array table production unit 25, and a multivariate analysis processing unit 26. The data storage unit 20 stores, for each specimen, a data file in which data of a signal intensity value including the two parameters of the retention time and the mass-to-charge ratio, in other words, three-dimensional chromatogram data is recorded.
[0032] The entity of the data processing unit 2 is a personal computer. The function of each component described above may be achieved when dedicated data processing software installed on the personal computer is executed by the computer.
[0033]
[0034] The following describes characteristic data processing at the LC-MS of the present example with reference to these drawings. This data processing performs multivariate analysis of determining difference and similarity between a plurality of specimens based on data files for the specimens, which are stored in the data storage unit 20 in advance.
[0035] An operator (user) specifies, through the input unit 3, a plurality of data files to be subjected to multivariate analysis (step S1). When the processing is started, the peak detection unit 21 reads the specified data files from the data storage unit 20. Then, peak picking is performed in accordance with a predetermined reference on three-dimensional chromatogram data stored in each data file, and the retention time, the mass-to-charge ratio, and the signal intensity value at the peak top of a peak are collected as peak information (step S2). Typically, a large number of peaks are detected from data in one data file corresponding to one specimen.
[0036] The same-component candidate extraction unit 22 extracts, from two or more peaks extracted from data files different from each other, peaks between which the retention time difference is equal to or smaller than a predetermined allowable value and the mass-to-charge ratio difference is equal to or smaller than a predetermined allowable value. The allowable values are preferably determined as appropriate in advance. The retention time allowable value may be determined with taken into account, for example, variance and variation in the flow speed of the mobile phase at the LC unit 11. The mass-to-charge ratio allowable value may be determined with device performance such as the mass accuracy of the MS unit 12 mainly taken into account. As described above, a pair of peaks extracted from data files different from each other, respectively, are candidates for peaks attributable to a same component.
[0037] Then, the spectrum similarity determination unit 23 produces mass spectra at a plurality of peaks included in one pair of peaks that are extracted as described above based on data in the data files, in other words, that are candidates for peaks attributable to the same component in the retention time. Then, spectrum pattern similarity between the mass spectra is calculated in accordance with a predetermined algorithm (step S3). When the plurality of peaks are peaks attributable to the same component, high similarity should be obtained between the spectrum patterns of the mass spectra corresponding to the plurality of respective peaks. Thus, it is determined whether the calculated similarity is equal to or larger than a predetermined threshold (step S4). When the similarity is equal to or larger than the threshold, it is determined that the plurality of peaks are peaks attributable to the same component (step S5).
[0038] As illustrated in
[0039] When it is determined that a plurality of peaks are peaks attributable to the same component, any difference between the plurality of peaks in the retention time needs to be eliminated. Thus, the retention-time and m/z-value correction unit 24 equalizes the retention times by using one or both of the retention times. For example, the average of a plurality of retention times may be calculated, and the retention times may be equalized to the average. In addition, any difference between the plurality of peaks in the mass-to-charge ratio needs to be eliminated, and thus the retention-time and m/z-value correction unit 24 equalizes the mass-to-charge ratios by using one or both of the mass-to-charge ratios as in the case of the retention times (step S6).
[0040] Then, it is determined whether the processing at steps S3 to S6 has been executed for all peaks extracted based on the retention time and the mass-to-charge ratio as candidates for peaks attributable to the same component (step S7). The process returns to steps S7 to S3 when any peak is unprocessed. Accordingly, through repetition of the processing at steps S3 to S7, whether peaks are attributable to the same component is determined for all peaks extracted based on the retention time and the mass-to-charge ratio, and the processing of equalizing retention times and mass-to-charge ratios is performed for a plurality of peaks determined to be attributable to the same component.
[0041] When the determination is positive at step S7, the data array table production unit 25 arranges, based on peak information after the retention times and the mass-to-charge ratios are corrected, the retention times and the mass-to-charge ratios in the longitudinal direction and specimen identification information (for example, specimen numbers and specimen names) in the lateral direction as illustrated in
[0042] As described above, in the LC-MS of the present example, when retention time difference and mass-to-charge ratio difference of the same component are present in data obtained for different specimens, the differences can be appropriately corrected and can be handled as identical peaks. Accordingly, the accuracy of a result of the multivariate analysis based on the data array table is improved.
[0043] Various similarities can be used as the similarity between a plurality of mass spectra at step S3, but, for example, a Pearson's moment correlation coefficient can be used. As is well known, the Pearson's moment correlation coefficient is same as the cosine (cos) of two vectors. Alternatively, for example. Euclidean distance, Mahalanobis distance, Minkowski distance. Chebyshev distance, or Manhattan distance can also be used as similarity.
[0044] It may be determined whether peaks are attributable to the same component by using, in place of the similarity between the spectrum patterns of mass spectra, the similarity of a signal intensity value at a particular mass-to-charge ratio or a ratio of signal intensity values at a plurality of mass-to-charge ratios, in other words, difference or distance.
[0045] As it is clear from the above description, when the spectrum patterns of mass spectra are too simple, it is difficult to determine whether peaks are attributable to the same component. Thus, for example, a mass spectrum in which only protonated (or proton-eliminated) ions are observed is not much suitable for the determination of whether peaks are attributable to the same component, and a mass spectrum on which a compound structure is reflected, such as a mass spectrum using fragments by an electron ionization (EI) method or an ISD spectrum using in-source dissociation (ISD), is more suitable. For the same reason, an MS/MS (MS.sup.n) spectrum obtained by MS/MS analysis or MS.sup.n analysis is suitable for the determination of peaks attributable to the same component.
[0046] The chromatogram data processing device according to the present invention is also applicable to processing of data obtained by other various chromatograph devices as well as an LC-MS and a GC-MS. Specifically, the chromatogram data processing device is also applicable to processing of data obtained by an LC including a PDA detector, an ultraviolet-visible absorption spectroscopic detector, a spectral fluorescence detector, a differential refractive index detector, an electric conductivity detector, or the like as a detector, or by a GC including a thermal conductivity detector, an electron capture detector, a flame photometric detector, a hydrogen flame ionization detector, or the like as a detector.
[0047] The above-described embodiment is merely an example of the present invention, and it is clear that deformation, modification, addition, and the like made as appropriate within the scope of the gist of the present invention are included in the claims of the present application at points other than the above-described points.
REFERENCE SIGNS LIST
[0048] 1 . . . Measurement unit [0049] 11 . . . Liquid chromatograph unit (LC unit) [0050] 12 . . . Mass spectrometer (MS unit) [0051] 2 . . . Data processing unit [0052] 20 . . . Data storage unit [0053] 21 . . . Peak detection unit [0054] 22 . . . Same-component candidate extraction unit [0055] 23 . . . Spectrum similarity determination unit [0056] 24 . . . Retention-time and m/z-value correction unit [0057] 25 . . . Data array table production unit [0058] 26 . . . Multivariate analysis processing unit [0059] 3 . . . Input unit [0060] 4 . . . Display unit