CHROMATOGRAM DATA PROCESSING METHOD AND DEVICE
20190011408 ยท 2019-01-10
Assignee
Inventors
Cpc classification
International classification
Abstract
Wavelength spectrums of peaks detected on a chromatogram based on observation data to be processed are extracted to create a spectrum set in which the intensity values of the spectrums are normalized. One wavelength spectrum is selected from the set, and a vector of the wavelength spectrum at each point in time of measurement based on the observation data is projected so as to be perpendicular to the vector of the selected spectrum. The vectors of the wavelength spectrums in the set are also similarly projected. Consequently, the selected spectrum is erased from the set. The processes are repeated until the set does not include a spectrum, and the obtained signals are added. The signal resulting from the addition is a signal indicating the waveform shape of an unknown baseline.
Claims
1-6. (canceled)
7. A chromatogram data processing method that processes three-dimensional chromatogram data collected for a sample to be measured and having a time axis, a signal intensity axis, and a third parameter axis, the chromatogram data processing method comprising: acquiring a spectrum corresponding to each of a plurality of peaks detected on a chromatogram created based on the three-dimensional chromatogram data to be processed, the spectrum indicating a relation between a third parameter value and a signal intensity value; and estimating step of estimating a waveform shape of a baseline chromatogram indicating a baseline in the chromatogram based on a time series signal, by projecting a multi-dimensional vector which is a vector representation of a spectrum at each of a plurality points in time of measurement based on the three-dimensional chromatogram data in a direction perpendicular to a vector of a vector representation of the spectrum obtained at the peak spectrum acquiring step, and calculating a magnitude of a projection vector obtained accordingly as the time series signal.
8. The chromatogram data processing method of claim 7, further comprising: estimating a baseline spectrum indicating a relation between the third parameter value and the signal intensity value at the baseline, based on the signal intensity value of the baseline for each third parameter value, the signal intensity value being determined by fitting the baseline chromatogram waveform obtained at the baseline chromatogram waveform estimating step to a part or each of a plurality of parts of the chromatogram of each third parameter value created based on the three-dimensional chromatogram data.
9. The chromatogram data processing method of claim 7, wherein in the peak spectrum acquiring step, a filter configured to output zero in response to input of a signal waveform with a slow time variation and to output nonzero in response to input of a signal waveform with an abrupt time variation is used to estimate a time range in which the peak is present on the chromatogram, and a spectrum corresponding to a peak is obtained within the time range.
10. The chromatogram data processing method of claim 9, wherein the filter is a Savitzky-Golay filter.
11. The chromatogram data processing method of claim 9, wherein the filter includes a filter parameter set to maximize a ratio between an output with respect to a signal of a time range in which only a baseline is present or in which only a baseline is estimated to be present in a given chromatogram, and an output with respect to a signal of a time range in which a peak is present or in which a peak is estimated to be present.
12. A chromatogram data processing device that processes three-dimensional chromatogram data collected for a sample to be measured and having a time axis, a signal intensity axis, and a third parameter axis, the chromatogram data processing device comprising: a peak spectrum acquisition unit configured to acquire a spectrum corresponding to each of a plurality of peaks detected on a chromatogram created based on the three-dimensional chromatogram data to be processed, the spectrum indicating a relation between a third parameter value and a signal intensity value; and a baseline chromatogram waveform estimation unit configured to estimate a waveform shape of a baseline chromatogram indicating a baseline in the chromatogram based on a time series signal, by projecting a multi-dimensional vector which is a vector representation of a spectrum at each of a plurality points in time of measurement based on the three-dimensional chromatogram data in a direction perpendicular to a vector of a vector representation of the spectrum obtained by the peak spectrum acquisition unit, and calculating a magnitude of a projection vector obtained accordingly as the time series signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0039]
[0040] In
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
DESCRIPTION OF EMBODIMENTS
[0047] An embodiment of chromatogram data processing method and device according to the present invention will be described in detail below with reference to the accompanying drawings.
[0048]
[0049] The LC device of the present embodiment includes an LC unit 1, a data processing unit 2, an input unit 3, and a display unit 4. In the LC unit 1, a liquid feed pump 12 sucks a mobile phase from a mobile phase container 11 and transfers the mobile phase to an injector 13 at a constant flow rate. The injector 13 injects a sample liquid into the mobile phase at a predetermined timing. The injected sample liquid is pushed by the mobile phase, and is introduced to a column 14. While the sample liquid passes through the column 14, compounds in the sample liquid are separated and eluted in a time direction. A PDA detector 15 connected to an outlet of the column 14 repeatedly measures the absorbance distribution (absorption spectrum) over a predetermined wavelength range of the eluate successively introduced over time. An analog-digital converter (ADC) 16 converts the signals obtained through the measurement to digital data, and inputs the digital data to the data processing unit 2 as three-dimensional chromatogram data.
[0050] The data processing unit 2 that receives the data described above includes functional blocks such as a chromatogram data storage unit 21, a baseline estimation unit 22, a peak chromatogram calculation unit 23, a qualitative processing unit 24, and a quantitative processing unit 25. The chromatogram data storage unit 21 stores therein the three-dimensional chromatogram data. The baseline estimation unit 22 estimates a baseline of a chromatogram on the basis of the three-dimensional chromatogram data. The peak chromatogram calculation unit 23 calculates a peak chromatogram from which the baseline is removed by using the estimated baseline. The qualitative processing unit 24 identifies compounds by performing qualitative processing on the basis of the calculated peak chromatogram. The quantitative processing unit 25 performs a quantitative calculation on the basis of the peak chromatogram. For example, the input unit 3 connected to the data processing unit 2 is used by a user for inputting various parameters and the like required for data processing. The display unit 4 displays a graph such as a chromatogram and qualitative and quantitative results for the user.
[0051] Usually, the entity of the data processing unit 2 is a personal computer or a work station having a higher performance than the personal computer. The functional blocks described above may be embodied by causing a computer to operate dedicated data processing software installed in the computer in advance.
[0052] In an LC analytical device of the present embodiment, the LC unit 1 performs an LC analysis on one sample. Consequently, the three-dimensional chromatogram data as illustrated in Portion (a) of
[0053]
[0054] In this example, the absorption spectrum and the chromatogram are each handled as a multi-dimensional vector represented by a vector. For example, because a set of degrees of absorbance at discrete wavelengths constitutes an absorption spectrum, the absorption spectrums can be expressed as (a(1), a(2), a(3), . . . , a(n)). Thus, a multi-dimensional vector having a(m) as a component can be defined. Here, a(m) represents the absorbance at a wavelength m (=1 to n).
[0055] Three-dimensional chromatogram data D that is an object to be processed and that has three dimensions, namely, wavelength, time, and signal intensity can be modelled as a direct product of a vector representation of a chromatogram signal that indicates a relation between the time and a signal intensity value of each of the compounds in the sample, and a vector representation of a wavelength spectrum (absorption spectrum) signal that indicates a relation between the wavelength at each measurement time and the signal intensity value. That is, the three-dimensional chromatogram data D can be modeled by the following formula (1):
D=C.sub.1.Math.S.sub.1.sup.T+C.sub.2.Math.S.sub.2.sup.T+ . . . +C.sub.m.Math.S.sub.n.sup.T (1)
[0056] wherein C.sub.1 to C.sub.m denote vector representations of chromatogram signals with respect to the first to m-th compounds, and S.sub.1 to S.sub.n denote vector representations of wavelength spectrum signals at the first to n-th measurement times (points in time of measurement).
[0057] In this example, the following three conditions are defined as the prerequisites for estimating the baseline. It is to be noted that these conditions are valid for the LC analysis and the like.
[0058] [Condition A] The baseline is the main signal component and there is no other significant signal component in at least one end of the chromatogram data to be processed (in other words, the start point or the end point of the entire measurement time range). In other words, there is no signal component derived from compounds.
[0059] [Condition B] Variation of the baseline is sufficiently slow compared to the signal variation of the peak derived from compounds. In other words, the time variation is small.
[0060] [Condition C] The wavelength spectrum of the baseline is different from the wavelength spectrum of any of the compounds.
[0061] Under the three conditions described above, the baseline estimation unit 22 and the peak chromatogram calculation unit 23 calculate the peak chromatogram from which the baseline is removed by estimating the baseline through the following procedures.
[0062] First, the baseline estimation unit 22 reads out the three-dimensional chromatogram data (hereinafter, referred to as observation data) obtained through the LC analysis from the chromatogram data storage unit 21 (step S1), and estimates the waveform shape of the baseline chromatogram indicating the time variation of the baseline in the chromatogram, on the basis of the wavelength spectrum calculated from the observation data (step S2). In this process, only the shape of the waveform is determined, and the intensity value of each wavelength is unknown.
[0063] Next, the peak chromatogram calculation unit 23 estimates the intensity value of the baseline chromatogram at each wavelength, in other words, the wavelength spectrum of the baseline (hereinafter, referred to as a baseline spectrum) on the basis of the baseline chromatogram waveform estimated at step S2 (step S3). The peak chromatogram calculation unit 23 then obtains a baseline signal in the chromatogram at each wavelength, by a direct product of the vector of the baseline chromatogram and the vector of the baseline spectrum (step S4). The peak chromatogram calculation unit 23 then subtracts the baseline signal from the observation data for each wavelength, thereby calculating a peak chromatogram which has only a peak waveform and from which the baseline has been removed (step S5).
[0064] The processes from step S2 to step S5 described above will be explained in detail. Here, to simplify the explanation, the sample is assumed to only include two compounds: a first compound and a second compound. In this case, the observation data D is expressed by the formula (2).
D=C.sub.1.Math.S.sub.1.sup.T+C.sub.2.Math.S.sub.2.sup.T (2)
[0065] If the wavelength spectrum S.sub.1 of the first compound is known, it is possible to determine a vector R.sub.1 perpendicular to the vector of the wavelength spectrum S.sub.1. When the vector R.sub.1 is multiplied by the observation data D, .Math.C.sub.2 which corresponds to multiplication of the vector of a chromatogram signal C.sub.2 of the second compound by a constant (an inner product of the vector of the wavelength spectrum S.sub.2 and the vector R.sub.1) is obtained as described by the following formula (3):
D.Math.R.sub.1=C.sub.1.Math.S.sub.1.sup.T.Math.R.sub.1+C.sub.2.Math.S.sub.2.sup.T.Math.R.sub.1=0+C.sub.2.Math.(S.sub.2.sup.T.Math.R.sub.1)=.Math.C.sub.2 (3)
[0066] This .Math.C.sub.2 indicates the waveform shape of the chromatogram signal C.sub.2 of the second compound.
[0067] Even if the number of compounds included in the sample is three or more, the same applies as long as the wavelength spectrum of one of the three or more compounds is unknown and the wavelength spectrums of all the other compounds are known. This means that if the baseline is assumed as the chromatogram of a single unknown compound, and the wavelength spectrums of all peaks (peaks corresponding to all compounds) that appear on the chromatogram are known, it is possible to determine the shape of an unknown baseline. The baseline is estimated herein under such principle.
[0068]
[0069] The baseline estimation unit 22 extracts all peaks by performing peak detection on the chromatogram created on the basis of the observation data. The baseline estimation unit 22 then extracts the wavelength spectrum at the point in time of measurement of the peak top of each peak on the chromatogram, and obtains a spectrum set {S.sub.n} in which the wavelength spectrums are collected (step S10).
[0070] In this process, the condition B described above is used to detect the peak corresponding to each of the compounds from which the influence of the wavelength spectrum of the baseline has been removed. In other words, because the time variation of the baseline is slow, the baseline can be sufficiently and locally approximated by a polynomial. On the other hand, the signal variation of a peak derived from a compound that appears on the chromatogram is sufficiently abrupt compared to the variation of the baseline. Consequently, a systematic approximation error is generated when the baseline is to be approximated by a polynomial in a situation where the peak is present. The systematic approximation error is similarly generated in a plurality of wavelengths each having a peak. Therefore, the data from which the systematic error is extracted is a wavelength spectrum corresponding to a pure chromatogram peak without the influence of the baseline.
[0071] Thus, in this example, a filter is used to detect a peak. This filter is configured to output zero in response to a signal with only a slow change such as that in the baseline chromatogram and a signal with a simple change such as a linear change or an exponential change, and configured to output a systematic error, which is nonzero, in response to the peak signal. However, in the wavelength with a peak, the systematic error needs to be similarly generated in all the wavelengths. In other words, a value needs to be obtained by multiplying the height of the peak by a constant corresponding to the shape of the peak chromatogram, without depending on the ratio of the baseline and the peak height in the chromatogram. In that sense, this filer may be a linear filter. For example, such a filter may be a Savitzky-Golay filter or the like.
[0072]
[0073] First, based on the condition A described above, the wavelength spectrum at the end of the measurement time range is defined as a temporary baseline spectrum (step S20). There are two ends, i.e., a start point (measurement start point) and an end point (measurement end point) in the measurement time range, and the wavelength spectrum at one of the ends may be used. However, to improve the accuracy of the baseline estimation, it is preferable that in each of the case where the wavelength spectrum at the start point is selected and the case where the wavelength spectrum at the end point is selected, the subsequent processes are carried out, and take the average of the baselines estimated accordingly.
[0074] Next, a residual (systematic error) signal is obtained by approximating the chromatogram signal by a quadratic function for each wavelength using the Savitzky-Golay filter (step S21). Then, a time range in which the residual signal is present, in other words, in which the filter output is nonzero is taken out for each wavelength, to obtain the wavelength spectrum in the time range (step S22). Consequently, the wavelength spectrums corresponding to all peaks on the chromatogram are obtained. However, if any of the obtained wavelength spectrums is substantially the same as the temporary baseline spectrum, the wavelength spectrum is removed (step S23). This is because such a spectrum may cause a calculation error.
[0075] In this manner, the spectrum set {S.sub.n} is created by collecting the wavelength spectrums finally obtained (step S24).
[0076] Reference is made to
[0077] On the other hand, when the value of the largest perpendicular component is equal to or greater than the predetermined value, the process proceeds from step S13 to step S14, and the vector of the wavelength spectrum at each point in time of measurement obtained on the basis of the observation data is projected so as to be perpendicular to the vector of the selected wavelength spectrum Smax (step S14). More specifically, when the vector of the wavelength spectrum at a certain measurement point is defined as A, this A is updated to A(A.Math.Smax).Math.Smax. Moreover, the vectors of all of the wavelength spectrums included in the spectrum set {S.sub.n} are projected so as to be perpendicular to the vector of the wavelength spectrum Smax described above (step S15). Consequently, the magnitude of the vector of the selected wavelength spectrum Smax becomes zero, and the wavelength spectrum is removed from the spectrum set {S.sub.n}. Thereafter, it is determined whether a wavelength spectrum remains in the spectrum set {S.sub.n} (step S16). If a wavelength spectrum remains in the spectrum set {S.sub.n}, the process returns from step S16 to step S12, and the processes from step S12 to step S16 will be repeated.
[0078] When it is determined No at step S16, or when it is determined No at step S12, the chromatogram signal C.sub.2 multiplied by various coefficients is present in the observation data in which only the perpendicular component remains. In other words, .Math.C.sub.2 in which the value of varies is present. At this time, the constant may be negative. Therefore, a signal representing the baseline chromatogram waveform is calculated by adding all the signal values of the chromatogram signals at each of the points in time of measurement, after adjusting the constant of the calculated chromatogram signal .Math.C.sub.2 to positive (step S17). In this manner, the baseline chromatogram waveform is estimated.
[0079] Next, a procedure of estimating the baseline spectrum at step S3 described above will be explained.
[0080] With the condition A described above, the signal at the end of the measurement time range is only the baseline signal. It is thus suitable that the estimated baseline chromatogram waveform described above is fitted so as to match the height, in other words, the signal level of the baseline signal. However, the signal at the end of the measurement time range is not always suitable to be used as a reference for fitting, due to, for example, the influence of noise and the like. For this reason, a plurality of partial time ranges are obtained by dividing the entire measurement time range by a suitable number of sections being empirically defined, and the baseline chromatogram waveform is fitted to each of the partial time ranges as described above. Then, the partial time range to which the baseline chromatogram waveform fits best is suitably regarded as the time range in which only the baseline component is present.
[0081] Here, a determination on whether the baseline chromatogram waveform fits well can be made according to the following procedure.
[0082] First, a residual signal is calculated by fitting the baseline on the chromatogram based on the observation data, for each wavelength and for each partial time range. An L1 norm of the residual signal is defined as a score value indicating a degree of error at the fitting. Specifically, the score value decreases as the fitting improves.
[0083] However, when the baseline gradually increases, the estimation error of the chromatogram tends to increase accordingly. Thus, to correct the influence, it is preferable to calculate a peak-to-peak value of an input signal within the partial time range, and divide the score value by the square root of the peak-to-peak value. Moreover, when there are two or more partial time ranges having close score values, in other words, when a difference between the score values of two or more partial time ranges is within a predetermined range, it is suitable that the partial time range closer to the end of the measurement time range that is estimated to include only the baseline is preferentially handled. This can be achieved multiplying the score value of the partial time range close to the end by one time a weight, and by multiplying the score value of the partial time range away from the end by six times the weight at most. The score values obtained for the wavelengths of the partial time ranges are then summed up to calculate the final score value with respect to the partial time range. For example, to sum up the score values, weights may be assigned on the basis of empirical knowledge, the measurement result of the signal-to-noise (SN) ratio of the device, and the like.
[0084] Once the score value of each of the partial time ranges is calculated in this manner, the partial time range with the minimum score value is selected as the baseline section. Then, the intensity of each wavelength of the baseline, i.e., the baseline spectrum is determined according to the baseline of the baseline section.
[0085] The evaluation reference of the residual signal is not limited to the L1 norm, but an L2 norm, a maximum-minimum value, and the like may also be used. Apart from the fitting by dividing the time range into the partial time ranges, a moving window or a weighted moving window may also be used, and the fitting may be performed on each window.
[0086] The baseline chromatogram waveform and the baseline spectrum are calculated as described above. As explained earlier, the direct product of the baseline chromatogram waveform and the baseline spectrum constitutes a baseline signal at each estimated wavelength. The peak chromatogram in which the baseline is corrected, in other words, from which the influence of the baseline is removed, can be obtained by subtracting the baseline at the wavelength from the chromatogram at each wavelength obtained on the basis of the observation data.
[0087]
[0088] In the processes described above, the peak chromatogram is calculated by using the baseline estimation result of the chromatogram. However, it is also possible to use the baseline estimation result to select the baseline calculated through various methods and algorithms, instead of performing the process of removing the baseline by using the baseline estimation result as it is.
[0089] For example, as described in the example in
[0090] Moreover, with the data processing method of the present embodiment, a pure wavelength spectrum of the peak on each chromatogram is calculated first, followed by calculation of the baseline chromatogram waveform and the baseline spectrum in this order. Thus, the compounds may be identified from the wavelength and the like of the absorption peak that appear on the wavelength spectrum, at the stage when the pure wavelength spectrum of the peak on the chromatogram is calculated.
[0091] Moreover, it is evident that the chromatogram data processing method and the LC analytical device of the embodiment described above are merely examples of the present invention, and are encompassed in the scope of claims of the present application even if modifications, additions, and corrections are made as appropriate within the spirit of the present invention.
[0092] For example, the chromatograph detector for acquiring three-dimensional chromatogram data that is a target to be processed in the present invention may not be a multi-channel detector such as the PDA detector described above. The chromatograph detector for acquiring three-dimensional chromatogram data may also be an ultraviolet visible light spectrophotometer, an infrared spectrometer, a near-infrared spectrophotometer, a fluorescence spectrophotometer, or the like capable of performing high-speed wavelength scanning. Moreover, the chromatograph detector for acquiring three-dimensional chromatogram data may be a liquid chromatograph mass spectrometer and a gas chromatograph mass spectrometer in which a mass spectrometer functions as the detector as described above.
[0093] Not only the data obtained by the analysis through the column of the chromatograph, but also data obtained through detection of a sample introduced by a flow injection analysis (FIA) method using the PDA detector and the like are also three-dimensional data having three dimensions of time, wavelength, and signal intensity. Thus, the data is substantially the same as the three-dimensional chromatogram data collected by the liquid chromatograph. Thus, it is evident that the present invention is applicable to a device for processing such data.
DESCRIPTION OF REFERENCE CHARACTERS
[0094] 1 LC Unit [0095] 11 Mobile Phase Container [0096] 12 Liquid Feed Pump [0097] 13 Injector [0098] 14 Column [0099] 15 PDA Detector [0100] 2 Data Processing Unit [0101] 21 Chromatogram Data Storage Unit [0102] 22 Baseline Estimation Unit [0103] 23 Peak Chromatogram Calculation Unit [0104] 24 Qualitative Processing Unit [0105] 25 Quantitative Processing Unit [0106] 3 Input Unit [0107] 4 Display Unit