DECODING MULTIPLEXED MASS SPECTRAL DATA
20220277943 · 2022-09-01
Inventors
Cpc classification
H01J49/0036
ELECTRICITY
International classification
Abstract
There is provided a method of decoding a first data set obtained from a time of flight (ToF) mass analyser operating according to an encoded frequency pulsing (EFP) scheme. The method comprises generating a mock data set based on a model set of ions taking account of the EFP pattern and the flight time distribution of the ions. The model set of ions is then iteratively updated using the first data set to determine a second, decoded data set.
Claims
1. A method of decoding mass spectral data that has been obtained from a time of flight (ToF) mass analyser operating according to an encoded frequency pulsing (EFP) scheme wherein ions are pulsed into the ToF mass analyser multiple times with non-uniform time intervals between each pulse at such a rate that the mass spectral data set contains a set of multiplexed ion signals representing ion arrival times recorded at a detector for ions from different ion pulses, the method comprising: obtaining a first data set to be decoded, the first data set representing a set of multiplexed ion arrival times recorded using the ToF mass analyser; and decoding the first data set to determine a second data set, the second data set representing one or more demultiplexed mass spectra relating to the flight times for the ions that were pulsed into the ToF mass analyser to generate the first data set, wherein the decoding comprises: (i) generating a mock data set representing a set of multiplexed ion arrival times for a model set of ions, wherein the step of generating the mock data set accounts for the EFP pattern used to pulse ions into the ToF mass analyser and the flight time distribution for the model set of ions in the ToF mass analyser, (ii) comparing the mock data set with the first data set; (iii) updating the model set of ions based on the comparison; (iv) repeating steps (i)-(iii) to iteratively update the model set of ions; and (v) using the updated model set of ions to determine the second data set.
2. The method of claim 1, wherein generating the mock data set comprises obtaining a set of notional unbroadened flight times for the model set of ions, a first step of converting the unbroadened flight times into a corresponding set of broadened flight times accounting for the flight time distribution in the ToF mass analyser, and a second step of generating a multiplexed mock data set by encoding the broadened flight times based on the EFP pattern.
3. The method of claim 1, wherein the ions have been separated upstream of the ToF mass analyser using an upstream ion separation device such that the intensity of ion species arriving at the ToF mass analyser changes over time.
4. The method of claim 3, wherein the first step further comprises applying a correlation function to the model set of ions representing the time variation of the ions arriving at the ToF mass analyser due to the upstream ion separation.
5. The method of claim 3, wherein the second step further comprises mapping the EFP pattern onto a sampling pattern of the upstream ion separation.
6. The method of claim 3, wherein the upstream ion separation comprises a mass and/or ion mobility separation.
7. The method of claim 1, comprising providing for output as the second data set a model set of broadened flight times.
8. The method of claim 1, comprising providing for output as the second data set a model set of unbroadened flight times.
9. The method of claim 1, comprising providing for output as the second data set a model set of ion arrival time data that has been assigned a flight time on the basis of the broadened flight time signals.
10. A method of mass spectrometry comprising: passing ions to a time of flight (ToF) mass analyser; operating the ToF mass analyser according to an encoded frequency pulsing (EFP) scheme wherein ions are pulsed into the ToF mass analyser multiple times with non-uniform time intervals between each pulse at such a rate that the mass spectral data set contains overlapping ion signals representing ion arrival times recorded at a detector for ions from different ion pulses to generate a first data set representing a set of multiplexed ion arrival times recorded using the ToF mass analyser; and decoding the first data set to determine a second data set, the second data set representing one or more demultiplexed mass spectra relating to the flight times for the ions that were pulsed into the ToF mass analyser to generate the first data set, wherein the decoding comprises: (vi) generating a mock data set representing a set of multiplexed ion arrival times for a model set of ions, wherein the step of generating the mock data set accounts for the EFP pattern used to pulse ions into the ToF mass analyser and the flight time distribution for the model set of ions in the ToF mass analyser, (vii) comparing the mock data set with the first data set; (viii) updating the model set of ions based on the comparison; (ix) repeating steps (i)-(iii) to iteratively update the model set of ions; and (x) using the updated model set of ions to determine the second data set.
11. The method of claim 10, further comprising separating the ions upstream of the ToF mass analyser.
12. The method of claim 11, comprising separating the ions using an ion mobility separator device.
13. The method of claim 11, comprising separating the ions according to mass or mass to charge ratio.
14. The method of claim 1, wherein the ToF mass analyser is a multi-reflecting time of flight mass analyser.
15. A computer readable storage medium storing software code that when executing on a data processor performs a method of decoding mass spectral data that has been obtained from a time of flight (ToF) mass analyser operating according to an encoded frequency pulsing (EFP) scheme wherein ions are pulsed into the ToF mass analyser multiple times with non-uniform time intervals between each pulse at such a rate that the mass spectral data set contains a set of multiplexed ion signals representing ion arrival times recorded at a detector for ions from different ion pulses, the method comprising: obtaining a first data set to be decoded, the first data set representing a set of multiplexed ion arrival times recorded using the ToF mass analyser; and decoding the first data set to determine a second data set, the second data set representing one or more demultiplexed mass spectra relating to the flight times for the ions that were pulsed into the ToF mass analyser to generate the first data set, wherein the decoding comprises: (i) generating a mock data set representing a set of multiplexed ion arrival times for a model set of ions, wherein the step of generating the mock data set accounts for the EFP pattern used to pulse ions into the ToF mass analyser and the flight time distribution for the model set of ions in the ToF mass analyser, (ii) comparing the mock data set with the first data set; (iii) updating the model set of ions based on the comparison; (iv) repeating steps (i)-(iii) to iteratively update the model set of ions; and using the updated model set of ions to determine the second data set.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0061] Various embodiments will now be described, by way of example only, and with reference to the accompanying drawings in which:
[0062]
[0063]
[0064]
[0065]
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
DETAILED DESCRIPTION
[0072] Various embodiments will now be described with respect to encoded frequency pulsing (EFP). EFP improves the duty cycle of long time of flight (ToF) instruments by overlaying spectra initiated at different start times (pushes). The time over which a single encoded spectrum is accumulated is called a transient and the length of the transient corresponds to the highest significant mass to charge ratio entering the ion flight path. As the push times are staggered within a transient, ions pushed in one transient may be recorded in the next.
[0073] In the present embodiments it will be assumed that the push pattern repeats from transient to transient. However, it will be understood that this need not be the case, and the push pattern may change from transient to transient.
[0074] Further, whilst the decoding scheme in the present embodiments will be performed with reference to transients, it will be appreciated that this need not be the case and that the data could be recorded continuously and decoded with reference to any arbitrary time intervals.
[0075] In a steady state situation, wherein the intensity of a particular species is not changing significantly from push to push across the transient, the original signal might be modelled as a sequence of discrete species at various times of flight. A simple example of this is shown in
[0076] In a ToF instrument, the impulses are blurred in to peaks due to a small flight time distribution. In
[0077] EFP transforms the flight time distribution into an arrival time distribution according to the chosen pattern of pushes. In steady state the transformed spectrum wraps around modulo the transient time as shown
[0078] The arrival time distribution of ions is then effectively sampled (by a Poisson process) to give the observed spectrum of ion arrivals. In reality, other instrument effects such as detector response may come into play, and may also be modelled appropriately, but for simplicity these will not be discussed here.
[0079] A steady-state decoding method might aim to reconstruct the original sequence of discrete species, as in
[0080] The reconstruction might be produced using a maximum entropy deconvolution method, Richardson-Lucy deconvolution or other technique involving enforcement of non-negativity constraints and perhaps some form of regularisation.
[0081] In outline, the approach would be to model the transformation from a (hidden space) sequence of discrete species through a ToF-blurred flight time distribution (visible space) (
[0082] In
[0083]
[0084] The upper diagonal of B indicates that the arrival times are folded in (modulo the transient length) from the previous transient. The blank regions of the matrix indicate zero elements.
[0085] Assuming the transient length, T, arrival times, t.sub.k, flight times, t.sub.j, and push times, t.sub.p, are digitised consistently, the mapping B, involving rotated copies of the push pattern, has components:
[0086] It will be convenient later to make use of the flexibility of tensor notation, so the mapping, B, is written as a mixed second order tensor. The “mod T” qualifier allows ions originating in the previous transient to be considered in the current transient and is appropriate for a system in steady state.
[0087] In order to make a practical decoding scheme, derivatives of some objective function, ϕ(d, m(h)), are required, where d is the vector of data values, m is the vector of mock data and h is a vector of “hidden” values from which the mock data are generated.
[0088] An example of a suitable decoding algorithm is the Richardson-Lucy algorithm, as will be described below. However, other suitable algorithms may be used, as appropriate.
[0089] A useful result here is that scalar function f(y) of a vector y with derivative vector
has a derivative vector with respect to a vector x of:
where y.sup.j=P.sup.j.sub.ix.sup.i. Note that the Einstein Summation Convention (ESC) is used here (wherein a sum is implied when the same index appears both raised and lowered in the same statement, e.g., P.sup.j.sub.ix.sup.i≡Σ.sub.i P.sup.j.sub.ix.sup.i.)
[0090] As a specific example of a deconvolution procedure, consider the basic Richardson-Lucy algorithm with:
where α defines an exponential prior on the values of H, so that:
where 1.sub.k is a co-vector of ones.
[0091] The mock data, m, can then be written in terms of a hidden space vector, h, via a visible space vector, v, and β, a small constant background contribution to each data point, so that:
m.sup.k=β1.sup.k+B.sup.k.sub.jv.sup.j=β1.sup.k+B.sup.k.sub.jC.sup.j.sub.ih.sup.i.
[0092] The visible space vector, v, can then be mapped to the data space mock data by application of B and is generated by the application of the “intrinsic correlation function”, C, to h.
[0093] The benefit of this division of the mapping from h to m is that any correlations required to be in the output are imposed through the application of C before the encoding transformation involving B is made to data space.
[0094] This is particularly useful when the data are (or are proportional to) a histogram of ion arrivals so that the arrivals associated with a particular species and a particular push may be separated in time.
[0095] The maximum of ϕ, will be given where
which leads to:
one for each data point and 1.sub.i is a co-vector of ones with the dimension of h.
[0096] The Richardson-Lucy update rescales the components of h towards this condition through:
where g.sub.i=B.sup.k.sub.jC.sup.j.sub.ir.sub.k and z.sub.i=B.sup.k.sub.jC.sup.j.sub.i1.sub.k+α1.sub.i.
[0097] Once the termination criteria have been met, the output may be taken to be v or, if h is sufficiently sparse, a data point may be mapped to a visible space point with a majority of responsibility for it, according to responsibilities R.sup.j.sub.k such that:
so that the visible space output is:
or proportional to it.
[0098] At termination the hidden space vector, h, may also be of interest, particularly as it may exhibit higher resolution than v or s, but it must be remembered that it is unphysical in the sense that it does not have the required correlations of visible space.
[0099] The termination criterion could be as simple as reaching a fixed number of iterations, or could be when the mock data is determined to be sufficiently similar to the recorded data.
[0100] An outline of the steady state algorithm will now be provided.
Steady State Algorithm Outline
1) Setup
[0101] Project unit data back to hidden space to get normalisation constants,
z.sub.i=B.sup.k.sub.jC.sup.j.sub.i1.sub.k+α1.sub.i.
2) On Data Input, Set Initial Model
[0102] Set each of the h.sup.i to some constant value greater than zero,
h.sup.i←
Project h forward to get initial mock data, m,
m.sup.k←β1.sup.k+B.sup.k.sub.jC.sup.j.sub.ih.sup.i.
3) Until Termination
[0103] Calculate data to mock data to ratios,
Project the ratios back to hidden space,
g.sub.i←B.sup.k.sub.jC.sup.j.sub.ir.sub.k.
Update hidden sources,
Project the hidden sources forward to mock data via visible space,
v.sup.j←C.sup.j.sub.ih.sup.i,
m.sup.k←β1.sup.k+B.sup.k.sub.jv.sup.j.
4) Assign Values to Output Spectrum
[0104] Optionally, set output spectrum, s, using v and m to construct responsibilities, R.sup.j.sub.k,
Alternatively, copy the visible space vector to the output,
s.sup.j=v.sup.j.
[0105] The above analysis assumes steady state conditions, wherein the ion species in each pulse are substantially the same. However, it is often desirable to couple ToF mass analysers to an upstream ion separation device such as an ion mobility separator, or mass separation device (which may comprise a scanning quadrupole mass filter, for example). Traditional ToF mass analysers require separation timescales of the order of around 20 μs to 200 μs for mass ranges up to a few thousand, dependent on the ToF mass analyser geometry. In contrast, typical faster IMS peak widths are of the order 0.4 ms to 1 ms, depending on the IMS geometry. The two separation timescales for these devices are therefore well-matched, as the ToF separation time scale is significantly shorter than the IMS separation time scale, and hence multiple ToF mass spectra can be individually acquired across the IMS peak. This allows, for example, two-dimensional nested data sets to be produced, wherein one dimension is the ToF mass and the other dimension is the IMS separation time.
[0106] However, the advent of ToF mass analysers which have a relatively long flight path, such as multi-reflecting ToF mass analysers, has enabled ions to be analysed with a relatively high mass resolution. The ions therefore have a relatively long flight time through such mass analysers. When coupled with an upstream ion separation device, this means that the intensity of a given ion species may change over time from transient to transient such that steady state conditions can no longer be assumed to apply.
[0107] The present embodiments provide a decoding algorithm that is able to decode multiplexed mass spectral data sets with sub-transient time resolution.
[0108] To do this, instead of modelling the data as a single sequence of discrete species, the model might consist of a number of channels corresponding to time points associated with upstream ion separation (UIS) (see
[0109] The hidden space UIS time points may be chosen to sample the expected time variations with sufficient granularity.
[0110] In terms of the analysis above, we now have a mapping with components C.sup.ju.sub.it from a hidden space array, h.sup.it, to a visible space array, v.sup.ju. The pairs of indices it and ju indicate that the correlation is over flight time (i, j) and UIS time points (t, u). This is the product of F.sup.j′t′.sub.it which applies the ToF blurring and G.sup.ju.sub.j′t′ which applies the required correlations between UIS time points. There is also the visible space to data space mapping with components B.sup.k.sub.ju which is the product of P.sup.j′p.sub.ju which applies the pusher pattern and maps UIS time points to push times and Q.sup.k.sub.j′p which collapses the distinct push time spectra into a single spectrum.
[0111] These mappings are visualised as having been unfolded into matrix operations in
[0112] In contrast to the steady state condition for the components of B.sup.k.sub.j, those of the corresponding mapping in the time-resolved system, Q.sup.k.sub.j′p, have the condition:
without the “mod T” qualifier which enforced the wrap-around boundary condition of the steady state system.
[0113] In principle, the entire time series of N transients could be analysed together so that the spectra for all UIS time points are modelled at once. There is a subtle distinction between the case where acquisition has already begun when recording of transient data is turned on and the case where acquisition and recording are started simultaneously. In the latter case, there are no ions pushed in a previous transient in the first recorded transient but there may be in the former case.
[0114] The latter case is easier to deal with as all relevant data are available and the number of model transients is the same as the number of data transients. In the former case, however, there is missing, unrecorded, data which is correlated with the data in the first transient—the first transient may contain ions pushed in the previous transient.
[0115] Accordingly, there must be a model for the previous transient (transient 0) as it must account for a portion of the data in the first transient. This case is of practical interest because a) it may reflect actual practice and b) it allows us to take a sub-interval of the full time series data which does not start at the start of the acquisition.
[0116] The situation for N=2 is illustrated in
[0117] If the option of assigning data to points in visible space is to be used it must take a form different from that used for the steady state analysis. This is because part of the mapping from visible space to data space, B.sup.k.sub.ju=Q.sup.k.sub.j′p P.sup.j′p.sub.ju, i.e. P.sup.j′p.sub.ju, resamples or interpolates from UIS time points to push times so it shares out responsibility for different data by different visible space points by construction. For the moment, the output will simply be taken to be the visible space array, v.sup.ju.
[0118]
[0119]
[0120]
[0121] An outline of an algorithm will now be given in the context of decoding a fixed number of transients on sub-transient time scales. This scheme may be iterated along a longer time series of transients.
[0122] An outline of the non-steady state algorithm will now be provided.
Sub-transient time scale decoding algorithm outline:
1) Setup
[0123] Project unit data back to hidden space to get normalisation constants,
z.sub.it=B.sup.k.sub.juC.sup.ju.sub.it1.sub.kα1.sub.it.
2) On Data Input
[0124] Processing will start once both transients are available.
3) Set Initial Model
[0125] Set each of the h.sup.it to some constant value greater than zero,
h.sup.it←
Project h forward to get initial mock data, m,
m.sup.k←β1.sup.k+B.sup.k.sub.juC.sup.ju.sub.ith.sup.it.
4) Until Termination
[0126] Calculate data to mock data to ratios,
Project the ratios back to hidden space,
g.sub.it←B.sup.k.sub.juc.sup.ju.sub.itr.sub.k.
Update hidden sources,
Project the hidden sources forward to mock data via visible space,
v.sup.ju←C.sup.ju.sub.ith.sup.it,
m.sup.k←β1.sup.k+B.sup.k.sub.juv.sup.ju.
5) Assign Values to Output Spectra
[0127] Copy the visible space vector to the output,
s.sup.ju=v.sup.ju.
[0128]
[0129] These projections are performed using a model, of the type described above. For instance, either the steady state or non-steady state algorithms presented above may be used in order to move between hidden, visible and data space.,
[0130] The visible space vector may then be copied to the output (step 906). However, as explained above, other outputs would also be possible.
[0131]
[0132] The present embodiments thus provide techniques for decoding of EFP multiplexed mass spectral data wherein peak detection has been performed on a transient by transient basis. This is done using a model including two conceptual steps: a first step accounting for the broadening of flight times due to an ion flight time distribution (moving from hidden space into a visible data space) and a second step of encoding of flight times to arrival times via the pattern of pulse times (moving from the visible data space into the data space).
[0133] The observed data (in data space) can then be demultiplexed via the visible space back to hidden space. The output may thus comprise any of the unbroadened flight times (in hidden space, to generate a ‘super-resolution’ spectrum), the broadened flight time signals (the visible spectrum), or the arrival time data assigned to flight time on the basis of the broadened flight time signals.
[0134] This approach can also be extended to data where the time profiles associated with upstream ion separation may be less than the duration of a transient by including time point correlations in the broadening described above and allowing each flight time to have a response that may vary with time.
[0135] Although the present invention has been described with reference to preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made without departing from the scope of the invention as set forth in the accompanying claims.