METHODS OF DETERMINING DNA BARCODES FOR EFFICIENT SPECIES CATEGORIZATION USING NANOPORE TRANSLOCATION
20220243253 · 2022-08-04
Inventors
Cpc classification
G06F17/18
PHYSICS
International classification
Abstract
Methods of accurately determining DNA barcodes using a cylindrical nanopore system. The methods include steps of leveraging the average velocity of a double-stranded DNA segment passing through a single cylindrical nanopore that is measured through repeated scanning to accurately determine protein tag locations on the double-stranded DNA segment. As such, the methods provide for the accurate calculation of a barcode for the double-stranded DNA segment based on protein tag locations without underestimation or overestimate issues. The underlying concept and the methods are equally applicable to other multi-nanopore systems which use the dwell time and time of flight velocities to measure the barcodes.
Claims
1. A method of calculating a distance between sequential protein tags within a segment of double-stranded DNA, the method comprising the steps of: passing the segment of double-stranded DNA through a singular cylindrical nanopore formed within a test chamber, the segment of double-stranded DNA including a plurality of monomers, a first protein tag, and a subsequent protein tag, each of the plurality of monomers and each protein tag having an equal size, shape, and volume; calculating an average scanning velocity of the segment of double-stranded DNA by dividing a length of the segment of double-stranded DNA by an average scanning time for the double-stranded DNA taken for multiple scans; calculating an estimated distance between a first protein tag and a subsequent protein tag of the segment of double-stranded DNA by measuring, for the first protein tag and the subsequent protein tag, a dwell time and a dwell velocity based on an entry time into the singular cylindrical nanopore and an exit time from the singular cylindrical nanopore; using the estimated distance between the first protein tag and the subsequent protein tag, calculating an estimated number of monomers of the plurality of monomers that are disposed between the first protein tag and the subsequent protein tag; calculating a weighted velocity of the segment of double-stranded DNA using the dwell velocity for each of the first protein tag and the subsequent protein tag, the average scanning velocity of the segment of double-stranded DNA, and the estimated number of monomers; and calculating the distance between the first protein tag and the subsequent protein tag by multiplying the weighted velocity of the segment of double-stranded DNA by a time delay between the entry time of the first protein tag and the entry time of the subsequent protein tag.
2. The method of claim 1, wherein the test chamber includes two opposing longitudinal walls joined together by two opposing lateral walls, such that the singular cylindrical nanopore is formed between the two opposing longitudinal walls, wherein a central axis of the singular cylindrical nanopore is parallel to each of the two opposing lateral walls.
3. The method of claim 1, wherein the singular cylindrical nanopore has an associated diameter of 2σ, where σ is a diameter of each of the plurality of monomers, the first protein tag, and the subsequent protein tag.
4. The method of claim 1, wherein the weighted velocity of the segment of double-stranded DNA is calculated using
5. The method of claim 4, further comprising a step of passing the segment of double-stranded DNA through the singular cylindrical nanopore in an opposing direction.
6. The method of claim 5, further comprising a step of calculating the weighted velocity of the segment of double-stranded DNA in an upward direction through the singular cylindrical nanopore using
7. The method of claim 1, further comprising a step of retaining at least a portion of the segment of double-stranded DNA within the singular cylindrical nanopore throughout each of the multiple scans.
8. The method of claim 1, further comprising a step of passing the segment of double-stranded DNA through the singular cylindrical nanopore in an opposing direction and repeating the steps of calculating the average scanning velocity, calculating the estimated distance between the first protein tag and the subsequent protein tag, calculating the estimated number of monomers of the plurality of monomers, calculating the weighted velocity of the segment of double-stranded DNA, and calculating the distance between the first protein tag and the subsequent protein tag.
9. The method of claim 8, further comprising a step of applying a bias voltage to the test chamber in a reverse direction prior to passing the segment of double-stranded DNA through the singular cylindrical nanopore in the opposing direction.
10. The method of claim 1, further comprising repeating the steps of calculating a distance between sequential protein tags for a plurality of protein tags within the segment of double-stranded DNA.
11. A method of calculating a distance between sequential protein tags within a segment of double-stranded DNA, the method comprising the steps of: applying a first voltage to a first side of a test chamber that defines a singular nanopore therethrough; based on the applied first voltage, passing the segment of double-stranded DNA through the first side of the singular nanopore defined by the test chamber, the segment of double-stranded DNA including a plurality of monomers, a first protein tag, and a subsequent protein tag, each of the plurality of monomers and each protein tag having an equal size, shape, and volume; applying a second voltage to a second side of the test chamber, the second side of the test chamber opposite the first side of the test chamber, such that a bias voltage applied to the test chamber reverses; based on the applied second voltage, passing the segment of double-stranded DNA through the second side of the singular nanopore in a direction toward the first side of the test chamber; calculating an average scanning velocity of the segment of double-stranded DNA by dividing a length of the segment of double-stranded DNA by an average scanning time for the double-stranded DNA between the first side of the singular nanopore and the second side of the singular nanopore; calculating an estimated distance between a first protein tag and a subsequent protein tag of the segment of double-stranded DNA by measuring, for the first protein tag and the subsequent protein tag, a dwell time and a dwell velocity based on an entry time into the singular nanopore and an exit time from the singular nanopore; using the estimated distance between the first protein tag and the subsequent protein tag, calculating an estimated number of monomers of the plurality of monomers that are disposed between the first protein tag and the subsequent protein tag; calculating a weighted velocity of the segment of double-stranded DNA using the dwell velocity for each of the first protein tag and the subsequent protein tag, the average scanning velocity of the segment of double-stranded DNA, and the estimated number of monomers; and calculating the distance between the first protein tag and the subsequent protein tag by multiplying the weighted velocity of the segment of double-stranded DNA by a time delay between the entry time of the first protein tag and the entry time of the subsequent protein tag.
12. The method of claim 11, wherein the test chamber includes a first longitudinal wall disposed at the first side opposite a second longitudinal wall disposed at the second side, with two opposing lateral walls joining the first longitudinal wall to the second longitudinal wall, such that the singular nanopore is formed between the two opposing longitudinal walls, wherein a central axis of the singular nanopore is parallel to each of the two opposing lateral walls.
13. The method of claim 11, wherein the singular nanopore is cylindrical in shape.
14. The method of claim 13, wherein the singular nanopore has an associated diameter of 2σ, where σ is a diameter of each of the plurality of monomers, the first protein tag, and the subsequent protein tag.
15. The method of claim 11, wherein the weighted velocity of the segment of double-stranded DNA is calculated using
16. The method of claim 11, further comprising a step of retaining at least a portion of the segment of double-stranded DNA within the singular cylindrical nanopore throughout each of the multiple scans.
17. The method of claim 11, further comprising repeating the steps of calculating a distance between sequential protein tags for a plurality of protein tags within the segment of double-stranded DNA.
18. A method of generating a barcode for a segment of double-stranded DNA by calculating a distance between sequential protein tags within the segment of double-stranded DNA, the method comprising the steps of: passing the segment of double-stranded DNA through a singular cylindrical nanopore formed within a test chamber, the segment of double-stranded DNA including a plurality of monomers, a first protein tag, and a subsequent protein tag, each of the plurality of monomers and each protein tag having an equal size, shape, and volume; calculating an average scanning velocity of the segment of double-stranded DNA by dividing a length of the segment of double-stranded DNA by an average scanning time for the double-stranded DNA taken for multiple scans; for each of a plurality of protein tags on the segment of double-stranded DNA: calculating an estimated distance between a first protein tag and a subsequent protein tag of the segment of double-stranded DNA by measuring, for the first protein tag and the subsequent protein tag, a dwell time and a dwell velocity based on an entry time into the singular cylindrical nanopore and an exit time from the singular cylindrical nanopore; using the estimated distance between the first protein tag and the subsequent protein tag, calculating an estimated number of monomers of the plurality of monomers that are disposed between the first protein tag and the subsequent protein tag; calculating a weighted velocity of the segment of double-stranded DNA using the dwell velocity for each of the first protein tag and the subsequent protein tag, the average scanning velocity of the segment of double-stranded DNA, and the estimated number of monomers; and calculating the distance between the first protein tag and the subsequent protein tag by multiplying the weighted velocity of the segment of double-stranded DNA by a time delay between the entry time of the first protein tag and the entry time of the subsequent protein tag; and generating the barcode for the segment of double-stranded DNA by arranging the plurality of protein tags of the segment of double-stranded DNA in sequential order.
19. The method of claim 18, wherein the singular cylindrical nanopore has an associated diameter of 2σ, where σ is a diameter of each of the plurality of monomers, the first protein tag, and the subsequent protein tag.
20. The method of claim 18, wherein the weighted velocity of the segment of double-stranded DNA is calculated using
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] For a fuller understanding of the invention, reference should be made to the following detailed description, taken in connection with the accompanying drawings, in which:
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
DETAILED DESCRIPTION OF THE INVENTION
[0032] In the following detailed description of the preferred embodiments, reference is made to the accompanying drawings, which form a part thereof, and within which are shown by way of illustration specific embodiments by which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the invention.
[0033] As used in this specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the content clearly dictates otherwise. As used in this specification and the appended claims, the term “or” is generally employed in its sense including “and/or” unless the context clearly dictates otherwise.
[0034] The present invention includes methods of accurately determining DNA barcodes using a cylindrical nanopore as opposed to a dual nanopore architecture. The methods of the present invention explain the underestimation of DNA tags caused by the fast-moving nucleotides in between the barcodes of a strand using tension propagation theory [8]. Instead, the methods described herein, schematic and graphical diagrams of which are shown in
[0035] As shown in particular in
[0036] In addition, still referring to
[0037] Similar to a double nanopore setup, the single cylindrical nanopore 16 includes a periodical variation in the differential bias applied at nanopore 16 to scan the co-captured DNA multiple times. The force bias direction is altered when either of the end tags is detected at the nanopore preventing the DNA chain from escaping the nanopore for a long time. Specifically, referring to the embodiment shown in
[0038] As shown in
[0039] As described above, entry time t.sub.i (m) and exit time t.sub.f (m) of each tag 22 and monomer with index m is recorded as the monomer/tag passes through the nanopore 16 membrane during each scan event, resulting in a calculation of the dwell time W(m). As shown in
W.sup.U.fwdarw.D(m)=t.sub.f.sup.U.fwdarw.D(m)−(m)−t.sub.i.sup.U.fwdarw.D(m) (1a)
W.sup.D.fwdarw.U(m)=t.sub.f.sup.D.fwdarw.U(m)−t.sub.i.sup.D.fwdarw.U(m) (1b)
with t.sub.i.sup.U.fwdarw.D (m) and t.sub.f.sup.U.fwdarw.D (m) being the arrival and exit times of a monomer with index m through nanopore 16 traveling in a downward, as shown in
ν.sub.dwell.sup.U.fwdarw.D(m)=t.sub.pore/W.sup.U.fwdarw.D(m) (2a)
ν.sub.dwell.sup.D.fwdarw.U(m)=t.sub.pore/W.sub.D.fwdarw.U(m) (2b)
[0040] The presence of tags with heavier mass (m.sub.tag>m.sub.bulk) and larger solvent friction (γ.sub.tag>γ.sub.bulk) introduces a large variation in the dwell time and, hence, a large variation in the dwell velocities of the dsDNA monomers and tags, as shown in
as shown in
[0041] If the dsDNA were a rigid rod, then the barcode distance (d.sub.mn.sup.U.fwdarw.D) between tags T.sub.m and T.sub.n is calculated by:
d.sub.mn.sup.U.fwdarw.D=ν.sub.mn.sup.U.fwdarw.D×τ.sub.mn.sup.U.fwdarw.D (3a)
ν.sub.mn.sup.U.fwdarw.D=½[ν.sub.dwell.sup.U.fwdarw.D(m)+ν.sub.dwell.sup.U.fwdarw.D(n)] (3b)
τ.sub.mn.sup.U.fwdarw.D=(t.sub.i.sup.U.fwdarw.D(n)=t.sub.i.sup.U.fwdarw.D(m)) (3c)
for U.fwdarw.D translocation; the same set of equations are derived for D.fwdarw.U translocation by interchanging the indices U to D and vice versa. Equations 3a-3c provide the shortest distance between the tags, but not necessarily the contour length, or the actual distance, between the tags. As such, such a calculation is likely to provide an underestimation of the barcodes.
[0042] Unlike a rigid rod, tension propagation is important in the semi-flexible dsDNA chain's motion in the presence of an external bias force, as the motion of the dsDNA subchain in the cis side decouples into two domains [8, 9]. As the dsDNA travels through the nanopore 16, after the tag 22 T.sub.m translates through the nanopore 16, the preceding monomers are quickly dragged into the nanopore 16 quickly by the tension front of the dsDNA, similar to an uncoiling effect of a rope pulled from one end. As such, faster motion occurs as the monomer strand translates through the nanopore 16, hitting a maximum at the subsequent tag 22 T.sub.m±1 with greater inertia and viscous drag. At this tension propagation time, the faster motion of the monomers (shown in
[0043] Accordingly, a first improved method for accurately determining tag 22 locations, without underestimations, includes measuring a barcode from known end-to-end tag 22 distances. By adding additional tags 22 disposed at the approximate ends of a dsDNA chain or by considering two end tags 22 (T.sub.1 and T.sub.8, with a distance therebetween being defined as d.sub.18≅L), an average velocity for the dsDNA chain is calculated by:
ν.sub.chain.sup.U.fwdarw.D≈ν.sub.18.sup.U.fwdarw.D=d.sub.18/τ.sub.18.sup.U.fwdarw.D (4)
where τ.sub.18.sup.U.fwdarw.D is the time delay of arrival for tags 22 T.sub.1 and T.sub.8 at the nanopore 16 for U.fwdarw.D scan direction. The barcode distance between tags 22 T.sub.m and T.sub.n is then calculated by multiplying the time delay with the ν.sub.18.sup.U.fwdarw.D velocity:
d.sub.mn.sup.U.fwdarw.D=ν.sub.18.sup.U.fwdarw.D×τ.sub.mn.sup.U.fwdarw.D (5)
The method is effective for estimating long-spaced barcodes; however, the method may be prone to overestimate barcode distances if multiple tags 22 are next to each other.
[0044] As such, a second improved method including a two-step process can be employed to correct for overestimations using the average scan time for the entire time, measured experimentally, to estimate the average velocity of the dsDNA chain. The scan length L.sub.scan is the maximum length up to which the dsDNA segment (including monomers and tags 22) remains captured inside nanopore 16 for scanning events. The scan length denotes the theoretical maximum beyond which the dsDNA will escape from the nanopore 16, L≈L.sub.scan. The average scanning velocity from a number of repeated scans, such as 500 independent scans, is calculated by Equation 6:
where τ.sub.scan (i) is the scan time for the i.sup.th event, N.sub.scan is the number of scanning events, and the average chain velocity is ν.sub.chain≈
[0045] During the first step of the method, the barcode distance between T.sub.m and T.sub.n is calculated using only tag velocities ν.sub.dwell (m) and ν.sub.dwell (n), using Equations 3a-3c. The estimated distance d.sub.mn is used to approximately calculate the number of monomers N.sub.mn=d.sub.mn.sup.U.fwdarw.D/b.sub.1
present in a segment joining the two tags T.sub.m and T.sub.n, with
b.sub.1
being the bond-length. In the second step, the segment velocity is re-calculated by accounting weighted velocity contribution from both tag 22 and non-tag counterpart as:
[0046] The same set of equations for D.fwdarw.U direction is obtained by interchanging U with D. The barcodes are finally calculated by multiplying the weighted two-step velocity by the tag time delay as:
d.sub.mn.sup.U.fwdarw.D=ν.sub.weight.sup.U.fwdarw.D×τ.sub.mn.sup.U.fwdarw.D (8)
[0047] The two-step method accurately captures barcode distances across the range of the dsDNA segment, independent of the proximity of the sequential tags. The underlying concept used in the single nanopore case is equally applicable to other multi-nanopore systems which use the dwell time and time of flight velocities to measure the barcodes.
Experimental Results
[0048] To test the methods described herein, an in silico coarse-grained (CG) model of a dsDNA segment including 1,024 monomers interspersed with 8 barcodes at different distances shown in
TABLE-US-00001 TABLE 1 Tag positions along dsDNA Tag # T.sub.1 T.sub.2 T.sub.3 T.sub.4 T.sub.5 T.sub.6 T.sub.7 T.sub.8 Position 154 369 379 399 614 625 696 901 Separation 154 215 10 20 215 11 71 205
TABLE-US-00002 TABLE 2 Barcodes measured from different methods Relative Method of Distance Equations One-Step Two-Step Tag # w.r.t T5 3a-3c Method Method T.sub.1 460 373 ± 122 459 ± 59 460 ± 43 T.sub.2 245 197 ± 67 250 ± 39 250 ± 32 T.sub.3 235 183 ± 63 237 ± 38 237 ± 32 T.sub.4 215 167 ± 54 211 ± 35 211 ± 30 T.sub.5 0 0 0 0 T.sub.6 11 11 ± 3 14 ± 4 11 ± 3 T.sub.7 82 68 ± 23 86 ± 23 86 ± 21 T.sub.8 287 230 ± 73 287 ± 65 287 ± 73
Conclusion
[0049] The barcode determination method described above, utilizing an in-silico Brownian dynamics scheme on a model dsDNA with known locations of the barcodes, a broad distribution of DNA tags can be accurately identified for species classification without overestimation or underestimation issues. The method includes the scanning of dsDNA through a cylindrical nanopore multiple times and uses the dwell time data of the tags in conjunction with a weighted extrapolation scheme to calculate the average velocities of the chain segment in between two tags. Using one of the tags as a reference, the barcodes are calculated multiplying time delays between sequential tags by the corresponding segment velocities using Equations 6 and 7.
References
[0050] All referenced publications are incorporated herein by reference in their entirety. Furthermore, where a definition or use of a term in a reference, which is incorporated by reference herein, is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
[0051] [1] R. Vernooy, E. Haribabu, M R. Muller, et al, PLoS Biol. 8(7), e1000417 (2010).
[0052] [2] N. J. Besansky, D. W. Severson, and M. T. Ferdig, Trends in Parasitology, 19, 545, (2003).
[0053] [3] N. Techen, I. Parveen, Z. Pan, and I. A Khan, Current Opinion in Biotechnology, 25, 103 (2014).
[0054] [4] X. Xiong, F. Yuan, M. Huang, L. Lu, X. Xiong, and J. Wen, J Food Prot, 82, 1200 (2019).
[0055] [5] E. H. -K. Wong, R. H. Hanner, Food Research International, 41, 828 (2008).
[0056] [6] P. D. N. Hebert, S. Ratnasingham, and J. R. de Waard, Proc R Soc Biol Sci Ser B, 270, 96 (2003).
[0057] [7] S. Pud, S. Chao, M. Belkin, D. Verschureren, T. Huijben, C. van Engelenburg, C. Dekker, and A. Aksimentiev, Nano Lett. 16, 8021 (2016).
[0058] [8] T. Sakaue, Phys. Rev. E 76, 021803 (2007).
[0059] [9] T. Ikonen, A. Bhattacharya, T. Ala-Nissila and W. Sung, J. Chem. Phys. 137, 085101 (2012).
[0060] [10] Y. Zhang, X. Liu, Y. Zhao, J. K. Yu, W. Reisner, and W. B. Dunbar, Small 14, 1801890 (2018).
[0061] [11] X. Liu, Y. Zhang, R. Nagel, W. Reisner, W. B. Dunbar, Small 15, 1901704 (2019).
[0062] [12] X. Liu, P. Zimny, Y. Zhang, A. Rana, R. Nagel, W. Reisner, and W. B. Dunbar, Small 16, 1905379 (2020).
[0063] The advantages set forth above, and those made apparent from the foregoing description, are efficiently attained. Since certain changes may be made in the above construction without departing from the scope of the invention, it is intended that all matters contained in the foregoing description or shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
[0064] It is also to be understood that the following claims are intended to cover all of the generic and specific features of the invention herein described, and all statements of the scope of the invention that, as a matter of language, might be said to fall therebetween.