BARCODED TRANSPOSASE COMPLEX AND APPLICATION THEREOF IN HIGH-THROUGHPUT SEQUENCING

Abstract

A barcoded transposase complex and an application thereof in high-throughput sequencing. Provided is a transposase recognition element, having the following structure: X(m)Y(f)N(n), in which X(m) represents a transposase recognition region of a double-stranded nucleic acid structure, Y(f) represents a spacer region of a single-stranded DNA structure, and N(n) represents a sample barcode of a single-stranded DNA structure. The high-molecular-weight DNA is processed using the barcoded transposase complex, to obtain a lot of barcoded DNA fragments. The barcoded DNA fragments obtained from each high-molecular-weight DNA are mixed to obtain a mixing sample. A carrier having a molecular barcode is adopted to capture. An exonuclease is adopted for processing, and then transposase is released. StLFR technology is adopted to construct a DNA library. The barcoded transposase complex can be applied to hybrid sequencing of a high-throughput sequencing platform.

Claims

1. A transposase recognition element, which is characterized by the following (a) and/or (b): (a) a transferred strand contains a fixed sequence; (b) a non-transferred strand contains a U base.

2. A transposase recognition element, which is characterized in that: the transposase recognition element has a structure of X(m)Y(f)N(n); wherein X(m) denotes a transposase recognition region and has a double-stranded nucleic acid structure; Y(f) denotes a spacer region and has a single-stranded DNA structure; N(n) denotes a sample barcode and has a single-stranded DNA structure; optionally, in the transposase recognition region, a portion of T in one strand is replaced with U.

3. (canceled)

4. A barcoded transposase complex, which is formed of a transposase and a transposase recognition element; wherein the transposase recognition element is the transposase recognition element according to claim 1.

5. A method for preparing a barcoded DNA fragment, comprising the following steps: providing high-molecular-weight DNA and treating with the barcoded transposase complex according to claim 4.

6. A method for constructing a DNA library, comprising the following steps in sequence: (1) providing high-molecular-weight DNA and preparing a barcoded DNA fragment using the method according to claim 5; and (2) treating with an exonuclease and releasing the transposase.

7. A method for constructing a DNA library, comprising the following steps in sequence: (1) providing high-molecular-weight DNA and preparing a barcoded DNA fragment using the method according to claim 5; (2) capturing with a carrier containing a molecular barcode; and (3) treating with an exonuclease and releasing the transposase.

8. A method for constructing a DNA library, comprising the following steps in sequence: (1) providing n pieces of high-molecular-weight DNA and preparing barcoded DNA fragments using the method according to claim 5, respectively, wherein n is a natural number greater than or equal to 2; (2) mixing the barcoded DNA fragments obtained after each high-molecular-weight DNA is subjected to step (1), to obtain a mixed sample; and (3) treating with an exonuclease and releasing the transposase; optionally, the method further comprises the following step: (4) performing library construction using a single-tube long fragment read (stLFR) technology to obtain the DNA library.

9. (canceled)

10. A method for constructing a DNA library, comprising the following steps in sequence: (1) providing n pieces of high-molecular-weight DNA and preparing barcoded DNA fragments using the method according to claim 5, respectively, wherein n is a natural number greater than or equal to 2; (2) mixing the barcoded DNA fragments obtained after each high-molecular-weight DNA is subjected to step (1), to obtain a mixed sample; (3) capturing the mixed sample obtained in step (2) with a carrier containing a molecular barcode; and (4) treating with an exonuclease and releasing the transposase; optionally, the method further comprises the following step: (5) performing library construction using a single-tube long fragment read (stLFR) technology to obtain the DNA library.

11. (canceled)

12. A kit for preparing a barcoded DNA fragment, comprising a transposase and a transposase recognition element, wherein the transposase recognition element is the transposase recognition element according to claim 1.

13. (canceled)

14. A kit for constructing a DNA library, comprising a transposase and a transposase recognition element, wherein the transposase recognition element is the transposase recognition element according to claim 1.

15. (canceled)

16. Use of the transposase recognition element according to claim 1 in DNA sequencing.

17. (canceled)

18. (canceled)

19. (canceled)

20. A barcoded transposase complex, which is formed of a transposase and a transposase recognition element; wherein the transposase recognition element is the transposase recognition element according to claim 2.

21. A method for preparing a barcoded DNA fragment, comprising the following steps: providing high-molecular-weight DNA and treating with the barcoded transposase complex according to claim 20.

22. A method for constructing a DNA library, comprising the following steps in sequence: (1) providing high-molecular-weight DNA and preparing a barcoded DNA fragment using the method according to claim 21; and (2) treating with an exonuclease and releasing the transposase.

23. A method for constructing a DNA library, comprising the following steps in sequence: (1) providing high-molecular-weight DNA and preparing a barcoded DNA fragment using the method according to claim 21; (2) capturing with a carrier containing a molecular barcode; and (3) treating with an exonuclease and releasing the transposase.

24. A method for constructing a DNA library, comprising the following steps in sequence: (1) providing n pieces of high-molecular-weight DNA and preparing barcoded DNA fragments using the method according to claim 21, respectively, wherein n is a natural number greater than or equal to 2; (2) mixing the barcoded DNA fragments obtained after each high-molecular-weight DNA is subjected to step (1), to obtain a mixed sample; and (3) treating with an exonuclease and releasing the transposase; optionally, the method further comprises the following step: (4) performing library construction using a single-tube long fragment read (stLFR) technology to obtain the DNA library.

25. A method for constructing a DNA library, comprising the following steps in sequence: (1) providing n pieces of high-molecular-weight DNA and preparing barcoded DNA fragments using the method according to claim 21, respectively, wherein n is a natural number greater than or equal to 2; (2) mixing the barcoded DNA fragments obtained after each high-molecular-weight DNA is subjected to step (1), to obtain a mixed sample; (3) capturing the mixed sample obtained in step (2) with a carrier containing a molecular barcode; and (4) treating with an exonuclease and releasing the transposase; optionally, the method further comprises the following step: (5) performing library construction using a single-tube long fragment read (stLFR) technology to obtain the DNA library.

26. A kit for preparing a barcoded DNA fragment, comprising a transposase and a transposase recognition element, wherein the transposase recognition element is the transposase recognition element according to claim 2.

27. A kit for constructing a DNA library, comprising a transposase and a transposase recognition element, wherein the transposase recognition element is the transposase recognition element according to claim 2.

28. Use of the transposase recognition element according to claim 2 in DNA sequencing.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0082] FIG. 1 is a structure diagram of elements of a barcoded transposase-loading fragment.

[0083] FIG. 2 is a structure diagram of a barcoded transposase complex.

[0084] FIG. 3 is a structure diagram of elements of a hybridization capture sequence-contained magnetic bead carrier.

[0085] FIG. 4 is a flowchart of library construction.

[0086] FIG. 5 is an electrophoresis diagram of step 11 according to Example 2.

[0087] FIG. 6 illustrates results of quantification using a Qubit™ double-stranded DNA high-sensitivity fluorescence quantification kit and calculation of a polymerase chain reaction (PCR) yield in Example 3.

[0088] FIG. 7 is a diagram illustrating results of electrophoresis detection in Example 3.

DETAILED DESCRIPTION

[0089] The following examples facilitate a better understanding of the present disclosure and do not limit the present disclosure. The experimental methods in the following examples are conventional methods unless otherwise specified. The experimental materials used in the following examples are purchased from conventional biochemical reagent stores unless otherwise specified. The quantitative experiments in the following examples are all provided with three repeated experiments, and the results are averaged. Unless otherwise specified, among the nucleic acid molecules in the examples, A refers to an adenine deoxyribonucleotide, C refers to a cytosine deoxyribonucleotide, G refers to a guanine deoxyribonucleotide, T refers to a thymine deoxyribonucleotide, and U refers to a uracil ribonucleotide.

Example 1. Establishment of Method

[0090] Transposase, a commonly used tool enzyme for next-generation library construction, can achieve rapid fragmentation of DNA. In the present disclosure, a barcoded transposase-loading fragment is designed and prepared. The barcoded transposase-loading fragment is self-assembled with a transposase to form a barcoded transposase complex, and when the barcoded transposase complex is subjected to a transposition reaction, high-molecular-weight DNA is fragmented and barcoded. Further, after the transposition reaction is performed, the transposase is not subjected to denaturation treatment and retains the integrity of the nucleic acid molecule fragments while occupying and protecting enzyme digestion recognition sites of the nucleic acid molecule fragments, protecting the nucleic acid molecule fragments from an action of an exonuclease.

[0091] 1. Preparation of barcoded DNA fragments

[0092] (1) Preparation of high-molecular-weight DNA

[0093] The high-molecular-weight DNA, also known as long-fragment DNA, is commonly greater than 40 Kb.

[0094] For example, the high-molecular-weight DNA may be genomic DNA obtained through DNA extraction of a biological sample.

[0095] (2) Preparation of a barcoded transposase-loading fragment

[0096] The barcoded transposase-loading fragment has a structure of X(m)Y(f)N(n).

[0097] X(m) denotes a transposase recognition region, which has a double-stranded nucleic acid structure (one strand consists of A, T, C and G, and the other strand consists of A, T, C, G and U) and a size of 19 bp.

[0098] Y(f) denotes a spacer region, which has a single-stranded DNA structure and a size of 15-30 nt (may specifically be 20 nt). The spacer region is used for separating the transposase recognition region and a sample barcode (reducing a direct effect of the sample barcode on the transposase) and may also be used for designing sequencing primers in a subsequent process.

[0099] N(n) denotes the sample barcode, which has a single-stranded DNA structure and a size of 8-12 nt (may specifically be 10 nt), where each nucleotide is any one of A, T, C and G. Each sample corresponds to a unique sample barcode for distinguishing a source of the sample.

[0100] Specifically, each sample barcode listed in Table 1 (in Table 1, the sequences are all in a 5′ .fwdarw.3′ direction) may be used.

TABLE-US-00001 TABLE 1 Name Sequence Molecular ATCGGACCTA barcode 01 Molecular GATTCCGTCC barcode 02 Molecular CGGCAGTAAG barcode 03 Molecular TCAATTAGGT barcode 04 Molecular CGGATACGAA barcode 05 Molecular GCTCGTTACC barcode 06 Molecular TTATACGTTG barcode 07 Molecular AACGCGACGT barcode 08 Molecular GCTAGCAGAA barcode 09 Molecular CTATCTTCCT barcode 10 Molecular AAGCAAGAGC barcode 11 Molecular TGCGTGCTTG barcode 12 Molecular CGGATTGCCG barcode 13 Molecular GAATCCTGAT barcode 14 Molecular TCTGGAATGA barcode 15 Molecular ATCCAGCATC barcode 16 Molecular CATCACTCAC barcode 17 Molecular CAGCTGACTC barcode 18 Molecular TTCGCAGACA barcode 19 Molecular TTGTACCAAT barcode 20 Molecular ACCACAATCG barcode 21 Molecular GGAAGTCTGT barcode 22 Molecular AGAGTGTGGA barcode 23 Molecular GCTTGTGGTG barcode 24 Molecular TTGTCCTCTA barcode 25 Molecular ATTCGCTAGG barcode 26 Molecular CGATGACTAC barcode 27 Molecular ACAGCTCAGC barcode 28 Molecular TATCTAGGTT barcode 29 Molecular GAGATGGCAA barcode 30 Molecular CGCAAGATCT barcode 31 Molecular GCCGATAGCG barcode 32 Molecular CCATCGTTGC barcode 33 Molecular TGAACGATTA barcode 34 Molecular TAGAGCGAAC barcode 35 Molecular ATGTGTGAGA barcode 36 Molecular ATCCTAACAG barcode 37 Molecular CGCGTCTGCG barcode 38 Molecular GATGATCCTT barcode 39 Molecular GCTCAACGCT barcode 40 Molecular ATGCATCTAA barcode 41 Molecular AGCTCTGGAC barcode 42 Molecular CTATCACGTG barcode 43 Molecular GGACTAGTGG barcode 44 Molecular GCCAAGTCCA barcode 45 Molecular CCTGTCAAGC barcode 46 Molecular TAGAGGTCTT barcode 47 Molecular TATGGCAACT barcode 48 Molecular CTGCGTACAT barcode 49 Molecular ATCTCATTAA barcode 50 Molecular AAGTGGCGCA barcode 51 Molecular GGCCTTAATG barcode 52 Molecular TCTGAGGCGG barcode 53 Molecular CGAGCCGATT barcode 54 Molecular GATAACCGGC barcode 55 Molecular TCAATATTCC barcode 56 Molecular TCCGTTGAAT barcode 57 Molecular CAGTACAGTT barcode 58 Molecular ATTGAGGTAC barcode 59 Molecular ATTAGAAGTC barcode 60 Molecular CAACGCTTCA barcode 61 Molecular GGATCGCACG barcode 62 Molecular TGCCTTCCGA barcode 63 Molecular GCGACATCGG barcode 64 Molecular CATTCTAAGT barcode 65 Molecular CAGGCTTGGA barcode 66 Molecular ATCATCGTCT barcode 67 Molecular GTCTTGTGAG barcode 68 Molecular AGTAGGAACG barcode 69 Molecular TCACAACCAC barcode 70 Molecular GCAGGCCTTC barcode 71 Molecular TGGCAAGCTA barcode 72 Molecular GAGCATTGTC barcode 73 Molecular TGTGATTAGC barcode 74 Molecular CCTATGGACT barcode 75 Molecular TAGGCGATAG barcode 76 Molecular AGACCACGAT barcode 77 Molecular GTATTAGCCA barcode 78 Molecular CTCTGCACTG barcode 79 Molecular ACCAGCCTGA barcode 80 Molecular GCGTGAGTAT barcode 81 Molecular CGCGGAGCAT barcode 82 Molecular CAAGTTCACA barcode 83 Molecular AGCACCTCTC barcode 84 Molecular TTACAGTGCA barcode 85 Molecular TTGCCTAGGC barcode 86 Molecular GCTATGATGG barcode 87 Molecular AATTACCATG barcode 88 Molecular AGACATGGTG barcode 89 Molecular CCAGACATAT barcode 90 Molecular ACGCTTCCTT barcode 91 Molecular GACGTCTTGA barcode 92 Molecular TACTGAGCGG barcode 93 Molecular TGTACACACC barcode 94 Molecular CTTACGTGAA barcode 95 Molecular GTGTGGAACC barcode 96 Molecular AAGAATACCT barcode 97 Molecular GTTGCATTCG barcode 98 Molecular CGCCGTTGAA barcode 99 Molecular TTCCGCCGAG barcode 100 Molecular CCATTACCGT barcode 101 Molecular ACGTCGGATC barcode 102 Molecular TGTATCGTGA barcode 103 Molecular GAAGAGAATC barcode 104 Molecular CATTAATTCT barcode 105 Molecular TGACGCTGGT barcode 106 Molecular GAGCCTGACG barcode 107 Molecular CTAGAGCAGG barcode 108 Molecular AGTTGAGTTA barcode 109 Molecular GCGGTCACTA barcode 110 Molecular TTCACTCCAC barcode 111 Molecular ACCATGAGAC barcode 112 Molecular TAGGTTGTTC barcode 113 Molecular CTGACTCTGG barcode 114 Molecular ACTGCCTGTT barcode 115 Molecular GTCATGGAGC barcode 116 Molecular GGATAGACAT barcode 117 Molecular CCTCGACAAG barcode 118 Molecular TACCGAAGCA barcode 119 Molecular AGATACTCCA barcode 120 Molecular TTGATCAAGG barcode 121 Molecular TGCCACTTCC barcode 122 Molecular GTAGAATGTT barcode 123 Molecular GACTCGCGTC barcode 124 Molecular AGTGTTATAG barcode 125 Molecular ACACGAGACT barcode 126 Molecular CATAGGCCGA barcode 127 Molecular CCGTCTGCAA barcode 128 Molecular ACTCATACGC barcode 129

[0101] (3) The barcoded transposase-loading fragment is co-incubated with a transposase to obtain a barcoded transposase complex.

[0102] (4) The high-molecular-weight DNA obtained in step (1) is fragmented and barcoded using the barcoded transposase complex obtained in step (3) to obtain a large number of barcoded DNA fragments, where each of the fragments has a size of 200-2000 bp. For each high-molecular-weight DNA, the used barcoded transposase complex contains a unique sample barcode so that the barcoded DNA fragments derived from each high-molecular-weight DNA contain the unique sample barcode and all the barcoded DNA fragments derived from each high-molecular-weight DNA contain the same sample barcode.

[0103] Note: after step (4) is completed, the transposase is not released.

[0104] 2. Sample mixing before hybridization capture

[0105] The products obtained after each high-molecular-weight DNA is subjected to step 1 are mixed to obtain a mixed sample.

[0106] 3. Hybridization capture of the barcoded DNA fragments

[0107] The mixed sample obtained in step 2 is taken and mixed with a high-throughput hybridization capture sequence-contained magnetic bead carrier (the high-throughput magnetic bead carrier includes a very large number types of hybridization capture sequence-contained magnetic bead carriers), and the hybridization capture sequence-contained magnetic bead carrier captured the barcoded DNA fragments through hybridization of DNA sequences.

[0108] The hybridization capture sequence-contained magnetic bead carrier is a magnetic bead to which a specific nucleic acid molecule has been attached. The specific nucleic acid molecule has a partially double-stranded structure. A segment at one end of a first strand is reverse complementary to a segment at one end of a second strand to form the partially double-stranded structure. The first strand is attached to the magnetic bead at its free end, and contains a molecular barcode (located in a non-double-stranded structure of the specific nucleic acid molecule) in the strand. The second strand contains a transposon capture region (located in the non-double-stranded structure of the specific nucleic acid molecule, where the transposon capture region is reverse complementary to a capture recognition region) at its free end.

[0109] Each magnetic bead contains multiple specific nucleic acid molecules that are the same (that is, all the specific nucleic acid molecules on each magnetic bead contain the same molecular barcode). For all hybridization capture sequence-contained magnetic bead carriers, other moieties of the specific nucleic acid molecules are the same except for the sequence of the molecular barcode. Hybridization capture sequence-contained magnetic bead carriers that contain the same specific nucleic acid molecule (that is, contain the same molecular barcode) are considered as one type of hybridization capture sequence-contained magnetic bead carrier.

[0110] 4. Removing excess oligonucleotides on the magnetic bead through enzyme digestion

[0111] Since the transposase in step 3 is not subjected to denaturation treatment, the transposase still retains the integrity of the DNA while occupying and protecting an enzyme digestion recognition site of the DNA. Moreover, only 1% of oligonucleotides on a magnetic bead modified with a large number of oligonucleotides with the same sequence can be used for binding to the DNA, and remaining 99% exposed oligonucleotides will participate in subsequent adapter ligation and PCR to compete with a real product. Therefore, the excess oligonucleotides on surface of the magnetic bead should be cleaved using exonuclease, while protecting the inserted DNA fragment from enzyme digestion of the exonuclease.

[0112] After the enzyme digestion, a denaturing agent for transposase is added to terminate the action of the exonuclease while denaturing the transposase so that the transposase is completely released from the DNA.

[0113] 5. The product in step 4 is taken, and library construction is performed using an stLFR technology to obtain a DNA library.

[0114] 6. The DNA library obtained in step 5 is taken and subjected to high-throughput sequencing. Then, sequencing results are attributed to each sample through the sample barcode, and short read length sequences generated through sequencing are spliced into original long-fragment DNA information through molecular barcode information carried on the stLFR magnetic bead, achieving haplotype sequencing.

[0115] A flowchart of the library construction is shown in FIG. 4.

Example 2. Specific Application of the Method

[0116] 1. Preparation of a barcoded transposase complex

[0117] (1) Preparation of a barcoded transposase-loading fragment

[0118] The barcoded transposase-loading fragment was formed of a single-stranded nucleic acid molecule A1 and a single-stranded nucleic acid molecule A2.

[0119] The barcoded transposase-loading fragment was prepared by a specific method as follows: the single-stranded nucleic acid molecule A1 and the single-stranded nucleic acid molecule A2 (both at a concentration of 100 μM) are mixed in equal volumes and subjected to annealing to obtain a product solution. Annealing parameters: at 70° C. for 3 min; cooled to 20° C. (at a cooling rate of 0.1° C./s), at 20° C. for 30 min, and held at 4° C. The product solution contained the barcoded transposase-loading fragment at a concentration of 50 μM.

TABLE-US-00002 Single-strandednucleic acid molecule A1 (Sequence 1): 5′Phos-CGATCCTTGGTGATCNNNNNNNNNN custom-character AGATGTGTATAAGAGACAG-3′. Single-strandednucleicacid molecule A2 (Sequence 2): 5′PhOS-CTGUCTCUTATACACAUCT-3′.

[0120] In the single-stranded nucleic acid molecule A1, 10 N underlined by the straight line constituted a sample barcode, where N represented any one of A, T, C and G. Each sample corresponded to a unique sample barcode for distinguishing a source of the sample.

[0121] The bold moiety of the single-stranded nucleic acid molecule A1 and the single-stranded nucleic acid molecule A2 formed a double-stranded structure (the double-stranded structure was a transposase recognition region), and the remaining moiety was a single-stranded structure. The moiety underlined by the squiggle of the single-stranded nucleic acid molecule A1 was a spacer region, and the italic moiety of the single-stranded nucleic acid molecule A1 was a capture recognition region.

[0122] (2) Preparation of the barcoded transposase complex

[0123] 16.52 μl of Tn5 transposase (purchased from BGI, Cat. No. BGE005, with a concentration of 1 U/μl), 17.08 μl of coupling buffer (6.3±0.1 g glycerol dissolved in 5 ml TE buffer), 17.92 μl of TE buffer and 4.48 μl of the product solution obtained in step (1) were uniformly mixed on ice and incubated at 30° C. for 1 h to obtain a product solution. The product solution was stored at −20° C. until use. The product solution contained the barcoded transposase complex.

[0124] 2. Fragmentation and barcoding of high-molecular-weight DNA

[0125] The high-molecular-weight DNA was: NA12878 (CORIELL, Cat. No. NA12878), genomic DNA of Escherichia coli DH5α, genomic DNA of Arabidopsis lyrata, and Lambda DNA (ThermoFisher, Cat. No. SD0011), respectively.

[0126] 10 ng of high-molecular-weight DNA was taken and added to a 0.2 ml centrifuge tube, and nuclease-free water was added to 36.8 μl. Then, 10 μl of 5×tagmentation buffer (purchased from BGI, Cat. No. BGE005B01) and 3.2 μl of 16-fold diluent (prepared by diluting the product solution obtained in (2) of step 1 to 16-fold volume with TE buffer, which was performed on ice) were added, uniformly mixed and incubated at 55° C. for 10 min to obtain a product solution. The 0.2 ml centrifuge tube containing the product solution was transferred to ice. The product solution contained barcoded DNA fragments.

[0127] For each type of high-molecular-weight DNA, the barcoded transposase complex used in the above steps contained a unique sample barcode so that the obtained barcoded DNA fragments contained the unique sample barcode.

[0128] 3. Preparation of hybridization capture sequence-contained magnetic bead carrier

[0129] The hybridization capture sequence-contained magnetic bead carrier was a magnetic bead to which a specific nucleic acid molecule had been attached. The specific nucleic acid molecule consisted of a single-stranded nucleic acid molecule B1 and a single-stranded nucleic acid molecule B2 and had a partially double-stranded structure. The 5′-end of the single-stranded nucleic acid molecule B1 was attached to the magnetic bead. A 3′-end segment of the single-stranded nucleic acid molecule B1 was reverse complementary to a 3′-end segment of the single-stranded nucleic acid molecule B2 to form the partially double-stranded structure. The single-stranded nucleic acid molecule B1 contained molecular barcode 1, molecular barcode 2 and molecular barcode 3 (located in a non-double-stranded structure of the specific nucleic acid molecule). In the single-stranded nucleic acid molecule B1, the 5′-end sequence (Sequence 3) was AAAAAAAAAATGTGAGCCAAGGAGTTG (located upstream of the three molecular barcodes). In the single-stranded nucleic acid molecule B2, the 5′-end contained a transposon capture region (located in the non-double-stranded structure of the specific nucleic acid molecule, where the transposon capture region was reverse complementary to the capture recognition region).

TABLE-US-00003 Single-stranded nucleic acid molecule B2 (Sequence 4): 5′- custom-character CCATAGTCCATGCTA-3′.

[0130] The region underlined by the straight line of the single-stranded nucleic acid molecule B2 was the moiety that was reverse complementary to the 3′-end segment of the single-stranded nucleic acid molecule B1. The region underlined by the squiggle of the single-stranded nucleic acid molecule B2 was the transposon capture region.

[0131] Each of the molecular barcode 1, the molecular barcode 2 and the molecular barcode 3 consisted of ten nucleotides, where each nucleotide was any one of A, T, C and G. A total of 1536 types of molecular barcodes 1, 1536 types of molecular barcodes 2 and 1536 types of molecular barcodes 3 were disposed. Each magnetic bead contained multiple specific nucleic acid molecules that were the same (that is, all the specific nucleic acid molecules on each magnetic bead contained the same molecular barcode 1, the same molecular barcode 2 and the same molecular barcode 3). Hybridization capture sequence-contained magnetic bead carriers that contained the same specific nucleic acid molecule (that is, contained the same molecular barcode 1, the same molecular barcode 2 and the same molecular barcode 3) were considered as one type of hybridization capture sequence-contained magnetic bead carrier. For each hybridization capture sequence-contained magnetic bead carrier, other moieties of the specific nucleic acid molecules were the same except for sequences of the molecular barcode 1, the molecular barcode 2 and the molecular barcode 3. There were 1536×1536×1536 types of magnetic bead carriers in total.

[0132] 4. Preparation of a mixed sample

[0133] The product solution of NA12878 obtained in step 2 and the product solution of the genomic DNA of Escherichia coli DH5α obtained in step 2 were taken and mixed in equal volumes to obtain a mixed sample 1.

[0134] The product solution of the genomic DNA of Escherichia coli DH5α obtained in step 2 and the product solution of the genomic DNA of Arabidopsis lyrata obtained in step 2 were taken and mixed in equal volumes to obtain a mixed sample 2.

[0135] The product solution of the genomic DNA of Escherichia coli DH5α obtained in step 2 and the product solution of Lambda DNA obtained in step 2 were taken and mixed in equal volumes to obtain a mixed sample 3.

[0136] The three mixed samples were placed on ice.

[0137] The three mixed samples obtained in step 4 were separately subjected to subsequent steps 5 to 10.

[0138] 5. Capture of the barcoded DNA fragments

[0139] (1) The hybridization capture sequence-contained magnetic bead carrier prepared in step 3 was taken and added to a 1.5 ml centrifuge tube (magnetic beads were in an amount of 30×1.1 million), the centrifuge tube was placed on a magnet for 2 min until the liquid was clear, and the supernatant was discarded. The beads were washed with 1X low salt wash buffer (LSWB), and the supernatant was discarded. The beads were washed again with 1X LSWB, and the supernatant was discarded.

[0140] (2) After step (1) was completed, the centrifuge tube was added with 55 μl of capture buffer (containing 100 mM Tris-HCl with a pH of 7.5, 200 mM MgCl.sub.2 and 0.1% Tween-20, and the balance was water) for resuspending.

[0141] (3) A new 1.5 ml centrifuge tube was taken and added with 50 μl suspension of the magnetic beads obtained in step (2) and 7.5 μl of a mixed sample obtained in step 4. The mixture was gently turned upside down ten times to be uniformly mixed, instantaneously centrifuged and incubated with rotation on a vertical mixer (incubated at 60° C. for 10 min and then at 45° C. for 50 min).

[0142] (4) After step (3) was completed, the centrifuge tube was taken and naturally cooled to room temperature, and added with 26 μl of ligation buffer I (containing 250 mM Tris-HCl with a pH of 7.5, 5 mM adenosine triphosphate (ATP) and 50 mM dithiothreitol (DTT), and the balance was water) and 4 μl of T4 DNA ligase (purchased from BGI, Cat. No. 01E004MM, with a concentration of 600 U/μl). The mixture was gently turned upside down ten times to be uniformly mixed, instantaneously centrifuged and incubated with rotation on a vertical mixer (incubated at 25° C. for 1 h).

[0143] 6. Removing excess oligonucleotides on the magnetic beads through enzyme digestion

[0144] (1) After step 5 was completed, the centrifuge tube was taken and placed on a magnet for 2 min until the liquid was clear, and the supernatant was discarded. The beads were washed with 1X LSWB, and the supernatant was discarded.

[0145] (2) After step (1) was completed, the centrifuge tube was placed on ice and added with 95 μl of digestion buffer I (containing 33 mM Tris-HCl with a pH of 7.5, 66 mM potassium acetate, 10 mM magnesium acetate and 0.5 mM DTT, and the balance was water) and 5 μl of an exonuclease mixture (containing 3.75 μl of exonuclease I and 1.25 μl of exonuclease III). The mixture was gently turned upside down ten times to be uniformly mixed, instantaneously centrifuged and incubated on a vertical mixer (incubated at 37° C. for 10 min). Exonuclease I: purchased from BGI, Cat. No. 01E010ML, with a concentration of 20 U/μl. Exonuclease III: purchased from BGI, Cat. No. 01E011HL, with a concentration of 100 U/μl.

[0146] 7. Release of the transposase through adding a denaturing agent

[0147] (1) After step 6 was completed, the centrifuge tube was added with 11 μl of 1% SDS aqueous solution, covered with a tube cap, shaken, uniformly mixed and incubated on a vertical mixer at room temperature for 10 min.

[0148] (2) After step (1) was completed, the centrifuge tube was instantaneously centrifuged and placed on a magnet for 2 min until the liquid was clear, and the supernatant was discarded.

[0149] (3) After step (2) was completed, the centrifuge tube was taken and washed three times. The steps of each washing were as follows: the centrifuge tube was added with 150 μl of 1X LSWB, shaken and placed on a magnet for 2 min until the liquid was clear, and the supernatant was discarded.

[0150] 8. Addition of an adapter

[0151] (1) After step 7 was completed, the centrifuge tube was taken and added with 20 μl of pre ligation buffer (containing 50 mM Tris-HCl with a pH of 7.5 and 20 mM MgCl.sub.2, and the balance was water) and 4 μl of pre ligation enzyme (single-strand DNA-binding (SSB) protein, purchased from BGI, Cat. No. BGE006, with a concentration of 500 μg/ml). The mixture was vortexed to be uniformly mixed and incubated on a vertical mixer at 37° C. for 30 min.

[0152] (2) After step (1) was completed, the centrifuge tube was taken and naturally cooled to room temperature, and added with 48 μl of ligation buffer II (containing 150 mM Tris-HCl with a pH of 7.8, 3 mM ATP, 1.5 mM DTT, 0.15 mM bovine serum albumin (BSA), 30 mM MgCl.sub.2 and 30% PEG8000, and the balance was water), 18 μl of an adapter solution and 10 μl of T4 DNA ligase (purchased from BGI, Cat. No. 01E004MM, with a concentration of 600 U/μl). The mixture was vortexed to be uniformly mixed and incubated on a vertical mixer at room temperature for 2 h.

[0153] The active ingredient provided by the adapter solution was adapter. In the adapter solution, the adapter had a concentration of 16.67 μM. The adapter consisted of a single-stranded DNA molecule adapter-1A and a single-stranded DNA molecule adapter-2A.

TABLE-US-00004 Adapter-1A (Sequence 5): 5′phos-TCTGCTGAGTCGAGAACGTCT/3ddC/-3′. Adapter-2A (Sequence 6): 5′-CTCGACTCAGCAG/3ddA/-3′.

[0154] “3ddC” refers to a cytosine dideoxyribonucleotide at the 3′-end, and “3ddA” refers to an adenine dideoxyribonucleotide at the 3′-end. 9. PCR amplification

[0155] (1) After step 8 was completed, the centrifuge tube was added with 80 μl of 1X LSWB and placed on a magnet for 2 min until the liquid was clear, and the supernatant was discarded.

[0156] (2) After step (1) was completed, the centrifuge tube was added with 180 μl of 1X LSWB and placed on a magnet for 2 min until the liquid was clear, and the supernatant was discarded.

[0157] (3) After step (2) was completed, the centrifuge tube was added with 2.25 μl of PCR enzyme and 147.75 μl of PCR buffer, uniformly mixed and subjected to the PCR amplification.

[0158] PCR enzyme: PfuTurbo Cx Hotstart DNA polymerase, purchased from Agilent Technologies, Inc., Cat. No. 600414, with a concentration of 2.5 U/μl.

[0159] PCR buffer contained 5% dimethylsulfoxide (DMSO), 1 M betaine, 6 mM MgSO.sub.4, 0.6 mM deoxyribonucleoside triphosphate (dNTP), 0.5 μM PCR primer-F and 0.5 μM PCR primer-R.

TABLE-US-00005 PCR primer-F (Sequence 7): 5′-TGTGAGCCAAGGAGTTG-3′. PCR primer-R (Sequence 8): 5′Phos-GAGACGTTCTCGACTCAGCAGA-3′.

[0160] Reaction parameters for the PCR amplification: hot cap function was performed at 105° C.; at 98° C. for 3 min; at 95° C. for 30s, at 58° C. for 30s, at 72° C. for 2 min, nine cycles; at 72° C. for 10 min; and held at 4° C.

[0161] (4) After step (3) was completed, the centrifuge tube was placed on a magnet for 2 min until the liquid was clear, and the supernatant was collected.

[0162] 10. Purification of the PCR product

[0163] The supernatant obtained in step 9 was taken and purified using DNA clean beads to obtain a product solution (the solvent was TE buffer), that is, a library solution.

[0164] The library solution was taken and quantified using a Qubit™ double-stranded DNA high-sensitivity fluorescence quantification kit, and the DNA concentration was ≥3 ng/μL.

[0165] 11. The library solution obtained in step 10 was taken and detected through electrophoresis.

[0166] The results are shown in FIG. 5. In FIG. 5, Marker is GeneRuler 1 kb Plus DNA Ladder, the lane 1 corresponds to a library solution obtained from the mixed sample 1, the lane 2 corresponds to a library solution obtained from the mixed sample 2, and the lane 3 corresponds to a library solution obtained from the mixed sample 3.

Example 3. An artificial sequence has higher interruption efficiency than a natural transposase recognition sequence

[0167] 1. Preparation of a barcoded transposase complex C

[0168] (1) Preparation of a barcoded transposase-loading fragment

[0169] The barcoded transposase-loading fragment was formed of a single-stranded nucleic acid molecule A1 and a single-stranded nucleic acid molecule C (a natural transposase recognition sequence).

[0170] The barcoded transposase-loading fragment was prepared by a specific method as follows: the single-stranded nucleic acid molecule A1 and the single-stranded nucleic acid molecule C (both at a concentration of 100 NM) were mixed in equal volumes and subjected to annealing to obtain a product solution. Annealing parameters: at 70° C. for 3 min; cooled to 20° C. (with a cooling rate of 0.1° C./s), at 20° C. for 30 min, and held at 4° C. The product solution contained the barcoded transposase-loading fragment at a concentration of 50 μM.

TABLE-US-00006 Single-strandednucleic acid molecule A1 (Sequence 1): 5′Phos-CGATCCTTGGTGATCNNNNNNNNNN custom-character AGATGTGTATAAGAGACAG-3′. Single-stranded nucleic acid molecule C (Sequence 9): 5′Phos-CTGTCTCTTATACACATCT-3′.

[0171] In the single-stranded nucleic acid molecule A1, 10 N underlined by the straight line constituted a sample barcode, where N represented any one of A, T, C and G. Each sample corresponded to a unique sample barcode for distinguishing a source of the sample.

[0172] The bold moiety of the single-stranded nucleic acid molecule A1 and the single-stranded nucleic acid molecule C formed a double-stranded structure (the double-stranded structure was a transposase recognition region), and the remaining moiety was a single-stranded structure. The moiety underlined by the squiggle of the single-stranded nucleic acid molecule A1 was a spacer region, and the italic moiety of the single-stranded nucleic acid molecule A1 was a capture recognition region.

[0173] (2) Preparation of the barcoded transposase complex C

[0174] 16.52 μl of Tn5 transposase (purchased from BGI, Cat. No. BGE005, with a concentration of 1 U/μl), 17.08 μl of coupling buffer (6.3+0.1 g glycerol dissolved in 5 ml TE buffer), 17.92 of μl TE buffer and 4.48 μl of the product solution obtained in step (1) were uniformly mixed on ice and incubated at 30° C. for 1 h to obtain a product solution C. The product solution C was stored at −20° C. until use. The product solution C contained the barcoded transposase complex C.

[0175] 2. Preparation of a barcoded transposase complex A

[0176] (1) Preparation of a barcoded transposase-loading fragment

[0177] The barcoded transposase-loading fragment was formed of the single-stranded nucleic acid molecule A1 and a single-stranded nucleic acid molecule A2.

[0178] The barcoded transposase-loading fragment was prepared by a specific method as follows: the single-stranded nucleic acid molecule A1 and the single-stranded nucleic acid molecule A2 (both at a concentration of 100 NM) were mixed in equal volumes and subjected to annealing to obtain a product solution. Annealing parameters: at 70° C. for 3 min; cooled to 20° C. (with a cooling rate of 0.1° C./s), at 20° C. for 30 min, and held to 4° C. The product solution contained the barcoded transposase-loading fragment at a concentration of 50 μM.

TABLE-US-00007 Single-stranded nucleic acid molecule A1 (Sequence 1): 5′Phos-CGATCCTTGGTGATCNNNNNNNNNN custom-character AGATGTGTATAAGAGACAG-3′. Single-stranded nucleic acid molecule A2 (Sequence 2): 5′PhOS-CTGUCTCUTATACACAUCT-3′.

[0179] In the single-stranded nucleic acid molecule A1, 10 N underlined by the straight line constituted a sample barcode, where N represented any one of A, T, C and G. Each sample corresponded to a unique sample barcode for distinguishing a source of the sample.

[0180] The bold moiety of the single-stranded nucleic acid molecule A1 and the single-stranded nucleic acid molecule A2 formed a double-stranded structure (the double-stranded structure was a transposase recognition region), and the remaining moiety was a single-stranded structure. The moiety underlined by the squiggle of the single-stranded nucleic acid molecule A1 was a spacer region, and the italic moiety of the single-stranded nucleic acid molecule A1 was a capture recognition region.

[0181] (2) Preparation of the barcoded transposase complex A

[0182] 16.52 μl of Tn5 transposase (purchased from BGI, Cat. No. BGE005, with a concentration of 1 U/μl), 17.08 μl of coupling buffer (6.3+0.1 g glycerol dissolved in 5 ml TE buffer), 17.92 μl of TE buffer and 4.48 μl of the product solution obtained in step (1) were uniformly mixed on ice and incubated for 1 h at 30° C. to obtain a product solution A. The product solution A was stored at −20° C. until use. The product solution A contained the barcoded transposase complex A.

[0183] 3. Fragmentation and barcoding of high-molecular-weight DNA

[0184] The high-molecular-weight DNA was: NA12878 (CORIELL, Cat. No. NA12878).

[0185] 10 ng of high-molecular-weight DNA was taken and added to a 0.2 ml centrifuge tube, and nuclease-free water was added to 38 μl. Then, 10 μl of 5×tagmentation buffer (purchased from BGI, Cat. No. BGE005B01) and 2 μl of 16-fold diluent (prepared by diluting the product solution C obtained in step 1 or the product solution A obtained in step 2 to 16-fold volume with TE buffer, which was performed on ice) were added, uniformly mixed and incubated at 55° C. for 10 min to obtain a product solution. The 0.2 ml centrifuge tube containing the product solution was transferred to ice. The product solution contained barcoded DNA fragments.

[0186] 3. Release of the transposase through adding a denaturing agent

[0187] (1) After step 2 was completed, the centrifuge tube was added with 5 μl of 1% SDS aqueous solution, covered with a tube cap, shaken, uniformly mixed and incubated on a vertical mixer at room temperature for 10 min.

[0188] (2) After step (1) was completed, the centrifuge tube was instantaneously centrifuged and added with 67 μl of DNA clean beads for purification, and the mixture was dissolved in 20 μl of TE buffer.

[0189] 4. Addition of an adapter

[0190] (1) After step 3 was completed, a new centrifuge tube was taken and added with 5 μl of product solution in step 3, 25 μl of ligation buffer II (containing 150 mM Tris-HCl with a pH of 7.8, 3 mM ATP, 1.5 mM DTT, 0.15 mM BSA, 30 mM MgCl.sub.2 and 30% PEG8000, and the balance is water), 1.5 μl of an adapter solution, 1 μl of T4 DNA ligase (BGI, Cat. No. 01E004MM, with a concentration of 600 U/μl) and 18.5 μl of water. The mixture was vortexed to be uniformly mixed and incubated at room temperature for 1 h.

[0191] The active ingredient provided by the adapter solution was adapter. In the adapter solution, the adapter had a concentration of 16.67 μM. The adapter consisted of a single-stranded DNA molecule adapter-1A and a single-stranded DNA molecule adapter-2A.

TABLE-US-00008 Adapter-1A (Sequence 5): 5′phos-TCTGCTGAGTCGAGAACGTCT/3ddC/-3′. Adapter-2A (Sequence 6): 5′-CTCGACTCAGCAG/3ddA/-3′.

[0192] “3ddC” refers to a cytosine dideoxyribonucleotide at the 3′-end, and “3ddA” refers to an adenine dideoxyribonucleotide at the 3′-end.

[0193] (2) After step (1) was completed, 60 μl of DNA clean beads were added for purification, and the mixture was dissolved in 20 μl of TE buffer. 5. PCR amplification

[0194] (1). The product solution in step 4 was added with 1 μl of PCR enzyme and 25 μl of PCR buffer 2, uniformly mixed and subjected to the PCR amplification.

[0195] PCR enzyme: PfuTurbo Cx Hotstart DNA polymerase, purchased from Agilent Technologies, Inc., Cat. No. 600414, with a concentration of 2.5 U/μl.

[0196] PCR buffer 2 contained 10% DMSO, 2 M betaine, 12 mM MgSO.sub.4, 1.2 mM dNTP, 1 μM PCR primer 2-F and 1 μM PCR primer-R.

TABLE-US-00009 PCR primer 2-F (Sequence 10): 5′-TTGTCTTCCTAAGATGTGTATAAGAGACAG-3′. PCR primer-R (Sequence 8): 5′-GAGACGTTCTCGACTCAGCAGA-3′.

[0197] Reaction parameters for the PCR amplification: hot cap function was performed at 105° C.; at 98° C. for 3 min; at 95° C. for 30s, at 58° C. for 30s, at 72° C. for 2 min, eleven cycles; at 72° C. for 10 min; and held at 4° C.

[0198] 6. Purification of the PCR product

[0199] The product obtained in step 5 was taken and purified using DNA clean beads to obtain 20 μl product solution (the solvent was TE buffer).

[0200] The product solution in step 6 was taken and quantified using a Qubit™ double-stranded DNA high-sensitivity fluorescence quantification kit. A PCR yield was calculated after the quantification. The results are shown in FIG. 6. In FIG. 6, 1 and 2 correspond to the product solution C obtained in step 1 (two repetitions, respectively), and 3 and 4 correspond to the product solution A obtained in step 2 (two repetitions, respectively).

[0201] The product solution in step 6 was taken and detected through electrophoresis. The results are shown in FIG. 7. In FIG. 7, Marker is GeneRuler 1 kb Plus DNA Ladder, lanes 1 and 2 correspond to the product solution C obtained in step 1 (two repetitions, respectively), and lanes 3 and 4 correspond to the product solution A obtained in step 2 (two repetitions, respectively).

INDUSTRIAL APPLICATION

[0202] The present disclosure has the following functions: (1) the present disclosure provides a stLFR-based multi-sample mixed library construction technology, which successfully solves the problems of mixed library construction and sequencing of large samples; (2) the present disclosure may significantly reduce the complexity of library construction, improve throughput of the library construction, improve a utilization rate of a sequencing instrument and reduce costs of library construction and sequencing for a single sample; (3) the present disclosure is applicable to resequencing and de novo assembly of samples with a small genome and samples with a requirement for a specific amount of data; (4) the present disclosure may further reduce an initial starting amount of a single sample to less than 1.5 ng, which is applicable to resequencing and de novo assembly of rare samples and samples in very low biomass; and (5) high-throughput automated library construction is convenient to be achieved.

BARCODED TRANSPOSASE COMPLEX AND APPLICATION THEREOF IN HIGH-THROUGHPUT SEQUENCING

Inventors

Cpc classification

Classification Explorer

C12N15/1065

CHEMISTRY; METALLURGY

Classification Explorer

C12N9/00

CHEMISTRY; METALLURGY

Classification Explorer

C40B70/00

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1093

CHEMISTRY; METALLURGY

Classification Explorer

C40B20/04

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/66

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/10

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/10

CHEMISTRY; METALLURGY

Abstract

Claims

Description