Sequencing library, and preparation and use thereof

11702690 · 2023-07-18

Assignee

Inventors

Cpc classification

International classification

Abstract

The present invention discloses a sequencing library comprising a nucleotide sequence. The sequence comprises a linker sequence and two target sequences. Two ends of the linker sequence are respectively linked to the target sequences and the two target sequences are direct repeat sequences. The present invention further discloses preparation and use of the sequencing library. The present invention overcomes the high error rate problem of current DNA sequencing technologies, especially in a way of very low coverage bias, and can be used to detect low frequency mutations in different kinds of samples.

Claims

1. A method for preparing a polynucleotide for sequencing, the polynucleotide comprising a linker and exactly two target sequences, wherein the two target sequences are respectively linked to two ends of the linker, the nucleotide sequence of each of the target sequences is different from the nucleotide sequence of the linker, and the two target sequences are direct repeat sequences, the method comprising: obtaining a double-stranded circular polynucleotide having nicks or gaps in both strands; and subjecting the double-strands circular polynucleotide with nicks or gaps in both strands to strand displacement amplification, forming the polynucleotide for sequencing.

2. The method of claim 1, wherein a reverse complementary region exists in the linker.

3. The method of claim 1, wherein one end of at least one of the target sequences opposing the end linked with the linker is further linked with an additional sequence which is not a target sequence, and at least part of the region of the additional sequence is the same as part of a region of the linker.

4. The method of claim 1, wherein a length of the target sequence is less than a sequencing read length of a DNA sequencing machine.

5. The method of claim 3, wherein a sum of lengths of the additional sequence and the target sequences is less than a sequencing read length of a DNA sequencing machine.

6. The method of claim 1, wherein a length of the target sequence is less than a sequencing read length of a DNA sequencing machine.

7. A method for preparing a polynucleotide for sequencing, the polynucleotide consisting of a linker and exactly two target sequences, wherein the two target sequences are respectively linked to the two ends of the linker, the nucleotide sequence of each of the target sequence is different from the nucleotide sequence of the linker, and the two sequences are direct repeat sequences, the method comprising: obtaining a double-stranded circular polynucleotide having nicks or gaps in both strands; and subjecting the double-strands circular polynucleotide with nicks or gaps in both strands to strand displacement amplification, forming the polynucleotide for sequencing.

8. A method for preparing a sequencing library, comprising the steps of: obtaining a double-stranded circular polynucleotide having nicks or gaps in both strands; and subjecting the double-strands circular polynucleotide with nicks or gaps in both strands to strand displacement amplification, forming polynucleotide, subjecting the polynucleotide to a standard high-throughput sequencing library preparation; wherein the polynucleotide comprises a linker and two target sequences, two ends of the linker are respectively linked to the target sequences, and the two target sequences are direct repeat sequences.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 is a flow chart of a process for constructing a sequencing library according to the present invention (by using a primer without a nicking base). A DNA macromolecule is fragmented and ligated to an adaptor with a nicking base (such as dUTP, 8-oxo-dGTP, nicking endonuclease recognition site, etc.) followed by single-stranded circularization. A complementary strand of the circularized DNA molecule is synthesized by using an ordinary primer without a nicking base, a gap is generated by nicking (where according to the nicking base in the adaptor, an appropriate nicking mode is selected), and the sequence consisting of the linker and two copies of the target sequence is subjected to strand displacement. A standard high-throughput sequencing library is constructed by using the double-stranded DNA after the strand displacement, and sequenced, and the data is analyzed.

(2) FIG. 2 is a flow chart of a process for constructing a sequencing library according to the present invention (by using a primer with a nicking base). A DNA macromolecule is fragmented and ligated to an adaptor with a nicking base (such as dUTP, 8-oxo-dGTP, and nicking endonuclease recognition site, etc.), followed by single-stranded circularization. A complementary strand of the circularized DNA molecule is synthesized by using a primer with a nicking base, a nick or a gap is generated by nicking (where according to the nicking base in the adaptor, an appropriate nicking mode is selected), and the sequence consisting of the linker and two copies of the target sequence is subjected to strand displacement. A standard high-throughput sequencing library is constructed by using the double-stranded DNA after the strand displacement, and sequenced, and the data is analyzed.

(3) FIG. 3 is a flow chart of a process for constructing a sequencing library according to the present invention (by using a primer with a nicking base). A DNA macromolecule is fragmented and ligated to an adaptor with a nicking base (such as dUTP, 8-oxo-dGTP, and nicking endonuclease recognition site, etc.), followed by double-stranded circularization. For the circularized DNA molecule, a gap is generated by nicking (where according to the nicking base in the adaptor, an appropriate nicking mode is selected), and the sequence consisting of the linker and two copies of the target sequence is subjected to strand displacement synthesis. A standard high-throughput sequencing library is constructed by using the double-stranded DNA after the strand displacement, and sequenced, and the data is analyzed.

(4) FIG. 4 shows use of the method in screening the genes of interest, where a complementary strand of a circularized DNA molecule is synthesized by using a primer matching with the gene of interest (one or more), then nicking is performed, and a sequencing library is constructed after strand displacement synthesis, thereby effectively enriching the genes of interest and realizing the sequencing of the genes of interest.

DETAILED DESCRIPTION

(5) The implementations of the present invention are described in detail below with reference to the embodiments. However, those skilled in the art will understand that the following embodiments are provided solely for illustrating the present invention and are not intended to limit the scope of the present invention. In the embodiments, where no specific conditions are given, normal conditions or the conditions suggested by the manufacturer are followed. Reagents or instruments without specified manufacturers are all commercially available conventional products.

(6) One of the innovative points of the present invention is that a double-stranded circular DNA molecule with double nicks/gaps, triple nicks/gaps or multiple nicks/gaps is obtained by ligating a short DNA molecule with an adaptor sequence and nicking after single-stranded or double-stranded circularization, and then amplified by using a strand displacement enzyme to obtain a sequence consisting of two target sequences which have at least partially the same region connected by one linker, and a sequencing library is constructed and then sequenced. Specifically, the present invention can be implemented through at least the following solutions.

(7) Scheme 1 (Double-Gap Scheme by Single-Stranded Circularization):

(8) The DNA is first randomly fragmented into a fragment having a length that is less than half the sequencing read length of a next generation sequencing machine (where the sum of the length after the fragmentation and the length of the 5′ adaptor sequence is preferably less than half the read length), and then the adaptor sequence is ligated, where the adaptor sequence contains a nicking base (e.g., dUTP, 8-oxo-dGTP, and a nicking endonuclease recognition site, etc.). The DNA is denatured at a high temperature and then cooled immediately to form a single-stranded sequence. The single-stranded DNA containing the adaptor sequence is circularized by using a single-stranded ligase. A complementary strand of the circularized DNA molecule is synthesized by using an ordinary primer without a nicking base, a nick/gap is generated by nicking (where according to the nicking base in the adaptor, an appropriate nicking mode is selected), and the sequence is subjected to strand displacement synthesis. A standard high-throughput sequencing library is constructed by using the double-stranded DNA after the strand displacement, and sequenced, and the data is analyzed.

(9) Scheme 2 (Triple-Gap and Multi-Gap Scheme by Single-Stranded Circularization):

(10) The DNA is first randomly fragmented into a fragment having a length that is less than half the sequencing read length of a next generation sequencing machine (where the sum of the length after the fragmentation and the length of the 5′ adaptor sequence is preferably less than half the read length), and then the adaptor sequence is ligated, where the adaptor sequence contains a nicking base (e.g., dUTP, 8-oxo-dGTP, and a nicking endonuclease recognition sites, etc., and the number of the nicking base is not limited). The DNA is denatured at a high temperature and then cooled immediately to form a single-stranded sequence. The single-stranded DNA containing the adaptor sequence is circularized by using a single-stranded ligase. A complementary strand of the circularized DNA molecule is synthesized by using a primer with a nicking base (e.g., dUTP, 8-oxo-dGTP, and a nicking endonuclease recognition sites, etc., and the number of the nicking base is not limited), a nick/gap is generated by nicking (where according to the nicking base in the adaptor, an appropriate nicking mode is selected), and the sequence is subjected to strand displacement synthesis. A standard high-throughput sequencing library is constructed by using the double-stranded DNA after the strand displacement and sequenced, and the data is analyzed.

(11) Scheme 3 (Double-Stranded Circularization)

(12) The DNA is first randomly fragmented into a fragments having a length that is less than half the sequencing read length of a next generation sequencing machine (where the sum of the length after the fragmentation and the length of the 5′ adaptor sequence is preferably less than half the read length), and then the adaptor sequence is ligated, where the adaptor sequence contains a nicking base (e.g., dUTP, 8-oxo-dGTP, and a nicking endonuclease recognition site, etc.), or the DNA molecule or the adaptor sequence is dephosphorylated during the circularization. A DNA ligase is used for double-stranded circularization. For the circularized DNA molecule, a gap is generated by nicking (where according to the nicking base in the adaptor, an appropriate nicking mode is selected, and if there is a gap in the adaptor, or the adaptor is dephosphorylated, the nicking is omitted), and the sequence is subjected to strand displacement synthesis. A standard high-throughput sequencing library is constructed by using the double-stranded DNA after the strand displacement, and sequenced, and the data is analyzed.

EXAMPLE 1

Construction of a Whole Genomic DNA Library According to Scheme 1 (Double-Gap Scheme) (Illumina Platform)

(13) 1) DNA Fragmentation

(14) Instruments and reagents:

(15) Ultrasonic breaking device: Covaris: S2 Focused-ultrasonicator

(16) Breaking tube: Covaris Microtube 6*16 mm, catalog #: 520045

(17) QIAGEN MinElute Gel Extraction Kit (250), Catalog #: 28606

(18) Takara 20 bp DNA Ladder (Dye Plus), Takara Code, 3420A

(19) 5 μg of the purified PhiX 174 genomic DNA was broken into a fragment of 150-200 bp by using an ultrasonic breaking device (Covaris S2 Focused-ultrasonicator) (Intensity: 5, Duty Cycle: 10%, Cycles per Burst: 200, Temperature: 4° C. , time: 60s, number of cycles: 5) in 50 μl of a breaking system.

(20) After 4% agarose gel electrophoresis (80V, 70 min; 1× TAE), and gel extraction (QIAGEN MinElute Gel Extraction Kit), the fragment of 60-90 bp was recovered (Takara 20 bp DNA Ladder), as detailed in the instruction of QIAGEN MinElute Gel Extraction Kit.

(21) 2) End Repair A-Tailing

(22) Reagents: New England Biolabs: NEBNextx® Ultra™ DNA Library Prep Kit for Illumina®, Catalog #: E7370S

(23) DNA fragment: 55.5 μl

(24) End Prep Enzyme Mix: 3 μl

(25) End Repair Reaction Buffer (10×): 6.5 μl

(26) In total: 65 μl

(27) 30 min at 20° C., and 30 min at 65° C.

(28) 3) Ligation of the Adaptor Sequence

(29) Reagents: New England Biolabs: NEBNext® Ultra™ DNA Library Prep Kit for Illumina®, Catalog #: E7370S

(30) Blunted DNA: 65 μl

(31) Blunt/TA Ligase Master Mix: 15 μl

(32) Ligation Enhancer: 1 μl

(33) Adaptor sequence: UO-A (50 pmol): 2 μl

(34) In total: 83 μl

(35) 30 min at 20° C., 5 min at 65° C., followed by immediately standing on the ice for 3 min.

(36) The product was purified with Agencourt AMPure XP magnetic beads (Beckman Coulter, Inc).

(37) Adaptor sequences: UO-A was formed by mixing 100 pmol of UO-adaptor1 (dissolved in an annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) and 100 pmol of UO-adapter2 (dissolved in an annealing buffer: 10 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.1 mM NaCl) in equal volumes and annealing (5 min at 94° C., and then gradually cooling to 25° C. at 0.1° C. per second).

(38) TABLE-US-00001 UO-adaptor1: (SEQ ID NO: 1) 5′-pGATCAGTCGTACGTGCTTACTCTCAATAGCAGCTT-3′ UO-adaptor2: (SEQ ID NO: 2) 5′-pGTGGGCAGTCGGTGAACGACTGAUCT-3′

(39) Note: Adaptor sequences include, but are not limited to, the sequences of UO-adapter1 and UO-adapter2 in the example. The same below.

(40) 4) Single-Stranded Circularization

(41) New England Biolabs: Exonuclease I (E. coli), Catalog #: M0293

(42) New England Biolabs: Exonuclease III (E. coli), Catalog #: M0206

(43) Epicentre: CircLigase II ssDNA Ligase, Catalog #: CL9025K

(44) DNA: 24 μl

(45) 3 min at 95° C., followed by immediately standing on the ice for 3 min

(46) 10× circligase buffer: 6 μl

(47) 10 mmol MnCl.sub.2: 1.5 μl

(48) Circligase (100 Oil): 1.5 μl

(49) 2 h at 60° C., and 10 min at 80° C.

(50) Digestion of linear and dimeric DNA:

(51) Exonuclease I (E. coli): 1 μl

(52) Exonuclease III (E. coli): 1 μl

(53) 1 h at 37° C.

(54) The product was purified with MinElute Reaction Cleanup Kit.

(55) 5) Complementary Strand Synthesis

(56) New England Biolabs: Klenow Fragment (3′.fwdarw.5′ exo-), Catalog #: M0212S

(57) New England Biolabs: USER™ Enzyme, Catalog #: M5505S

(58) NEB buffer 4: 2 μl

(59) primer (UO-p1, 10 uM): 1 μl

(60) DNA: 15.8 μl

(61) 3 min at 95° C., followed by immediately standing on the ice for 3 min.

(62) After that, the following was added:

(63) 2.5 mM dNTP: 0.5 μl

(64) 100× BSA: 0.2 μl

(65) Klenow Fragment (3′.fwdarw.5′ exo-): 1 μl

(66) In total: 20 μl

(67) 30 min at 20° C. and 20 min at 75° C.

(68) USER™ Enzyme: 1 μl

(69) 30 min at 37° C.

(70) The product was purified with Agencourt AMPure XP magnetic beads (Beckman Coulter, Inc).

(71) TABLE-US-00002 UO-p1: (SEQ ID NO: 3) 5′-AGCACGTACGACTGATCT-3′

(72) 6) Strand Displacement Synthesis

(73) New England Biolabs: Bst 2.0 WarmStart® DNA Polymerase, Catalog #: M0538S

(74) DNA: 16.5 ∥l

(75) Isothermal Amplification Buffer: 2 μl

(76) 2.5 mM dNTP: 0.5 μl

(77) Bst 2.0 WarmStart® DNA Polymerase: 0.5 μl

(78) 30 min at 60° C.

(79) The product was purified with Agencourt AMPure XP magnetic beads (Beckman Coulter, Inc).

(80) 7) Construction of the Illumina Library by Using the Sequence

(81) Commercial kits for constructing standard Illumina libraries such as TruSeq DNA Sample Preparation Kits and so on were used. The construction included the following steps specifically:

(82) (1) End repair A-tailing (the same as the “end repair A-tailing” section above)

(83) (2) Ligation of adaptor sequence for sequencing

(84) Blunted DNA: 65 μl

(85) Blunt/TA Ligase Master Mix: 15 μl

(86) Ligation Enhancer: 1 μl

(87) NEXTflex™ DNA Barcodes (Bioo Scientific Corporation, Catalog #:514101): 0.5 μl, in total: 83 μl

(88) 30 min at 20° C.

(89) The product was purified with Agencourt AMPure XP magnetic beads (Beckman Coulter, Inc).

(90) (3) PCR amplification

(91) DNA: 24 μl

(92) NEXTflex™ Primer Mix (Bioo Scientific Corporation, Catalog #: 514101): 1 μl

(93) KAPA HiFi HotStart ReadyMix (Kapa Biosystems, Catalog #: KK2601): 25 μl

(94) In total: 50 μl

(95) Cycling conditions for PCR amplification:

(96) pre-denaturation for 45 s at 98° C., amplification by 13 cycles of (15 s at 98° C., 30 s at 65° C., and 60 s at 72° C.), 4 min at 72° C., cooling at 4° C.

(97) The product was purified with Agencourt AMPure XP magnetic beads (Beckman Coulter, Inc).

(98) After 2% agarose gel electrophoresis and gel extraction (QIAGEN MinElute Gel Extraction Kit), the fragment of 300-500 bp was recovered.

(99) The eluted DNA was the constructed library which can be sequenced on a next generation sequencing platform.

EXAMPLE 2

Construction of a Whole Genomic DNA Library According to Scheme 2 (with the Triple-Gap Scheme as an Example)

(100) (1) The DNA fragmentation, end repair A-tailing, adaptor ligation and single-stranded circularization steps were the same as those in Example 1.

(101) (2) Complementary strand synthesis

(102) New England Biolabs: Klenow Fragment (3′.fwdarw.5′ exo-), Catalog #: M0212S

(103) New England Biolabs: USER™ Enzyme, Catalog #: M5505S

(104) NEB buffer 4: 2 μl

(105) primer (UO-p1-2, 10 uM): 1 μl

(106) DNA: 15.8 μl

(107) 3 min at 95° C., followed by immediately standing on the ice for 3 min.

(108) After that, the following was added:

(109) 2.5 mM dNTP: 0.5 μl

(110) 100× BSA: 0.2 μl

(111) Klenow Fragment (3′.fwdarw.5′ exo-): 1 μl

(112) In total: 20 μl

(113) 30 min at 20° C., 20 min at 75° C.

(114) USER™ Enzyme: 1 μl

(115) 30 min at 37° C., 5 min at 50° C., followed by immediately placing on ice.

(116) The product was purified with Agencourt AMPure XP magnetic beads (Beckman Coulter, Inc).

(117) TABLE-US-00003 UO-p1-2: (SEQ ID NO: 4) 5′-AGCACGTACGACTGAUCT-3′

(118) The product can then be used for constructing next generation and third-generation sequencing libraries.

EXAMPLE 3

Construction of a Whole Genomic DNA library according to Scheme 3 (Double-Stranded Circularization, where the Adaptor Containing a Nicking Site)

(119) (1) DNA fragmentation (about 700 bp, fragmentation conditions: duty cycle: 5%, intensity: 3, cycles per burst: 200, time: 75 s), the end repair A-tailing and adaptor ligation were the same as those in Example 1, the adaptor sequence was UO-A2 annealed by the following two sequences:

(120) TABLE-US-00004 (SEQ ID NO: 5) 5′-AGCACGTACGACTGAUCT-3′ (SEQ ID NO: 6) 5′-pGATCAGTCGTACGTGCT-3′

(121) (2) End phosphorylation

(122) 44 μl DNA, 10U T4 PNK (T4 Polynucleotide Kinase, NEB, M0201S), 50 mM Tris-HCl pH 7.5, 10 mM MgCl2, 1 mM ATP, 10 mM DTT, 30 min at 37° C., and the product was purified by 1× Ampure XP magnetic beads.

(123) (3) Double-stranded circularization

(124) NEBNext® Quick Ligation Module (NEB, E6056S)

(125) DNA: 35 μl

(126) T4 quick ligase: 5 μl

(127) 5× ligase buffer: 10 μl

(128) 30 min at 20° C.

(129) The product was purified with 1× Ampure XP magnetic beads.

(130) (4) Enzymatic digestion

(131) Exonuclease I (E. coli): 1 μl

(132) Exonuclease III (E. coli): 1 μl

(133) USER™ Enzyme: 1 μl

(134) DNA: 42 μl

(135) NEB buffer 4: 5 μl

(136) 1 h at 37° C.

(137) The product was purified with MinElute Reaction Cleanup Kit.

(138) (5) Strand displacement synthesis

(139) New England Biolabs: Bst 2.0 WarmStart® DNA Polymerase, Catalog #: M0538S

(140) DNA: 16.5 μl

(141) Isothermal Amplification Buffer: 2 μl

(142) 2.5 mM dNTP: 0.5 μl

(143) Bst 2.0 WarmStart DNA Polymerase: 0.5 μl

(144) 60 min at 60° C.

(145) The product was purified with Agencourt AMPure XP magnetic beads (Beckman Coulter, Inc).

(146) The product can be used for constructing the first-, second- and third-generation sequencing libraries.

EXAMPLE 4

Construction of a Whole Genomic DNA Library According to Scheme 3 (Double-Stranded Circularization)

(147) (1) DNA fragmentation (about 700 bp, fragmentation conditions: duty cycle: 5%, intensity: 3, cycles per burst: 200, time: 75 s), end-repair A-tailing

(148) (2) 5′ dephosphorylation (NEB: M0289)

(149) DNA: 44 μl

(150) Antarctic Phosphatase: 1 μl

(151) Antarctic Phosphatase Reaction Buffer: 5 μl

(152) 60 min at 37° C., the product was purified with 1× Ampure XP magnetic beads.

(153) (3) Double-stranded circularization

(154) NEBNext® Quick Ligation Module (NEB, E6056S)

(155) DNA: 34 μl

(156) UO-A3: 1 μl

(157) T4 quick ligase: 5 μl

(158) 5× ligase buffer: 10 μl

(159) 30 min at 20° C.

(160) The product was purified with 1× Ampure XP magnetic beads.

(161) Adaptor sequence was UO-A3 annealed by the following two sequences:

(162) TABLE-US-00005 (SEQ ID NO: 7) 5′-pGATCAGTCGTACGTGCTTACTCTCAATAGCAGCTT-3′ (SEQ ID NO: 8) 5′-pAGCTGCTATTGAGAGTAAGCACGTACGACTGATCT-3′

(163) (4) Enzymatic digestion

(164) Exonuclease I (E. coli): 1 μl

(165) Exonuclease III (E. coli): 1 μl

(166) DNA: 43 μl

(167) NEB buffer 4: 5 μl

(168) 1 h at 37° C.

(169) The product was purified with MinElute Reaction Cleanup Kit.

(170) (5) Strand displacement synthesis

(171) New England Biolabs: Bst 2.0 WarmStart® DNA Polymerase, Catalog #: M0538S

(172) DNA: 16.5 μl

(173) Isothermal Amplification Buffer: 2 μl

(174) 2.5 mM dNTP: 0.5 μl

(175) Bst 2.0 WarmStart® DNA Polymerase: 0.5 μl

(176) 60 min at 60° C.,

(177) The product was purified with Agencourt AMPure XP magnetic beads (Beckman Coulter, Inc).

(178) The product can be used for constructing the first-, second- and third-generation sequencing libraries.

EXAMPLE 5

Construction of a Target Region Capture Library

(179) A library of human genomic DNA was constructed according to the method of Example 1, and the target regions of the PCR product were captured.

(180) Exon Probe Hybridization

(181) Exon probe hybridization was performed on the PCR product by using SureSelect Human All Exon Kits from Agilent in this experiment. Formulation of hybridization buffer:

(182) SureSelect Hyb #1 (orange cap, or bottle): 25 μl

(183) SureSelect Hyb #2 (red cap): 1 μl

(184) SureSelect Hyb #3 (yellow cap): 10 μl

(185) SureSelect Hyb #4 (black cap, or bottle): 13 μl

(186) In total: 49 μl, 5 min at 65° C.

(187) Formulation of capture library mixture:

(188) SureSelect Library: 5 μl

(189) SureSelect RNase Block (purple cap): 0.5 μl

(190) ddH2O: 1.5 μl

(191) In total: 7 μl, 2 min at 65° C.

(192) Formulation of sample mixture:

(193) Purified DNA (about 700 ng): 3.4 μl

(194) SureSelect Indexing Block #1 (green cap): 2.5 μl

(195) SureSelect Block #2 (blue cap): 2.5 μl

(196) SureSelect Indexing Block #3 (brown cap): 0.6 μl

(197) In total: 9 μl, 5 min at 95° C., held at 65° C.

(198) 13 μl of hybridization buffer prepared was added to the capture library mixture (7 μl), then the sample mixture (9 μl) was added to give a total volume of 29 μl, and hybridization was performed at 65° C. for 24 h.

(199) Hybridized fragments were captured by the magnetic beads (Invitrogen™: Dynabeads® M-280 Streptavidin, Catalog #: 11205D) (50 μl magnetic beads were washed three times with 200 μl SureSelect Binding Buffer, and then resuspended in 200 μl SureSelect Binding Buffer; the hybridized product was added, stood at room temperature for 30 min, and adsorbed onto the magnetic beads; the magnetic beads were washed once with SureSelect Wash 1, three times with SureSelect Wash 2, and then resuspended in 36.5 μl ddH.sup.2O), as detailed in the instruction manual of SureSelect Human All Exon Kits from Agilent.

(200) (7) PCR after probe hybridization

(201) Instruments and reagents:

(202) PCR machine: Eppendorf: Mastecycler pro s

(203) Agilent: Herculase II Fusion DNA Polymerases, Catalog #: 600677

(204) Beckman Coulter, Inc: Agencourt AMPure XP, Item No. A63880

(205) The reaction formula was as follows:

(206) Resuspended magnetic beads in exon probe hybridization: 36.5 μl

(207) MP PCR primer 1.0 (10 pmol): 1 μl

(208) MP PCR primer 2.0 (10 pmol): 1 μl

(209) 5× Herculase II Reaction Buffer: 10 μl

(210) dNTPs (100 mM; 25 mM each dNTP): 0.5 μl

(211) Herculase II Fusion DNA Polymerase: 1 μl

(212) In total: 50 μl.

(213) Cycling conditions for PCR amplification:

(214) pre-denaturation for 2 min at 98° C., amplification by 12 cycles of (30 s at 98° C., 30 s at 65° C., 30 s at 72° C.), 10 min at 72° C., cooling at 4° C.

(215) Primer sequences:

(216) TABLE-US-00006 MP PCR primer 1.0: (SEQ ID NO: 9) 5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGC TCTTCCGATCT-3′ MP PCR primer 2.0: (SEQ ID NO: 10) 5′-CAAGCAGAAGACGGCATACGAGAT-3′

(217) After PCR, the product was purified with Agencourt AMPure XP magnetic beads and the steps were summarized as follows. 1.8 times the volume of beads were added to the amplified product, the product was stood and adsorbed onto the magnetic beads for 5 minutes at room temperature. The supernatant was removed, the magnetic beads were washed twice with 70% alcohol, air dried and eluted with 16 μl ddH.sub.2O, as detailed in the instruction for the kit.

(218) The eluted DNA was the constructed human exon library which can be used for the sequencing by the next generation sequencing platform.

EXAMPLE 6

DNA Library Construction of free DNA in Peripheral Blood

(219) (1) Extraction of free DNA in peripheral blood and determination of fragment size

(220) Instruments and reagents:

(221) QIAGEN: QIAamp Circulating Nucleic Acid Kit, catalog #: 55114

(222) Agilent: 2100 bioanalyzer

(223) 2 ml of plasma was taken, plasma DNA (cell-free circulating DNA) was extracted with QIAamp Circulating Nucleic Acid Kit from QIAGEN, and eluted with 20 μl of ddH.sub.2O (as detailed in the instruction for the kit). The size distribution of the extracted fragments was tested with Agilent 2100 bioanalyzer. The results show that the sizes of the free DNA fragments in patients with liver cancer concentrate around 164 bp, the distribution range is about 110 bp-210 bp, the concentration is 4.78 ng/μl, and the total amount of DNA is about 100 ng.

(224) (2) End repair A-tailing was performed on peripheral blood DNA, the adaptor ligation, single-stranded circularization, complementary strand synthesis, strand displacement, and subsequent Illumina library construction were the same as those in Example 1.

EXAMPLE 7

Analysis of Sequencing Data of Phage Phix174 Library in Example 1

(225) Paired-end sequencing data of about 1 G (where read length is 2×125=250 bp) was obtained with hiseq 2500. The data analysis is as follows:

(226) 1. There are 1,410,463 reads in total, where the number of reads of correct structure is 631,353.

(227) 2. The size range of the target sequence is 30-107 bp with an average size of 91.87 bp, a standard deviation of 14.42 and a median of 94 bp.

(228) 3. Paired-End high-throughput sequencing is performed on the constructed library. The two target sequences from one paired sequencing read are compared with each other and inconsistent sequences are removed. Sequencing error rate refers to, in the consensus sequence, the proportion of the sites which are not the same as the reference sequence. The error rate of DNA in the tested data is calculated based on this principle. Assuming that there is no low-frequency mutation in the sample, the sequencing error rate of this method is 10.sup.−5. The distribution of sequencing errors on different bases (based on reference genome) is different. See Table 1.

(229) TABLE-US-00007 TABLE 1 Sequencing error rate of different bases measured by the method described. Sequencing error type Error rate A => C 1.85E−06 T => G 1.25E−06 A => G 6.56E−06 T => C 7.55E−06 A => T 3.59E−06 T => A 2.80E−06 C => A 3.11E−05 G => T 3.22E−05 C => G 9.94E−06 G => C 7.42E−06 C => T 1.67E−05 G => A 1.34E−05

(230) The calculation results show that the single base error rate (10 .sup.−5) of this method is much lower than the error rate (1%) of the next generation sequencing, and is far lower than the error rate of some existing improved methods. Therefore, the method almost completely eliminates the problem of high error rate of the next generation sequencing and realizes the precise sequencing of DNA molecules by means of the next generation sequencing platform.

(231) 4. Distribution of sequencing coverage

(232) Based on the sequencing results, the coverage of the detected sequence on the whole phix174 genome is analyzed. The result shows that the bias of amplification is effectively reduced by adopting the method provided in the present invention and the sequencing data achieves the effective and uniform coverage on the whole genome.

(233) If the starting template is fully evenly amplified, then the sequencing depth at any site in the genome should be equal to the average genome-wide sequencing depth, i.e., the ratio should be 1, and after taking the denary logarithm of the ratio, the result should be 0. If the starting template cannot be amplified uniformly, the sequencing depth at certain sites in the genome is evidently not equal to the average genome-wide sequencing depth, i.e., the ratio is greater than 1 or less than 1, and the logarithm of the ratio should be greater than or less than 0.

(234) With the libraries constructed in Chinese Patent Nos. 201310651462.5 and 201410448968.0, the logarithms of the ratios of the depth of sequencing at almost all sites to the average depth of genome-wide sequencing are seriously deviated from 0, in which the logarithms of the ratios concentrate below−1 for a majority of sites, and are greater than 0, or even up to 4 for a small portion of sites, which means that the replication multiple at some sites is tens to hundreds times the average genome-wide replication multiple. This is because the great bias of rolling circle replication during circular DNA amplification leads to high amplification at certain sites, and the presence of the sites of high amplification increases the average depth of the genome-wide sequencing, resulting in reduced ratio of the sequencing depth at the majority of sites to the average depth of the genome-wide sequencing. In the present invention, the logarithms of the ratios of the depth of sequencing at almost all sites to the average depth of genome-wide sequencing are uniformly distributed around 0. Even at the site of biggest bias, the ratio of the sequencing depth to the genome-wide sequencing depth is also less than 1. Uniform replication of the whole genome is achieved and the amplification product covers the whole genome better and more uniformly. In summary, by the technology provided in the present invention, the circular DNA molecules are effectively and uniformly amplified.

(235) Another advantage of this technology is that the accuracy of sequencing is independent of the depth of sequencing, unlike tagging methods can only determine the DNA sequence accurately at a very high depth of sequencing, this approach can achieve the accurate sequencing of large genomes (such as human genome, etc.).

(236) By using the method of the present invention, the molecular composition of DNA in cells can be accurately determined and the DNA composition in a normal or diseased (such as cancer tissue) cell population can be presented more realistically. In the detection of cancers, whether potential carcinogenic mutation has occurred to a tissue or organ of a normal individual can be detected to achieve the purpose of early diagnosis and prevention of cancers. In the field of cancer research, the method can be used for detecting the distribution of DNA mutations in populations with cancers; for finding potential small clonal populations in cancer tissues to truly understand the heterogeneous structure of tumors; for elucidating the role of mutations in the development of cancers; and for finding cancer stem cells and so on. In the treatment of cancers, the method can be used for finding cancer stem cell populations, and then specific drug targets for cancer stem cells are designed to achieve the effective treatment for cancers. For normal individuals, the method can be used for detecting DNA mutations in normal cells of the individuals to trace the phylogenetic linage of normal tissues; the method can also be used for calculating the number of DNA mutations in a tissue of individuals of different ages to estimate the rate of DNA mutation; and the method can be used for detecting whether there are mutations associated with various diseases in a normal individual, so as to prevent the occurrence of diseases.

(237) Also, the method is effective in the construction of libraries of free DNA in peripheral blood and in detection of low-frequency mutation sites in peripheral blood, such that the effective detection and evaluation of occurrence and development of cancers and harmful mutations to fetus in prenatal diagnosis can be achieved in a non-invasive detection way.

(238) Ancient human DNA sequencing is the main means to study human evolution, but there are many problems in ancient human DNA sequencing. Among them, the most serious problems are low amount of extracted ancient human DNA, severe degradation and severe microbial contamination. The method can be used for constructing libraries with a very small amount of DNA (single or double-stranded); and the constructed library can be used for capturing exon (after removing microbial contamination on the genome), thereby effectively solving the problems in the construction of ancient DNA library.

(239) Based on the present invention, a sequencing library construction kit is provided, which comprises an end repair A-tailing reagent, a DNA ligase, an adaptor sequence, a single-stranded circularization reagent, a second-strand synthesis reagent, a nicking enzyme, a strand replacement reagent, dNTPs (2.5 mM) and BSA (100×). The kit can specifically comprise the followings:

(240) End repair A-tailing agent: comprising 10× end repair A-tailing buffer (500 mM Tris-HCl, 100 mM MgCl.sub.2, 100 mM DTT, 10 mM ATP, 4 mM dATP, 4 mM dCTP, 4 mM dGTP, 4 mM dTTP, pH 7.5, 25° C.), T4 DNA Polymerase (3 U/μl), Klenow DNA Polymerase (0.5 U/μl), T4 Polynucleotide Kinase (10 U/μl, Thermophilic modified DNA polymerase (5 U/μl).

(241) DNA ligase: T4 DNA ligase (20 U/μl), 5×T4 DNA ligase buffer (250 mM Tris-HCl, 50 mM MgCl.sub.2, 5 mM ATP, 50 mM DTT, pH 7.5, 25° C.)

(242) Adaptor sequence:

(243) Y-shaped structure formed by annealing 5′-pGATCAGTCGTACGTGCTTACTCTCAATAGCAGCTT-3′ (SEQ ID NO: 1) and 5′-pGTGGGCAGTCGGTGAACGACTGAUCT-3′(SEQ ID NO: 2)

(244) Single-stranded circularization reagent: a single-stranded cyclase (100 U/μl), 50 mM MnCl.sub.2, 10× single-stranded cyclase buffer (0.33 M Tris-Acetate (pH 7.5), 0.66 M potassium acetate and 5 mm DTT

(245) Second-strand synthesis reagent: DNA Polymerase I (E. coli) (10 U/μl) 10× Buffer: (500 mM NaCl, 100 mM, Tris-HCl, 100 mM MgCl.sub.2, 10 mM DTT, pH 7.9, @25° C.

(246) Nicking enzyme: Uracil DNA glycosylase (UDG) (1 U/μl), DNA glycosylase-lyase Endonuclease VIII (1 U/μl)

(247) Strand displacement reagent: Bst DNA polymerase large fragment (8 U/μl), 10× Bst DNA polymerase buffer (200 mM Tris-HCl, 100 mM (NH4).sub.2SO.sub.4, 100 mM KCl, 20 mM MgSO.sub.4, 1% Triton® X-100, pH 8.8, @25° C.)

(248) Although specific embodiments of the present invention have been described in detail, those skilled in the art will understand that various modifications and replacements can be made to those details based on all the teachings disclosed, which are all contemplated in the scope of the present invention as defined the appended claims and any equivalents thereof.