METHOD TO CONSTRUCT WHOLE-GENOME HIGH-THROUGHPUT SEQUENCING LIBRARY AND TEST KIT THEREOF

Abstract

The present disclosure relates to a method for constructing a whole genome high-throughput sequencing library comprising the following steps: (1) extracting a sample gDNA; (2) fragmenting said sample gDNA by enzyme cleavage, filling ends of the gDNA and adding A base to the gDNA fragments to obtain an A-added gDNA; (3) connecting the A-added gDNA with a linker combination to obtain a connected produce, said linker combination comprises two parts: a Y-shaped reverse linker and a high GC clamp linker; (4) purifying said connected product to obtain a purified product; and (5) screening the fragment of said purified product to obtain a sequencing library. The present disclosure also relates to a kit for constructing a whole genome high-throughput sequencing library.

Claims

1. A method for constructing a whole genome high-throughput sequencing library, comprising the steps of: 1) extracting a sample gDNA; 2) fragmenting said sample gDNA by enzyme cleavage, filling ends of the gDNA and adding A base to the gDNA fragments to obtain an A-added gDNA; 3) connecting the A-added gDNA with a linker combination to obtain a connected product, said linker combination comprises two parts: a Y-shaped reverse linker and a high GC clamp linker; 4) purifying said connected product to obtain a purified product; 5) screening the fragment of said purified product to obtain a sequencing library.

2. The method according to claim 1, characterized in that said Y-shaped reverse linker sequence is reverse complementary to a normal Y-shaped linker sequence and has the following sequence: TABLE-US-00033 5′pCAAGCAGAAGACGGCATACGAGATNNNNNNNGTGACTGGAGTTCAGA CGTGTGCTCTTCCGAT*C*T 3′ 5′pGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCG CCGTATCATT3′ wherein N represents random degenerate base A/T/C/G, * represents thio-modification and p represents phosphorylation modification.

3. The method according to claim 1, characterized in that said Y-shaped reverse linker is annealed to form the structure of: TABLE-US-00034 5′ -CAAGCAGAAGACGGCATACGAGATNNNNNNNGTGACTGGAGTTCAG ACGTGTGCTCTTCCGATCT-3′ CGAGAAGGCTAG-5′ 3′ -TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTG′.

4. The method according to claim 1, characterized in that said high GC clamp linker is formed by annealing two sequences: one sequence is a GC clamp sequence, which is 5-50 bp in length; the other sequence contains two parts, one part is reverse complementary to the GC clamp sequence and the other part is reverse complementary to the sequence at the P7 end of the Y-shaped reverse linker.

5. The method according to claim 4, characterized in that said GC clamp sequences are as follows. TABLE-US-00035 Sequence 1: 5′ TCGACTGCGTG3′ Sequence 2: 5′ CGTATGCCGTCTTCTGCTTGCACGCAGTC3′ wherein the 5′ end of sequence 1, the 5′ end and the 3′ end of sequence 2 are end closed.

6. The method according to claim 5, characterized in said high GC clamp linker is annealed to form the structure of: TABLE-US-00036 5′-TCGACTGCGTG-3′ 3′-CTGACGCACGTTCGTCTTCTGLCGTATGC-5′.

7. The method according to claim 1, characterized in that (a) said two parts of said linker combination are annealed and connected together by the principle of base complementarity during the connecting step 3) and then connected to the gDNA fragment in step 2) to form the final library; or (b) said method is applicable to a sequencing platform employing patterned flow-through technology; or (c) said sample is selected from the group consisting of a cell line, peripheral blood, cord blood, amniotic fluid, chorion, placenta, umbilical cord, saliva and pharyngeal swab.

8. (canceled)

9. (canceled)

10. A kit for constructing a whole genome high-throughput sequencing library, characterized in that it comprises the following components: reagents required for fragmenting a sample gDNA, filling ends of the gDNA and adding A base, including enzymes and buffers required for fragmenting, filling ends of the gDNA and adding A base; connecting reagents, including ligase, ligation buffer and a linker combination required for the connecting step, said linker combination comprises two parts: a Y-shaped reverse linker and a high GC clamp linker; and reagents and devices required for purifying a connected product to obtain a purified product, and for screening a fragment of the purified product to obtain a sequencing library.

11. The kit according to claim 10, characterized in that said Y-shaped reverse linker sequence is reverse-complementary to a normal Y-shaped linker sequence and has the following sequence: TABLE-US-00037 5′pCAAGCAGAAGACGGCATACGAGATNNNNNNNGTGACTGGAGTTCAGA CGTGTGCTCTTCCGAT*C*T 3′ 5′pGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCG CCGTATCATT3′ where N represents random degenerate base A/T/C/G, * represents thio-modification and p represents phosphorylation modification.

12. The kit according to claim 10, characterized in that the Y-shaped reverse linker is annealed to form the following structure: TABLE-US-00038 5′ -CAAGCAGAAGACGGCATACGAGATNNNNNNNGTGACTGGAGTTCAG ACGTGTGCTCTTCCGATCT-3′ CGAGAAGGCTAG-5′ 3′ -TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTG′.

13. The kit according to claim 10, characterized in that said high GC clamp linker is formed by annealing two sequences: one sequence is a GC clamp sequence which is 5-50 bp in length; the other sequence contains two parts, one part is reverse complementary to the GC clamp sequence and the other part is reverse complementary to the sequence at the P7 end of the Y-shaped reverse linker.

14. The kit according to claim 10, characterized in that said GC clamp sequences are as follows: TABLE-US-00039 Sequence 1: 5′ TCGACTGCGTG3′ Sequence 2: 5′ CGTATGCCGTCTTCTGCTTGCACGCAGTC3′, the 5′ end of sequence 1, the 5′ end and the 3′ end of sequence 2 are end closed.

15. The kit according to claim 10, characterized in that said high GC clamp linker is annealed to form the structure of: TABLE-US-00040 5′-CGCTGCGTG-3′ 3′- CTGACGCACGTTCGTCTTCTGCCGTATGC-5′.

16. The kit according to claim 10, characterized in that (a) said two parts of said linker combination are annealed and connected together by the principle of base complementarity during the connecting step and then connected to said gDNA fragment to form the final library; or (b) said kit is applicable to a sequencing platform employing patterned flow-through technology; or (c) said sample is selected from the group consisting of a cell line, peripheral blood, cord blood, amniotic fluid, chorion, placenta, umbilical cord, saliva and pharyngeal swab.

17. (canceled)

18. (canceled)

19. A Y-shaped reverse linker, characterized in that said Y-shaped reverse linker sequence is inversely complementary to a normal Y-shaped linker sequence.

20. The Y-shaped reverse linker according to claim 19, characterized in that said Y-shaped reverse linker has the following sequence: TABLE-US-00041 5′pCAAGCAGAAGACGGCATACGAGATNNNNNNNGTGACTGGAGTTCAGA CGTGTGCTCTTCCGAT*C*T 3′ 5′pGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCG CCGTATCATT3′ where N represents random degenerate base A/T/C/G, * represents thio-modification and p represents phosphorylation modification.

21. The Y-shaped reverse linker according to claim 19, characterized in that said Y-shaped reverse linker is annealed to form the following structure: TABLE-US-00042 5′ -CAAGCAGAAGACGGGCATACGAGATNNNNNNNGTGACTGGAGTTCA GACGTGTGCTCTTCCGATCT-3′ CGAGAAGGCTAG-5′ 3′ -TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTG′.

22. A high GC clamp linker, characterized in that said high GC clamp linker is formed by annealing two sequences: one sequence is a GC clamp sequence, which is 5-50 bp in length; the other sequence contains two parts, one part is reverse-complementary to the GC clamp sequence and the other part is reverse-complementary to the sequence at the P7 end of the Y-shaped reverse linker.

23. The high GC clamp linker according to claim 22, characterized in that said GC linker sequences are as follows: TABLE-US-00043 Sequence 1: 5′ TCGACTGCGTG3′ Sequence 2: 5′ CGTATGCCGTCTTCTGCTTGCACGCAGTC3′, the 5′ end of sequence 1, the 5′ end and the 3′ end of sequence 2 are end closed.

24. The high GC clamp linker according to claim 23, characterized in that said high GC clamp linker is annealed to form the structure of: TABLE-US-00044 5′-TCGACTGCGTG-3′ 3′- CTGACGCACGTTCGTCTTCTGCCGTATGC-5′.

25. The Y-shaped reverse linker according to claim 19, further comprises a high GC clamp linker, characterized in that said high GC clamp linker is formed by annealing two sequences: one sequence is a GC clamp sequence, which is 5-50 bp in length; the other sequence contains two parts, one part is reverse-complementary to the GC clamp sequence and the other part is reverse-complementary to the sequence at the P7 end of the Y-shaped reverse linker.

Description

DESCRIPTION OF THE FIGURES

[0062] FIG. 1 shows a flowchart of a method for constructing a whole genome high-throughput sequencing library according to the present disclosure.

[0063] FIG. 2 shows the structure of a normal Y-shaped linker (TruSeq linker), consisting of P7, P5 sequences, index sequences and sequencing primer sequences (R1 SP and R2 SP).

[0064] FIG. 3 shows the structure of a Y-shaped reverse linker according to the present disclosure, consisting of the reverse complementary P7, P5 sequences, index sequences and sequencing primer sequences (R1 SP and R2 SP).

[0065] FIG. 4 shows a novel clamp linker combination according to the present disclosure, consisting of a Y-shaped reverse complementary linker and a high GC clamp linker.

[0066] FIG. 5 shows a whole genome high-throughput sequencing library structure according to the present disclosure, wherein a fragmented, end-filled and A-added gDNA is connected to a novel linker combination to form a second generation sequencing library.

[0067] FIG. 6 shows the distribution of insert fragments of DNA libraries with different input amounts in accordance with example 3 of the present disclosure.

[0068] FIG. 7 shows the distribution of sequencing depths and densities of DNA libraries with different input amounts in accordance with example 3 of the present disclosure.

EXAMPLES

[0069] The present disclosure will be described in detail below with reference to the drawings and in conjunction with examples.

[0070] The specific sequence of the common Y-shaped linker (TruSeqlinker) used in the following example is as follows.

TABLE-US-00013 5′ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCT ACACGACGCTCTTCCGAT*C*T3′ 5′ pGATCGGAAGAGCACGTCTGAACTCCAGTCACNNNNNNNNATCTCGT ATGCCGTCTTCTGCTTG 3′

[0071] where N represents randomly degenerate bases A/T/C/G and * represents thio-modification, and p represents phosphorylation modification.

Example 1

[0072] The standard cell line NA12878 genomic DNA was used to construct PCR-free libraries by using a normal Y-shaped linker (TruSeq linker), a Y-shaped reverse linker according to the disclosure, and a novel linker combination according to the disclosure (Y-shaped reverse linker+high GC clamp linker), respectively, and the PCR library was used as a control. The sequencing data of the PCR-free library constructed by the three different linkers and the PCR library were compared by sequencing and data analysis.

[0073] NA12878 gDNA as the sample was used in the example, and PCR-free whole genome high-throughput sequencing libraries were constructed using the normal Y-shaped linker, the Y-shaped reverse linker according to the disclosure, and the Y-shaped reverse linker+high GC clamp linker according to the disclosure, respectively. The libraries were subjected to 150PE double-end sequencing on NovaSeq, and the sequencing results were analyzed using bioinformatics. The following was the specific protocol.

[0074] Step 1: A reaction mixture as shown in Table 1 was prepared. 3 tubes were prepared for subsequent connection of 3 different linkers. Then, the reaction procedures of fragmentation, end-filling and addition of A base as shown in Table 2 were run together.

TABLE-US-00014 TABLE 1 Component Volume NA12878 gDNA 300 ng X μl WGS reactive enzyme f 5 μl WGS buffer f 2.5 μl Sterile H.sub.2O 17.5-X μl Total volume 25 μl

TABLE-US-00015 TABLE 2 Reaction Reaction temperature time 4° C. 1 min 32° C. 6 min 65° C. 30 min 4° C. ∞

[0075] Hot cap temperature: 70° C., volume: 25 μl.

[0076] Step 2: Each reaction component needed for connecting as shown in Table 3 for 1, 2 and 3 was added to the reaction solutions of fragmentation, end-filling and addition of A base of Step 1 respectively, and the connecting procedure was run as shown in Table 4.

TABLE-US-00016 TABLE 3 Component 1 2 3 Previous reaction system 25 μl 25 μl 25 μl WGS connecting solution 10 μl 10 μl 10 μl WGS ligase 5 μl 5 μl 5 μl WGS normal Y-shaped linker (3 μM) 5 μl — — WGS Y-shaped reverse linker (3 μM) — 5 μl 5 μl WGS high GC clamp linker (3 μM) — — 5 μl Sterile H.sub.2O 5 μl 5 μl — Total volume 50 μl 50 μl 50 μl

TABLE-US-00017 TABLE 4 Reaction Reaction temperature time 20° C. 15 min 4° C. ∞

[0077] Step 3: The amplification products were purified using the High-throughput Sequencing Library Construction DNA Purification Kit (magnetic bead method).

[0078] Step 4: The purified libraries were screened for fragments using the High-throughput Sequencing Library Construction DNA Purification Kit (magnetic bead method), wherein 0.49× beads were added for binding, the beads were discarded, the supernatant was removed, and 0.15× beads were further added for binding, washing and elution.

[0079] Step 5: The screened libraries were purified for qPCR quantification.

[0080] Step 6: Based on the qPCR quantification results, the libraries were sequenced by NovaSeq 150PE double-end sequencing according to the standard operating procedures of the sequencer.

[0081] Step 7: The sequencing results were subjected to basic statistics and performance analysis, and the basic statistics are shown in Table 5.

TABLE-US-00018 TABLE 5 Type Y-shaped normal Y- Y-shaped reverse linker + shaped reverse high GC linker linker clamp linker Sample Name Control-PCR RWGSNA RWGSNA RWGSNA NA12878 12878A2 12878R2A 12878GC11rep2 Number of data 722.6 722.6 722.6 722.6 amount sequences: M Comparison 94.54 94.06 94.24 95.50 ratio % Redundancy % 15.08 28.81 18.12 4.94 Average 29.88 21.94 26.01 30.04 sequencing depth Coverage % 98.10 98.11 98.18 98.20 10× coverage % 97.09 96.90 97.31 97.44 20× coverage % 92.68 65.85 85.83 92.55 Insert fragment 370 286 316 309 size Note: The first column NA12878 in Table 5 is the PCR process control, and the samples and volumes are consistent with this example, and the library was constructed using the PCR amplification method as a control. The analysis results showed that the redundancy of the library constructed according to the novel linker combination of the present disclosure (4.78%) was significantly lower than those of the other libraries (14.69%, 27.6%, and 18.25%) under the same amount of data. Because its redundancy was significantly lower than those of the other three libraries, its average sequencing depth was deeper and the 20× coverage was higher, which was comparable to the performance of the library constructed by the PCR process. It indicates that the use of the novel linker combination according to the present disclosure effectively reduces the redundancy of the PCR-free library and the overall quality control data performs optimally.

TABLE-US-00019 TABLE 6 SNP INDEL CNV Repeat Accuracy Sensitivity Accuracy Sensitivity Accuracy Sensitivity Consistency Type Sample % % % % % % % Control- NA12878 99.54 98.96 94.90 94.3 51.1 96.6 78.1 PCR Normal RWGSNA12878A2 99.28 98.86 96.86 95.45 49.45 91.06 93.75 Y- shaped linker Y- RWGSNA12878R2A 99.37 99.13 97.76 97.33 48.07 91.38 87.50 shaped reverse linker Y- RWGSNA12878GC11rep2 99.27 99.22 97.37 97.49 51.30 97.10 96.90 shaped reverse linker + high GC clamp linker

[0082] The results of the performance analysis are shown in Table 6. The accuracy of the novel linker combination of SNPs was comparable to the performance of other libraries and PCR controls, and the sensitivity of the novel linker combination was comparable to the results of Y-shaped reverse linkers and slightly higher than that of normal Y-shaped linker libraries and PCR libraries. As for the accuracy and sensitivity of INDEL, the novel linker combination according to the present disclosure was comparable to the performance of Y-shaped reverse linkers and both were higher than that of normal Y-shaped reverse linkers library and PCR controls. The accuracy of CNV of the novel linker combination library according to the disclosure was comparable to that of the PCR library, higher than those of the normal Y-shaped linker library and the Y-shaped reverse linker library, and the sensitivity was also higher than those of the other three libraries. The concordance rate of repeat was also significantly higher than those of the other three libraries. The results showed that the overall data performance of the libraries constructed with the novel linker combination was better than those of the other three linker libraries in the performance analysis.

Example 2

[0083] The human genomic DNA was used to construct PCR-free libraries using normal Y-shaped linker (TrueSeq linker), Y-shaped reverse linker, and the combination of normal Y-shaped linker and high GC clamp linker and the novely linker combination of the present disclosure, respectively. The libraries were sequenced together with the phix library. The number of phix sequences measured in the library was analyzed, and the index hopping ratio was calculated. Meanwhile, the redundancy under the same amount of data was compared

[0084] The principle for testing the hopping ratio using phix is: the phix library insert fragment is derived from viral genomic DNA. Its gene sequences are known precisely and the GC ratio is about 40, which is close to the GC ratio of the human genome. Its gene sequence is far from the human gene sequence and does not contain an index. Therefore, the phix library is sequenced together with the library to be tested, and the number of phix sequences split in the library is analyzed. The ratio of phix sequences to the total number of sequences in the library is calculated as the hopping ratio. The following is the specific protocol.

[0085] Step 1: PCR-Free library was constructed with the four linkers respectively.

[0086] Step 2: Based on the qPCR quantification results, the phix library and the PCR-free library constructed with four different linkers were sequenced together in 150PE double-end sequencing according to the sequencer standard operation protocol.

[0087] Step 3: The sequencing results were compared with the human genome reference sequence and phix gene sequence, and the number of sequences aligned to the human genome reference sequence and the number of sequences aligned to the phix gene sequence were counted. The statistical results are shown in Table 7 below.

TABLE-US-00020 TABLE 7 Number of Phix Total sequences number of Number of measured Phix Sample library in the Hopping sequences name sequences library ratio Redundancy Remarks 44687582 RDWGS-304 28774642 993 0.0035% 9.80% Normal Y- shaped linker RDWGS-347R 40284997 473 0.0012% 7.93% Y-shaped reverse linker RDWGS-305GC 19890566 2261 0.0114% 5.44% Normal Y- shaped linker + high GC clamp linker RDWGS-348RGC 35998134 1161 0.0032% 3.09% Y-shaped reverse linker + high GC clamp linker

[0088] The results showed that the hopping ratio of the Y-shaped linker was lower than that of the normal Y-shaped linker library. Meanwhile, the hopping ratio of the PCR-free library constructed with the Y-shaped linker+high GC clamp linker was lower than that of the normal Y-shaped linker+high GC clamp linker. Regardless of the addition of high GC clamp linker or not, PCR libraries using Y-shaped reverse linkers showed a lower index hopping ratio, indicating that the specific structure of Y-shaped reverse linkers can effectively reduce the index hopping ratio. Whether combined with the normal Y-shaped linker or the Y-shaped reverse linker, the high GC clamp linker can effectively reduce the redundancy.

Example 3

[0089] The library was constructed using different template input amounts, sequenced, and the data was analyzed to compare the sequencing data quality and the performance analysis results of different input amounts.

[0090] The novel linker combination described according to the present disclosure was used in the example to construct whole genome high-throughput sequencing libraries with input amounts of 200 ng and 300 ng respectively, using NA12878 genomic DNA as the sample. The libraries were sequenced by 150PE double-end sequencing on NovaSeq, and the sequencing results were analyzed using bioinformatics to analyze the library quality of libraries constructed with different linkers. The following is the specific scheme.

[0091] Step 1: A reaction mixture was prepared as shown in Table 8. Four tubes were prepared, two with 200 ng DNA input amount and the other two with 300 ng DNA input amount. Then, the reaction procedures of fragmentation, end-filling and addition of A base were run together as shown in Table 9.

TABLE-US-00021 TABLE 8 Component 200 ng 300 ng NA12878 gDNA X μl Y μl WGS Reactive Enzyme f 5 μl 5 μl WGS buffer f 2.5 μl 2.5 μl Sterile H.sub.2O 17.5-X μl 17.5-Y μl Total volume 25 μl 25 μl

TABLE-US-00022 TABLE 9 Reaction Reaction temperature time 4° C. 1 min 32° C. 6 min 65° C. 30 min 4° C. ∞

[0092] Hot cap temperature: 70° C., volume: 25 μl.

[0093] Step 2: Each reaction component needed for connecting as shown in Table 10 was added to the reaction solutions of fragmentation, end-filling and addition of A base of step 1 respectively, and the connecting procedure was run as shown in Table 11.

TABLE-US-00023 TABLE 10 Component Volume Previous reaction system 25 μl WGS connecting solution 10 μl WGS ligase 5 μl WGS Y-shaped reverse linker (3 μM) 5 μl WGS high GC clamp linker (3 μM) 5 μl Total volume 50 μl

TABLE-US-00024 TABLE 11 Reaction Reaction temperature time 20° C. 15 min 4° C. ∞

[0094] Hot cap temperature: off; volume: 50 μl

[0095] Step 3: The amplification products were purified using the High-throughput Sequencing Library Construction DNA Purification Kit (magnetic bead method).

[0096] Step 4: The purified libraries were screened for fragments using the High-throughput Sequencing Library Construction DNA Purification Kit (magnetic bead method), wherein the screening conditions are: after binding with 0.49× beads, the beads were discarded, the supernatant was removed, and 0.15× beads were further added for binding, washing and elution.

[0097] Step 5: The screened libraries were purified for qPCR quantification.

[0098] Step 6: Based on the qPCR quantification results, the libraries were sequenced by NovaSeq 150PE double-end sequencing according to the standard operating procedures of the sequencer.

[0099] Step 7: The sequencing results are subjected to basic statistics as well as performance analysis, and the basic statistics are shown in Table 12.

TABLE-US-00025 TABLE 12 Type 200 ng 300 ng Sample Name NA12878-1 NA12878-3 NA12878-5 NA12878-7 Number of data amount sequences: M 799.3 799.2 799.4 799.4 Comparison ratio % 99.50 99.47 99.69 99.70 Redundancy % 5.91 5.18 7.53 7.29 Average sequencing depth 33.59 34.29 33.65 33.36 Coverage % 99.22 99.22 99.22 99.22 4× Coverage % 99.11 99.11 99.12 99.12 20× Coverage % 96.62 96.97 96.74 96.54 Insert fragment size 290 300 303 292

[0100] The analysis results show that the quality of the library with 200 ng input amount is comparable to that of the library with 300 ng input amount in terms of basic statistics. FIG. 6 and FIG. 7 show the insert fragment distribution and depth and density distribution, respectively. There is no significant difference in the insert fragment size and depth and density distribution of the library. In the insert fragment distribution, the horizontal coordinate is the fragment size (bp) and the vertical coordinate is the count, which reflects the size distribution of DNA fragments in the library. In the depth and density distribution, the horizontal coordinate is sequencing depth and the vertical coordinate is the count, which reflects the uniformity of sequencing. The narrower the peak, the closer the sequencing depth at each position, i.e., the more uniform the data coverage across the genome, it will be more beneficial for detection of the mutation and CNV.

TABLE-US-00026 TABLE 13 SNP INDEL CNV Repeat Accuracy Sensitivity Accuracy Sensitivity Accuracy Sensitivity Consistency Type Sample % % % % % % % 200 12878-1 99.30 99.18 97.12 96.75 24.52 96.90 90.63 ng 12878-3 99.32 99.20 97.23 96.91 17.45 96.90 93.75 300 12878-5 99.33 99.22 97.51 97.47 53.61 97.39 96.88 ng 12878-7 99.20 99.20 97.31 97.24 53.57 97.39 93.75

[0101] The results of the performance analysis are shown in Table 13. The sensitivity and accuracy data of the two different input amounts of SNP & INDEL performed comparably, the sensitivity of CNV also performed comparably, and the performance of repeat results was also basically comparable. However, the accuracy of 300 ng input amount of CNV was significantly higher than that of 200 ng.

Example 4

[0102] The whole genome high-throughput sequencing library was constructed, sequenced, and the data amounts of 15×, 30× and 40× sequencing depths were intercepted. Data analysis, basic statistics and performance analysis were performed to compare the data performance under different sequencing depths.

[0103] The novel linker combination described according to the present disclosure was used in the example to construct whole genome high-throughput sequencing libraries with input amount of 300 ng, using NA12878 genomic DNA as the sample. The libraries were sequenced by 150PE double-end sequencing on NovaSeq, and the sequencing results were analyzed using bioinformatics to analyze the library quality of libraries constructed with different linkers. The following is the specific scheme.

[0104] Step 1: A reaction mixture was prepared as shown in Table 14. Then, the reaction procedures of fragmentation, end-filling and addition of A base as shown in Table 2 were run.

TABLE-US-00027 TABLE 14 Component Volume NA12878 gDNA X μl WGS Reactive Enzyme f 5 μl WGS buffer f 2.5 μl Sterile H.sub.2O 17.5-X μl Total volume 25 μl

TABLE-US-00028 TABLE 15 Reaction Reaction temperature time 4° C. 1 min 32° C. 6 min 65° C. 30 min 4° C. ∞

[0105] Hot cap temperature: 70° C., volume: 25 μl

[0106] Step 2: Each reaction component needed for connecting as shown in Table 16 was added to the reaction solutions of fragmentation, end-filling and addition of A base of step 1 respectively, and the connecting procedure was run as shown in Table 17.

TABLE-US-00029 TABLE 16 Component Volume Previous reaction system 25 μl WGS connecting solution 10 μl WGS ligase 5 μl WGS Y-shaped reverse linker (3 μM) 5 μl WGS high GC clamp linker (3 μM) 5 μl Total volume 50 μl

TABLE-US-00030 TABLE 17 Reaction Reaction temperature time 20° C. 15 min 4° C. ∞

[0107] Hot cap temperature: off; volume: 50 μl

[0108] Step 3: The amplification products were purified using the High-throughput Sequencing Library Construction DNA Purification Kit (magnetic bead method).

[0109] Step 4: The purified libraries were screened for fragments using the High-throughput Sequencing Library Construction DNA Purification Kit (magnetic bead method), wherein the screening conditions are: 0.49× beads were added for binding, the supernatant was taken, and 0.15× beads were further added for binding, washing and elution.

[0110] Step 5: The screened libraries were purified for qPCR quantification.

[0111] Step 6: Based on the qPCR quantification results, the libraries were subjected to NovaSeq 150PE double-end sequencing according to the standard operating procedures of the sequencer.

[0112] Step 7: The sequencing results were theoretically calculated and the data amounts were intercepted from the beginning of 15×, 30× and 40× data respectively for basic statistical analysis. (Note: there will be some deviation between the theoretically calculated data amounts and the actual intercepted data amounts. According to the theoretical calculation, 15×, 30×, and 40× data amounts are intercepted, but the actual intercepted data amounts are 17×, 33×, and 38×, respectively). The results are shown in Table 18.

TABLE-US-00031 TABLE 18 Depth 17× 33× 38× Sample 5 6 7 8 5 6 7 8 5 6 7 8 Number of 400 400 400 400 799 799 799 799 933 933 933 933 data amount sequence: M Comparison 99.70 99.71 99.72 99.68 99.69 99.70 99.70 99.68 99.68 99.69 99.70 99.67 rate % Average 16.44 16.92 16.37 33.65 32.48 33.36 32.44 39.10 37.74 38.77 37.75 16.44 sequencing depth Coverage % 99.18 99.18 99.18 99.18 99.22 99.22 99.22 99.22 99.23 99.23 99.23 99.23 4 × 99.0 99.0 99.0 98.9 99.1 99.1 99.1 99.1 99.1 99.1 99.1 99.1 coverage % 20 × 23.4 19.9 23.0 19.9 96.7 95.8 96.5 95.5 98.2 98.0 98.2 97.9 coverage % Redundancy 6.5 6.1 5.9 5.7 7.5 7.2 7.3 6.5 8.0 7.7 7.8 6.9 % Insert size 303 276 292 271 303 276 292 271 303 276 292 271

[0113] The analysis results show that the coverage of 20× increases with the increase of sequencing depth. When the sequencing depth is 17×, the coverage of 20× is only 20-23%; and the other QC points data amount, average sequencing depth and redundancy will be slightly improved with the increase of sequencing depth. Overall, the basic statistics of 17× are poor, while the basic statistics of 33× and 38× are comparable.

TABLE-US-00032 SNP INDEL CNV Repeat Accuracy Sensitivity Accuracy Sensitivity Accuracy Sensitivity Consistency Type Sample % % % % % % % 17× NA12878-5 99.15 97.18 94.39 90.09 60.4 90.9 87.5 NA12878-6 98.99 96.93 94.12 89.43 60.5 88.2 87.5 NA12878-7 99.11 97.23 94.41 90.22 60.6 91.8 71.9 NA12878-8 98.96 96.9 94.08 89.3 61.8 91.7 87.5 33× NA12878-5 99.33 99.22 97.51 97.47 53.6 97.4 96.9 NA12878-6 99.22 99.21 97.30 97.23 54.3 96.6 90.6 NA12878-7 99.30 99.23 97.45 97.42 53.6 97.4 90.6 NA12878-8 99.20 99.20 97.31 97.24 55.0 96.4 93.8 38× NA12878-5 99.34 99.3 99.82 95.24 53.0 97.4 96.9 NA12878-6 99.25 99.29 99.79 94.99 53.8 96.6 90.6 NA12878-7 99.33 99.3 99.82 94.66 53.1 97.7 90.6 NA12878-8 99.23 99.29 99.79 94.87 54.4 97.2 93.8

[0114] The results of the performance analysis are shown in Table 19. The accuracy and sensitivity of SNPs performed comparably at different sequencing depths, the accuracy and sensitivity of INDEL were lower at 17× than at 33×, while performed comparably at 33× and 38×. The accuracy of CNV was higher at 17× than at 33×, and performed comparably at 33× and 38×. The sensitivity of CNV was lower at 17× than at 33×, and performed comparably at 33× and 38×. The consistency of repeat was lower at 17× than at 33×, and performed comparably at 33× and 38×. Overall, the performance analysis at 17× was a little worse than at 33× and could not meet the analysis demand, while the performance analysis at 38× was basically comparable to that at 33×. Considering the sequencing results as well as the cost, the sequencing depth of 33× is the optimal sequencing depth.

METHOD TO CONSTRUCT WHOLE-GENOME HIGH-THROUGHPUT SEQUENCING LIBRARY AND TEST KIT THEREOF

Inventors

Cpc classification

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2521/301

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1093

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/191

CHEMISTRY; METALLURGY

Classification Explorer

C40B50/06

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/191

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6869

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2521/301

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/11

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/10

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6869

CHEMISTRY; METALLURGY

Classification Explorer

C40B50/06

CHEMISTRY; METALLURGY

Abstract

Claims

Description