Methods and reagents for enhanced next generation sequencing library conversion and incorporation of molecular barcodes into targeted and random nucleic acid sequences
20210017572 ยท 2021-01-21
Assignee
Inventors
Cpc classification
C12N15/1037
CHEMISTRY; METALLURGY
C12N15/1068
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
C40B40/06
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
International classification
Abstract
Novel engineered compositions, reagents, and methods are described that facilitate NGS analysis of both random and specific nucleic acid sequences in a sample by providing high efficiency target enrichment and improved error suppression. Specifically, the design and use of engineered NGS dual adapter molecules with homology regions (HRs) targeted at the 5 and 3 regions of a double-stranded DNA target, unique molecule identifiers (UMIs), and NGS adapters allows target libraries to be created for subsequent NGS analysis.
Claims
1. An engineered nucleic acid molecule selected from the group consisting of: 1A. a linear double-stranded DNA (dsDNA) Dual Adapter molecule configured to prepare a sample nucleic acid for targeted next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a first homology region (HR1) that comprises at least about a 10 bp first target homology sequence that is substantially homologous to a region located at or 5 to a predetermined region within a first target DNA, cDNA, and/or RNA molecule, wherein HR1 optionally comprises a modified nucleotide at or in proximity to the 5 or 3 terminal nucleotide, wherein the modified nucleotide comprises an affinity tag to facilitate separation, wherein optionally the affinity tag is a biotin molecule or a hapten; b. a second homology region (HR2) that comprises at least about a 10 bp second target homology sequence that is substantially homologous to a region located at or 3 to a predetermined region within the first target DNA, cDNA, and/or RNA molecule, wherein HR2 optionally comprises a modified nucleotide at or in proximity to the 5 or 3 terminal nucleotide, wherein the modified nucleotide comprises an affinity tag to facilitate separation, wherein optionally the affinity tag is a biotin molecule or a hapten; c. a first unique molecular identifier (UMI1) disposed 5 of, and optionally flanking, HR1 that comprises at least 1 bp and a random nucleic acid sequence; d. a second unique molecular identifier (UMI2) disposed 3 of, and optionally flanking, HR2 that comprises at least 1 bp and a random nucleic acid sequence; e. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; f. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; g. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; h. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and i. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1B. a linear dsDNA Dual Adapter molecule configured to prepare a sample nucleic acid for next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a first 5 or 3-protruding single-stranded homology region (HR1) that comprises at least about a 10 nucleotide first target homology sequence that is substantially homologous to a region located at or 5 to a predetermined region within a first target DNA, cDNA, and/or RNA molecule, wherein HR1 optionally comprises a modified nucleotide at or in proximity to the 5 or 3 terminal nucleotide, wherein the modified nucleotide comprises an affinity tag to facilitate separation, wherein optionally the affinity tag is a biotin molecule or a hapten; b. a second 5 or 3 -protruding single-stranded homology region (HR2) that comprises at least about a 10 nucleotide second target homology sequence that is substantially homologous to a region at or located 3 to a predetermined region within the first target DNA, cDNA, and/or RNA molecule, wherein HR2 optionally comprises a modified nucleotide at or in proximity to the 5 or 3 terminal nucleotide, wherein the modified nucleotide comprises an affinity tag to facilitate separation, wherein optionally the affinity tag is a biotin molecule or a hapten; c. a first unique molecular identifier (UMI1) disposed 5 of, and optionally flanking, HR1 that comprises at least 1 bp and a random nucleic acid sequence; d. a second unique molecular identifier (UMI2) disposed 3 of, and optionally flanking, HR2 that comprises at least 1 bp and a random nucleic acid sequence; e. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; f. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; g. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; h. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and i. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1C. a circular dsDNA Dual Adapter molecule composed of sample nucleic acid for next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a first target insertion region that corresponds to a region located at or 5 to a predetermined region within a first target DNA, cDNA, and/or RNA molecule; b. a second target insertion region that corresponds to a region located at or 3 to a predetermined region within the first target DNA, cDNA, and/or RNA molecule; c. the predetermined region of the first target nucleic acid, wherein the first and second target insertion regions flank the 5- and 3-termini of the predetermined region, respectively; d. a first unique molecular identifier (UMI1) disposed 5 of the first target insertion region and comprises at least 1 bp and a random nucleic acid sequence; e. a second unique molecular identifier (UMI2) disposed 3 of the second target insertion region and comprises at least 1 bp and a random nucleic acid sequence; f. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; g. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; h. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; i. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and j. optionally, at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1D. a dsDNA Dual Adapter molecule configured to prepare a sample nucleic acid for next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a first topoisomerase leaving group (TLG1) located at the 3 end of the molecule containing a nicking endonuclease recognition sequence, optionally a nicking enzyme sequence for an enzyme selected from the group consisting of Nt.BsmAI, Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.AlwI, Nb.BbvCI, Nt.BbvCI, Nb.BsmI, and Nb.BssSI; b. a second topoisomerase leaving group (TLG2) located at the 5 end of the molecule containing a nicking endonuclease recognition sequence, optionally the nicking enzyme sequence for an enzyme selected from the group consisting of Nt.BsmAI, Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.AlwI, Nb.BbvCI, Nt.BbvCI, Nb.BsmI, and Nb.BssSI; c. a first topoisomerase enzyme binding site (TOPO1) at or near TLG1; d. a second topoisomerase enzyme binding site (TOPO2) at or near TLG2; e. a first unique molecular identifier (UMI1) disposed 5 of the TOPO1 site that comprises at least 1 bp and a random nucleic acid sequence; f. a second unique molecular identifier (UMI2) disposed 3 of the TOPO2 site that comprises at least 1 bp and a random nucleic acid sequence; g. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; h. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; i. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; j. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and k. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1E. a circular dsDNA Dual Adapter molecule that comprises sample nucleic acid for next generation sequencing (NGS), the Dual Adapter molecule also comprising: a. a target nucleic acid molecule that is fragmented genomic DNA, cDNA, or an amplicon, optionally a PCR amplicon; b. a first topoisomerase enzyme binding site (TOPO1) 5 of the sample nucleic acid; c. a second topoisomerase enzyme binding site (TOPO2) 3 of the sample nucleic acid; d. a first unique molecular identifier (UMI1) disposed 5 of the TOPO1 site that comprises at least 1 bp and a random nucleic acid sequence; e. a second unique molecular identifier (UMI2) disposed 3 of the TOPO2 site that comprises at least 1 bp and a random nucleic acid sequence; f. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; g. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; h. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; i. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1F. a topoisomerase-charged double-stranded DNA (dsDNA) Dual Adapter molecule, configured to prepare a sample nucleic acid library for next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a first topoisomerase enzyme covalently bound at or near a 3 overhang or blunt first end (TOPO1) and the first end further comprising a 3 thymine overhang or blunt end; b. a second topoisomerase enzyme covalently bound at or near a 3 overhang or blunt second end (TOPO2) and the second end further comprising a 3 thymine overhang or blunt end; c. a first unique molecular identifier (UMI1) disposed 5 of, and optionally flanking, TOPO1 that comprises at least 1 bp and a random nucleic acid sequence; d. a second unique molecular identifier (UMI2) disposed 3 of, and optionally flanking, TOPO2 that comprises at least 1 bp and a random nucleic acid sequence; e. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; f. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second primer bind sites comprise different nucleotide sequences and bind different primers; g. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; h. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and i. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1G. a dsDNA Dual Adapter molecule composed of sample nucleic acid for next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a nucleic acid consisting of fragmented genomic DNA, cDNA, or an amplicon, optionally a PCR amplicon; b. a first topoisomerase enzyme binding site (TOPO1) 5 of the sample nucleic acid; c. a second topoisomerase enzyme binding site (TOPO2) 3 of the sample nucleic acid; d. a first unique molecular identifier (UMI1) disposed 5 of the TOPO1 site that comprises at least 1 bp and a random nucleic acid sequence; e. a second unique molecular identifier (UMI2) disposed 3 of the TOPO2 site that comprises at least 1 bp and a random nucleic acid sequence; f. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; g. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; h. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; i. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and j. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1H. a double-stranded DNA (dsDNA) Dual Adapter molecule configured to prepare a sample nucleic acid for targeted next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a first unique molecular identifier (UMI1) that comprises at least 1 bp and a random nucleic acid sequence disposed 5 of, and optionally flanking, a 3 T-overhang, an A-overhang, a CG overhang, a blunt end, or any other ligatable nucleic acid sequence, b. a second unique molecular identifier (UMI2) that comprises at least 1 bp and a random nucleic acid sequence disposed 3 of, and optionally flanking, a 3 T-overhang, an A-overhang, a CG overhang, a blunt end, or any other ligatable nucleic acid sequence, c. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; d. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; e. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; f. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and g. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1J. a circular dsDNA Dual Adapter molecule composed of sample nucleic acid for next generation sequencing (NGS), the Dual Adapter molecule comprising: a. sample nucleic acid; b. a first unique molecular identifier (UMI1) disposed 5 of the sample nucleic acid and comprises at least 1 bp and a random nucleic acid sequence; c. a second unique molecular identifier (UMI2) disposed 3 of the sample nucleic acid and comprises at least 1 bp and a random nucleic acid sequence; d. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; e. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; f. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; g. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and h. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1K. a single-stranded nucleic acid probe configured to prepare to prepare a sample nucleic acid for targeted next generation sequencing (NGS), the probe comprising: a. uracil; b. a first homology region (HR1) that comprises at least about a 100 bp homology sequence, optionally at least about 200 bp homology sequence, that is substantially homologous to a predetermined region within a target DNA, cDNA, and/or RNA molecule; c. a first unique molecular identifier (UMI1) disposed 5 of, and optionally flanking, HR1 that comprises at least 2 bp and a random nucleic acid sequence; d. a second unique molecular identifier (UMI2) disposed 3 of, and optionally flanking, HR1 that comprises at least 2 bp and a random nucleic acid sequence; e. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; and f. a modified nucleotide at the 5 or 3 end that facilitates binding to a molecule or particle that is capable of separating the probes from a hybridization mixture; 1L. double-stranded nucleic acid probe/target complex composed of a sample nucleic acid for targeted next generation sequencing (NGS), the probe/target complex comprising: a. a first homology region (HR1) that comprises at least about a 100 bp homology sequence, optionally at least about 200 bp homology sequence, that is hybridized to a target sequence or sequence with substantial homology to a predetermined target region within a target DNA, cDNA, and/or RNA molecule; b. a first double-stranded unique molecular identifier (UMI1) disposed 5 of, and optionally flanking, HR1 that comprises at least 2 bp and a random nucleic acid sequence; c. a second double-stranded unique molecular identifier (UMI2) disposed 3 of, and optionally flanking, HR1 that comprises at least 2 bp and a random nucleic acid sequence; d. a first double-stranded Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; and e. a second double-stranded Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; 1M. a double-stranded DNA (dsDNA) Dual Adapter molecule, configured to prepare a sample nucleic acid library for next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a first homology region (HR1) that comprises at least about a 10 bp first target homology sequence that is substantially homologous to a predetermined region within a first target DNA, cDNA, and/or RNA molecule; b. a first unique molecular identifier (UMI1) disposed 5 of, and optionally flanking, HR1 that comprises at least 1 bp and a random nucleic acid sequence; c. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; d. a second Adapter sequence (AS2), wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; e. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; f. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and g. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1N. a double-stranded DNA (dsDNA) Dual Adapter molecule, configured to prepare a sample nucleic acid library for next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a 5 or 3-protruding homology region (HR1) that comprises at least about a 10 nucleotide first target homology sequence that is substantially homologous to a region located 3 or 5 to a predetermined region within a first target DNA, cDNA, and/or RNA molecule; b. a first unique molecular identifier (UMI1) disposed 5 of, and optionally flanking, HR1 that comprises at least 1 bp and a random nucleic acid sequence; c. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; d. a second Adapter sequence (AS2), wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; e. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; f. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and g. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1O. a double-stranded DNA (dsDNA) Dual Adapter molecule composed of a sample nucleic acid for next generation sequencing (NGS), the Dual Adapter molecule comprising: a. a target insertion region that corresponds to a region located in a predetermined region within a target DNA, cDNA, and/or RNA molecule; b. an unknown insertion region fused to the known targeted sequence; c. a first unique molecular identifier (UMI1) disposed 5 of the target insertion region and comprises at least 1 bp and a random nucleic acid sequence; d. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; e. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, the unknown insertion region fused to the known target sequence, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers; f. optionally a first tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 5 of, and optionally flanking AS1; g. optionally a second tandem restriction or nicking enzyme sequence positioned such that it forms a stem-loop or hairpin structure and can be cleaved or nicked by the corresponding enzyme when a single strand is generated by a DNA polymerase during isothermal amplification or rolling circle amplification and disposed 3 of, and optionally flanking AS2; and h. optionally at least one or more of the following, (i) a gene encoding a selectable marker, optionally an antibiotic resistance marker, and (ii) an origin of replication; 1P. an adapter pair, optionally a Y-shaped DNA adapters, configured to prepare a nucleic acid sample for next generation sequencing (NGS), the adapter pairs comprising: a. a first homology region (HR1) that comprises at least about a 10 bp first target homology sequence that is substantially homologous to a predetermined region within a first target DNA, cDNA, and/or RNA molecule; b. a first unique molecular identifier (UMI1) disposed 5 of, and optionally flanking, HR1 that comprises at least 1 bp and a random nucleic acid sequence c. a second homology region (HR2) that comprises at least about a 10 bp first target homology sequence that is substantially homologous to a predetermined region within a first target DNA, cDNA, and/or RNA molecule;; d. a second unique molecular identifier (UMI2) disposed 3 of, and optionally flanking, HR2 that comprises at least 1 bp and a random nucleic acid sequence; and e. Adapter sequences that facilitate PCR amplification and sequencing on an NGS sequencer; 1P. dsDNA Adapter molecule composed of sample nucleic acid for next generation sequencing (NGS), the adapter molecule comprising: a. a first target insertion region that corresponds to a region located at or 5 to a predetermined region within a first target DNA, cDNA, and/or RNA molecule; b. a second target insertion region that corresponds to a region located at or 3 to a predetermined region within the first target DNA, cDNA, and/or RNA molecule; c. the predetermined region of the first target nucleic acid, wherein the first and second target insertion regions flank the 5- and 3-termini of the predetermined region, respectively; d. a first unique molecular identifier (UMI1) disposed 5 of the first target insertion region and comprises at least 1 bp and a random nucleic acid sequence; e. a second unique molecular identifier (UMI2) disposed 3 of the second target insertion region and comprises at least 1 bp and a random nucleic acid sequence; f. a first Adapter sequence (AS1) disposed 5 of, and optionally flanking, UMI1; g. a second Adapter sequence (AS2) disposed 3 of, and optionally flanking, UMI2, wherein the first and second Adapter sequences comprise different nucleotide sequences and bind different primers;
2. A library comprising a plurality of different engineered nucleic acid molecule species each comprised of a different linear Dual Adapter molecule species of claim 1, wherein each Dual Adapter molecule species comprises a different nucleic acid, and wherein optionally each Dual Adapter molecule species further comprises an HR1:HR2 pair that comprises a first and second target homology sequences that differ from the first and second target homology sequences of the other HR1:HR2 pairs, wherein optionally the HR1:HR2 pair of each Dual Adapter molecule species targets a different predetermined region within a population of DNA, cDNA, and/or RNA molecules and unique random UMI1 and UMI2 sequences and UMI1-UMI2 pairs.
3. An engineered nucleic acid molecule of claim 1 that is a Dual Adapter molecule, wherein single-stranded regions at the ends of the Dual Adapter molecules are generated using an enzyme with exonuclease activity, optionally in the HR1 and HR2 regions.
4. An engineered nucleic acid molecule of claim 1 that is a Dual Adapter molecule, wherein the HR1 and HR2 pair targets a nucleic acid region of interest selected from the group consisting of an exon, intron, cDNA, RNA, and an amplicon, optionally a PCR amplicon.
5. A reaction mixture selected from the group consisting of: 6A. a reaction mixture comprising engineered nucleic acid molecules of claim 1 that are Dual Adapter molecules that comprise single-stranded ends, a nucleic acid sample that comprises a population of target nucleic acids (optionally DNA, cDNA and/or RNA molecules) that comprise single-stranded ends a DNA polymerase activity and a DNA ligase activity, wherein the reaction mixture optionally is incubated under conditions that facilitate hybridization between single-stranded complementary regions of the Dual Adapter molecules and target nucleic acids, removal of noncomplementary single-stranded ends, fill in of internal single-stranded regions by the DNA polymerase activity, and formation of phosphodiester bonds between the 5-phosphate and 3-hydroxyl of two adjacent DNA strands by the DNA ligase activity, optionally Taq DNA ligase; 6B. a reaction mixture comprising engineered nucleic acid molecules of claim 2 that are Dual Adapter molecules, an exonuclease activity, a DNA polymerase activity, and a DNA ligase activity and a population of target DNA, cDNA, and/or RNA molecules, wherein the reaction mixture optionally is incubated under conditions that facilitate exonuclease binding of dsDNA and removal of nucleotides to generate single-stranded regions in the target nucleic acid molecules and Dual Adapter molecules, optionally in the HR1 and HR2 regions of the dsDNA molecules, wherein the reaction mixture is incubated under conditions that facilitate hybridization of single-stranded complementary regions between the Dual Adapter molecules and sample nucleic acid, removal of noncomplementary single-stranded ends, fill in of internal single-stranded regions by DNA polymerase and formation of phosphodiester bonds between the 5-phosphate and the 3-hydroxyl of two adjacent DNA strands by the DNA ligase activity, optionally Taq DNA ligase; 6C. a reaction mixture comprising engineered nucleic acid molecules of claim 1B that are Dual Adapter molecules, a DNA polymerase activity, and a DNA ligase activity, and a population of target nucleic acids (optionally DNA, cDNA, and/or RNA molecules), wherein the reaction mixture is incubated under conditions that facilitate hybridization between single-stranded complementary regions of the Dual Adapter molecules and target nucleic acids, removal of noncomplementary single-stranded ends, fill in of internal single-stranded regions by the DNA polymerase activity, and formation of phosphodiester bonds between the 5-phosphate and 3-hydroxyl of two adjacent DNA strands by the DNA ligase activity, optionally Taq DNA ligase; 6D. a reaction mixture comprising engineered nucleic acid molecules of claim 1C, amplification primers, optionally PCR primers, adapted to hybridize to AS1 and AS2, and reagents to amplify the nucleic acid region bounded by AS1 and AS2 to produce amplification products adapted for next generation sequencing; 6E. a reaction mixture comprising Dual Adapter UMI library according to claim 1 combined with fragmented double-stranded sample genomic DNA, cDNA, or amplicon, optionally a PCR amplicon containing a T-overhang, an A-overhang, a CG overhang, a blunt end, or any other ligatable nucleic acid sequence and a ligase activity, wherein the reaction mixture is incubated under conditions that facilitate ligation of the sample nucleic acid to the Dual Adapter UMI library; 6F. a reaction mixture comprising Dual Adapter molecule library of claim 1, amplification primers, optionally PCR primers, adapted to hybridize to AS1 and AS2, and reagents to amplify the nucleic acid region bounded by AS1 and AS2 to produce amplification products adapted for next generation sequencing optionally PCR primers are used that bind partial sequence of AS1 and AS2 for a limited number of cycles, wherein the PCR products are split into two reactions and amplified using a first pair of primers for reaction 1 that links AS1 to sequence read 1 and AS2 to sequence read 2, and a second pair of primers for reaction 2 that links AS1 to sequence read 2 and AS2 to sequence read 1, wherein the reaction mixture optionally contains Uracil-DNA-Glycosylase (UDG).
6. A reaction mixture of claim 5 further comprising an exonuclease activity and population of target nucleic acids (optionally DNA, cDNA and/or RNA molecules), wherein single-stranded regions at the ends of the molecules are generated in the target nucleic acid.
7. A library comprising a plurality of engineered nucleic acid molecules of claim 1 and different target nucleic acid molecules.
8. A recombinant microorganism that comprises an engineered nucleic acid molecule species of claim 1.
9. A population of microorganisms that have been transformed with the library of claim 7, optionally archiving the population of microorganisms that have been transformed with the library for storage, optionally as a glycerol stock stored at reduced temperature, optionally at 20 to 80 degrees Celsius or colder.
10. A nucleic acid amplification reaction, comprising exposing a reaction mixture of claim 5 to amplification conditions to produce amplification products, optionally denaturing the amplification products into sense and antisense strands, optionally separating the sense and antisense strands, and then optionally determining the nucleotide sequence of the separated sense and/or antisense strand(s), wherein optionally the sense strands are selectively captured and amplified on a flowcell, bead, or other surface coated with oligonucleotides that comprise an antisense sequence to the primer region of the sense strand, wherein sequence data optionally is generated only from molecules that are descendants of sense strand molecules, wherein optionally the antisense strands are selectively captured and amplified on a flowcell, bead, or other surface coated with oligonucleotides that comprise a sense sequence to the primer region of the antisense sequence, wherein sequence data optionally is generated only from molecules that are descendants of antisense strand molecules.
11. A sequence analysis method for a target nucleic acid cloned into an engineered nucleic acid molecule of claim 1, which method comprises generating four datasets per target sequence to identify errors, the method comprising: a. generating a forward/sense single strand consensus sequence (SSCS) dataset for each target sequence cloned in the forward direction and sense strand captured by grouping all sequences with the same UMI1 and UMI2 sequence pairs and discounting variants that are not present in at least about 50%, optionally at least about 90%, of the reads; b. generating a forward/antisense SSCS dataset for each target sequence cloned in the forward direction and antisense strand captured by grouping all sequences with the same UMI1 and UMI2 sequence pairs and discounting variants that are not present in at least about 50%, optionally at least about 90%, of the reads; c. generating a reverse/sense SSCS dataset for each target sequence cloned in the reverse direction and sense strand captured by grouping all sequences with the same UMI1 and UMI2 sequence pairs and discounting variants that are not present in at least about 50%, optionally at least about 90%, of the reads; d. generating a reverse/antisense SSCS dataset for each target sequence cloned in the reverse direction and antisense strand captured by grouping all sequences with the same UMI1 and UMI2 sequence pairs and discounting variants that are not present in at least about 50%, optionally at least about 90%, of the reads; e. generating a duplex consensus sequence (DCS) by grouping SSCS datasets according to complementing UMI pairs from the sense and antisense datasets and discounting variants that are only found in one dataset; and f. generating a quadplex consensus sequence (QCS) by comparing DCS datasets from the same target in the forward and reverse directions and discounting variants not observed in both the forward and reverse DCSs.
12. A Dual Adapter UMI library comprising a plurality of different engineered nucleic acid molecule species each according to claim 1, wherein each Dual Adapter species comprises unique UMI1 and UMI2 sequences and UMI1-UMI2 pairs, wherein optionally the nicking endonuclease recognition sequences of the Dual Adapters are nicked by a sequence-specific single strand nicking endonuclease, optionally a sequence-specific single strand nicking endonuclease selected from the group consisting of Nt.BsmAI, Nt.BspQI, Nt.CviPII, Nt.BstNBI, Nb.BsrDI, Nb.BtsI, Nt.AlwI, Nb.BbvCI, Nt.BbvCI, Nb.BsmI, and Nb.BssSI, and wherein the library optionally retains TLG1 and TLG2.
13. The library of claim 12 wherein the nicked and purified Dual Adapter library is combined with A-tailed or blunt fragmented double-stranded sample genomic DNA, cDNA, or amplicon, optionally a PCR amplicon, and sequence-specific topoisomerase enzyme, optionally vaccinia topoisomerase I, comprising: a. a topoisomerase enzyme recognizes and covalently binds to the TOPO1 site (5-CCCTT-3) at the last T base, causing TLG1 to dissociate and trap topoisomerase to the 3 end a Dual Adapter UMI library molecule; and b. a topoisomerase enzyme recognizes and covalently binds to the TOPO2 site (5-CCCTT-3) at the last T base, causing TLG2 to dissociate and trap topoisomerase to the 5 end a Dual Adapter UMI library molecule; and c. a bound topoisomerase enzyme covalently attaches individual A-tailed or blunt fragmented sample nucleic acid (genomic DNA, cDNA, RNA, or amplicon, optionally a PCR amplicon), to individual Dual Adapter UMI library molecules, whereby the 5 end of the sample nucleic acid attaches to the 3 end of the Dual Adapter UMI library molecule and the 3 end of the sample nucleic acid attaches to the 5 end of the Dual Adapter UMI library molecule, forming a circular dsDNA molecule; wherein the method optionally further comprises and a ligase enzyme.
14. A probe-based target enrichment method that captures both strands of the Dual Adapter molecule library of claim 2, wherein the target sequences to be enriched from the library comprise a sequence which lies in a target region of interest, wherein the method comprises: a) providing: i) one or more nucleoprotein filaments, wherein the nucleoprotein filament comprises a single-stranded invasion probe, wherein the invasion probe has a region of substantial complementarity to one strand of a double-stranded target sequence; and ii) one or more recombinase enzymes, optionally Rec A; b) forming a complex between the invasion probe and a complementary portion of the target sequence wherein complex formation is mediated by the recombinase(s); and c) separating the complexes from the remaining sequencing library, thereby enriching the target sequences and providing a target enriched sequence library.
15. A kit selected from the group consisting of: 15A. a kit for target enrichment, comprising (i) a library comprising a plurality of different engineered nucleic acid molecule species of Dual Adapter molecules of claim 1 with either single-stranded or double-stranded regions at the ends of the molecules, (ii) an enzyme with exonuclease activity, (iii) an enzyme with DNA polymerase activity, (iv) an enzyme with DNA ligase activity, (v) a crowding agent, (vi) optionally, hybridization and wash buffers, and optionally one or more of (vii) streptavidin beads and (viii) instructions for use; 15B. a kit comprising a Dual Adapter UMI library of claim 15A and one or more of the following: a. instructions for use; b. a topoisomerase enzyme; c. a DNA ligase enzyme; d. one or more nicking enzymes; e. one or more restriction enzymes; f. primers that bind to AS1 and/or AS2; g. biotinylated primers that bind to AS1 and/or AS2; h. one or more recombinase enzymes, and optionally a non-hydrolyzable co-factor for the recombinase enzyme(s); i. a plurality of different invasion probes that differ in their region of complementarity to a target region of interest; j. a solid support suitable for capturing synaptic complexes formed between the invasion probes and target sequences; k. a strand-displacing DNA polymerase; and l. buffer suitable for isothermal amplification; and m. optionally at least one or more of the following, (i) an enzyme to introduce negative supercoiling, optionally DNA Gyrase, and (ii) a chemical to introduce negative supercoiling, optionally ethidium bromide; 15C. a kit comprising: a. a single-stranded probe library; b. one or more enzymes with exonuclease activity; c. one or more enzymes with DNA polymerase activity; d. a DNA ligase, optionally Taq DNA ligase; e. a uracil-DNA glycosylase (UDG); f. a plurality of primers; g. PCR reagents; h. streptavidin beads; i. hybridization buffers; and j. wash buffer; 15D. a kit for target enrichment, comprising: a. a library of probes with HR Adapters; b. an enzyme with exonuclease activity; c. a DNA polymerase; d. a DNA ligase, optionally Taq DNA ligase; e. a buffer, optionally a PEG buffer; and f. optionally, hybridization and wash buffers, and streptavidin beads 15E. A kit to identify gene fusions, comprising: a. a library of probes with HR domains with homology regions complementary to several gene targets, optionally fusion Dual Adapter probes; b. a reverse transcriptase; c. an RNAseH; d. an enzyme with exonuclease activity; e. a polynucleotide kinase; f. a DNA polymerase; g. a DNA ligase; and h. a buffer, optionally a PEG buffer; 15F. a kit for using Dual Adapter probes or HR Adapters on droplet-based or microfluidics device, comprising: a. individual Dual Adapter probes or HR Adapter pairs encapsulated into droplets or other formats compatible with droplet-based or microfluidics devices; and b. enzymes and buffers required for homology-based dual adapter and HR adapter attachment encapsulated into droplets or other formats compatible with droplet-based or microfluidics devices; and 15G. a kit for archiving NGS libraries; comprising: a. Dual Adapter probes with an antibiotic resistance gene, origin of replication, and HR domains that are complementary to NGS adapter sequences; b. an enzyme with exonuclease activity; c. a DNA polymerase; d. a DNA ligase, optionally a Taq DNA ligase; e. a buffer, optionally a PEG buffer; and f. E. coli cells suitable for transformation; and g. optionally, reagents for performing recombinase-mediated target enrichment, optionally RecA and/or UvsX protein, ATP--S, DNA Gyrase, Proteinase K, streptavidin beads, and buffers.
16. A kit of claim 15 further comprising reagents to prepare DNA molecules ready for loading onto an NGS flowcell.
17. A complete system for nucleic acid analysis, comprising: a. a according to claim 15; b. optionally, a droplet-based or microfluidics device; c. a sequencing instrument, optionally an NGS sequencer; and d. a bioinformatics pipeline for data analysis.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
DETAILED DESCRIPTION OF INVENTION
[0054] The present inventions describe methods to improve unique molecular identifiers (UMI) synthesis and incorporation into NGS libraries, with some methods also facilitating simultaneous target enrichment. The first invention combines engineered NGS adapter attachment and target capture into one step by using a double-stranded molecule containing two probe sequences (flanked by UMIs and NGS adapter sequences) that hybridize to the 5 and 3 regions of a target sequence. Upon hybridization, the target sequence is covalently attached to the probe sequences, forming a circular double-stranded DNA molecule (
[0055] In one embodiment of the current invention, these seamless cloning technologies are leveraged, but rather than assemble multiple linear DNA fragments or a single target gene into a vector, the present invention describes the creation of engineered dsDNA Dual Adapter molecules that capture target sequences (i.e. exons, hotspot regions in genes, intronic regions, etc.) and simultaneously attach UMI and NGS adapter sequences. The captured nucleic acids are then amplified using primers that anneal to the NGS adapter sequences using PCR or isothermal amplification. The amplified products are then loaded directly onto an NGS sequencer. Dual Adapter Probe (DAP) is a novel technology that combines target-enrichment and NGS library preparation into one step. It increases efficiency of double-stranded UMI incorporation as hybridization of 20-25 bp complementary regions between DAP and target molecules is a more efficient process than ligation-based T/A attachment of adapters. DAPs also capture both strands of a target and can be designed to capture targets in both orientations. This feature is used in a novel error suppression strategy that should reduce artifacts and increase overall accuracy compared to current methods.
[0056] DAPs have several functional groups. First, DAPs contain a region of homology sequence (Homology Region 1=HR1) located at or 5 (or upstream) to the desired target DNA and a second region of homology (Homology Region 2=HR2) located at or 3 (or downstream) to the desired target DNA. The HR1 and HR2 regions are flanked by UMIs, which consist of at least 2 base pairs of random nucleic acid sequence and NGS adapter sequences 1 and 2 (AS1 and AS2) that bind primers for existing NGS platforms. For example, NGS adapter sequences containing partial P5 and P7 sequences used for Illumina NGS platforms (Note: primers used to amplify the captured DNA contain the full length P5 and P7 sequences, which allow attachment and amplification on the flowcell). DAP molecules may also include all components necessary to propagate in bacteria (i.e. E. coli) or other microorganism under antibiotic selection (i.e. Ampicillin) or other selection methods (
[0057] The workflow for target enrichment using DAPs is similar to methods used for seamless cloning. First, the DAP molecules are treated with exonuclease or DNA polymerase (without nucleotides) to generate single-stranded regions. For example, T5 Exonuclease binds to dsDNA and removes nucleotides in a 5 to 3 direction, leaving single-stranded 3 regions of DNA. Likewise, the sample DNA is treated with exonuclease or DNA polymerase (without nucleotides) to generate single-stranded regions. Next, the DNA molecules with single-stranded ends are incubated under conditions that facilitate hybridization of the complementary single-stranded regions (
[0058] The next step involves either PCR or isothermal amplification. For PCR amplification, target DNA is amplified using primers that bind AS1 and AS2. The PCR products amplified from the DAP molecules contain all sequences necessary for NGS and can be loaded directly onto the sequencer. All reads will be directional since target enrichment is facilitated by different homologous regions targeting the region of interest. Probes can also be designed to capture regions of interest in both orientations. For isothermal amplification, RCA or RPA may be used. For RCA, only the AS1 primer is used and only this single-stranded molecule is loaded on the sequencer. Data can be generated from the forward and reverse strand if the target is captured in both orientations. RPA is similar to PCR and amplifies both strands, but does not require temperature cycling and generates product at low temperature.
[0059] In another embodiment, DAPs can be designed to target only one region of a nucleic acid molecule by including only one HR sequence in the probe. The HR1 target region is captured through hybridization and the other end of the target molecule is ligated to the probe through blunt end ligation. This strategy is ideal for detecting gene fusions in cases where only one of the fusion partners is known. The known fusion partner is captured by hybridization of the HR1 probe sequence. The other end of the nucleic acid molecule (DNA, cDNA, RNA, etc.) is linked to the probe through ligation. These modified DAP molecules are referred to as Fusion DAP and include only one HR and UMI sequence (
[0060] A key aspect of the invention is the synthesis method of the DAP molecules, which allow efficient synthesis of double-stranded UMI sequences using PCR. DAP molecules are synthesized using a PCR template molecule consisting of partial or complete NGS adapter sequences located at the 5 and 3 ends of a linear dsDNA molecule. For example, partial or complete P5 and P7 sequences. The 3 primer is designed to anneal to AS1 and includes both the UMI1 and HR1 sequences as a 5 tag. Likewise, the 5 primer is designed to anneal to AS2 and includes both the UMI2 and HR2 sequences as a 5 tag. The PCR reaction amplifies the entire molecule, including the reverse complement of the HR and UMI sequences.
[0061] The invention also consists of a sequencing strategy and sequencing instrument that generates data from both the sense and antisense strands independently. In current NGS platforms (Illumina and Ion Torrent), only one strand is captured on a flowcell or bead by hybridization to an oligonucleotide that is complementary to adapter sequences on the isolated DNA. This strategy entails a sequencing platform with two separate oligonucleotides with unique sequence; one is complementary to the sense strand of the adapter and the other is complementary to the antisense strand of the adapter so that both the antisense and sense molecules are captured and sequenced. The incorporation of UMIs combined with sequencing both strands of the target in both orientations allows for unprecedented error detection. First, UMI pairs are used to generate single strand circular consensus sequences (SSCS) for data from both the sense and antisense strands. This is done by grouping all sequences flanked by the same UMIs and eliminating variants not present in a majority of the sequence. For example, variants that are not present in at least 90% of all sequences with the same UMI pair are removed. This process is conducted for both the sense and antisense datasets. Next, a double strand consensus sequence (DSCS) can be generated by comparing the SSCSs of the sense and antisense datasets. This can be accomplished because the UMI sequence of the antisense dataset is the complementary sequence to the sense dataset. This strategy may increase error detection as errors generated during bridge amplification and emPCR will readily be identified since each molecule will undergo these processes independently. Further error reduction can be performed by comparing datasets from the forward and reverse consensus sequences for targets that were captured in both the forward and reverse orientations (Quadplex consensus sequence
[0062] In another embodiment of the invention, which is referred to as Topoisomerase Dual Adapter technology (TOPO-DAT), topoisomerase is used to attach sample nucleic acid to the dual adapter molecule. TOPO-DAT molecules do not have HR1 and HR2 groups and can be used for capturing random double-stranded nucleic acid (i.e. DNA, cDNA, etc.) similar to standard NGS library preparation protocols. The advantage of TOPO-DAT is similar to standard DAP technology in that 1) adapter attachment efficiency is enhanced and 2) efficient incorporation of double-stranded UMIs into the final NGS library.
[0063] When topoisomerase protein binds to dsDNA at the sequence (C/T)CCTT, it cleaves the adjacent phosphodiester bond by forming a stable covalent adduct with a tyrosyl residue (Tyr-274) and the 3 phosphate of the last thymine in the consensus sequence in a reversible reaction. Lack of a phosphodiester bond on the non-scissile strand in close proximity to the cleaved scissile strand (and close to the end of the DNA molecule) allows topoisomerase protein to be trapped onto the DNA as the small fragment of dsDNA dissociates, removing the substrate for the reverse reaction. When exposed to sample dsDNA, topoisomerase catalyzes the joining of the two molecules if the 5 ends of the sample DNA are hydroxylated and singled-stranded overhangs (if any) are complementary [23-25].
[0064] Accordingly, TOPO-DAT molecules are engineered to contain the topoisomerase binding sites CCCTT at close proximity (10-20 bp) to each end of the dsDNA molecule. In addition, a single strand nicking enzyme sequence is incorporated into the 10-20 bp region in the complement strand and positioned to generate a single thymine base overhang when the TOPO-DAT molecule is exposed to a single strand nicking enzyme followed by topoisomerase in a sequential manner [31]. The resulting topoisomerase charged dual adapter molecule can be linked to sample dsDNA that has an overhanging adenine base (i.e. A-tailed), yielding a library flanked with UMIs and NGS adapter sequences (
[0065] In some embodiments, the resulting UMI-tagged library can be subject to probe-base capture protocols, including recombinase-mediated target enrichment, such as RecA mediated target enrichment [21-22]. The RecA protein is required for DNA repair and homologous recombination in E. coli and binds ssDNA strongly in long clusters to form a nucleofilament. In the presence of ATP, it can simultaneously bind ssDNA and dsDNA and catalyze a DNA synapsis reaction between the dsDNA and a ssDNA molecule that has a complementary sequence forming a triplex. These triple-stranded structures are referred to as displacement loops, or D-loops, and are an intermediate structure that undergoes further recombination in vivo (32-35). RecA protein can mediate both homologous pairing and/or strand exchange between appropriate DNA molecules in in vitro homologous recombination assays [36-37]. The strand exchange reaction can be blocked in vitro by ATP--S, a non-hydrolyzable form of ATP. The resulting triple-stranded hybrid structures are stable when RecA is removed from D-loops formed with closed supercoiled circular dsDNA [35, 38]. This feature can be exploited for target enrichment when NGS libraries are generated using Dual Adapter molecules that form circular DNA molecules (
[0066] In addition to DNA-DNA hybridization, RecA protein can promote RNA-DNA hybridization. For example, RecA protein coated single-stranded DNA can recognize complementary with naked RNA [39]. Therefore, any recombinase which can promote homologous pairing and/or strand exchange between appropriate DNA molecules or between DNA and RNA molecules may be used with TOPO-Dual Adapter technology. Recombinases of the RecA family include RecA in eubacteria, RadA in archea, Rad51 and Dmc1 in eukaryea, and the bacteriophage T4 UvsX protein [40]. RecA-like recombinases have been isolated from many prokaryotes and eukaryotes. For example, such recombinases include, but are not limited to, the wild type E. coli RecA protein [41] and mutant forms of E. coli RecA protein, such as RecA 803 and RecA 441 [42-43]; T4 UvsX Recombinase [44]; B. subtilis RecA protein[45]; U. maydis Red. protein [46]; T. aquaticus RecA analog protein [47-48] and RecA-like protein derived from fission yeast, mouse and human [49]. In a preferred embodiment of the present invention the wild type RecA-protein is used as recombinase alone or in combination with a second recombinase, such as T4 UvsX.
[0067] Previous methods for NGS target enrichment using Recombinase protein [22] describe capture of traditional linear dsDNA NGS libraries using 15 to 25 mer oligonucleotides complementary to the displaced sequence in the D-loop to stabilize the structure. In contrast, the current invention generates NGS libraries as closed circular dsDNA that do not require stabilizing oligonucleotides. Stabilizing oligonucleotides may be practical when attempting to capture a small number genes, however, they are not practical with attempting a large capture such as a human exome. For example, 2.1 million 60-90 mer oligonucleotide probes are required to capture approximately 20,000 genes (64 Mb) that make up the human exome. Based on the previously described method, a minimum of 2.1 million stabilization probes would be required. This number is likely to increase once the capture is optimized and may negatively affect assay performance or commercial viability.
[0068] An important aspect of RecA mediated capture is the fact it occurs rapidly (10-20 min) at low temperature (37 C.) using enzyme-guided hybridization with circular dsDNA. In contrast, traditional oligonucleotide-based target enrichment requires DNA denaturing to generate ssDNA and extended hybridization times (4-16 hrs) at elevated temperatures (65 C.). This is a significant advantage when trying to detect rare variants at low frequency as Newman et. al. suggest background errors (i.e. artifacts) may be introduced into sample DNA from extended hybridization times at elevated temperature. The DNA damage is suspected to be caused by oxidation-induced 8-oxo-guanine causing G>T transitions [17-18]. Thus, RecA mediated target enrichment may generate NGS data with fewer artifacts and increase overall sensitivity. Also, combinations of recombinase-like enzymes may be used in the capture in the event some enzymes show a bias in sequence binding.
[0069] In another embodiment of the invention, TOPO-DAT (and DAPs) molecules may contain tandem restriction or nicking enzyme sites flanking the NAS sequences, which can form dsDNA hairpins when a single strand is amplified. This allows the option of amplifying TOPO-DAT molecules using isothermal or rolling circle amplification (RCA) following target enrichment. The dsDNA hairpins allow cleavage of the resulting ssDNA concatemers into individual units, which can be loaded directly on the sequencer when full length NGS adapter sequences (i.e. full length P5 and P7 sequences) are incorporated into TOPO-DAT and DAP molecules (
[0070] A key aspect of the invention is the synthesis method of the TOPO-DAT molecules, which allow efficient synthesis of double-stranded UMI sequences using PCR. TOPO-DAT molecules are synthesized using a PCR template molecule consisting of partial or complete NGS adapter sequences located at the 5 and 3 ends of a linear dsDNA molecule. For example, partial or complete P5 and P7 sequences. The 3 primer is designed to anneal to AS1 and includes the UMI1 sequence, topoisomerase binding site and a leaving group containing a nicking enzyme sequence as a 5 tag. Likewise, the 5 primer is designed to anneal to AS2 and includes the UMI2 sequence, topoisomerase binding site and a leaving group containing a nicking enzyme sequences as a 5 tag. The reverse complement of the UMI, topoisomerase binding site and leaving group sequences are generated during the PCR reaction.
[0071] In another embodiment, the dual adapter molecule is synthesized without topoisomerase sites and target fragments are attached to the dual adapter molecule using traditional T/A ligation. The molecules are still synthesized using the PCR method described previously to generate double-stranded UMI sequences flanking the T/A cloning site, but the TOPO and nicking enzyme sequences are replaced by a leaving group containing a type IIS restriction enzyme, such as BmrI type IIs. Similar to TOPO-DAT, this molecule is used to capture random nucleic acid sequences for NGS analysis.
[0072] In another embodiment of the invention, two HR sequences (HR1 and HR2) that are homologous to the 5 and 3 regions of a target are included on two separate adapters molecules that also contain UMI and NGS sequences. Similar to other embodiments, DNA polymerase is used to synthesize the reverse complement of UMI sequences. Individually, these HR adapter molecules consist of two oligonucleotides of different lengths that contain both non-complementary and complementary regions, wherein the complementary regions are hybridized together (
[0073] Alternatively, UMI NGS adapters with or without HR regions may be created using RCA. First, a single-stranded synthetic oligo is generated that contains: 1) NGS adapter sequence(s) such as the Illumina adapter sequences P5 and/or P7; 2) random nucleotides that serve as UMI sequences positioned adjacent and 3 to the NGS adapter sequence; 3) an HR domain (optional); 4) a A base located 3 to the UMI sequence or HR domain if one is present, and 5) a recognition site for a type IIS restriction enzyme, such as BmrI, that cleaves outside of the recognition site located adjacent to the A base such that the enzyme cleaves at the A and generates and overhanging T base when a dsDNA molecule is generated. Adapters are generated by the following process: 1) the synthetic ssDNA is circularized using a splint ligation strategy or the enzyme circulase; 2) the splint ligation oligo (or other oligo) is used as a primer for RCA to generate a long ssDNA concatemer molecule containing repeats of the features listed above; 3) RCA reaction is stopped and ssDNA is purified; 4) An oligonucleotide containing both a non-homologous region and a region complementary to the NGS adapter sequences hybridize to multiple sites along the ssDNA concatemer; 5) a non-strand displacing DNA polymerases such as T4 and T7 DNA polymerases (gap filling enzymes) are used to generate the reverse complement of the UMI and optional HR sequences, Type IIA restriction enzyme sequence and other sequences located between adjacent hybridized oligos (note: the polymerase will not displace the neighboring oligo containing a non-homologous region that remains single-stranded); and 6) the Type IIA enzyme is added to the reaction to cleave the at the T/A position generating individual Y-shaped adapters with double-stranded UMIs and a single T over-hang which can be used in T/A mediated ligation.
[0074] In another embodiment of the invention, topoisomerase binding sites and UMIs are incorporated into NGS adapters to produce TOPO UMI adapters. Similar to other embodiments, DNA polymerase is used to synthesize the reverse complement of UMI sequences. Adapters are constructed by annealing 4 oligonucleotides (Oligo 1-4). Oligo 1 may consist of standard sequence found in Illumina's Universal adapter (NGS Adapter Sequence 1: AS1). Oligo 2 may consist of standard sequence found in Index Adapters (NGS Adapter Sequence 2: AS2), but is extended to include a UMI sequence followed by the sequence GGGA. Oligo 3 contains the topoisomerase binding site CCCTT plus additional sequence that is complementary to Oligo 4. Oligo 4 contains sequence that is complementary to Oligo 3 except the first 4 nucleotides of the topoisomerase sequence (CCCT) (
[0075] In another embodiment of the invention, long capture probes that are complementary to target regions are created with flanking sequences that consist of UMIs and NGS adapter sequences. These probes are referred to as capture and adapt (CAAD) probes (
[0076] The present invention also includes a complete system consisting of any of the target enrichment/sample preparation inventions listed above, sequencing instrument that can sequence both strands and a bioinformatics pipeline for data analysis. This system may be submitted to the FDA as an in vitro diagnostic (IVD) test for ctDNA, solid tumor DNA/RNA, hematological malignancy DNA/RNA and germline DNA/RNA based assay. The system may also be used for infectious disease detection and organ transplant monitoring.
[0077] In one embodiment, all aspects of the invention (DAPs, TOPO-DAT/RecA Capture, HR Adapters and CAAD) can be used to design any gene panel for diagnostic use. Targeted gene panels have become standard tools for cancer diagnostics. These gene panels allow simultaneous gene mutation detection across different cancer types and contain genes whose mutations are associated with a particular FDA approved, off-label or investigational therapy. For example, a panel to capture the approximately 2800 hotspot mutations from 50 oncogenes and tumor suppressor genes in the COSMIC database [52]. In this scenario, probes would target the regions known to harbor hotspot mutations or entire exons that contain the target mutation. Multiple overlapping probes can be designed to capture the entire region. The panel can be used to provide clinically actionable data for solid tumor specimens. For example, if BRAF V600E is detected in a melanoma tumor specimen, a physician can prescribe a BRAF inhibitor such as vemurafenib [53]. In some cases, a biopsy specimens cannot be obtained, so ctDNA specimens are the only option. This is very common for lung cancer patients. The current invention has increased DNA capture efficiency and high mutation detection sensitivity due to advanced error suppressor, so the assay can provide clinically relevant data for ctDNA specimens. For example, if the common lung cancer mutation L858R is detected, the treating physician can prescribe tyrosine kinase inhibitors such as erlotinib [54]. In another embodiment of the invention, a very large gene panel (>300 genes) targeting full exon analysis of genes associated with FDA approved, off-label and investigational drugs is created with the invention. Data from the large panel not only allows for directed therapy options to be discovered, but also total mutation burden, which serves as a positive indicator for response to immunotherapy [55].
[0078] In another embodiment, the invention can be used to detect Microsatellite instability (MSI). MSI is a surrogate marker for DNA mismatch repair (MMR) deficiency and the FDA has approved immunotherapy for any tumor type exhibiting MSI [56]. MSI assays analyze repeat regions within the human genome that are susceptible to error due to DNA polymerase strand slippage. Individuals harboring a germline or somatic mutation in a MMR gene that disables this pathway will contain deletions in these repeat regions. MSI assays work by capturing select regions DNA repeats such as BAT25, BAT26, MONO-27, NR-21 and NR-24 and comparing repeat lengths between DNA isolated from the blood and tumor specimen [57]. Due to their repetitive and low complexity nature, these repeat regions are difficult to selectively capture using standard probe-based target enrichment protocols. DAPs may be designed to target the flanking regions located outside the repeats region, hence the efficiency of the target capture should be improved. Moreover, repeat regions are more likely to produce sequencing errors on NGS systems. The unique sequencing strategy and error suppression technology incorporated into DAPs should be able to identify and remove these errors, and provide physicians with more accurate results when determining eligibility for immunotherapy.
[0079] In another embodiment, the invention can be used to detect Loss of heterozygosity (LOH). LOH is a condition common in tumors whereby the wild type allele is lost/deleted and only the mutant/inactive copy remains. LOH is very common in individuals with hereditary colon and endometrial cancer (diseases driven by inactivation of the MMR pathway) and often serves as the second hit that leads to tumor formation and disease [59]. Determining LOH status of the five MMR genes (MLH1, MSH2, MSH6, PMS2 and EPCAM) and their mutational status is also used to determine eligibility for immunotherapy [60]. Global LOH profiles also serves as a surrogate marker for homologous recombination repair deficiency (HRD), which is a disease driven by inactivation of the homologous recombination DNA repair pathway. Current FDA companion diagnostics analyze tumor LOH status and mutational status of the HRD genes BRCA1 and BRCA2 to determine eligibility for PARP inhibitors in breast and ovarian cancer patients [61-62]. NGS methods for determining LOH requires capturing intronic regions within either the MMR genes to determine LOH status of individual MMR genes, or multiple select introns throughout the genome to generate a global LOH profile. Intronic regions consist of low complexity/repetitive sequence which can pose problems for traditional probe-based capture. DAPs can be designed to capture the flanking regions of the repetitive regions and allow efficient capture for LOH analysis. Moreover, all aspects of the invention (DAPs, TOPO-DAT/RecA Capture and CAAD) can be used to create a panel to provide a global LOH profile and mutational status for MMR and HRD genes for immunotherapy and PARP inhibitor eligibility determination.
[0080] In another embodiment, all aspects of the invention (DAPs, TOPO-DAT/RecA Capture and CAAD) can be used to create gene panels for patient recruitment into clinical trials. For example, PARP inhibitors are more effective in patients with a mutation in BRCA1, BRCA2 or have an increased number of LOH regions within their genome [62]. A gene panel that targets these genes and regions can be used to enroll patients in PARP inhibitor clinical trials by analyzing DNA extracted from blood and tumor tissue. Moreover, an extended gene panel can be used for patient stratification and biomarker discovery during and after the clinical trial. Patient stratification includes grouping patients with the same diagnosis and prescription into groups such as drug beneficial and nontoxic, drug non-beneficial and nontoxic, drug beneficial and toxic and drug non-beneficial and toxic. An example panel for non-responders to immunotherapy treatment for biomarker discovery is one that targets the antigen presentation and interferon gamma pathway as mutations in these pathways have been show to facilitate resistance to immunotherapy in melanoma cell lines [63].
[0081] In another embodiment, all aspects of the invention (DAPs, TOPO-DAT/RecA Capture and CAAD) can be used to monitor patients diagnosed with ovarian cancer. The majority of patients who have been diagnosed with ovarian cancer will have recurrent disease. In fact, 50% of patients who achieve remission after first-line chemotherapy will experience recurrence of cancer within 3 years. Current methods for detecting recurrent disease include CA-125 blood antigen, x-rays and CT scans [64]. Liquid biopsies have been shown to detect recurrent cancer earlier than these methods. The current invention may be used to design ovarian cancer specific gene panels and used as a surveillance tool to monitor for cancer recurrence. The current invention has higher capture efficiency that provides a more representative sampling of the ctDNA molecule population than current methods. Moreover, the error suppression technology allows the detection of extremely rare variants, thus providing a very high level of sensitivity, which may translate into early cancer recurrence detection.
[0082] In another embodiment, all aspects of the invention (DAPs, TOPO-DAT/RecA Capture and CAAD) can be used to monitor patients with heredity cancer syndromes. There are several heredity cancer syndromes with known associated gene mutations. These include hereditary breast and ovarian cancer syndrome, Cowden Syndrome, Lynch Syndrome, Hereditary leukemia and hematologic malignancies syndrome and Li-Fraumeni Syndrome to name a few [65]. These high-risk populations will eventually develop cancer and early detection and treatment increases the likelihood of patient survival. The current invention can be leveraged to create gene panels specific to a hereditary cancer syndrome for ctDNA surveillance to detect early disease. The invention's highly efficient target enrichment and error suppression features can provide the extremely high sensitivity for early disease detection.
[0083] In another embodiment, DAPs can be used for T-cell and B-cell repertoire analysis [66-67]. T-cell and B-cell clonal populations can be analyzed by cDNA capture using DAPs. In this scenario, the conserved regions flanking the variable regions in the TCR and BCR genes are targeted with the DAPs. The UMI sequences can be used to generate SSCSs and DSCSs and determine which clonal population has expanded in a patient in response to an antigen or immunotherapy by counting unique clones based on UMI data. Moreover, Fusion-DAPs may also map the diversity of CAR-T integration sites and estimate both the transduction efficiency and copy number of integration events [68]. This is accomplished by targeting the conserved region of the integration cassette with the HR1 domain and determining the neighboring region (genomic integration site), similar to the strategy used for gene fusions.
[0084] In another embodiment, the sample DNA is treated with RNA guides in a CRISPR/Cas, TALEN, or comparable system for targeted DNA cleavage. In this scenario, samples would be treated with enzyme and RNA guides for targeted cleavage. A reverse SPRI method may be used to separate cleaved dsDNA from non-cleaved [69]. Or, a biotinylated mutated Cas9 protein that binds target DNA without cleaving may be used to separate the cleaved dsDNA from un-cleaved [70]. The cleaved samples would then be incubated with DAPs that target with the cleaved DNA fragments. Once captured, the target DNA may be amplified and sequenced.
[0085] In another embodiment, target DNA is first PCR amplified and then incubated with DAPs that target the PCR products. In one scenario, HR1 and HR2 sequence can be a universal synthetic sequence that is incorporated into the gene specific primers. So, only DAP molecules composed of these specific HR sequences that contain the universal synthetic sequences are required to capture any PCR amplicon containing those sequences. In another scenario, PCR primers are designed to target regions flanking the desired target and DAPs are designed to target regions within the PCR primer regions. As a result of DAP capture, the PCR primer sequences are removed. If the target is PCR amplified in both orientations and DAPs designed to capture in both orientations, sequence data can be generated that is free of both primer and probe sequence if datasets are combined.
[0086] In another embodiment, mRNA is converted to cDNA and captured by TOPO-DAT molecules for whole transcriptome analysis. The UMIs can be used for transcript quantification and the high efficiency TO PO-DAT molecules should generate a more diverse mRNA transcript library than ligation-based methods [71]. Moreover, the UMIs will provide superior reverse transcription error identification by generating SSCSs.
[0087] In another embodiment, DAPs can be designed asymmetrically to capture difficult/low complexity regions. For example, intronic regions that will be used to determine LOH or large repetitive regions that will be used to measure MSI or elevated microsatellite alterations at selected tetranucleotide repeats (EMAST) [72] can have DAPs with HR1 at 20 bp and HR2 at 500 bp to ensure selective capture. Due to the extensive length of HR2, only Read 1 data would be used for analysis as HR2 probe size exceeds typical read lengths.
[0088] In another embodiment, the entire DAP containing the target DNA may be sequenced with a long read technology such as the Sequel System from Pacific Biosciences. Currently, long fragments up to 20-30 kb are isolated using long range PCR, but long range PCR is not practical for fragments larger than 30 kb. DAPs may be used for target enrichment of these large DNA fragments. Since DAPs form closed circular molecules upon target capture, the molecules may be transformed into E. coli if an antibiotic resistance gene and origin of replication are included in the probe backbone. Next, DAP DNA is isolated and sequenced using a long read sequencing technology, such as Pacific Biosciences Sequel System, as DAPs may be denatured and random hexamers hybridized and loaded onto the sequencer [73]. This method allows for large fragment capture and sequencing without PCR. Large fragments that are too large for long range PCR and traditional target enrichment may be captured and analyzed using this method. This is accomplished by using DAPs that contain a modified 5 or 3 nucleotide at or in the proximity to the terminal nucleotide composed of an affinity tag, such as biotin, that facilitates affinity purification. This allows DAPs to be used in a similar manner as traditional bait capture probes. In this scenario, DAPs with large HR1 and HR2 regions (i.e. 50 bp to 1000 bp) may be used to capture large DNA fragments (20 kb to 50 kb). First, an enzyme with 3 exonuclease activity (such as T4 DNA polymerase in the absence of nucleotides) is used to generate single-stranded HR1 and HR2 regions in the Dual Adapter probe. Next, the probe is incubated with sheared sample DNA in hybridization buffer. Next, the DAPs/DNA complexes are bound to streptavidin beads and washed several times. Next, beads are resuspended in water and heated for 5 minutes at 75 C. to disrupt the biotin/streptavidin interaction and release the DAPs/DNA complex from the streptavidin beads. The supernatant containing the DAPs/DNA complex is treated with one or more enzymes with 5 and 3 exonuclease, such as Mung Bean exonuclease. Alternatively, the heating step can be eliminated as the exonuclease step may release the DAPs/DNA complex from the streptavidin beads. Next, the DNA is purified and either directly transformed into bacteria, which will complete the DAP fragment insertion through in vivo homologous recombination enzymes, or DNA polymerase and Taq DNA ligase may be used in vitro. The closed circular DAPs containing the target DNA may be transformed into bacteria for amplification or amplified using rolling circle amplification. The amplified DNA can be loaded directly on a long read sequencer such as the Pacific Biosciences Sequel System [74].
[0089] In another embodiment, DAPs can be used to capture entire transcripts that traditional probe-based target enrich technologies cannot isolate. First, RNA may be isolated and converted to cDNA. Next, DAPs can be designed to capture the first and last exons of a complete transcript or any combination of transcripts to ensure the total molecule length does not exceed the sequencing read length (i.e. exon 1 and exon 6, or exon2 and exon 10, etc.). Multiple probes would be designed to capture several adjacent regions to cover the entire cDNA length. This method can used to detect deep intronic mutations that generate slicing mutations leading to intronic sequence inclusion in the final transcript and premature stop codon introduction [75]. These DAP libraries may be sequenced on long read sequencers such as Pacific Biosciences Sequel System.
[0090] In another embodiment, all aspects of the invention (DAPs, TOPO-DAT/RecA-mediated capture or CAAD probes) may be used to monitor transplant rejection. Donor-derived cell-free DNA (ddcfDNA) may be used as a biomarker for allograft rejection. If a donor tissue is rejected, the host immune system destroys donor tissue cells and releases ddcfDNA into the blood stream. Dual Adapter or CAAD probes may be used as a surveillance tool to monitor tissue rejection, evaluate response to anti-rejection therapy and decrease the need for more invasive procedures [76].
[0091] In another embodiment, DAPs can be used to detect gene fusions by capturing mRNA or DNA. Gene fusions such as EML4-ALK are common drivers of cancer [77]. Some gene fusions may only have one partner that is commonly found in fusions. DAPs can be designed to target one region of the fusion and facilitate capture of both the known and unknown fusion gene sequences. Following PCR and sequencing of the captured fusion, the fusion partner may be identified. Moreover, whole transcriptome sequencing of full transcripts may identify gene fusions in cancer [78]. As mentioned previously, TOPO-DAT molecules combined with long read sequencing technology can capture and sequence full-length transcripts for fusion discovery.
[0092] In another embodiment, target DNA is enriched using traditional single-stranded RNA or DNA probes. The enriched DNA is eluted off the probes as single-stranded DNA and captured using DAPs and the enzymes T5 exonuclease, DNA polymerase and Taq DNA ligase. DAPs can hybridize to ssDNA that has been enriched using traditional oligonucleotide probes. Combining traditional target enrichment with DAPs allows double enrichment, which is beneficial when attempting capture of a small region or number of targets. A double capture can limit off-target sequence isolation and maximize on-target sequence data generation [79].
[0093] In another embodiment, DAPs containing an antibiotic resistance gene and origin of replication may be used to archive NGS libraries produced with any NGS library preparation kit. This is accomplished by designing DAPs with HR sequences specific to NGS adapter sequences. For example, the IIlumina P5 and P7 sequences. Once the libraries are captured into the DAPs, the library can be transformed into bacteria, propagated and archived by making bacteria glycerol stocks. The ability to archive NGS libraries is a significant advancement in the current state-of-the-art as precious DNA samples isolated from tumor blocks and/or ctDNA samples can be amplified in bacteria and analyzed repeatedly over time. For example, an NGS library created from the last amount of DNA from an FFPE specimen (i.e. the FFPE specimen is now exhausted) may be analyzed using a gene panel with 400 targets in an effort to identify treatment options for a patient. By archiving the patients NGS library, it can be analyzed at a later date with an updated panel that may contain new therapeutic targets. Or, archiving NGS libraries allows pharmaceutical companies with FFPE block and blood biorepositories/biobanks to run sample DNA multiple times on different gene panels without risk of depleting high value specimens (i.e. highly characterized specimens with extensive molecular and clinical data).
[0094] In another embodiment, DAPs may be used with microfluidic or droplet-based or reaction miniaturization device/instrument so that sample DNA is diluted such that a limited amount of DNA is captured in each droplet, chamber, etc. and combined with Dual Adapter probes [80]. Bringing target DNA in close proximity with limited competing DNA should enhance homology-based capture with Dual Adapter probes. In addition, HR Adapter technology may be compatible with drop-based fluidics instruments whereby HR Adapter pairs are enclosed in individual droplets. These droplets are then combined with fragmented sample nucleic acid that has been treated with an enzyme with exonuclease activity to generate 5 single-stranded ends.
[0095] In another embodiment of the invention, prenatal genetic testing and noninvasive prenatal testing (NIPT) can leverage DAP or TOPO-DA technology for higher efficiency capture which may allow earlier detection than current technologies [81]. Similar to ctDNA, fetal DNA may be isolated from the blood of a pregnant mother. This DNA can be captured using wither DAP or TOPO-DAT/RecA target enrichment and analyzed for mutations and chromosomal abnormalities such as trisomy 21.
[0096] Kits for the different variations of the described invention may consist of different components.
[0097] In one embodiment, Dual Adapter probes (DAPs), capture and adapt (CAAD) probes or HR Adapters that target different genes (i.e. exons) may be synthesized, pooled and sold as a gene panel. For example, a panel that includes the NCCN genes ALK, APC, BRAF, BRCA1, BRCA2, EGFR, ERBB2, KIT, KRAS, MET, NRAS, PDGFRA, RET, ROS1 and TP53. Alternatively, custom DAPs may be created. The kit comprises the gene panel, enzymes (i.e. exonuclease, DNA polymerase, Taq DNA ligase, etc.), buffers and protocol.
[0098] In another embodiment, the DAPs, CAADs and HR adapters and kit components may be manufactured for use with droplet-based fluidics systems available from companies such as Biorad, RainDance and 10 Genomics. Individual DAPs and HR Adapter pairs may be encapsulated into droplets. Likewise, all enzymes and buffers required for homology based DAP and HR adapter attachment are encapsulated into droplets. The droplet-based fluidics systems are capable of merging droplets and regulating temperature to facilitate the appropriate reaction conditions. Droplets can be recovered and DNA extracted and subjected to PCR, RCA, or RPA to generate the final target enriched NGS library.
[0099] In another embodiment, Fusion Dual Adapter probes (Fusion DAPs) that target different gene fusions may be synthesized, pooled and sold as a gene fusion panel. For example, a panel that includes gene fusions commonly found in lung cancer ALK, EGFR, MET, BRAF, FGFR, NRG1, NTRK, RET and ROS. Alternatively, custom Fusion DAPs may be created. The kit comprises the gene panel, enzymes (i.e. reverse transcriptase, RNAase H, DNA polymerase, Taq DNA ligase, PNK, ligase, etc.), buffers and protocol.
[0100] In another embodiment, DAPs containing an antibiotic resistance gene, origin of replication and HR domains that have homology to NGS adapter sequences may be synthesized and included in a kit for archiving previously generated NGS libraries, ideally pre-target enrichment libraries, so the whole genome is represented. Laboratories with precious DNA samples that have been adapted for NGS can use the kit to archive pre-target enrichment libraries by generating E. coli glycerol stocks and storing at low temperature (e.g., 80 C.). Libraries may be accessed at a later date by amplifying in E. coli. For example, E. coli containing the archived library may be plated out on antibiotic media and plasmid library isolated. Next, the library can be subject to target enrichment using recombinase-mediated target enrichment or it can be PCR amplified using NGS primers and subject to traditional target enrichment. The kit comprises the DAP molecule targeting the NGS sequences, enzymes (i.e. exonuclease, DNA polymerase, Taq DNA ligase, etc.), buffers, competent E. coli cells and protocol. The kit may also contain reagents for performing recombinase-mediated target enrichment and include, for example, RecA and/or UvsX protein, ATP--S, DNA Gyrase, Proteinase K, streptavidin beads and buffers.
[0101] In another embodiment, topoisomerase adapted dual adapter technology (TOPO-DAT) molecules or topoisomerase charged NGS adapters may be included in an NGS sample preparation kit. The kit comprises TOPO-DAT molecules or TOPO adapters with NGS sample preparation enzymes and buffers for DNA shearing, end repair, A-tailing, PCR, RCA, or RPA. The kit may also contain reagents for performing recombinase-mediated target enrichment and include, for example, RecA and/or UvsX protein, ATP--S, DNA Gyrase, Proteinase K, streptavidin beads and buffers.
[0102] In another embodiment, the TOPO-DAT molecule may not be pre-charged with topoisomerase, but pre-nicked with a nicking enzyme. The kit comprises uncharged TOPO-DAT molecules and the enzyme topoisomerase, as well as NGS sample preparation enzymes and buffers for DNA shearing, end repair, A-tailing, PCR, RCA, or RPA. The kit may also contain reagents for performing recombinase-mediated target enrichment and include, for example, RecA and/or UvsX protein, ATP--S, DNA Gyrase, Proteinase K, streptavidin beads and buffers.
[0103] In another embodiment, dual adapter technology (DAT) molecules that utilize standard T/A ligation may be included in an NGS sample preparation kit. The kit comprises DAT molecules with NGS sample preparation enzymes and buffers for DNA shearing, end repair, A-tailing, ligation, PCR, RCA, or RPA. The kit may also contain reagents for performing recombinase-mediated target enrichment and include, for example, RecA and/or UvsX protein, ATP--S, DNA Gyrase, Proteinase K, streptavidin beads and buffers.
EXAMPLES
[0104] The following examples are provided solely to illustrate the concept of the present invention and not meant to limit the present invention to the embodiments provided.
Example 1
[0105] Dual Adapter Probe (DAP) Generation
[0106] Initially, DAPs were generated using the plasmid pUC19 as the backbone. A 2091 bp fragment containing a pUC19 backbone with partial regions of the Illumina sequences P5 and P7 was PCR-amplified using Taq polymerase (from Monserate Biotechnology Group) and standard conditions with the primers P5-L4440+RC (AGATCGGAAGAGCGTCGTGTAGGCTTCCTCGCTCACTGACTCGCT (SEQ ID NO: 1)) and P7-pGEX 3 (AGATCGGAAGAGCACACGTCTGCCGGGAGCTGCATGTGTCAGAGG (SEQ ID NO: 2)). This molecule (P5/pUC19/P7) was used as a template in all subsequent PCR reactions used to generate DAPs containing UMIs and homologous regions to target genes. This was accomplished by including UMI1 and HR1 sequences in the first oligonucleotide primer and UMI2 and HR2 sequences in the second oligonucleotide primer as a single-stranded sequence. The primers anneal to the P5 and P7 sequences in the P5/pUC19/P7 molecule. The following primers were used to generate DAPs for:
TABLE-US-00001 KRASexon2: KRAS_2-seq1-UMI-P5-RC(TCAGTCATTTTCAGCAGGCCTTNNNNN NNNNNAGATCGGAAGAGCGTCGTGTAG(SEQIDNO:3)) KRAS_2-seq2-UMI-P7-RC(ACTGGTGCAGGACCATTCTTTGNNNNN NNNNNAGATCGGAAGAGCACACGTCTG(SEQIDNO:4)) PIK3CAexon20: PIK3CA_20-seq1-UMI-P5-RC(CATTCCAGAGCCAAGCATCATNNN NNNNNNNAGATCGGAAGAGCGTCGTGTAG(SEQIDNO:5)) PIK3CA_20-seq2-UMI-P7-RC(AACAGCATGCATTGAACTGAAANN NNNNNNNNAGATCGGAAGAGCACACGTCTG(SEQIDNO:6)) TP53exon6: TP53_6-seq1-UMI-P5-RC(TGGGCAACCAGCCCTGTCGTCTNNNNN NNNNNAGATCGGAAGAGCGTCGTGTAG(SEQIDNO:7)) TP53_6-seq2-UMI-P7-RC(GAGGAGGGGTTAAGGGTGGTTGNNNNN NNNNNAGATCGGAAGAGCACACGTCTG(SEQIDNO:8))
[0107] PCR reactions were also performed to generate the target sequences for the DAPs. The following primers were used:
TABLE-US-00002 KRASexon2: KRASex2Fwd(AAGGCCTGCTGAAAATGACTGA(SEQID NO:9)) KRASex2Rev(CAAAGAATGGTCCTGCACCAGT(SEQID NO:10)) PIK3CAexon20: PIK3CAex20Fwd(ATGATGCTTGGCTCTGGAATG(SEQID NO:11)) PIK3CAex20Rev(TTTCAGTTCAATGCATGCTGTT(SEQID NO:12)) TP53exon6: TP53ex6Fwd(AGACGACAGGGCTGGTTGCCCA(SEQID NO:13)) TP53ex6Rev(CAACCACCCTTAACCCCTCCTC(SEQID NO:14))
[0108] All DAPs and target genes were gel purified using the Zymoclean Gel DNA Recovery Kits (Zymo Research).
[0109] Recombination-based dual adapter attachmentAll reactions were conducted using either the NEBuilder HiFi DNA Assembly Master Mix (New England Biolabs), which contains an exonuclease, DNA Ligase and DNA polymerase. A ratio of 15:1 (DAP:target) was found yield the highest number of bacteria transformants, so this ratio was used for all subsequent studies. All 3 DAPs were pooled at equal amounts to generate a probe pool. Likewise, all PCR targets were pooled at equal amounts to generate a target pool. Reactions were set up as follows: 25 ul NEB HiFi DNA Assembly Master Mix, 5 ul DAP pool (30 ng/ul), 5 ul PCR target pool (2 ng/ul), 15 ul H.sub.2O and incubated at 50 C. for 30 min. Reactions were purified using 0.5 KAPA Pure Beads (from Roche) and eluted in 35 ul H.sub.2O. Next, unreacted linear DNA was removed by adding 5 ul NEB Buffer 4, 5 ul ATP, 2 ul RecBCD (New England Biolabs) and 3 ul H.sub.2O and incubating at 37 C. for 60 min and 70 C. for 30 min. The reaction was purified using 1.0 KAPA Pure Beads and eluted in 20 ul elution buffer. Finally, captured targets were PCR amplified using primers that target the NGS adapter sequences P5 and P7 (P5-PCR-universal: AAT GAT ACG GCG ACC ACC GAG ATC TAC ACT CTT TCC CTA CAC GAC GCT CTT CCG ATC T and P7-PCR-index CCGTTA: CAA GCA GAA GAC GGC ATA CGA GAT TAA CGG GTG ACT GGA GTT CAG ACG TGT GCT CTT CCG ATC T) in the DAP backbone using 22 cycles and a Tm of 60 C. PCR products were purified with 1.0 KAPA Pure Beads, eluted in 40 ul elution buffer and analyzed on an Agilent TapeStation D1000 assay.
[0110] Standard NGS library preparationA standard ligation-based NGS library preparation was performed with custom adapters containing an 8 bp UMI located adjacent to the sample barcode. This library was created to directly compare ligation-based adapter attachment to recombination-based dual adapter attachment. The library was created using the KAPA Hyperprep kit (Roche) using custom adapters from IDT (AAT GAT ACG GCG ACC ACC GAG ATC TAC ACT CTT TCC CTA CAC GAC GCT CTT CCG ATC* T and /5Phos/GA TCG GAA GAG CAC ACG TCT GAA CTC CAG TCA CGG AAC TNN NNN NNN ATC TCG TAT GCC GTC TTC TGC TTG). First, 5 ng of pooled PCR products were diluted in 50 ul 1 KAPA Frag Buffer. Next, 7 ul End Repair & A-Tailing Buffer and 3 ul End Repair & A-Tailing Enzyme Mix were added and the reaction was incubated at 65 C. for 30 min. Next, 5 ul of 1.5 uM custom adapter stock, 5 ul H.sub.2O, 30 ul KAPA Ligation Buffer and DNA Ligase was added and incubated at 20 C. for 1 hr. Ligation reactions were purified by adding 88 ul KAPA Pure Beads and eluting in 25 ul elution buffer. Next, 20 ul of the purified ligation reaction was added to 25 ul KAPA HiFi HotStart ReadyMix (2) and 5 ul Library Amplification Primer Mix (10) and PCR amplified 22 cycles with a Tm of 60 C. The final PCR was purified with 1.0 KAPA Pure Beads, eluted in 40 ul elution buffer and analyzed on an Agilent TapeStation D1000 assay.
[0111] Sequencing ResultsThe DAP and KAPA libraries were multiplexed at 10 nM and run on an Illumina MiSeq using a micro flowcell with 2150 bp sequencing. For each sample, reads were first sorted based on alignment to the three reference sequences. Next, unique 8 bp UMIs within each group were identified (Note: for DAP samples, the first 8 bp from the UMI1 sequence were used. The KAPA sample contained an 8 bp UMI adjacent to the sample barcode). In order for a UMI to be used in error correction, the UMI must be observed multiple times as low frequency variants in the UMI sequences may be errors. We used a cut-off of 5 to filter out UMIs and determined the percentage of UMIs over 5 out of the total number of UMIs per PCR target. Table 1 shows the recombination-based DAP method had significantly more UMIs above 5 than the traditional ligation-based method, suggesting recombination-based adapter attachment using DAPs is significantly more efficient than traditional ligation-based methods. This is significant as NGS library conversion is a critical step when analyzing circulating tumor DNA (ctDNA) for rare variant detection as inefficient adapter ligation can result in population sampling bias. DAPs can readily be used with PCR-based hotspot panels used for solid tumor and ctDNA analysis. Some current state of the art hotspot panels include an adapter ligation step following PCR amplification. Substituting DAP technology in place of traditional adapter ligation will increase NGS library conversion and improve overall sample representation in the sequence data. Moreover, DAPs allow high efficiency UMI attachment, which may offer superior error detection without sacrificing library conversion efficiency.
TABLE-US-00003 TABLE 1 NGS Library Conversion Comparison % Unique Unique UMIs >= UMIs >= Total 5x/total % Amplicon Library Prep: 5x UMIs UMIs Increase Amplicon: DAP 54,828 65,081 84% 27% KRAS KAPA 39,298 64,334 61% Amplicon: DAP 27,334 61,941 44% 65% PIK3CA KAPA 7,862 53,972 15% Amplicon: DAP 40,024 63,880 63% 63% TP53 KAPA 13,368 57,883 23%
Example 2
[0112] Comparison of TOPO T/A Mediated Ligation with the Vector pCR4-TOPO and Standard T/A Ligation with NGS Adapters from KAPA/Roche
[0113] Preliminary NGS data has been generated comparing TOPO T/A mediated ligation with the vector pCR4-TOPO and standard T/A ligation with NGS adapters from KAPA/Roche. To measure capture efficiency, mutagenic libraries were made with 3 amplicons and then pooled. Capture efficiency was determine by counting the number of unique sequences and dividing by the total number of sequences. Wild-type amplicons were cloned and Sanger sequenced verified. Two different sets of primers were used in the mutagenic PCR (mPCR) reactions: 1) standard gene specific primers (see example 1) and 2) gene specific primers containing identical sequence plus partial NGS adapter sequences.
TABLE-US-00004 KRASexon2: KRAS_2-seq1-UMI-P5-RC(TCAGTCATTTTCAGCAGGCCTTAGATC GGAAGAGCGTCGTGTAG(SEQIDNO:19)) KRAS_2-seq2-UMI-P7-RC(ACTGGTGCAGGACCATTCTTTGAGATC GGAAGAGCACACGTCTG(SEQIDNO:20)) PIK3CAexon20: PIK3CA_20-seq1-UMI-P5-RC(CATTCCAGAGCCAAGCATCATAGA TCGGAAGAGCGTCGTGTAG(SEQIDNO:21)) PIK3CA_20-seq2-UMI-P7-RC(AACAGCATGCATTGAACTGAAAAG ATCGGAAGAGCACACGTCTG(SEQIDNO:22)) TP53exon6: TP53_6-seq1-UMI-P5-RC(TGGGCAACCAGCCCTGTCGTCTAGATC GGAAGAGCGTCGTGTAG(SEQIDNO:23)) TP53_6-seq2-UMI-P7-RC(GAGGAGGGGTTAAGGGTGGTTGAGATC GGAAGAGCACACGTCTG(SEQIDNO:24))
[0114] Both mPCR reactions used the same wild type clones as template. The mPCR products generated with standard primers were used as input for the KAPA Hyperprep Kit for Illumina sequencing, which consists of end repair/A-tailing, adapter ligation and PCR with Illumina NGS primers. The mPCR products generated with the second primer set were TOPO cloned into the pCR4-TOPO vector. Unreacted PCR products and TOPO vector were removed with Exonuclease V/RecBCD. Following purification, a final NGS library was created by PCR amplification with standard Illumina NGS primers. Both libraries sequenced as expected, demonstrating the TOPO cloned mPCR products are compatible with NGS sequencing. The TOPO library did not generate as many clusters/reads as the KAPA library, but it is not uncommon for libraries generated by different methods to cluster at different efficiencies. To measure the efficiency of the TOPO mediated ligation, we identified all unique mPCR products (as compared to their wild type sequence) using an Levenshtein/Edit Distance algorithm [82-83]. Next, we treated each unique mPCR molecule as a barcode and counted total unique reads/barcodes present at 10 or higher. As a percentage of total reads, we found the TOPO vector captured more unique reads/barcodes for of the amplicons. This result indicates that TOPO vector ligation efficiency is comparable or better when compared to the standard Y-shaped adapter ligation reactions used in standard NGS library preparation.
TABLE-US-00005 TABLE 1 NGS Library Conversion Comparison Total Reads Library Prep: Unique Mapped to % Unique Amplicon TOPO or KAPA Reads Reference Reads Amplicon: TOPO 742 214,116 0.35 KRAS KAPA 987 429,960 0.23 Amplicon: TOPO 785 186,850 0.42 PIK3CA KAPA 1098 559,500 0.20 Amplicon: TOPO 1503 548,305 0.27 TP53 KAPA 2590 937,679 0.28
Example 3
[0115] TOPO-DAT and Recombinase-Mediated Target Enrichment
[0116] To demonstrate proof of principle for recombinase-mediated target enrichment with a closed circular dsDNA NGS library, a topoisomerase charged vector was used to topo clone an NGS library generated using a KAPA kit. Results were compared with the traditional target enrichment capture using same NGS libraries. Target enrichment was performed using xGen Hybridization probes from IDT. The probes were designed to capture the 773 bp UPP promoter.
TABLE-US-00006 >UPP-promoter (SEQIDNO:25) GGGTGAAAGCCAACCATCTTTGTTTCGGGGAACCGTGCTCGCCCCGTAAA GTTAATTTTTTTTTCCCGCGCAGCTTTAATCTTTCGGCAGAGAAGGCGTT TTCATCGTAGCGTGGGAACAGAATAATCAGTTCATGTGCTATACAGGCAC ATGGCAGCAGTCACTATTTTGCTTTTTAACCTTAAAGTCGTTCATCAATC ATTAACTGACCAATCAGATTTTTTGCATTTGCCACTTATCTAAAAATACT TTTGTATCTCGCAGATACGTTCAGTGGTTTCCAGGACAACACCCAAAAAA AGGTATCAATGCCACTAGGCAGTCGGTTTTATTTTTGGTCACCCACGCAA AGAAGCACCCACCTCTTTTAGGTTTTAAGTTGTGGGAACAGTAACACCGC CTAGAGCTTCAGGAAAAACCAGTACCTGTGACCGCAATTCACCATGATGC AGAATGTTAATTTAAACGAGTGCCAAATCAAGATTTCAACAGACAAATCA ATCGATCCATAGTTACCCATTCCAGCCTTTTCGTCGTCGAGCCTGCTTCA TTCCTGCCTCAGGTGCATAACTTTGCATGAAAAGTCCAGATTAGGGCAGA TTTTGAGTTTAAAATAGGAAATATAAACAAATATACCGCGAAAAAGGTTT GTTTATAGCTTTTCGCCTGGTGCCGTACGGTATAAATACATACTCTCCTC CCCCCCCTGGTTCTCTTTTTCTTTTGTTACTTACATTTTACCGTTCCGTC ACTCGCTTCACTCAACAACAAAA
[0117] NGS libraries were made with genomic DNA from 4 Pichia Pastoris strains using the KAPA kit (as described in the previous example). For the traditional bait capture, each final NGS library was quantified using the Qubit broad range dsDNA assay and 125 ng of each library was combined for a total of 500 ng and combined with 2.5 ul salmon sperm DNA (10 mg/ml). The mixture was purified using 1.8 Kapa Beads and eluted in 9.5 ul xGen 2 Hybridization Buffer, 3 ul xGen Hybridization Buffer Enhancer, 2 ul Blocking Oligos and 4.5 ul xGen Lockdown Probes targeting the UPP promoter. The reaction was heated at 95 C. for 30 sec and 65 C. for 16 hrs. The hybridization reaction was bound to streptavidin beads and washed according to the IDT xGen hybridization capture of DNA libraries protocol. Following washing, 16 cycles of post-capture PCR were performed. PCR products were purified with 1.5 KAPA Pure Beads, eluted in 26 ul elution buffer and analyzed on an Agilent TapeStation D1000 assay.
[0118] For the RecA-mediated capture, the purified post-ligation reaction was PCR amplified using 30 cycles and a Tm of 60. The PCR reaction was diluted 1/10 and 4 ul were mixed with 1 ul pCR4-TOPO and 1 ul salt solution, and incubated at room temperature for 30 mins. Reactions were transformed into NEB 10-beta electrocompetent cells and plated over 30 LB/Amp plates to generate approximately 1.5M colonies. All plates were scraped, cells combined, resuspended in LB/Amp media and 9 ml used to isolate supercoiled plasmids using the Monarch Plasmid Prep kit (New England Biolabs). The Pichia Pastoris genome is approximately 9.4 Mb, so 1.5M clones with 200-300 bp fragments=300-450 Mb total genome representation (30-45 coverage).
[0119] Nucleofilaments were formed by combining 1 ul (2 ug) of RecA protein (New England Biolabs), 3 ul 10 RecA Buffer (New England Biolabs), 7 ul xGen Lockdown Probes targeting the UPP promoter, 6 ul 2 mM ATP--S, 2 ul ATP (10 mM) and 6 ul of H.sub.2O, and incubating at 37 C. for 15 min. Next, 5 ul supercoiled NGS plasmid library (100 ng/ul) was added (500 ng total) and incubated at 37 C. for 20 min. The reaction was terminated by adding 0.5 ul Proteinase K (20 mg/ml) and 1 ul 5% SDS and incubating at 37 C. for 10 min. Finally, 1 ul 100 mM PMSF was added to stop Proteinase K activity.
[0120] The hybridization reaction was bound to streptavidin beads by incubation at room temperature for 40 min and washed twice with 100 ul 1 Bind & Washing (B&W) buffer (5 mM Tris-HCl, pH7.5, 0.5 mM EDTA and 1M NaCl) and once with 100 ul H2O. The beads were resuspended in 20 ul H.sub.2O and combined 25 ul KAPA HiFi HotStart ReadyMix (2) and 5 ul Library Amplification Primer Mix (10) and PCR amplified 22 cycles with a Tm of 60 C. The final PCR was purified with 1.0 KAPA Pure Beads, eluted in 40 ul elution buffer and analyzed on an Agilent TapeStation D1000 assay.
[0121] Sequencing ResultsBoth captures were sequenced separately on an Illumina Miseq (2150 bp).
[0122] Prior reports suggests the true value of this method is the ability to capture small target areas with minimal off-target, which is difficult to achieve with traditional bait capture. However, the real value of the invention is the ability to perform target enrichment rapidly at low temperature, which should reduce sample oxidation and reduce artifacts when analyzing specimens for rare variants. Moreover, combining recombinase-mediated target enrichment with dual adapter technology solves multiple issues associated with rare variant detection. Topoisomerase charged dual adapter technology facilitates high efficiency UMI attachment to sample nucleic acid, which is required for error correction, and the resulting circular dsDNA molecules allow recombinase-mediated target enrichment without stability probes. The previous example used a plasmid and relied on E. coli transformation for super coiling the plasmid DNA, however, non-plasmid dual adapter molecules may be supercoiled in vitro using DNA Gyrase or ethidium bromide. Also, closed circular dsDNA dual adapter probes may not need to be supercoiled to form stable D-loops as stable D-loops have be shown to be formed with relaxed plasmid DNA (ref) and the strand invasion process by the nucleofilament itself may introduce enough negative supercoiling to stabilize the structure for target enrichment.
REFERENCES
[0123] 1. Kamps R, Brando R D, Bosch B J, et al. Next-Generation Sequencing in Oncology: Genetic Diagnosis, Risk Prediction and Cancer Classification. Int J Mol Sci. 2017; 18(2):308. Published 2017 Jan. 31. [0124] 2. Gray P. N.; Dunlop, C.; Elliott, A. Not All Next Generation Sequencing Diagnostics are Created Equal: Understanding the Nuances of Solid Tumor Assay Design for Somatic Mutation Detection. Cancers 2015, 7, 1313-1332 [0125] 3. Shu, Y., Wu, X., Tong, X. et al. Circulating Tumor DNA Mutation Profiling by Targeted Next Generation Sequencing Provides Guidance for Personalized Treatments in Multiple Cancer Types. Sci Rep 7, 583 (2017). [0126] 4. Fiala, C., Diamandis, E. P. Utility of circulating tumor DNA in cancer diagnostics with emphasis on early detection. BMC Med 16, 166 (2018). https://doi.org/10.1186/s12916-018-1157-9 [0127] 5. Hahn, S.; Garvin, A. M.; Di Naro, E.; Holzgreve, W. Allele drop-out can occur in alleles differing by a single nucleotide and is not alleviated by preamplification or minor template increments. Genet. Test 1998, 2, 351-355. [0128] 6. Barnard, R.; Futo, V.; Pecheniuk, N.; Slattery, M.; Walsh, T. Pcr bias toward the wild-type k-ras and p53 sequences: Implications for per detection of mutations and cancer diagnosis. BioTechniques 1998, 25, 684-691. [0129] 7. Hodges, E.; Xuan, Z.; Balija, V.; Kramer, M.; Molla, M. N.; Smith, S. W.; Middle, C. M.; Rodesch, M. J.; Albert, T. J.; Hannon, G. J.; et al. Genome-wide in situ exon capture for selective resequencing. Nat. Genet. 2007, 39, 1522-1527. [0130] 8. Okou, D. T.; Steinberg, K. M.; Middle, C.; Cutler, D. J.; Albert, T. J.; Zwick, M. E. Microarray-based genomic selection for high-throughput resequencing. Nat. Methods 2007, 4, 907-909. [0131] 9. Albert, T. J.; Molla, M. N.; Muzny, D. M.; Nazareth, L.; Wheeler, D.; Song, X.; Richmond, T. A.; Middle, C. M.; Rodesch, M. J.; Packard, C. J.; et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods 2007, 4, 903-905. [0132] 10. Mamanova, L.; Coffey, A. J.; Scott, C. E.; Kozarewa, I.; Turner, E. H.; Kumar, A.; Howard, E.; Shendure, J.; Turner, D. J. Target-enrichment strategies for next-generation sequencing. Nat. Methods 2010, 7, 111-118. [0133] 11. Pritchard, C. C.; Salipante, S. J.; Koehler, K.; Smith, C.; Scroggins, S.; Wood, B.; Wu, D.; Lee, M. K.; Dintzis, S.; Adey, A.; et al. Validation and implementation of targeted capture and sequencing for the detection of actionable mutation, copy number variation, and gene rearrangement in clinical cancer specimens. J. Mol. Diagn. 2014, 16, 56-67. [0134] 12. Frampton, G. M.; Fichtenholtz, A.; Otto, G. A.;Wang, K.; Downing, S. R.; He, J.; Schnall-Levin, M.; White, J.; Sanford, E. M.; An, P.; et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 2013, 31, 1023-1031. [0135] 13. Wagle, N.; Berger, M. F.; Davis, M. J.; Blumenstiel, B.; Defelice, M.; Pochanard, P.; Ducar, M.; van Hummelen, P.; Macconaill, L. E.; Hahn, W. C.; et al. High-throughput detection of actionable genomic alterations in clinical tumor samples by targeted, massively parallel sequencing. Cancer Discov. 2011, 2, 82-93. [0136] 14. Lanman R B, Mortimer S A, Zill O A, Sebisanovic D, Lopez R, Blau S, et al. (2015) Analytical and Clinical Validation of a Digital Sequencing Panel for Quantitative, Highly Accurate Evaluation of Cell-Free Circulating Tumor DNA. PLoS ONE 10(10): e0140712. [0137] 15. Schmitt M W, et al. Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci USA. 2012; 109:14508-14513. [0138] 16. Fox, E.;, Reid-Bayliss, K.; Emond, M.; Loeb, L. Next Gener Seq Appl. Accuracy of Next Generation Sequencing Platforms 2014; 1: 1000106. [0139] 17. Newman, A, et al. Integrated digital error suppression for improved detection of circulating tumor DNA Nat Biotechnol. 2016 May; 34(5): 547-555. [0140] 18. Costello, M, et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acid Research. 2013; 41(6):e67-e67 [0141] 19. https://sequencing.roche.com/en/products-solutions/by-category/assays/ctdna-analysis-kits/ctdna-targeted-kits.html [0142] 20. Ability of RecA protein to promote a search for rare sequences in duplex DNA. Proc. Nati. Acad. Sci. USA Vol. 83, pp. 9586-9590, December 198 [0143] 21. Bakhyt Zhumabayeva, Alex Chenchik, and Paul D. Siebert. RecA-Mediated Affinity Capture: A Method for Full-Length cDNA Cloning. BioTechniques 1999 27:4, 834-845 [0144] 22. Holger Welder, Hilden and Erika Wedler, Hilden. RECOMBINASE MEDIATED TARGETED DNA ENRICHMENT FOR NEXT GENERATION SEQUENCING. US 2015/0197787 A1 United States Patent and Trademark Office, 16 Jul. 2015. [0145] 23. Shuman S. Recombination mediated by vaccinia virus DNA topoisomerase I in Escherichia coli is sequence specific. Proc Natl Acad Sci USA. 1991 Nov. 15; 88(22):10104-8. [0146] 24. Shuman S. Novel approach to molecular cloning and polynucleotide synthesis using vaccinia DNA topoisomerase. J Biol Chem. 1994 Dec. 23; 269(51):32678-84. [0147] 25. Jonathan D. Chesnut, Stewart Shuman, Knut R. Madden, John A. Heyman, Robert P. Bennett. METHODS AND REAGENTS FOR MOLECULAR CLONING. US 2003/0022179 A1 United States Patent and Trademark Office, 30 Jan. 2003 [0148] 26. Gibson, D. G. et. al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature Methods. 2009 May; 6(5):343-5. [0149] 27. Gibson, D. G. et al. Chemical synthesis of the mouse mitochondrial genome. Nature Methods. 2010 November; 7(11):901-3. [0150] 28. https://www.neb.com/products/e2621-nebuilder-hifi-dna-assembly-master-mix#Product%20Information [0151] 29. Zhang, Y.; Werling, U.; Edelmann, W. Seamless Ligation Cloning Extract (SLiCE) Cloning Method Methods Mol Biol. 2014; 1116: 235-244. [0152] 30. Li, M.; Elledge S. SLIC: a method for sequence- and ligation-independent cloning. Methods Mol Biol. 2012; 852:51-9. [0153] 31. Jon Ness and Jeremy S. Minshull. METHODS, COMPOSITIONS, AND KITS FOR ONE-STEP DNA CLONING USING DNA TOPOISOMERASE WO 2009/017673 A2 World Intellectual Property Organization International Bureau 5 Feb. 2009 [0154] 32. P Hsieh, C S Camerini-Otero, R D Camerini-Otero. The synapsis event in the homologous pairing of DNAs: RecA recognizes and pairs less than one helical repeat of DNA. Proceedings of the National Academy of Sciences July 1992, 89 (14) 6492-6496. [0155] 33. Stephen C. Kowalczykowski and Angela K. Eggleston. HOMOLOGOUS PAIRING AND DNA STRAND-EXCHANGE PROTEINS. Annual Review of Biochemistry 1994 63:1, 991-1043 [0156] 34. B J Rao, M Dutreix, C M Radding. Stable three-stranded DNA made by RecA protein. Proceedings of the National Academy of Sciences April 1991, 88 (8) 2984-2988 [0157] 35. Tracy R B, Kowalczykowski S C In vitro selection of preferred DNA pairing sequences by the Escherichia coli RecA protein. Dev. 1996 Aug. 1; 10(15):1890-903. [0158] 36. S C Kowalezykowski. Biochemistry of Genetic Recombination: Energetics and Mechanism of DNA Strand Exchange. Annual Review of Biophysics and Biophysical Chemistry 1991 20:1, 539-575 [0159] 37. Radding C M. J Biol Chem. 1991 Mar. 25; 266(9):5355-8. Helical interactions in homologous pairing and strand exchange driven by RecA protein. [0160] 38. Honigberg S M, Rao B J, Radding C M. Ability of RecA protein to promote a search for rare sequences in duplex DNA. Proc Natl Acad Sci USA. 1986 December; 83(24):9586-90. [0161] 39. Kirkpatrick D P, Radding C M. RecA protein promotes rapid RNA-DNA hybridization in heterogeneous RNA mixtures. Nucleic Acids Res. 1992; 20(16):4347-4353. doi:10.1093/nar/20.16.4347. [0162] 40. Assembly of RecA-like recombinases: Distinct roles for mediator proteins in mitosis and meiosis. Stephen L. Gasior, Heidi Olivares, Uy Ear, Danielle M. Hari, Ralph Weichselbaum, Douglas K. Bishop. Proceedings of the National Academy of Sciences July 2001, 98 (15) 8411-8418 [0163] 41. Shibata T, Osber L, Radding C M. Purification of recA protein from Escherichia coli. Methods Enzymol. 1983; 100:197-209. [0164] 42. Madiraju M V, Templin A, Clark A J. Properties of a mutant recA-encoded protein reveal a possible role for Escherichia coli recF-encoded protein in genetic recombination. Proc Natl Acad Sci USA. 1988; 85(18):6592-6596. doi:10.1073/pnas.85.18.6592 [0165] 43. Kawashima, H., Horii, T., Ogawa, T. et al. Functional domains of Escherichia coli recA protein deduced from the mutational sites in the gene. Molec Gen Genet 193, 288-292 (1984). [0166] 44. Yonesaki T, Minagawa T. T4 phage gene uvsX product catalyzes homologous DNA pairing. EMBO J. 1985 Dec. 1; 4(12):3321-7. [0167] 45. Lovett C M Jr, Roberts J W. Purification of a RecA protein analogue from Bacillus subtilis. J Biol Chem. 1985 Mar. 25; 260(6):3305-13. [0168] 46. Kmiec E, Holloman W K. Homologous pairing of DNA molecules promoted by a protein from Ustilago. Cell. 1982 June; 29(2):367-74. [0169] 47. Angov E, Camerini-Otero R D. The recA gene from the thermophile Thermus aquaticus YT-1: cloning, expression, and characterization. J Bacteriol. 1994; 176(5):1405-1412. doi:10.1128/jb.176.5.1405-1412.1994 [0170] 48. Kato, R. and Kuramitsu, S. (1999), Characterization of thermostable RecA protein and analysis of its interaction with single-stranded DNA. European Journal of Biochemistry, 259: 592-601. [0171] 49. Shinohara, A., Ogawa, H., Matsuda, Y. et al. Cloning of human, mouse and fission yeast recombination genes homologous to RAD51 and recA. Nat Genet 4, 239-243 [0172] 50. Potapov V, Ong J L (2017) Examining Sources of Error in PCR by Single-Molecule Sequencing. PLoS ONE 12(1): e0169774. [0173] 51. Chong H K, Wang T, Lu H-M, Seidler S, Lu H, Keiles S, et al. (2014) The Validation and Clinical Implementation of BRCAplus: A Comprehensive High-Risk Breast Cancer Diagnostic Assay. PLoS ONE 9(5): e97408. [0174] 52. Tate, J G. et. al. COSMIC: the Catalogue Of Somatic Mutations In Cancer, Nucleic Acids Research, Volume 47, Issue D1, 8 Jan. 2019, Pages D941-D947. [0175] 53. Hong, Y. et al. Antitumor Activity of BRAF Inhibitor Vemurafenib in Preclinical Models of BRAF-Mutant Colorectal Cancer. Cancer Res Feb. 1, 2012 (72) (3) 779-789; DOI: 10.1158/0008-5472.CAN-11-2941. [0176] 54. Khozin S, Blumenthal G M, Jiang X, et al. U.S. Food and Drug Administration approval summary: Erlotinib for the first-line treatment of metastatic non-small cell lung cancer with epidermal growth factor receptor exon 19 deletions or exon 21 (L858R) substitution mutations. Oncologist. 2014; 19(7):774-779. [0177] 55. Chan, T. A. et al. Development of tumor mutation burden as an immunotherapy biomarker: utility for the oncology clinic. Annals of Oncology, Volume 30, Issue 1, 44-56. [0178] 56. Westdorp, H., Fennemann, F. L., Weren, R. D. A. et al. Opportunities for immunotherapy in microsatellite instable colorectal cancer. Cancer Immunol Immunother 65, 1249-1259 (2016). [0179] 57. Bacher, Jeffery W., et al. Development of a fluorescent multiplex assay for detection of MSI-High tumors. Disease markers 20.4, 5 (2004): 237-250. [0180] 58. Treangen T J, Salzberg S L. Repetitive DNA and next-generation sequencing: computational challenges and solutions [published correction appears in Nat Rev Genet. 2012 February; 13(2):146]. Nat Rev Genet. 2011; 13(1):36-46. [0181] 59. Kohlmann, W., and S. B. Gruber. GeneReviews (R). (1993). [0182] 60. Havel, J. J., Chowell, D. & Chan, T. A. The evolving landscape of biomarkers for checkpoint inhibitor immunotherapy. Nat Rev Cancer 19, 133-150 (2019). [0183] 61. Abkevich, V., et al. Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. British journal of cancer 107.10 (2012): 1776-1782. [0184] 62. Kohn, Elise C., Jung-min Lee, and S. Percy Ivy. The HRD decisionwhich PARP inhibitor to use for whom and when. Clinical Cancer Research 23.23 (2017): 7155-7157. [0185] 63. Patel S J, Sanjana N E, Kishton R J, et al. Identification of essential genes for cancer immunotherapy. Nature. 2017; 548(7669):537-542. doi:10.1038/nature23477. [0186] 64. Giornelli G H. Management of relapsed ovarian cancer: a review. Springerplus. 2016; 5(1):1197. Published 2016 Jul. 28. doi:10.1186/s40064-016-2660-0. [0187] 65. Rahner N, Steinke V. Hereditary cancer syndromes. Dtsch Arztebl Int. 2008; 105(41):706-714. doi:10.3238/arzteb1.2008.0706. [0188] 66. Rosati, E., Dowds, C. M., Liaskou, E. et al. Overview of methodologies for T-cell receptor repertoire analysis. BMC Biotechnol 17, 61 (2017). [0189] 67. Chaudhary N, Wesemann D R. Analyzing Immunoglobulin Repertoires. Front Immunol. 2018; 9:462. Published 2018 Mar. 14. doi:10.3389/fimmu.2018.00462 [0190] 68. Levine, Bruce L., et al. Global manufacturing of CAR T cell therapy. Molecular Therapy-Methods & Clinical Development 4 (2017): 92-101. [0191] 69. Nachmanson D, Lian S, Schmidt E K, et al. Targeted genome fragmentation with CRISPR/Cas9 enables fast and efficient enrichment of small genomic regions and ultra-accurate sequencing with low DNA input (CRISPR-DS). Genome Res. 2018; 28(10):1589-1599. doi:10.1101/gr.235291.118. [0192] 70. Lee J., Lim H, Jang H, et al. CRISPR-Cap: multiplexed double-stranded DNA enrichment based on the CRISPR system. Nucleic Acids Res. 2019; 47(1):e1. doi:10.1093/nar/gky820 [0193] 71. Fu, Y., Wu, P., Beane, T. et al. Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers. BMC Genomics 19, 531 (2018). [0194] 72. Carethers, John M., and Stephanie S. Tseng-Rogenski. EMAST is a form of microsatellite instability that is initiated by inflammation and modulates colorectal cancer progression. Genes 6.2 (2015): 185-205. [0195] 73. Coupland P, Chandra T, Quail M, Reik W, Swerdlow H. Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation. Biotechniques. 2012; 53(6):365-372. doi:10.2144/000113962 [0196] 74. van Dijk, Erwin L., et al. The third revolution in sequencing technology. Trends in Genetics 34.9 (2018): 666-681. [0197] 75. Talerico M, Berget S M. Effect of 5 splice site mutations on splicing of the preceding intron. Mol Cell Biol. 1990; 10(12):6299-6305. doi:10.1128/mcb.10.12.6299. [0198] 76. Knight, Simon Robert, Adam Thorne, and Maria Letizia Lo Faro. Donor-specific cell-free DNA as a biomarker in solid organ transplantation. A systematic review. Transplantation 103.2 (2019): 273-283. [0199] 77. Mertens, Fredrik, et al. The emerging complexity of gene fusions in cancer. Nature Reviews Cancer 15.6 (2015): 371-381. [0200] 78. Maher, Christopher A., et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458.7234 (2009): 97-101. [0201] 79. Holmes, Allyson, et al. Mechanistic signatures of HPV insertions in cervical carcinomas. NPJ genomic medicine 1.1 (2016): 1-16. [0202] 80. Shembekar, Nachiket, et al. Droplet-based microfluidics in drug discovery, transcriptomics and high-throughput molecular genetics. Lab on a Chip 16.8 (2016): 1314-1331. [0203] 81. Allyse, Megan, et al. Non-invasive prenatal testing: a review of international implementation and challenges. International journal of women's health 7 (2015): 113. [0204] 82. https://en.wikipedia.org/wiki/Edit_distance [0205] 83. https://en.wikipedia.org/wiki/Levenshtein_distance
[0206] All of the molecules, compositions, articles, and methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure.