BUBBLE PRIMERS

20170349926 · 2017-12-07

Assignee

Inventors

Cpc classification

International classification

Abstract

A method for generating sequence ready fragments of nucleotide sequences is described, the method making use of “bubble primers” which include first and third portions which hybridise to a target, and a second partly self-complementary portion which forms an unhybridised loop. The loop contains generic sequences allowing use of sequencing primers. The first portion may be degradable so as to generate an amplicon of sequence of interest flanked by the third portion and the generic sequences of the second portion. In preferred embodiments, the second portion, or the region between the second portion and the third portion, also comprises a tetrad of nucleotides A, C, G, T, allowing calibration of the sequencing reaction.

Claims

1. A method for generating polynucleotide fragments from a starting template polynucleotide, the method comprising: a) amplifying a region of interest from the starting template using a first primer pair to form an amplicon incorporating the region of interest, b) amplifying the region of interest from the first amplicon generated in step a) using a nucleic acid amplification reaction with a second primer, to form an amplicon incorporating the second primer, wherein the second primer comprises a nucleic acid sequence having a first portion which is complementary to a first portion of the starting template, a second portion which is not complementary to the starting template, and a third portion which is complementary to a second portion of the starting template; wherein the first and second portions of the starting template are adjacent or in close proximity to one another; wherein the first, second, and third portions of the second primer are arranged in that order from 5′ to 3′, such that on hybridisation to the starting template the second portion of the primer remains unhybridised and forms a loop between the first and third portions; thereby generating an amplified product comprising a region of interest flanked by sequences of the second primer.

2. The method of claim 1, wherein the amplification reaction of step b) is carried out with a second primer pair, each of which is of the form of the second primer.

3. The method of claim 1 or claim 2 wherein the second portion of the second primer comprises a generic sequence.

4. The method of claim 3 wherein the generic sequence comprises a sequencing primer sequence.

5. The method of claim 3 or claim 4 wherein the generic sequence is adjacent the third portion of the second primer.

6. The method of claim 5 where the generic sequence is separated from the third portion by a defined sequence of bases.

7. The method of claim 6 where the generic sequence is separated from the third portion by a sequence comprising each of the four nucleotide bases A, T, G and C in any defined order.

8. The method of any preceding claim, wherein at least a part of the first portion of the or each second primer is susceptible to degradation to which at least the third portion and at least a part of the second portion of the primer are not susceptible; and the method further comprises the step of: c) degrading the susceptible part of the or each primer from the amplicon.

9. The method of any preceding claim, further comprising the step of: d) amplifying the product of b) and/or the product of c) with a third primer pair, each primer comprising a nucleic acid sequence substantially identical to at least a portion of the second portion of the or each second primer.

10. The method of claim 9, when dependent on any one of claims 3 to 7, wherein the product of b) and/or the product of c) is amplified, and at least a portion of the nucleic acid sequence of the third primer is substantially identical to the generic sequence of the second portion of the or each second primer.

11. The method of any preceding claim, wherein the template is a fragment of a genome.

12. The method of claim 11, wherein the template is a genomic locus.

13. The method of claim 2, wherein the second portion of each second primer in the pair is distinct.

14. The method of any preceding claim, wherein the first and second portions of the template are separated by 0-20 nucleotides, preferably 1-10, more preferably 1-6, and most preferably 1, 2, 3, 4, 5, or 6 nucleotides.

15. The method of any preceding claim, wherein the first portion of the second primer is up to 15, 20, 25, 30, 35, 50 nucleotides in length, preferably 20-35 nucleotides, more preferably 25 nucleotides.

16. The method of any preceding claim, wherein the second portion of the second primer comprises a self-complementary region, such that the loop formed upon hybridisation takes a stem-loop structure in which the self-complementary region forms the stem.

17. The method of claim 8, wherein the second portion of the second primer comprises a first degradable portion and a second resistant portion.

18. The method of any preceding claim, wherein the third portion of the second primer is no more than 2, 4, 5, 6, 7, 8, 9, or 10 nucleotides in length, preferably 4 to 6, most preferably 6.

19. The method of any preceding claim, wherein the second portion, or the second and third portions together, of the second primer is or are selected so as to include a tetrad of nucleotides comprising all four of the nucleotide bases (A, C, G, T).

20. The method of claim 9, wherein the third primer pair in step d) further comprises additional non-template sequences at the 5′ end.

21. The method of any preceding claim wherein the amplification of step b) is nested PCR.

22. The method of any of claims 3 to 21 wherein a sequencing primer is hybridised to the complement of the generic sequence of the second portion of the second primer.

23. The method of any preceding claim, further comprising the step of sequencing the generated amplified products.

24. The method of any preceding claim wherein the amplification of step a) and/or step b) is a multiplex amplification.

25. A primer for nucleic acid amplification, the primer comprising a nucleic acid sequence having a first portion which is complementary to a first portion of a target sequence for amplification, a second portion which is not complementary to the target sequence and comprises a generic sequence, and a third portion that is complementary to a second portion of the target sequence; wherein the first and second portions of the target sequence are adjacent or in close proximity to one another; wherein the first, second, and third portions of the primer are arranged in that order from 5′ to 3′, such that on hybridisation to a target sequence the second portion of the primer remains unhybridised and forms a loop between the first and third portions.

26. The primer of claim 25 wherein the complement of the generic sequence is hybridisable to a sequencing primer.

27. The primer of claim 25 or claim 26 wherein the generic sequence is adjacent the third portion.

28. The primer of any of claims 25 to 27, wherein the first portion of the primer is up to 15, 20, 25, 30, 35, 50 nucleotides in length, preferably 20-35 nucleotides, more preferably 25 nucleotides.

29. The primer of any of claims 25 to 28, wherein the second portion of the primer comprises a self-complementary region, such that the loop formed upon hybridisation takes a stem-loop structure in which the self-complementary region forms the stem.

30. The primer of any of claims 25 to 29, wherein the third portion of the primer is no more than 2, 4, 5, 6, 7, 8, 9, or 10 nucleotides in length, preferably 4 to 6, most preferably 6.

31. The primer of any of claims 25 to 30, wherein the second portion, or the second and third portions together, of the primer is or are selected so as to include a sequence of nucleotides comprising each of the four nucleotide bases (A, C, G, T).

32. A pair of primers in accordance with any of claims 25 to 31, wherein the second portion of each primer in the pair is distinct.

33. The primer pair of claim 32, in combination with a second primer pair, each member of the second primer pair comprising a nucleic acid sequence complementary to at least a portion of a respective member of the first primer pair.

34. A library of primer pairs comprising multiple primer pairs according to claim 33, each pair having first and second primers, comprising respective first and second second portions, wherein each first second portion is identical, and each second second portion is identical.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0053] FIG. 1 shows a schematic illustration of a primer for use in the methods described herein.

[0054] FIG. 2 illustrates the method for generating sequencing-ready polynucleotide fragments.

DETAILED DESCRIPTION OF THE INVENTION

[0055] The methods disclosed herein enable the generation of NGS (next generation sequencing) “sequence-ready” DNA fragments that are a targeted subset of the total DNA present in the original template DNA sample. Just those loci of interest are amplified by, for example, polymerase chain reaction, such that the amplicons produced have the template DNA of interest flanked by terminal ends of known sequence. These known sequences are identical or substantially identical on all the amplicons generated, and are deliberately and controllably asymmetric, with two distinct sequences applied to each of the two ends of the amplified fragments. The amplicons thus produced are functionally equivalent to adapter-ligated fragments produced in conventional NGS methods, but offer distinct advantages in terms of ease, time and cost of production, as well as quality of the sequencing data subsequently produced. The terminal ends of the amplicons are amenable to generic ‘one-size-f its-all’ biochemistry during subsequent NGS manipulations, such as clonal amplification and DNA sequencing.

[0056] Further, embodiments of the methods enable a relatively short 3′ end of a site-specific primer (the “third portion” in the summary of the invention) to hybridise in close proximity to a much larger, stably hybridised 5′ element (the ‘first portion’ in the summary of invention) of the same primer, with these two target-complementary regions separated by a non-template sequence (the second portion in the summary of the invention) that will become part of the daughter amplicon upon successful primer extension. The non-template sequence will incorporate sequences of use in next generation sequencing, such that sequencing reactions can begin from that point. This minimises the amount of known DNA sequence data that would inevitably be wastefully generated from a direct ‘adapter ligation’ strategy, avoiding sequencing through a substantial ‘region of no interest’ amplification-primer remnant.

[0057] In addition, embodiments of the methods enable the use of NGS for the targeted analysis of specific genetic loci from within a complex DNA template source. Efficient targeted-panel sequencing is possible (for example, from a specific genetic locus or loci), rather than the current massively parallel ‘whole genome shotgun sequencing’.

[0058] An illustration of a primer for use in the method is shown in FIG. 1. The primer 10 includes a first portion 12, a second portion 14, and a third portion 16. The first portion 12 is designed to be complementary to a part of the target genomic sequence to be amplified, while the third portion 16 is also designed to be complementary to an adjacent part of the target sequence. The first portion is around 25 nucleotides in length, with the third portion being around 6 nucleotides. There may be a gap of 0-4 nucleotides on the target between the sequences complementary to the first portion and those complementary to the third portion. This gap is to accommodate the non-complementary second portion 14 (the stem-loop structure) of the primer when first 12 and third 16 portions are hybridized to the target strand.

[0059] The second portion 14 is not complementary to the target, and includes a self-complementary region such that the sequence forms a stem-loop hairpin structure. The loop part and part of the stem of the second portion include sequences substantially identical to sequencing primers used in a chosen sequencing reaction. Note that the particular sequencing chemistry to be used is largely irrelevant; the method described herein is of general applicability, and is expected to be able to incorporate the relevant sequencing primer sequence into the amplicon. In certain embodiments the second portion may further comprise or be adjacent to a sequence comprising each of the four nucleotides A, C, G, T, in any order. Preferably the sequence is a tetrad (eg, ACGT), although the sequence may include multiple copies of each nucleotide, typically (but not necessarily) in equal numbers (eg, AAGGCCTT).

[0060] The primer 10 may include two types of nucleic acid. The first region, at the 5′ end of the primer, may be sensitive to degradation by a selected technique, while the second region, at the 3′ end of the primer, is insensitive to degradation by that technique. For example, the 5′ end of the primer may be formed from RNA, while the 3′ end is formed from DNA; the RNA portion may be degraded by RNAse H or alkaline pyrolysis, to which the DNA portion is resistant. Alternatively, the 5′ end of the primer may be formed from DNA incorporating uracil in place of thymine; this will be degradable by uracil-N-glycosylase. In preferred embodiments, the degradable portion is degradable by an enzyme.

[0061] In this example, the degradable portion includes all of the first portion 12 of the primer, and a first section of the second portion 14 (shown on the second portion in double dashed line). The remainder of the primer is non-degradable. The degradable section of the second portion includes that region forming one half of the stem of the stem-loop structure; the non-degradable portion (shown in single dashed line) forms the loop and the second half of the stem adjacent the third portion. The non-degradable portion comprises a sequence that is at least substantially identical to the sequence of the sequencing primer. The sequencing primers hybridise to the complement of this sequence, produced upon DNA polymerisation (typically clonal amplification) generating the other strand.

[0062] The primer 10 may be used in a pair, consisting of forward and reverse primers. The forward and reverse primers include distinct first and third portions (as these are selected to be complementary to the endpoints of the region of the template to be amplified), and distinct second portions (leading to distinct forward and reverse sequencing primers being used), as the aim is to allow for asymmetric integration of the second portions into the amplicon. Where multiple primer pairs are provided, however, the second portions of each pair may be identical, to allow for common sequencing primers to be used to sequence all amplicons.

[0063] The method of generating amplicons using the primers is shown in FIG. 2. This figure details the sequential steps performed in order to generate generic templates for a sequencing reaction in which a minimal amount of remnant primer sequences will be interrogated.

[0064] The method allows for the conversion of multiple separate template targets to products amenable to a generic sequencing workflow quickly, and with (ultimately) high sensitivity and specificity. The sequential amplification steps can be carried out discretely in separate amplification chambers, physically separating primer species, but one skilled in the art will appreciate that it may be possible to conduct these reactions in a smaller number of chambers (ideally just one) through selection of primer binding temperatures, careful control of primer concentrations (such that certain primer species are consumed to exhaustion) and by the application of a specific thermal cycling regime that temporally separates individual stages from participation in the overall process.

[0065] In the first step [FIG. 2a], a standard PCR reaction is undertaken using conventional oligonucleotide primers, enriching the template population with the target of the amplification. This reaction can beneficially be carried out in multiplex, with distinct primer pairs delivering a relatively low specificity multiplex amplification of a number of different targets, to ensure that rare species are efficiently amplified. Low specificity primers in this initial phase may also to accommodate a degree of non-complementary base pairing within the targeted primer binding sites, as may be encountered in and around target DNA from cancer associated genes, for example. This initial amplification phase 2a can sacrifice specificity for enhanced sensitivity; tolerable as any inappropriately amplified species, including primer dimer artefacts, will be eliminated from further amplification during the subsequent stages. This step generates a first amplicon flanked by the primer sequences. Note that these primers may themselves be degradable (eg, formed from RNA, or from DNA incorporating U in place of T). These primers may be designed such that they will produce a limited amount of amplicon before becoming inefficient through one, or a combination, of; [0066] high Tm, with later cycles carried out at lower annealing temperature; [0067] low initial concentration of this primer

[0068] In step b), the amplicon from step a) is then amplified using the “bubble primers” or loop primers as described above. In FIG. 2b, the novel Bubble Primer capitalises on the enriched pool of template generated during the first step 2a and efficiently propagates just those amplicons from 2a that were generated from the correct targets, rectifying that initial amplification may have been of relatively low specificity. This amplification therefore generates an amplicon pool that capitalises on the high sensitivity of the initial low specificity amplification (FIG. 2a), but as the 3′ end of the Bubble Primer will only be entertained by the correct amplicons, high specificity is re-established at this second stage (FIG. 2b). The only amplicons that contain the ‘bubble sequence’ of the Bubble Primer are generated from a reaction that is now (in combination) high sensitivity (2a) and high specificity (2b). Any other off target amplicons or artefacts that are generated will fail to be taken forward through the reaction scheme, as they will lack the necessary generic sequences defined within the non-template (artificial) bubble of the Bubble Primer.

[0069] The sequences of the bubble primers are selected such that the amplification is nested with respect to the amplification in step a); that is, the first portion of the bubble primers is substantially identical to the primers of step a), while the third portion is 3′-wards of the 3′ end of the primers of step a). This means that the third portion contains sequences not represented in the primers of step a), and allows a selective ‘nested’ PCR of only those amplicons that were correctly generated during the initial amplification, which may therefore accommodate a degree of reduced specificity. The sequence of the second and/or third portion is also ideally selected such that it contains a tetrad including each of the four nucleotides (A, C, G, T). The tetrad of nucleotides is preferably situated immediately adjacent to the 3′ end of the region at least substantially identical to the sequencing primer. The primers may also include “Index Codes” within the stem of the stem loop structure; for example, to identify and label products. As an example, an index code may be used to identify a specific product from a specific individual template. Alternatively, or in addition, the six bases of the third portion of the bubble primer, if sequenced, would normally be sufficient to identify the specific target that was being sequenced in a reasonable size multiplex.

[0070] Step c) shows the amplicon generated in step b). The amplification product has non-template sequences (that is, the sequences of the second portion of the bubble primer) represented in close proximity to the target DNA sequence. This product may have degradable sequence (eg, RNA) derived from the initial target-specific PCR binding sites, and the RNA-containing remnant of the non-template loop.

[0071] In step d), the product of step c) may be degraded (eg, by using RNAse H and/or RNAse A), to remove the degradable sequence if present from the amplicon. This degradation also removes any excess degradable primers which are not incorporated into the amplicon, functionally removing these from any further activity. The remaining amplicon therefore includes only the amplified target sequence incorporating the non-degradable, non-target sequence of the second portion and the third portion from the primers. Optionally at this stage, a generic PCR amplification may also be carried out with primers targeted to the non-target sequence of the bubble primers (referred to as a third primer pair in the “summary of invention” section above). These further primers may additionally carry a non-template artificial 5′ extension for use as a sequence capture tag, a region used for clonal amplification, or for post-preamp amplification of the product.

[0072] Whether or not the 5′ susceptible end of the amplicon generated is digested away, the next stage of the amplification scheme relies on the amplification of the target amplicons using a primer that is at least substantially identical to the non-template (artificial) sequence of the Bubble Primer. All amplicons that are generated within a multiplex reaction are amenable to amplification in a generic fashion using this primer, at least substantially identical to the artificial sequence provided within the non-template region of the Bubble Primer. This generic primer acts as an amplification primer, whereas a primer with identical or substantially identical sequence can be used as the ultimate ‘sequencing primer’ during the sequencing reaction, with the 3′ end of the sequencing primer placed (generically) close to the region of the target amplicons to be interrogated, separated only by the few target-specific bases (ideally a number of between 4 and 10 bases, with 6 bases, 7 bases or 8 bases being most desirable, depending on GC content of this template-defined region). The region between the 3′ end of the generic sequencing primer and the target-specific bases are designed or selected to include a tetrad of nucleotides A, T, G and C to act as a primer of the level of signal generated from each of these single base incorporation events. This nucleotide tetrad may be provided as polynucleotide representations of each of the nucleotide types (AA, TT, GG and CC or AAA, TTT, GGG, or CCC for example). The order of presentation of the bases within the tetrad primer is not important, and the number of representations of each base can be varied (e.g. AA, TTT, GG, CCC).

[0073] Step e) shows the final product. This includes the target sequence optionally flanked by a sequence available for capture/clonal amplification (introduced in the amplification in step d)); a region available for hybridisation of a generic sequencing primer (derived from within the second portion of the bubble primer) and a region (derived from within the third portion, or between the second and third portions of the bubble primer) harbouring A, T, G and C to act as a reference for the signal strength generated for each base incorporation during sequencing. The final product may then be recovered, and used in a sequencing reaction.

[0074] The generic amplification of the target sequences using a primer at least substantially identical to the non-template sequence of the Bubble Primer can benefit from the inclusion of generic 5′ tag tail extensions, which can be used to capture individual molecules of the multiplex amplicon pool and facilitate the clonal amplification of these individual molecules in (again) a generic fashion. One skilled in the art will recognise that the reliance on amplifications that are based on artificial sequences gives tremendous scope for the target-specific or general optimisation of these amplifications and that the overall scheme will produce a population of amplicons that are amenable to sequencing that is NGS technology agnostic.

[0075] The method described herein delivers a pool of ‘end modified’ fragments that have consistent (reliable asymmetric) adapter sequences attached to the ends, as opposed to the ˜50% randomly symmetrical products achieved by adapter ligation strategies: symmetrical products are not amenable to supporting clonal amplification for NGS sequencing and the invention therefore effectively eliminates the reduction in available template of utility in NGS.

[0076] The method enables the rapid generation of a pool of short fragments of DNA in which the interior of the fragments is the DNA sequence of interest, to be determined by NGS, and the ends of the fragments are substantially generic, allowing parallel processing during the generation of the clonal populations required for signal enhancement.

[0077] The method uses primer designs that, in one embodiment, employ the replacement of thymine bases with uracil bases, enabling functional removal of these sequences to the advantage of the efficient production of the desired products. In another embodiment, the invention uses primer designs that are a hybrid of RNA at the 5′ end of the primer, and DNA at the 3′ end of the primer, enabling digestion of the RNA component when hybridised to DNA, and the functional removal of this component.

[0078] The 3′ end of the bubble primers, the third portion, includes a limited number of template-specific bases, sufficient to entertain DNA polymerase attachment and extension, but limiting the number of bases that will be ‘wastefully’ represented and sequenced in the final product used for NGS reactions.

[0079] The methods and primers described herein have a number of advantages over the prior art. In some embodiments, the attachment of sequences of DNA to the ends of specific regions of DNA enables these different regions to be analysed in multiplex, with the same applied biochemistry effecting NGS sequencing in parallel-processing. The methods and primers provide generic regions on the end of targeted DNA regions, the generic regions being available to support capture and clonal amplification of a diversity of targeted regions on a diversity of solid and/or aqueous phases. Further, the methods and primers circumvent the need to use ligation of DNA adapters to the ends of fragments of DNA generated by DNA amplification, and provides template amenable for efficient sequencing.

[0080] The methods and primers are agnostic over the subsequent manipulations that generate pools of clonally amplified products (amenable to the generation of clonal populations both on a surface, on a bead or in solution). The technology is also agnostic of the technology that is subsequently used to generate the NGS data, and could be used (for example) with Illumina SBS technology, Ion Torrent or Roche 454 ‘one base at a time’ technologies, or other NGS technologies such as nanopore sequencing. In general, the methods described herein may be advantageous where it is desirable to introduce defined sequences onto the end or ends of specific amplified products.

[0081] The methods and primers are of principal utility in the analysis of a panel of DNA targets selected from a much larger available pool of DNA sequences.