Use of aptamers in proteomics

09758811 · 2017-09-12

Assignee

Inventors

Cpc classification

International classification

Abstract

The present invention is a method for measuring the amount of at least one molecule in a biological sample, the method comprising a) combining the sample, or a derivative thereof, with one or more aptamers and allowing one or more molecules in the sample to bind to the aptamer(s); b) separating bound from unbound molecules; and c) quantifying the molecule(s) bound to the or each aptamer, wherein quantification of the bound molecule(s) is carried out by sequencing at least part of the or each aptamer. Uses of and products derived from the method are also contemplated.

Claims

1. A method for measuring the amount of at least one molecule in a biological sample, the method comprising: a) combining the biological sample, or a derivative thereof, with a plurality of aptamers and allowing members of the plurality of aptamers to bind to the molecules in the sample; b) separating molecules that are bound by members of the plurality of aptamers in step a); and c) sequencing, using next generation polynucleotide sequencing techniques, at least part of each of the members of the plurality of aptamers that bound the molecules separated in step b) to quantify each of the bound aptamers, thereby measuring the amount of the at least one molecule.

2. The method of claim 1, wherein the at least one molecule comprises a protein.

3. The method of claim 1, wherein the identity of the at least one molecule is known.

4. The method of claim 1, wherein the identity of the at least one molecule is unknown.

5. The method of claim 4, wherein the method further comprises determining the identity of the at least one molecule.

6. The method of claim 1, wherein the sequence of the members of the plurality of aptamers is known.

7. The method of claim 1, wherein the sequence of each unique member of the plurality of aptamers carries a unique tag.

8. The method of claim 7, wherein the tag comprises the sequence of each member of the plurality of aptamers.

9. The method of claim 7, wherein the tag comprises part of the sequence of each member of the plurality of aptamers.

10. The method of claim 1, wherein the next generation polynucleotide sequencing is carried out on a single molecule array or a clonal single molecule array.

11. The method of claim 1, wherein the method further comprises arraying the members of the plurality of aptamers that bound the molecules separated in step b) onto a surface.

12. The method of claim 11, wherein the method further comprises amplifying the arrayed members of the plurality of aptamers.

13. The method of claim 1, wherein the plurality of aptamers comprises different member sequences that bind to the same target molecule.

14. The method of claim 1, wherein the plurality of aptamers comprises different panels of aptamer sequences that each bind to a different target molecule.

15. The method of claim 1, wherein the members of the plurality of aptamers comprises DNA, RNA, or both.

16. The method of claim 1, wherein the members of the plurality of aptamers are between about 30 and about 60 bases long.

17. The method of claim 1, wherein the members of the plurality of aptamers are about 40 bases long.

18. The method of claim 1, wherein the biological sample comprises a bodily fluid.

19. The method of claim 18, wherein the bodily fluid comprises blood or is derived from blood.

20. The method of claim 18, wherein the bodily fluid comprises serum or plasma.

21. The method of claim 1, wherein the method further comprises: d) combining a second biological sample with each of the members of the plurality of aptamers that bound the molecules separated in step b); e) separating molecules in the second biological sample that are bound by members of the plurality of aptamers in step d); f) sequencing, using next generation polynucleotide sequencing techniques, at least part of each of the members of the plurality of aptamers that bound the molecules separated in step e), thereby measuring the amount of the at least one molecule in the second biological sample; and g) comparing the amounts of the at least one molecule obtained in c) with those obtained in f).

22. The method of claim 1, wherein the method further comprises comparing the quantity of the members of the plurality of aptamers sequenced in step c) against a control or baseline quantity.

23. The method of claim 21, wherein the second biological sample is obtained from an individual known to be in a diseased state.

24. The method of claim 21, wherein the second biological sample is obtained from an individual known to be in a healthy state.

25. The method of claim 21, wherein the second biological sample is obtained from an individual after drug treatment.

26. The method of claim 22, further comprising diagnosing a disease or predicament based on the comparison.

Description

(1) The invention will now be described by way of non-limiting examples, in which:

(2) FIG. 1 illustrates the method of the present invention, in which a library of aptamer sequences is mixed with serum proteome. The bound aptamer/protein fraction is eluted before being sequenced and counted on a second generation sequencer. The output from the sequencer is the number of each sequence present in the bound fraction.

(3) FIG. 2 illustrates use of the method for biomarker discovery, in which proteomes from different patients are compared with a baseline or control.

(4) FIG. 3 illustrates an alternative embodiment of the present invention, in which a library of aptamers is mixed with a first serum proteome sample. After quantification of the bound fraction, the population of aptamers binding to proteins of interest within the serum sample is amplified and mixed with a second serum proteome sample. A comparison of the first and second respective quantification outputs is then made.

(5) FIG. 4 is a chromatogram showing the different constituents of the mixture obtained after a 30′ incubation of equal volumes of 150 nM IgE and 100 nM ProNuc1FR.

(6) FIG. 5 shows CE-chromatograms of 0/20/40 . . . 2000 nM IgE incubated with a constant (100 nM) ProNuc1F concentration (from bottom to top)

(7) FIG. 6 is a graph showing the fraction of bound ProNuc1F plotted against changing protein concentration.

(8) FIG. 7 is a schematic representation of the adapted sample preparation protocol for next generation sequencing of aptamers.

(9) FIG. 8 is a plot showing absolute counts of IgE aptamer sequences retrieved using two different analysis methods.

(10) FIG. 9 is a plot showing the fraction of IgE aptamer sequences present in a mixture of sequences counted by Next Generation sequencing plotted against the number of sequences spiked in a mixture of irrelevant (PhiX) sequences.

EXAMPLE 1

(11) This example is illustrated in FIG. 1 which shows the quantification of proteins in a serum sample using a library of aptamer tags. A diverse library of known aptamers is screened against a protein mixture derived from serum. The protein mixture is immobilised on a solid support and stringent washing is performed to remove unbound and weakly binding aptamers.

(12) The remaining bound aptamers are then removed from their protein hosts and sequenced and counted on a next generation sequencer. After being arrayed, the bound aptamer sequences may be amplified by standard methods, such as PCR, to increase the clarity of the signal over background noise on the sequencer.

(13) The output from the sequencer will be an absolute quantification of proteins present in the biological sample, as represented by the library of aptamers.

EXAMPLE 2

(14) For use in biomarker discovery and as shown in FIG. 2, the library of aptamers is screened against samples derived from different individuals. A suitable number of known control, or healthy samples will also be required to establish a baseline or healthy condition. A number of samples known to represent a diseased state will also be screened. A comparison of the two populations, healthy versus diseased, will allow the identification of aptamers that show significant alternations between the two populations. Once the sequence of the or each aptamer has been elucidated, the protein to which the aptamers bind may then be identified. Identification of the protein may be by next generation sequencing techniques, or proteomic-based methods, such as chromatography and MS.

EXAMPLE 3

(15) This Example describes an alternative method of the present invention and is illustrated in FIG. 3.

(16) An aptamer library is screened against a first biological sample. As with either Example 1 or Example 2, the non-bound aptamers are removed and the remaining bound aptamers are released from their host proteins and sequenced and counted on a next generation sequencer.

(17) The population of sequences that bind to proteins in the first sample is then screened against a second biological sample. As before, the non-bound aptamers are discarded and the remaining aptamers are sequenced and counted. If the two samples are derived from healthy subjects, any variation between the two sample may be attributed to variance within the normal population. Similarly, any variance between the output from two samples derived from patients known to have a particular disease may also be attributed to variance. However, any significant variance between healthy and diseased subject may be attributed to a change in the protein population as a result of the disease of interest. The aptamer sequences in which significant changes are found may then be translated to identify the protein(s) to which the aptamer(s) bind.

EXAMPLE 4

(18) In this example, the aptamer library comprises at least one pool of aptamers specific for one protein. Equally, the panel may contain a number of pools of aptamers, where each pool is specific for a different protein.

(19) The protein mixture may simply be a biological sample, such as blood or serum. Alternatively, the protein mixture may be obtained by enriching the biological sample for the protein fraction. In this example, care must be taken to minimise disruption to and denaturing of the protein population.

(20) The library of known, selected aptamer sequences is screened against the protein mixture. In one example, the protein mixture is bound to a column and the panel of aptamers is allowed to flow through the column. Aptamers that do not bind are removed. Aptamers that do bind are removed from their protein hosts and sequenced and counted on a next generation sequencer.

(21) If the protein of interest is indeed a biomarker for a diseased state, aptamers will be found in the bound fraction from the diseased sample and not in a sample derived from a healthy individual. Alternatively, the role of a protein as a biomarker may be manifest by up- or down-regulation of the protein in a diseased individual when compared to a healthy subject.

EXAMPLE 5

(22) In this example, the library of aptamers contains one or more sequences known to be specific for one or more proteins. The identity of the or each protein of interest is also known. The library may comprise a single pool specific to a single protein, or it may comprise many pools specific to an array of proteins.

(23) The library of aptamers is screened against a biological sample derived from an individual, such as blood. The aptamer library may be held on a support and the biological sample passed over the aptamers. Any unbound proteins are removed and the aptamer:protein complexes retained. The retained aptamers are released from their protein hosts and sequenced and counted on a next generation sequencer.

(24) The presence and quantity, or absence, of aptamers on the next generation sequencer may be used to diagnose a particular disease or predicament.

EXAMPLE 6

(25) A novel oligonucleotide aptamer was designed so that the aptamer was compatible with the Next-Generation DNA sequencing technology platform that was used.

(26) The aptamer had an oligonucleotide sequence having three distinct regions: a) a functional aptamer region, b) an adapter region, and c) a label.

(27) The aptamer region in this study was based on a well-studied aptamer (Wiegand et al Journal of Immunology (1996) 157 221-230) having a high affinity for human Immunoglobulin-E.

(28) The adapter region made the construct compatible with the sample preparation procedures of an Illumina GA1 Next Generation sequencer and was designed according to the manufacturer's protocols.

(29) The label was fluorescent 6-carboxyfluorescein (FAM), the inclusion of which was to allow visualization of the construct.

(30) The constructs were obtained by standard solid phase DNA synthesis and purified by polyacrylamide gel electrophoresis to ensure low error rates.

(31) The construct with the FAM label will be referred to hereinafter as ProNuc1F and without the label as ProNuc1.

(32) The constructs were used to demonstrate that: i) such aptamer-derived single strand oligonucleotide sequences are compatible with sample preparation procedures used for Next Generation DNA sequence applications, ii) the sequences can be sequenced using the Illumina GA1 sequencing technology, i.e. the identity of the ProNuc1 can be retrieved (“read”); and iii) the count of whole ProNuc1 sequences, following a standard Illumina GA1 sequencing experiment, directly correlates with the number of prepared ProNuc1 sequences spiked into a sample, thereby confirming that Next Generation sequencing can be applied to proteomics in accordance with the present invention.

EXAMPLE 7

(33) Capillary electrophoresis (CE) was used to confirm that the oligonucleotide sequence exhibits a distinct protein affinity.

(34) Preparation of Reagents and Solutions

(35) A stock 100 nM stock solution of ProNucF1 was prepared from dry aptamer source by diluting the DNA material in TGK buffer (tris(hydroxyamino)methane-glycine-potassium) buffer, pH 8.4. Following dilution, the aptamer stock solution was incubated at 75° C. for 10 minutes and subsequently cooled and stored in ice to ensure the absence of multimers. The heating procedure also ensures formation of the secondary structure of the aptamer.

(36) Starting from a concentrated, 5500 nM stock solution of protein a dilution series of IgE was prepared in TGK buffer (Table 1):

(37) TABLE-US-00001 TABLE 1 IgE dilution series Solution Concentration (nM) 1 20 2 40 3 60 4 100 5 150 6 200 7 300 8 1000 9 2000

(38) The samples for CE demonstrating aptamer-protein complex formation were prepared by incubating 5 μl of the ProNucF1 stock solution with 5 μl of protein solution. A total of six replicates were prepared in this way for each protein concentration level. All samples were incubated at room temperature for a minimum of 30 minutes and no longer than 40 minutes.

(39) Apparatus

(40) Separations were achieved using a Beckman P/ACE 2200 CE-LIF system (Beckman-Coulter, Fullerton, Calif., USA). Separations were performed using fused-silica capillaries (50 μm ID, 360 μm OD) with 40.2 cm total length, detector at 30 cm. Capillaries were pretreated by pumping 1 M NaOH, deionized H.sub.2O, and buffer through the capillary for 10 min each. Between each electrophoretic separation the capillary was rinsed with base, water, and buffer again to remove any residual sample from the capillary walls. Samples were injected using pressure for 4 s at 1 psi. The LIF detector employed the 488 nm line of a 3 mW Ar-ion laser (Beckman-Coulter) for excitation and emission was collected through a 520±10 nm filter. Data were recorded and analyzed with P/ACE software (Beckman-Coulter).

(41) Protein-Aptamer Binding

(42) Samples at each protein concentration level were run on the column in normal mode using the conditions described in (Table 2), firstly to ensure that a protein-aptamer complex was indeed formed after incubation of each sample.

(43) TABLE-US-00002 TABLE 2 Normal mode separation conditions Action Duration (s) Rinse 20 psi-0.1M NaOH 120 Rinse 20 psi-Buffer 120 Inject 1 psi 4-Sample 4 Inject 0.1 psi-Water 1 Separate 20 kV 760 Rinse 10 psi-Buffer 60

(44) Results

(45) As an example, a chromatogram for normal mode injection for a sample containing 150 nM of IgE and 100 nM of ProNuc1F is shown in FIG. 4.

(46) In order to apply the aptamer-protein complex formation for quantification purposes, the bound-aptamer fraction has to reflect the protein dilution series. To prove, this a constant amount of ProNuc1F was incubated with different protein concentrations (see above).

(47) From FIG. 5 it is clear that, with increasing protein concentration (bottom trace “0”; upper trace 5000 nM), the peak area of the aptamer-protein complex increases. Simultaneously, the signal of the free ProNuc1F decreases to the point where no free aptamer is detected (upper trace). This shows that adding an adapter sequence specific for the Next Generation Sequencing procedure to a known aptamer sequence against IgE does not impair the aptamer capabilities of ProNuc1F-ProNuc1F retains its high and selective affinity towards IgE.

(48) FIG. 6 shows the information in FIG. 5 translated into a graph where the “Bound ProNuc1F fraction” is plotted against the protein concentration. From this graph it can be seen that the ProNuc1F aptamer can be used to “read out” quantitative protein information. From this (and other) graphs it is also possible to deduce equilibrium dissociation constants Kd, which give insight in the affinity of ProNuc1F for IgE, i.e. between 76 and 92 nM.

EXAMPLE 8

(49) Experiments were carried out to demonstrated that the binding characteristics illustrated in the previous Example are applicable in a quantitative context, i.e. differences in protein concentration correlate with the fraction of bound aptamer (aptamer-protein complex). Selection for these aptamer-protein complexes was achieved by means of nonequilibrium capillary electrophoresis of equilibrium mixtures (NECEEM) as developed by Krylov et al (Journal of the American Chemical Society (2002) 124 13674-13675; Analytical Chemistry (2003) 75 1382-1386).

(50) Method

(51) An adapted sample preparation protocol was developed to allow sequencing of the functional IgE aptamer containing the added sequencing sequence (ProNuc1).

(52) FIG. 7 outlines the different steps in the sample preparation protocol. After the annealing of primer 1 on Sequencing primer site 1, a double stranded molecule was formed by extension with Taq polymerase. An adaptor was ligated to the double stranded molecule generating the full molecule.

(53) The IgE aptamer sequence used in this study was a single stranded oligonucleotide with Sequencing Primer Site 1 at its 3′ end. In order to make the complementary strand, a tailed primer was annealed with primer 1 and extension was performed using Taq polymerase.

(54) The recipe for this reaction was:

(55) TABLE-US-00003 10 μM Pronuc1 2 μl 10 μM Primer 1 5 μl 2.5 mM dNTPs 4 μl 10x Taq polymerase buffer 5 μl 50 mM MgCl.sub.2 2 μl Taq polymerase 0.5 μl   Water 31.5 μl  

(56) The reaction was mixed and heated to 94° C. for 2 minutes, followed by controlled cooling to 60° C. and incubation at 72° C. for 10 minutes. Reaction products were cleaned using a Qiagen MinElute column, following the manufacturer's instructions, and eluting in 10 μl EB buffer.

(57) 20 μl each of a 100 μM solution of ProAd1 and ProAd2 adapters were annealed together in a total volume of 50 μl 10 mM Tris-HCl pH 8.3 by heating to 94° C. for 2 minutes and controlled cooling to room temperature. ProAd1 and ProAd2 are synthesized oligonucleotides that are derived from the Illumina GA1 kit.

(58) The mixture used was:

(59) TABLE-US-00004 Ds aptamer 10 μl 2x DNA Illumina ligase buffer 25 μl Annealed adapter oligo mix 10 μl Illumina DNA ligase  5 μl

(60) Reactions were mixed and incubated for 15 minutes at room temperature before gel electrophoresis in 2% agarose. The appropriately-sized band was excised from the gel and the DNA was extracted and amplified following the standard Illumina enrichment PCR protocol.

(61) PCR amplicons were quantified using an Agilent Bioanalyzer 2100 and sequencing reactions were prepared by adding these amplified aptamers to a constant quantity of PhiX, with the intention of generating the Illumina GA1 cluster numbers/tile given in Table 3, to produce a 2× dilution series. Lane 4 is PhiX with no added aptamer and serves as a control lane.

(62) For the avoidance of doubt, PhiX refers to the circular genome of the double stranded DNA PhiX174 bacteriophage which consists of 5386 nucleotides. In this study, the genome was processed according to the Illumina GA1 sequencing protocol to serve as normalization for the sequence load in the different lanes.

(63) A single end flowcell was prepared and sequenced on an Illumina GA1 in accordance with the manufacturer's instructions.

(64) TABLE-US-00005 TABLE 3 Actual ProNuc1 and PhiX sequence numbers used per tile for the eight different lanes on the Illumina GA1 flowcell. A lane consists of around 300 tiles. Lane ProNuc1/tile ProNuc1/lane PhiX 1 5000 1500000 20,000 2 2500 750000 20,000 3 1250 375000 20,000 4 0 0 20,000 5 625 187500 20,000 6 312 93600 20,000 7 156 46800 20,000 8 78 23400 20,000

(65) Results:

(66) Sequencing data was processed according to the manufacturer's instructions. For data analysis, different strategies were applied to evaluate the sequence read outs obtained in the different lanes: a) a search for exact matches to the aptamer sequence (exact), b) a search which allows for up to 3 amplification/sequencing errors (Agrep). It is clear that other processing methods can also be applied. FIG. 8 plots the absolute numbers of IgE aptamers counted in the different lanes by looking for exact matches/matches with up to three errors retrieved in the different lanes, i.e. an exact aptamer sequence count and an error-tolerant aptamer sequence count (Agrep). The values at the X-axis correspond to the lane readouts of lanes 8 down to 1 (control lane 4 is excluded; see Table 3).

(67) In contrast, FIG. 9 plots the fraction of IgE aptamers counted by looking for exact matches in the total sequences obtained against the number of aptamers spiked in the PhiX sequence mixture. In both cases a strong linear correlation is shown confirming the use of Next Generation sequencing to quantify aptamers.

(68) The above data shows that, when following a standard Illumina GA1 sequencing protocol, the count of whole ProNuc1 sequences correlates directly with the number of prepared ProNuc1 sequences spiked into the sample, thereby confirming that Next Generation sequencing can be used for quantification in accordance with the present invention.

(69) Furthermore, the data show that the prepared ProNuc1 sequences can be sequenced using Next Generation sequencing technology. In other words the identity of the ProNuc1 can be retrieved (“read”).