METHOD AND SYSTEMS FOR IDENTIFYING A SEQUENCE OF MONOMER UNITS OF A BIOLOGICAL OR SYNTHETIC HETEROPOLYMER

20240077491 · 2024-03-07

Inventors

Cpc classification

International classification

Abstract

The present invention relates to a method for the identification a sequence of monomer building blocks of a biological or synthetic heteropolymer. The invention also relates to the use of a nanopore for identifying a sequence of monomer building blocks of a biological or synthetic heteropolymer. The invention further relates to a computer-implemented method, computer program code, and data processing system for identifying a sequence of monomer building blocks of a biological or synthetic heteropolymer.

Claims

1. A method for identifying a sequence of monomer building blocks of a biological or synthetic heteropolymer, comprising the steps: a) perform a fragmentation method in which the heteropolymer is broken down into fragments, thereby obtaining a fragment mixture whose fragments are molecules having different sequence segments of the heteropolymer; b) perform a current measurement method in which current signals of a current through the channel of a nanopore are detected, wherein each current signal is based on the interaction of a fragment of the fragment mixture with the channel of the nanopore, wherein the current signals are characteristic of the different fragments such that a representative set of characteristic current signals representing the fragment mixture is determinable; and c) perform an evaluation method in which a sequence of monomer building blocks of the heteropolymer is determined from the representative set of characteristic current signals.

2. The method according to claim 1, wherein the fragments of the fragment mixture are obtained by enzymatic, chemical and/or physical methods and/or are obtained by successive degradation of the heteropolymer.

3. The method according to claim 2, wherein the successive degradation of the heteropolymer provides that the heteropolymer is chain-like and, starting from one end of its chain, is stepwise shortened by one monomer building block to obtain length fragments, in particular substantially all length fragments n-(n-1), n-(n-2) . . . to n(nn), of a heteropolymer consisting of n monomer building blocks.

4. The method according to claim 1, wherein the heteropolymer is a peptide and the fragmentation method is or includes Edman degradation.

5. A method according to claim 1, for determining the primary structure of a macromolecule formed at least from heteropolymers, in particular a protein, comprising the steps of: i) cleavage of the macromolecule, in particular by enzymatic and/or chemical and/or physical cleavage, to obtain heteropolymers, in particular peptides, as cleavage products of the macromolecule; optionally: obtaining the heteropolymers by chromatographic or electrophoretic separation of a heteropolymer mixture obtained by the cleavage; ii) use of the method according to claim 1 for determining a sequence of monomer building blocks, in particular amino acids, of at least one, in particular each, of the heteropolymers; iii) perform a macromolecule recognition method in which the primary structure of the macromolecule is determined from a sequence listing of the at least one heteropolymer.

6. The method according to claim 5, wherein the macromolecule is DNA, RNA, protein, peptide, or any synthetic polymer, and wherein, in particular, the nanopore is a biological nanopore or a toxin or pore-forming toxin.

7. The method according to claim 1, wherein the nanopore is a solid-state nanopore or a hybrid of solid-state and biological components.

8. The method according to claim 1, wherein the fragmentation of the heteropolymer is carried out by enzymes.

9. The method according to claim 1, wherein the fragmentation of the heteropolymer is carried out chemically and non-enzymatically.

10. The method according to claim 1, wherein the fragmentation of the heteropolymer is carried out physically, e.g. by exposure to heat, cold, sound waves, electromagnetic radiation, in particular infrared, ultraviolet or X-ray radiation, microwaves or visible light.

11. The method according to claim 1, wherein the nanopore is aerolysin, alpha-hemolysin, VDAC, or other protein of the beta-barrel protein family.

12. Use of a nanopore for performing the method for identifying a sequence of monomer building blocks of a biological or synthetic heteropolymer according to claim 1.

13. A computer-implemented method for determining a sequence of monomer building blocks of a heteropolymer, referred to as a heteropolymer sequence, from measurement data of a current measurement method containing information on current signals obtained upon interaction of different fragments formed from the heteropolymer with the channel of a nanopore, comprising the steps of: A) determine residual current values from the measurement data, wherein a residual current describes the interaction of one of the different fragments of the heteropolymer with the channel of a nanopore; B) statistically determine of a representative set of characteristic residual current values from the residual current values, a characteristic residual current value describing in each case one fragment type, in particular fragment size, of the number n of fragment types of a fragment mixture formed from the heteropolymer, the representative set uniquely describing the heteropolymer sequence; C) sort the characteristic residual current values by their magnitude into a residual current value sequence and determining the current value differences of successive current values of the residual current value sequence; and D) assign the current value differences to monomer building block types of the heteropolymer based on previously known correlation data containing information about which monomer building block type is represented by which current value amount to make the determination of the sequence of monomer building block types.

14. A computer program code which is stored on a data carrier and which determines a sequence of monomer building blocks of a heteropolymer, referred to as heteropolymer sequence, from the measurement data of a current measurement method when executed by the central processor of a computer, the measurement data containing information on current signals which are determined upon the interaction of different fragments formed from the heteropolymer with a nanopore, comprising the respective steps implemented by program code: A) determine residual current values from the measurement data, wherein a residual current describes the interaction of one of the different fragments of the heteropolymer with a nanopore; B) statistically determine of a representative set of characteristic residual current values from the residual current values, a characteristic residual current value describing in each case one fragment type, in particular fragment size, of the number n of fragment types of a fragment mixture formed from the heteropolymer, the representative set describing the heteropolymer sequence unambiguously, but in any case sufficiently for a desired structure elucidation or structure prediction; C) sort the characteristic residual current values by their magnitude into a residual current value sequence and determining the current value differences of successive current values of the residual current value sequence; and D) assign the current value differences to monomer building block types of the heteropolymer based on previously known correlation data containing information about which monomer building block type is represented by which current value amount to make the determination of the sequence of monomer building block types.

15. A data processing system for determining a sequence of monomer building blocks of a heteropolymer, referred to as heteropolymer sequence, from the measurement data of a current measurement method containing information on current signals determined upon interaction of different fragments formed from the heteropolymer with a nanopore, comprising a computer with a central processor, and a program code, in particular the program code according to claim 14, wherein the computer is programmed to perform the following computer-implemented steps: A) determine residual current values from the measured data, wherein a residual current describes the interaction of one of the different fragments of the heteropolymer with a nanopore; B) statistically determine of a representative set of characteristic residual current values from the residual current values, a characteristic residual current value describing in each case one fragment type, in particular fragment size, of the number n of fragment types of a fragment mixture formed from the heteropolymer, the representative set describing the heteropolymer sequence unambiguously, but in any case sufficiently for a desired structure elucidation or structure prediction; C) sort the characteristic residual current values according to their contribution to a residual current value sequence and determine the current value differences of successive current values of the residual current value sequence; and D) assign the current value differences to monomer building block types of the heteropolymer based on pre-known correlation data containing information about which monomer building block type is represented by which current value amount to perform the determination of the sequence of monomer building block types.

Description

[0089] Further preferred embodiments of the objects according to the invention result from the following description of the embodiment examples in connection with the figures. Identical reference signs designate essentially identical components or method steps.

[0090] FIG. 1 shows a sketch of the principle of single molecule detection by nanopores shown, which can be used in the method 100 according to the invention.

[0091] FIG. 2 shows the two possible regimes of a polymer-nanopore interaction.

[0092] FIG. 3 shows the detection of the twenty proteinogenic amino acids (aa) using the aerolysin nanopore, in particular according to the prior art.

[0093] FIG. 4 shows measurement proofs for an exemplary process designed according to the invention.

[0094] FIGS. 5a, 5b and 5c each show embodiments of the process according to the invention and of its components.

[0095] FIG. 6a shows, with reference to an embodiment of the invention: sequences of the six heterodeca peptides that constitute the ladder start peptide.

[0096] FIG. 6b shows, with reference to an embodiment of the invention: a schematic diagram of the experimental setup.

[0097] FIG. 6c shows, with reference to an embodiment of the invention: a control trace in 4 M KCl.

[0098] FIG. 6d shows, with reference to an embodiment of the invention: an exemplary measurement curve after addition of the peptide ladder L1 with all peptides in equimolar concentration.

[0099] FIG. 6e shows, referring to an embodiment of the invention: a schematic level histogram averaged over the main level for a peptide ladder sequencing experiment.

[0100] FIG. 7 shows, with reference to an embodiment of the invention: residence time scatter plots over the residual pore current I/Io (red) with superimposed level histograms averaged over the main level (black) for all six peptide conductors.

[0101] FIG. 8 shows, with reference to an embodiment of the invention: Data correlation plots for all six peptide ladders.

[0102] FIG. 9a shows, with respect to an embodiment of the invention: reproducibility of I/Io of homo-arginine peptides R3, R4, R5, R7 (blue) compared to R3-R7 of Piguet et al. 2018 (red), and ladders L1 (green, solid line, circle), L3 (green, dashed, pointing triangle), L4 (green, dotted, pointing triangle), L2 (pink, solid line, circle), L5 (pink, dashed, pointing triangle), L6 (pink, dotted, pointing triangle).

[0103] FIG. 9b shows, with reference to an embodiment of the invention: I/Io boxplot for each cleaved amino acid type with median (blue) and mean (white).

[0104] FIG. 9c shows, with reference to an embodiment of the invention: I/Io values for arginine cleavage classified by nearest neighbor aa of arginine as C-terminal aa (alanine blue, arginine red, serine green, tyrosine yellow) of homo- (dots) and hetero-peptides (circles); data for homo-peptides were taken from Piguet et al. 2018.

[0105] FIG. 9d shows, with respect to an embodiment of the invention: residence time scatter plots versus residual pore current I/Io with superimposed main level-averaged level histograms for the deca-peptides of conductor1 (red), conductor2 (blue), conductor3 (green), conductor4 (yellow), conductor5 (pink), conductor6 (black).

[0106] FIG. 10 shows, with reference to an embodiment of the invention: residence time scatter plots versus residual pore current I/Io (red) with superimposed level-averaged histograms (black) sample A (left) and B (right). Below each graph are the, using the first reader, proposed sequences (prop) and the correct sequences (corr). The green box indicates the correct reading frame.

[0107] FIG. 11 shows in relation to an embodiment example of the invention: Data table for double-blind study.

[0108] FIG. 1a shows an illustration of the principle of single-molecule sensing through nanopores that can be used to implement the invention. A constant voltage U across an insulator draws ionic current through the nanopore. A single analyte particle, e.g., a fragment, in the nanopore partially blocks the current (resistive pulse or current signal, or residual current value). Both the depth of the blockage and the duration carry information about the analyte.

[0109] FIG. 2 shows the two possible regimes of polymer-nanopore interaction. The threading/translocation regime is favored when long polyelectrolyte chains interact with the pore in low to moderate salt concentration (0.1 to 1.0 M KCl). The binding-trapping, or collapsed, regime typically occurs under conditions of high salt concentration (e.g., 4 M KCl) and does not require charging of the analyte. Preferably, the collapsed regime is used in the invention. In a measurement arrangement 1 for nanopore size spectroscopy, which can also be used in the method according to the invention, an electrolyte-filled first compartment 11 is electrically isolated from an electrolyte-filled second compartment 12 by a membrane formed, in particular, by means of a lipid bilayer 2; current flow is possible essentially only through the nanopore 3 incorporated in the lipid bilayer, which electrically connects the compartments 11 and 12. The lipid bilayer can be stretched over the microaperture or over a microcavity of a microstructure device (not shown in FIG. 2), as described, for example, in document WO 2013/083270. In the threading/translocation regime, the analyte 4a is elongated, and in the collapsed or binding regime, the analyte 4b is collapsed and compact.

[0110] FIG. 3 shows the detection of the twenty proteinogenic amino acids (aa) using the aerolysin nanopore.

[0111] A: 1: Peptide design 2: Peptide-pore interaction. 3: Current trace in the presence of a mixture of 7R+D,K,R,E,H.

[0112] B: plot of relative current vs. aa volumes. C: >95% discrimination between structural isomers 7R+L and 7R+I by high-resolution recording on MECA (according to Ouldali et al. 2020).

[0113] Based on the prior art in Ouldali et al. 2020, the question for the inventors was how to use the high sensitivity of the nanopore to peptide size or volume for actual sequence identification in heteropolymers or for protein identification and sequencing.

[0114] To solve this problem, the inventors explored an approach, also called nanopore ladder sequencing, in which peptides (or other heteropolymers), which can be initially generated preferably by enzymatic or chemical or physical cleavage of proteins, are separated, preferably by known chromatographic or electrophoretic methods, or in which peptides or other heteropolymers are already present in isolation, and, preferably in a second step, are subjected either to the action of exopeptidases that cleave individual N- or C-terminal amino acids from a peptide, or to chemical methods such as the Edman reaction, in order to obtain a mixture of peptides or heteropolymers, i.e., a mixture of fragments, in which several species or characteristic fragment types are present in a representative set, preferably representing all or most of the possible fragments formed by the removal of amino acids (or monomer building blocks) in sequence, such that for a peptide (or heteropolymer) of degree of polymerization (d. p.) n, all or most species of d.p. n(n1), n(n2) . . . bis n(nn) are present. Each of these species, when interacting with the nanopore, will give a characteristic maximum in the histogram of relative residual currents (characteristic residual current value or amount).

[0115] The measurement evidence demonstrates the ability of the invention here, for example, to correlate short, known peptide sequences with nanopore data in this manner (see FIG. 4). FIG. 4 shows:

[0116] A, B: Scatter plots with event histogram obtained from the interaction of aerolysin with two peptide ladders containing a triarginine handle. Removal of aa results in a species-specific shift in residual current characteristic of a monomer building block species (here aa).

[0117] C,D: Plot of the change in peptide volume and relative residual current for the two ladders shown above. A clear correlation between the two parameters as well as sequence dependence is evident.

[0118] FIG. 5a shows an exemplary method 100 according to the invention for identifying a sequence of monomer building blocks of a biological or synthetic heteropolymer, comprising the steps: [0119] (a) carrying out a fragmentation method in which the heteropolymer is fragmented, in particular enzymatically, chemically and/or physically, and a fragment mixture is thereby obtained, the fragments of which are molecules having different sequence segments of the heteropolymer; (101) [0120] (b) performing a current measurement method in which current signals of a current through a nanopore are detected, wherein each current signal is based on the interaction of a fragment with the nanopore, wherein the current signals are characteristic of the different fragments such that a representative set of characteristic current signals representing the fragment mixture is determinable; (102) [0121] (c) Performing an evaluation method in which the sequence of the monomer building blocks of the heteropolymer is determined from the representative set of the characteristic current signals. (103)

[0122] In particular, the method 100 may be used in a method (200) for determining the primary structure of a protein, comprising the steps of (see FIG. 5b) [0123] (i) cleavage of the protein, in particular by enzymatic and/or chemical and/or physical cleavage, to obtain peptides as cleavage products of the protein; optionally: obtaining the peptides by chromatographic or electrophoretic separation of a peptide mixture obtained by the cleavage; (201) [0124] ii) Application of the method according to the invention for determining the sequence of amino acids (monomer building blocks) of at least one, in particular each, of the peptides (heteropolymer); (202 and 100, respectively). [0125] (iii) performing a protein recognition procedure in which the primary structure of the protein is determined from the sequence of the at least one peptide. (203) For this purpose, in particular, method 100 may be carried out for all peptides obtained by cleavage of the protein.

[0126] The evaluation method (103 or 300), in which the sequence of the monomer building blocks of the heteropolymer is determined from the representative set of the characteristic current signals, may in particular comprise the following steps (see FIG. 5c): [0127] A) determine residual current values from the measurement data, wherein a residual current describes the interaction of one of the different fragments of the heteropolymer with a nanopore; (301) [0128] B) statistically determine a representative set of characteristic residual current values from the residual current values, a characteristic residual current value describing in each case one fragment type, in particular fragment size, of the number n of fragment types of a fragment mixture formed from the heteropolymer, the representative set describing the heteropolymer sequence unambiguously, but in any case sufficiently for a desired structure elucidation or structure prediction; (302) [0129] C) sort the characteristic residual current values by their magnitude to form a residual current value sequence and determining the current value differences of successive current values of the residual current value sequence; (303) and [0130] (D) assign the current value differences to monomer building block species of the heteropolymer based on pre-known correlation data containing information about which monomer building block species is represented by which current value amount to perform the determination of the sequence of monomer building block species (determination of the sequence of monomer building blocks of the heteropolymer). (304)

[0131] Experimental Data and Embodiment

[0132] An embodiment of the invention is described below in which the complete sequence of synthetic peptides is elucidated, including in a double-blind experiment:

[0133] In the present embodiment, the method according to the invention is described as a method for peptide sequence recognition with respect to peptide sequencing in a derivatization-free single molecule experiment using the wt-aerolysin (wt-AeL) nanopore by a bottom-up peptide ladder strategy. In this research experiment, six peptide ladder-like sample pools were designed. Each pool consisted of the same deca-peptide but with a scrambled sequence and the respective ladder down to the polycationic tri-arginine carrier. Single molecule resistive pulse experiments (nanopore size spectroscopy) demonstrated the detection of species-dependent characteristic differences in residual current strengths for each peptide with identification of the single amino acid (aa) corresponding to each step of ladder formation, laying the foundation for peptide sequencing according to the invention. In addition, the potential of this simple approach as a benchmark technique in everyday laboratory use is described by a double-blind study in another laboratory in which two blindly selected peptides from the sample pool were identified and distinguished based on their aa sequence.

[0134] Peptide Ladder Design and Measurement

[0135] The embodiment uses the wt-AeL nanopore. A Deka peptide was designed consisting of a polycationic C-terminal carrier, R.sub.3, preceded by a heterogeneous stretch of seven aa recruited from the five different aa SRAKY (e.g., SRASKYR). In a second step, the sequence of the aa portion was scrambled to obtain six different hetero-Deka peptides that have the exact same mass of 1335.65 Da (FIG. 6a). Next, peptide ladders (fragment mixtures) were formed for each Deka peptide down to R.sub.3 (aa R.sub.73, As R.sub.63, . . . , aa R, R.sub.133), resulting in a total of 42 samples. By successively adding the peptides of a ladder to the measurement chamber containing the nanopore, a stepwise degradation of a peptide in a ladder generation process was simulated (e.g., Edmann degradation). The step thus corresponds to step a) of the method according to the invention.

[0136] Step b) of the method according to the invention, or steps A) and B), was carried out as follows: In a typical experiment, a single wt-AeL channel was inserted into a DPhPC lipid bilayer spanning a single 50 m aperture of the microelectrode cavity array (MECA16) used. A trans-negative bias voltage of 40 mV was used to drive an ion current (Io) through the protein channel connecting two reservoirs otherwise electrically isolated from each other by the lipid bilayer and filled with electrolyte solution (4 M KCl). Individual peptides that enter the channel defined by the protein and thereby alter the ionic current (I) are detected via the resulting resistive pulses, FIG. 6b. Ladder experiments were performed by adding all peptides of a ladder successively in equimolar amounts, starting with aa R.sub.13 to aa R.sub.73. FIG. 6e schematically shows a result of a nanopore-based peptide ladder experiment. The peptide ladder of an aa R.sub.73 peptide would consist of eight peptides, each leading to a single maximum in the histogram of event-averaged residual current values. The sequence of maxima of the residual current histogram represents the sorting of the measured current signal values I as fractions of the current through the unblocked pore Io (also referred to as relative residual current values (I/Io) or relative residual conductances with possible values between 0 and 1) into a sequence of characteristic residual current values (step C)). It thus defines a representative set of 8 different characteristic residual current values with an equally characteristic dispersion, each representing a fragment of the peptide ladder. It is expected that the longest peptide, aa R.sub.73, would lead to the deepest blockage, while the shortest peptide, R.sub.3, would be represented with the highest I/Io. Then the sequence of maxima can also be clearly assigned to the steps of the ladder, and it is the difference in I/Io of two adjacent maxima that corresponds to the difference that the cleavage of a single aa would produce in the ladder generation process (used in step D). The magnitude of the difference I/Io is thereby sensitive to the identity of the cleaved aa, which facilitates the identification of the sequence of the peptide.

[0137] An evaluation method in which the sequence of monomer building blocks (here: aa) of the heteropolymer (here: peptide) is determined from the representative set of characteristic current signals results from using the differences I/Io of residual current values of adjacent maxima in the representative set of characteristic residual current values. Step D, determining the above aa, is performed by assigning the residual current value differences I/Io to aa of the peptide using pre-known correlation data containing information about which aa is represented by which current value difference amount I/Io to make the determination of the sequence of aa (determining the sequence of As of the peptide).

[0138] FIGS. 6c and d show exemplary raw data (current traces) for the measurement of the conductors L1. After addition of peptides (d), resistance pulses of different depth and duration were detected. It was seen that individual resistor pulses were strongly modulated, but to prevent distortion of the I/Io values, these modulations were excluded and only the main level of a pulse was considered in the data analysis. Such modulations are induced by the motion of the polymer itself within the AeL nanopore.

[0139] FIG. 6a: Sequences of the six heterodeca peptides, each representing the start peptide of a ladder. Black dashed boxes symbolize shifts of aa cassettes, black (and gray) lines symbolize inversion, while colored lines symbolize identity of aa in the different sequences; b: Schematic representation of the experimental setup. An external trans-negative voltage is applied to drive an ion current Io through the open nanopore. Peptides entering the nanopore alter the current, resulting in a resistive pulse (red curve); c: Control trace in 4 M KCl under a trans-negative voltage clamp of 40 mV, digitized at 1 MHz sampling rate, filtered with an 8-pole Bessel filter at a corner frequency of 50 kHz and digitally post-filtered at 25 kHz; d: Exemplary trace after addition of peptide ladder L1 with all peptides at equimolar concentration (HSRASKYRR.sub.3 OH, HRASKYRR.sub.3 OH, HASKYRR.sub.3 OH, HSKYRR.sub.3 OH, HKYRR.sub.3 OH, HYRR.sub.3 OH, HRR.sub.3 OH); e: Schematic level histogram averaged over the main level for a peptide ladder sequencing experiment. The longest peptide (aa R.sub.73) produces the deepest block, and the shortest peptide (aa R.sub.13) produces the shallowest block. The differences in I/Io values (blue lines) can be correlated with the identity of the lost aa. The last aa can be determined against the polycationic C-terminal carrier peptide, R.sub.3 (black).

[0140] To ensure correct assignment of maxima to peptides, the ladders were measured sequentially, starting with the smallest peptide. The expectation expressed above of a monotonic relationship between peptide length and depth of the block was confirmed. On this basis, following this experimental pathway, each of the 42 peptides could be identified within all six ladders (FIG. 7). Differences in the spacing of two adjacent maxima in the histograms are clearly visible and already indicate a presumed relationship between I/Io and the identity of the cleaved aa. (Suppl. 1-Suppl. 6)

[0141] FIG. 7: Residence time scatter plots versus residual pore current I/Io (red) with superimposed histograms of relative residual current values averaged over the main resistive pulse current level (black) for all six peptide ladders. Peptides were added sequentially, starting with the smallest peptide aa R.sub.13 and ending with the largest peptide aa R.sub.73. All measurements of a ladder were performed using the same AeL nanopore. In addition, the green line indicates the location of the separately determined polycationic C-terminal carrier peptide, R.sub.3.

[0142] All recorded resistive pulses in the data sets were analyzed in terms of event duration (dwell time) and amplitude (I/Io), as well as the number of modulations. The calculated differentials, i.e. changes in these values from one maximum to the next, were then plotted together with the differentials for the volume and hydrophobicity of the peptide against the respective position in the peptide, FIG. 8. To allow a direct comparison of all experiments, all differential values were double normalized with their maximum and minimum within the interval [0,1]. It was found that I/Io correlated with the volume (vol), indicating that the largest contribution to the blockade was caused by the volume of the analyte. Thus, the largest I/Io was always found for arginine, the largest aa. Unexpectedly, serine always exhibited the smallest blockade, with one exception in L2, although the smallest volume change was expected for alanine. Remarkably, the I/Io for uncharged and hydrophilic aa, tyrosine and serine, was always underweighted compared to their Vol, whereas hydrophobic alanine was found to be overweighted. On the other hand, charged aa, arginine and lysine, showed a different behavior. While arginine was found to be slightly overweighted in long peptides, it was found to be underweighted in short peptides. The opposite finding was found for lysine.

[0143] FIG. 8: Data correlation plots for all six peptide ladders. Dwell time scatter plots and level histograms averaged over the main level were analyzed for their differences in dwell time (red), residual current (blue), and number of modulations (black, dotted). The corresponding peptide volumes (green) and hydrophobicity (black, dashed) were also plotted. All values were double normalized to allow direct comparability.

[0144] Double-Blind Test

[0145] To investigate the reproducibility and reliability of the results described above, a double-blind experiment was performed. Six peptide ladder samples were prepared, each consisting of aa R.sub.13 to aa R.sub.73 in equimolar amounts. An independent third party acting as a notary randomly selected two of the six ladder samples, labeled them A & B, and sent them along with an R.sub.3-homo peptide sample to an outside comparison laboratory (Abdelghani Oukhaled working group, Universit Cergy Pontoise, France). In addition to the ladders, only FIG. 9b was initially submitted as a reading aid for the ladders, along with the information that all ladders consisted of a triarginine (R.sub.3)C-terminus and the stoichiometric molecular formula A K R S.sub.11221, Yin every possible combination. In the comparative laboratory, the samples were analyzed under identical conditions but with different apparatus. Furthermore, the evaluation of the data, in particular the determination of the I/Io values, was carried out using our own algorithms and software routines, which differed significantly from those of the inventor's laboratory.

[0146] Using FIG. 9b alone, the sequence of sample A was correctly determined in the reference laboratory (KSRASRY, L3), and for sample B (FIG. 10) the partial sequence xxSRASx (i.e., more than half of the variable sequence components) was also correctly recognized and positioned here.

[0147] FIG. 10: Residence time scatter plots over the residual pore current I/Io (red) with superimposed level-averaged histograms (black) sample A (left) and B (right). Below each graph are the, using the first reader, proposed sequences (prop) and the correct sequences (corr). The green box indicates the correct reading frame.

SUMMARY

[0148] The embodiment shows the method of the invention for peptide identification by ladder fingerprinting, which can serve as a primary platform for further development towards peptide sequencing, in particular using the highly sensitive wt-AeL nanopore. Reliable detection of hetero-peptides consisting of a c-terminal polycationic R.sub.3-carrier and up to seven n-terminal alternating heterogeneous aa was achieved . . . . By using peptide ladder-like sample pools ranging from aa R.sub.13 to aa R.sub.73, the position-sensitive contribution of a specific aa species to the overall block depth of a peptide was investigated, and based on these findings, a sequencing as well as fingerprinting reading frame was postulated. Using these, the robustness and reliability of this strategy was demonstrated in a double-blind study by demonstrating sequencing of a randomly selected peptide and identification of a second peptide by fingerprinting.

[0149] In this embodiment example, peptides synthesized on demand were used. This is a model case that can be easily adapted for the case of unknown protein or peptide samples. More comprehensive analysis of larger heteropolymers is accomplished by an initial step of cleaving the heteropolymer by fragmentation methods into further fragmentable subcomponents, which are then used to form ladders For example, proteins can be made available in a standardized sample preparation process. Similar to standard bottom-up MS protein sequencing experiments, for example, an endo-peptidase can be used to fragment proteins into smaller peptides. Furthermore, an exo-peptidase can be used to dynamically generate ladders from these peptides. Individual peptides produced by the protease could be sequentially presented to the nanopore and analyzed in a dynamic exopeptidase-coupled experiment. There is great value in the method of the invention with respect to everyday laboratory applications.

[0150] Material and Methods

[0151] Reagents

[0152] All measurements were performed in AgCl (Carl Roth GmbH, Karlsruhe, Germany) saturated 4 M KCl (Carl Roth GmbH, Karlsruhe, Germany) buffered with 25 mM TRIS (Merck KGaA, Darmstadt, Germany) at pH 7.5. All solutions were prepared using 18.2 M .Math.cm.sup.1 Milli-Q water. After equilibration, the electrolyte solutions were filtered (0.22 m) and stored protected from light. Peptides were synthesized according to the desired requirements by Intavis Peptide Services GmbH & Co KG (Tubingen, Germany). Stock solutions (750 M) of all peptides were prepared in 10 mM HEPES, pH 7.5 and stored at 20 C. until use. Reagents were used at a final concentration of 5 M.

[0153] Protein and Lipid Preparation

[0154] Wild-type proaerolysin (pAeL) was prepared internally via standard protocols from E. coli BL21 (DE3)-pLysS-competent cells using the pET22b (+) vector. pAeL was purified from cell lysates via His-tag chromatography. Sticks of pAeL were prepared using 1 g.Math.L.sup.1, frozen with nitrogen, and stored at 80 C. Thawed pAeL was activated with trypsin (Promega GmbH, Walldorf, Germany) and used at a final pAeL concentration of 20 pmol L.sup.1 (or 3 pmol L.sup.1 AeL). The preprotein construct was chosen in such a way that the affinity tag used for purification is separated from the protein during trypsin activation and native protein is obtained.

[0155] All membranes were prepared from 1,2-diphytanoyl-sn-glycero-3-phosphocholine (DPhPC) from octane. DPhPC was dissolved in chloroform by Avanti Polar Lipids Inc (Alabaster, AL, USA). The lipids were aliquoted, dried under argon, and stored as a dry film at 20 C. until used at a concentration of 1 mg mL.sup.1

[0156] Nanopore Measurements Inventor Laboratory

[0157] All recordings were made using an Axopatch 200B (Molecular Devices, San Jose, CA, USA) in capacitive feedback mode with its 4-pole Bessel filter corner frequency set to 100 kHz at a digitization rate of 1 MHz. An 8-pole Bessel filter with a corner frequency of 50 kHz was connected between the amplifier output and the input of the analog-to-digital converter (Model 9002, Frequency Devices, Ottawa, II, USA). Digitization was performed using a National Instruments AD converter (PCI-6251, National Instruments, Austin, TX, USA). GePulse software (Michael Pusch, University of Genoa, Italy) was used for holding potential control and data recording. Single-molecule resistive pulses were collected under 40 mV transnegative voltage. To eliminate as many parasitic capacitances as possible, MECA16 cavity arrays from lonera GmbH (Freiburg, Germany) with 50 m diameter cavities were used. Further digital filtering (25 kHz Bessel) and event detection was performed with self-written LabView (National Instruments)-based software; subsequent analysis with Igor Pro 8 (Wavemetrics, Lake Oswego, OR, USA).

[0158] Nanopore Measurements Comparison Lab:

[0159] All recordings were performed with an Axopatch 200B (Molecular Devices, San Jose, CA, USA) in resistive feedback mode with its 4-pole Bessel filter cutoff frequency set to 5 kHz at a digitization rate of 100 kHz. A classic vertical chamber system from Warner Instruments (Hamden, CT, USA) with apertures of 150 m diameter was used for the measurements. Digitization was performed using the DigiDatat 1440A AD converter and Clampex10 software (Molecular Devices). The analysis was performed with in-house routines implemented in IgorPro 8.

TABLE-US-00001 Suppl. 1 (Supplement 1): determined values from peptide ladder L1 Ladder L.sub.1 norm loss norm dwell- dwell dwell- norm sequence of I/lo I/lo I/lo time/ms time/ms time n_m2 dn_m2 dn_m2 SRASK 0.3686 9.073 3.35 YR-R.sub.3 RASK S 0.3922 0.0235 0.0000 10.419 1.346 0.000 3.07 0.29 0.35 YR-R.sub.3 ASK YR-R.sub.3 R 0.4965 0.1044 1.0000 3.909 6.510 1.000 2.55 0.52 0.645 SK YR-R.sub.3 A 0.5360 0.0395 0.1975 2.412 1.497 0.361 1.75 0.80 1.00 K YR-R.sub.3 S 0.5622 0.0262 0.0329 2.034 0.379 0.220 1.59 0.16 0.19 YR-R.sub.3 K 0.6487 0.0865 0.7782 0.690 1.344 0.342 1.14 0.46 0.57 R-R.sub.3 Y 0.7259 0.0772 0.6642 0.167 0.523 0.238 1.01 0.13 0.15 R.sub.3 R 0.8067 0.0809 0.7089 0.021 0.146 0.190 1.00 0.01 0.00

TABLE-US-00002 Suppl. 2 (Supplement 2): determined values from peptide ladder L2 Ladder L.sub.2 norm loss norm dwell- dwell dwell- norm sequence of I/lo I/lo I/lo time/ms time/ms time n_m2 dn_m2 dn_m2 KSRYA 0.3792 4.952 4.03 RS-R.sub.3 SRYA K 0.4418 0.0625 0.4837 2.120 2.832 1.000 1.90 2.14 1.00 RS-R.sub.3 RYA S 0.4837 0.0419 0.0993 1.891 0.229 0.076 1.68 0.22 0.10 RS-R.sub.3 YA RS-R.sub.3 R 0.5739 0.0902 1.0000 0.694 1.198 0.420 1.22 0.46 0.22 A RS-R.sub.3 Y 0.6481 0.0742 0.7003 0.233 0.460 0.158 1.03 0.19 0.09 RS-R.sub.3 A 0.6846 0.0366 0.0000 0.164 0.070 0.020 1.02 0.01 0.00 S-R.sub.3 R 0.7603 0.0756 0.7279 0.035 0.128 0.040 1.00 0.02 0.01 R.sub.3 S 0.8067 0.0465 0.1848 0.021 0.014 0.000 1.00 0.00 0.00

TABLE-US-00003 Suppl. 3 (Supplement 3): values determined from peptide ladder L3 Ladder L.sub.3 norm loss norm dwell- dwell dwell- norm sequence of I/lo I/lo I/lo time/ms time/ms time n_m2 dn_m2 dn_m2 KSRAS 0.3869 4.082 3.05 RY-R.sub.3 SRAS K 0.4444 0.0575 0.3533 2.695 1.387 0.72128 1.99 1.06 1.00 RY-R.sub.3 RAS S 0.4749 0.0305 0.0000 2.847 0.152 0.000 1.98 0.01 0.00 RY-R.sub.3 AS RY-R.sub.3 R 0.5819 0.1069 1.0000 0.865 1.982 1.000 1.39 0.60 0.56 S RY-R.sub.3 A 0.6233 0.0414 0.1424 0.479 0.385 0.252 1.13 0.25 0.23 RY-R.sub.3 S 0.6564 0.0331 0.0331 0.417 0.063 0.101 1.09 0.04 0.03 Y-R.sub.3 R 0.7442 0.0878 0.7497 0.105 0.312 0.218 1.01 0.08 0.07 R.sub.3 Y 0.8067 0.0626 0.4191 0.021 0.084 0.111 1.00 0.01 0.00

TABLE-US-00004 Suppl. 4 (supplement 4): determined values from peptide ladder L4 Ladder L.sub.4 norm loss norm dwell- dwell dwell- norm sequence of I/lo I/lo I/lo time/ms time/ms time n_m2 dn_m2 dn_m2 RYSRA 0.3627 4.173 1.72 SK-R.sub.3 YSRA R 0.4372 0.0745 0.7394 2.608 1.565 1.000 1.52 0.20 0.59 SK-R.sub.3 SRA SK-R.sub.3 Y 0.5226 0.0854 0.9493 1.432 1.126 0.717 1.18 0.34 1.00 RA SK-R.sub.3 S 0.5585 0.0359 0.0000 1.052 0.430 0.269 1.08 0.09 0.27 A SK-R.sub.3 R 0.6465 0.0880 1.0000 0.270 0.782 0.496 1.01 0.07 0.21 SK-R.sub.3 A 0.6863 0.0398 0.0745 0.142 0.128 0.074 1.01 0.00 0.01 K-R.sub.3 S 0.7307 0.0444 0.1629 0.130 0.012 0.000 1.00 0.01 0.02 R.sub.3 K 0.8067 0.0760 0.7695 0.021 0.109 0.062 1.00 0.00 0.00

TABLE-US-00005 Suppl. 5 (supplement 5): determined values from peptide ladder L5 Ladder L.sub.5 norm loss norm dwell- dwell dwell- norm sequence of I/lo I/lo I/lo time/ms time/ms time n_m2 dn_m2 dn_m2 KRSSR 0.3793 3.514 2.35 AY-R.sub.3 RSSR K 0.4404 0.0611 0.3874 2.353 1.161 0.732 1.86 0.48 0.95 AY-R.sub.3 SSR R 0.5352 0.0948 1.0000 0.783 1.570 1.000 1.35 0.51 1.00 AY-R.sub.3 SR S 0.5780 0.0428 0.0548 0.666 0.116 0.046 1.24 0.12 0.23 AY-R.sub.3 R AY-R.sub.3 S 0.6178 0.0398 0.0000 0.616 0.051 0.003 1.14 0.10 0.19 AY-R.sub.3 R 0.6968 0.0790 0.7127 0.147 0.468 0.277 1.02 0.13 0.24 Y-R.sub.3 A 0.7435 0.0468 0.1263 0.101 0.046 0.000 1.00 0.01 0.02 R.sub.3 Y 0.8067 0.0632 0.4262 0.021 0.080 0.023 1.00 0.00 0.00

TABLE-US-00006 Suppl. 6 (supplement 6): determined values from peptide ladder L6 Ladder L.sub.6 norm loss norm dwell- dwell dwell- norm sequence of I/lo I/lo I/lo time/ms time/ms time n_m2 dn_m2 dn_m2 SKRYS 0.3937 4.738 2.28 RA-R.sub.3 KRYS S 0.4179 0.0242 0.0000 4.811 0.073 0.000 2.11 0.17 0.32 RA-R.sub.3 RYS K 0.4901 0.0722 0.7117 2.087 2.723 1.000 1.58 0.53 1.00 RA-R.sub.3 YS RA-R.sub.3 R 0.5817 0.0916 1.0000 0.712 1.376 0.518 1.24 0.34 0.65 S RA-R.sub.3 Y 0.6601 0.0784 0.8047 0.268 0.443 0.185 1.02 0.22 0.42 RA-R.sub.3 S 0.6919 0.0318 0.1129 0.218 0.051 0.044 1.01 0.01 0.02 A-R.sub.3 R 0.7627 0.0708 0.6917 0.050 0.167 0.086 1.00 0.01 0.01 R.sub.3 A 0.8067 0.0441 0.2950 0.021 0.029 0.037 1.00 0.00 0.00

TABLE-US-00007 Suppl. 7 (Supplement 7): determined values for I/lo and residence time of homo-arginine peptides. Ensslen et al. Refers to the embodiment according to the invention. Piguet et al. (50 mV) Ensslen et al. (40 mV) Rx I/lo I/lo dwell-time/ms dwell-time/ms I/lo dwell-time/ms 10 0.234 72.0 9 0.286 0.052 31.0 41.0 8 0.353 0.067 14.2 16.8 7 0.435 0.082 6.2 8.0 0.4371 7.23 6 0.530 0.095 2.3 3.9 5 0.631 0.101 0.9 1.4 0.6309 0.86 4 0.731 0.1 0.7259 0.167 3 0.8067 0.02

METHOD AND SYSTEMS FOR IDENTIFYING A SEQUENCE OF MONOMER UNITS OF A BIOLOGICAL OR SYNTHETIC HETEROPOLYMER

Inventors

Cpc classification

Classification Explorer

G01N2333/96433

PHYSICS

Classification Explorer

G01N33/6824

PHYSICS

International classification

Classification Explorer

G01N33/68

PHYSICS

Abstract

Claims

Description