Method for Processing Nuclear Magnetic Resonance (NMR) Spectroscopic Data

Abstract

The invention provides methods for processing nuclear magnetic resonance (NMR) spectroscopic data to assign resonance peaks in an NMR spectrum of a molecule to atomic nuclei in said molecule, based on graph-theoretical principles. In particular, the methods of the invention may be employed in the assignment of data obtained from methyl-TROSY spectroscopy of proteins.

Claims

1. A method for processing nuclear magnetic resonance (NMR) spectroscopic data to assign resonance peaks in an NMR spectrum of a molecule to atomic nuclei in said molecule, comprising: (i) obtaining from a source a nuclear magnetic resonance dataset D.sub.1 comprising a first set of data points ij representing resonance peaks arising from heteronuclear J-couplings between nuclei i and nuclei j, and a second set of data points jj representing through-space proximity correlations between individual nuclei j associated with the first set of data points ij; (ii) obtaining from a source a molecular structure dataset D.sub.2 comprising data points representing a three-dimensional spatial arrangement of atoms in the molecule; (iii) defining a magnetic resonance data graph G.sub.1=(V.sub.1, E.sub.1) and a structure graph G.sub.2=(V.sub.2, E.sub.2) wherein: the set of vertices V.sub.1 of G.sub.1 comprises individual vertices V.sub.1.sup.ij such that each individual vertex V.sub.1.sup.ij represents an individual data point ij selected from the first set of data points; the set of edges E.sub.1 of G.sub.1 comprises individual edges E.sub.1.sup.jj such that each individual edge E.sub.1.sup.jj represents a data point jj selected from the second set of data points and wherein each individual edge E.sub.1.sup.jj connects two individual vertices V.sub.1.sup.ij if nuclei j associated with the data points represented by those two vertices interact via a through-space proximity correlation; the set of vertices V.sub.2 of G.sub.2 comprises individual vertices V.sub.2.sup.i such that each individual vertex V.sub.2.sup.i represents an atom in the molecular structure represented by dataset D.sub.2, said atom having a nucleus i and said being bound to at least one atom having a nucleus j; the set of edges E.sub.2 of G.sub.2 comprises individual edges E.sub.2.sup.ii such that each individual edge E.sub.2.sup.ii connects two individual vertices V.sub.2.sup.i to one another if the atoms represented by said vertices are within a predetermined distance r relative to one another in the molecular structure represented by dataset D.sub.2; (iv) comparing G.sub.1 and G.sub.2 to identify at least one maximal common edge subgraph (MCES); (iv) generating all possible mappings of edges E.sub.1.sup.jj to edges E.sub.2.sup.ii and of vertices V.sub.1.sup.ij to vertices V.sub.2.sup.i which are consistent with said at least one MCES; (vi) for each possible mapping of a vertex V.sub.1.sup.ij to a vertex V.sub.2.sup.ii obtained in step (v), generating a peak assignment, wherein generating said peak assignment comprises attributing a resonance represented by said vertex V.sub.1.sup.ij to an atom having a nucleus i which is represented by said vertex V.sub.2.sup.i; and (vii) generating an output comprising all possible assignments from step (vi) of resonance peaks in the NMR spectrum to atomic nuclei in the molecule.

2. A method according to claim 1, wherein the through-space proximity correlations are nuclear Overhauser effect (NOE) interactions.

3. A method according to claim 1, wherein the magnetic resonance data graph G.sub.1 is a graph G.sub.1=(V.sub.1, E.sub.1,.sub.1) and wherein the structure graph G.sub.2 is a graph G.sub.2=(V.sub.2, E.sub.2,.sub.2), wherein at is a labelling function which labels the vertices V.sub.1 and .sub.2 is a labelling function which labels the vertices V.sub.2.

4. A method according to claim 1, wherein nuclei i and j are independently selected from among .sup.13C, .sup.1H and .sup.15N, with the proviso that nuclei i and nuclei j are different types of nucleus to one another.

5. A method according to claim 1 wherein nuclei i are .sup.13C nuclei and nuclei j are .sup.1H nuclei.

6. A method according to claim 1 wherein nuclei i are .sup.13C nuclei and nuclei j are .sup.1H nuclei and further wherein said nuclei j are each individually bound to a nucleus i.

7. A method according to claim 6 wherein nuclei i and j are nuclei in .sup.13C-labelled methyl groups.

8. A method according to claim 7 wherein said .sup.13C-labelled methyl groups are .sup.13C-labelled isoleucine , valine and/or leucine methyl groups.

9. A method according to claim 1 wherein datapoints ij represent 2D .sup.13C-.sup.1H HMQC resonance peaks.

10. A method according to claim 1 wherein datapoints jj represent .sup.1H-.sup.1H nuclear Overhauser effect interactions.

11. A method according to claim 1 wherein datapoints ij individually represent 2D .sup.13C-.sup.1H HMQC resonance peaks arising from separate .sup.13C-labelled methyl groups and datapoints jj represent .sup.1H-.sup.1H nuclear Overhauser effect interactions between .sup.1H nuclei associated with said separate .sup.13C-labelled methyl groups.

12. A method according to claim 1 wherein said NMR spectrum is a methyl-TROSY spectrum.

13. A method according to claim 1 wherein datapoints ij represent resonance peaks arising from 2D .sup.13C-.sup.1H HMQC interactions between .sup.13C and .sup.1H nuclei in individual .sup.13C-labelled methyl groups of alanine, valine, leucine and/or isoleucine residues in a biomolecule, and datapoints jj represent homonuclear NOE interactions between .sup.1H nuclei in separate such .sup.13C-labelled alanine, valine, leucine and/or isoleucine residues.

14. A method according to claim 1 wherein the molecule represented by D.sub.2 is a protein or enzyme.

15. A method according to claim 1 wherein D.sub.2 is derived from a crystal structure.

16. A method according to claim 15 wherein said crystal structure is a crystal structure available in the Protein Databank (PDB).

17. A method according to claim 1 wherein D.sub.2 is derived from homology modelling.

18. A method according to claim 1 wherein, prior to step (iii), D.sub.1 is filtered.

19. A method according to claim 1 wherein, in step (iii), r is 5-20 .

20. A method according to claim 1 wherein, following step (iii) and prior to step (iv), G.sub.1 is decomposed into non-intersecting connected subgraphs.

21. (canceled)

22. A method according to claim 1 wherein step (iv) comprises determining an optimal matching order of the vertices of G1 to G2 and/or an optimal order of traversal of vertices of G.sub.1.

23. A method according to claim 1 wherein step (iv) is repeated as often as necessary to generate all possible mappings of G1 to G2 consistent with subgraph isomorphism and/or MCES.

24. A method according to claim 1 wherein the output of step (vii) is displayed on an electronic display screen.

25. (canceled)

26. A computer readable storage medium storing computer software code which when executing on a data processing system performs a method as claimed in claim 1.

27. A method according to claim 1 comprising, between steps (iii) and (iv), performing a step (iiiA) of comparing G.sub.1 and G.sub.2 to identify subgraph isomorphism from G.sub.1 to G.sub.2 and generating all possible mappings of edges E.sub.1.sup.jj to edges E.sub.2.sup.ii and of vertices V.sub.1.sup.ij to vertices V.sub.2.sup.i which are consistent with said subgraph isomorphism.

28. A method according to claim 27 wherein, following step (iii) and prior to step (iiiA), G.sub.1 is decomposed into non-intersecting connected subgraphs.

29. A method according to claim 27 wherein step (iiiA) is performed using the VF2 algorithm.

30. A method according to claim 1 wherein nuclear magnetic resonance dataset D.sub.1 is obtained by conducting an NMR experiment on a sample comprising a molecule of interest and the dataset D.sub.1 thus generated is in the form of an NMR spectrum of the molecule of interest.

31. A method according to claim 1 wherein steps (iii)-(vii) are performed by executing code in a computer.

32. A method of processing nuclear magnetic resonance (NMR) spectroscopic data, wherein a molecule of interest is subject to NMR and a nuclear magnetic resonance dataset D.sub.1 is obtained in the form of an NMR spectrum of the molecule of interest, to assign resonance peaks in the NMR spectrum of the molecule of interest to atomic nuclei in said molecule of interest, comprising: (i) obtaining said nuclear magnetic resonance dataset D.sub.1 comprising a first set of data points ij representing resonance peaks arising from heteronuclear J-couplings between nuclei i and nuclei j, and a second set of data points jj representing through-space proximity correlations between individual nuclei j associated with the first set of data points ij; (ii) obtaining a molecular structure dataset D.sub.2 comprising data points representing a three-dimensional spatial arrangement of atoms in the molecule; (iii) defining a magnetic resonance data graph G.sub.1=(V.sub.1, E.sub.1) and a structure graph G.sub.2=(V.sub.2, E.sub.2) wherein: the set of vertices V.sub.1 of G.sub.1 comprises individual vertices V.sub.1.sup.ij such that each individual vertex V.sub.1.sup.ij represents an individual data point ij selected from the first set of data points; the set of edges E.sub.1 of G.sub.1 comprises individual edges E.sub.1.sup.jj such that each individual edge E.sub.1.sup.jj represents a data point jj selected from the second set of data points and wherein each individual edge E.sub.1.sup.jj connects two individual vertices V.sub.1.sup.ij if nuclei j associated with the data points represented by those two vertices interact via a through-space proximity correlation; the set of vertices V.sub.2 of G.sub.2 comprises individual vertices V.sub.2.sup.i such that each individual vertex V.sub.2.sup.i represents an atom in the molecular structure represented by dataset D.sub.2, said atom having a nucleus i and said being bound to at least one atom having a nucleus j; the set of edges E.sub.2 of G.sub.2 comprises individual edges E.sub.2.sup.ii such that each individual edge E.sub.2.sup.ii connects two individual vertices V.sub.2.sup.i to one another if the atoms represented by said vertices are within a predetermined distance r relative to one another in the molecular structure represented by dataset D.sub.2; (iv) optionally comparing G.sub.1 and G.sub.2 to identify subgraph isomorphism from G.sub.1 to G.sub.2 and generating all possible mappings of edges E.sub.1.sup.jj to edges E.sub.2.sup.ii and of vertices V.sub.1.sup.ij to vertices V.sub.2.sup.i which are consistent with said subgraph isomorphism; (v) comparing G.sub.1 and G.sub.2 to identify at least one maximal common edge subgraph (MCES); (vi) generating all possible mappings of edges E.sub.1.sup.jj to edges E.sub.2.sup.ii and of vertices V.sub.1.sup.ij to vertices V.sub.2.sup.i which are consistent with said at least one MCES; (vii) for each possible mapping of a vertex V.sub.1.sup.ij to a vertex V.sub.2.sup.ii obtained in steps (iv) and (vi), generating a peak assignment, wherein generating said peak assignment comprises attributing a resonance represented by said vertex V.sub.1.sup.ij to an atom having a nucleus i which is represented by said vertex V.sub.2.sup.i; (viii) generating an output comprising all possible assignments from step (vii) of resonance peaks in the NMR spectrum to atomic nuclei in the molecule.

33. A computer program product embodied on a computer readable storage medium for processing nuclear magnetic resonance (NMR) spectroscopic data of a molecule to assign resonance peaks in said NMR spectroscopic data to atomic nuclei in said molecule, said computer program product comprising: computer code for defining a magnetic resonance data graph G.sub.1=(V.sub.1, E.sub.1) from a nuclear magnetic resonance dataset D.sub.1, wherein nuclear magnetic resonance dataset D.sub.1 comprises a first set of data points ij representing resonance peaks arising from heteronuclear J-couplings between nuclei i and nuclei j in said molecule, and a second set of data points jj representing through-space proximity correlations between individual nuclei j in said molecule associated with the first set of data points ij, and wherein the set of vertices V.sub.1 of G.sub.1 comprises individual vertices V.sub.1.sup.ij such that each individual vertex V.sub.1.sup.ij represents an individual data point ij selected from the first set of data points; and wherein the set of edges E.sub.1 of G.sub.1 comprises individual edges E.sub.1.sup.jj such that each individual edge E.sub.1.sup.jj represents a data point jj selected from the second set of data points and wherein each individual edge E.sub.1.sup.jj connects two individual vertices V.sub.1.sup.ij if nuclei j associated with the data points represented by those two vertices interact via a through-space proximity correlation; (ii) computer code for defining a structure graph G.sub.2=(V.sub.2, E.sub.2) from a molecular structure dataset D.sub.2, wherein molecular structure dataset D.sub.2 comprises data points representing a three-dimensional spatial arrangement of atoms in said molecule, and wherein the set of vertices V.sub.2 of G.sub.2 comprises individual vertices V.sub.2.sup.i such that each individual vertex V.sub.2.sup.i represents an atom in the molecular structure represented by dataset D.sub.2, said atom having a nucleus i and said being bound to at least one atom having a nucleus j; and wherein the set of edges E.sub.2 of G.sub.2 comprises individual edges E.sub.2.sup.ii such that each individual edge E.sub.2.sup.ii connects two individual vertices V.sub.2.sup.i to one another if the atoms represented by said vertices are within a predetermined distance r relative to one another in the molecular structure represented by dataset D.sub.2; (iii) optionally, computer code for comparing G.sub.1 and G.sub.2 to identify subgraph isomorphism from G.sub.1 to G.sub.2 and generating all possible mappings of edges E.sub.1.sup.jj to edges E.sub.2.sup.ii and of vertices V.sub.1.sup.ij to vertices V.sub.2.sup.i which are consistent with said subgraph isomorphism; (iv) computer code for comparing G.sub.1 and G.sub.2 to identify at least one maximal common edge subgraph (MCES); (v) computer code for generating all possible mappings of edges E.sub.1.sup.ij to edges E.sub.2.sup.ii and of vertices V.sub.1.sup.ij to vertices V.sub.2.sup.i which are consistent with said at least one MCES; (vi) computer code for generating a peak assignment for each possible mapping of a vertex V.sub.1.sup.ij to a vertex V.sub.2.sup.ii obtained in steps (iv) and (vi), wherein generating said peak assignment comprises attributing a resonance represented by said vertex V.sub.1.sup.ij to an atom having a nucleus i which is represented by said vertex V.sub.2.sup.i; (vii) computer code for generating an output comprising all possible peak assignments of resonance peaks in the NMR spectrum to atomic nuclei in the molecule.

Description

[0234] The invention will now be described in further detail with reference to the following non-limiting Examples and the Figures, in which:

[0235] FIG. 1 is a schematic of an exemplary method of processing NMR data to assign resonance peaks in accordance with the methods of the present invention.

[0236] FIG. 2 is a block diagram of an exemplary workflow for processing NMR data to assign resonance peaks in accordance with the methods of the present invention.

[0237] FIG. 3 illustrates an example of automated methyl resonance assignment using MAGMA (MAGMA is the name given by the inventors to an algorithmic implementation of a method of processing NMR data in accordance with the invention). (a): .sup.1H-.sup.13C TROSY-HMQC spectrum and a 2D .sup.13C-.sup.13C projection of a CCH-HMQC-NOESY spectrum from ATCase, recorded from a sample with [.sup.13CH.sub.3]-labelled methyl groups of Ile, Leu and Val residues in a highly deuterated background. In the case of Leu and Val, only one methyl group of the two is labeled for any one residue. The diagonal peaks in the NOESY spectrum correspond to the methyl peaks from the TROSY spectrum. Off-diagonal peaks reveal which methyl residues undergo significant cross-relaxation during the mixing time, where shorter range distances are expected to give higher intensity correlations (NOEs). (b): interpretation of a methyl-TROSY spectrum and a 3D .sup.13C-HMQC-NOESY spectrum by MAGMA. Methyl resonances from different residue types are shown in different colours. Extraction of strips in the 3D or 4D NOESY spectrum (b, shaded) for each auto peak allows a network to be constructed showing which peaks have connection to which others. (c): The information from each resonance can be combined into a graph. (d): Similarly, methyl-methyl connections from the protein structure model can be derived, within a MAGMA-determined distance cut-off, and turned into a graph. (e): MAGMA applies the graph matching method of the invention to produce assignments.

[0238] FIG. 4 illustrates the combinatorial nature of graph matching. (a): Each assignment takes 10.sup.3 seconds to evaluate. In a problem involving only one type of residue, the combinations are given by N! and the total calculation time in seconds is given by (N!)10.sup.3. The assignment of 10 residues by brute force would take 1 hour whereas if the number of residues exceed 21, the evaluation time exceeds the age of the universe (10.sup.17 seconds). (b): graph representation of an example of the assignment problem. (i) A structure (PDB) graph. (ii) a methyl-methyl NOE graph. The NOE graph is sparser in methyl-methyl connectivity (dotted lines) and contains an erroneous NOE that does not exist in the structure graph. The correct solution is shown, with NOEs shown corresponding to specific regions of the structure. (c): there are a total of 288 solutions, which can be sorted by the number of overlapping edges. The higher the number, the more NOEs are explained by the assignment. (d): Two example solutions are shown with M=2 (i) and M=5 (ii). (e): there are two maximum common edge graphs for this problem (i, ii) where M=8. These two solutions are shown where 7 residues are unambiguously assigned (darker shading), and two positions are ambiguous as they are structurally identical (lighter shading).

[0239] FIG. 5 illustrates the performance of MAGMA on synthetic data. (a): (i) secondary structure of CDK2. (ii) locations of Ile-1, Leu-1/2, and Val;-1/2 .sup.13C atoms are shown as spheres. iii) Atoms from ii) within 8 distance are connected by edges revealing a structure graph. (b) Ala, Ile, Leu and Val synthetic inter-methyl NOE graphs were generated from structure graph by removal of 30% and 60% of inter-methyl edges for i) and ii), respectively (methods), with Ala, Leu, Val and Ile methyl groups labelled (AILV). The corresponding performance of MAGMA is shown, where the correct assignment lies on the diagonal. (c) A comparison of the performance of MAGMA on a number of randomly-simulated synthetic datasets with different labelling strategies AILV and ILV) and different sparsity. I, L, V and A denote Ile, Leu, Val and Ala labelling. (LV) denote that the L and V residues are not distinguished. Even with highly sparse data, a number of confident assignments are expected. Increasing the number of different types of labels increases the success of MAGMA. (d) Performance comparison of MAGMA (i) to FLAMEnGO2.0 (ii) and MAP-XSII (iii) on a single representative 50% sparse ILV inter-methyl NOE data graph. In general, MAGMA gives more confident assignments than the other programs and all assignments were correct.

[0240] FIG. 6 illustrates the performance of MAGMA on experimental data (a/c) Histograms of inter-carbon distances (white bars) are calculated from crystal structures of the indicated proteins together with those experimentally observed (black bars). Structures of each protein are shown with the Ile, Leu, and Val groups indicated. (b/d) Assignment results from MAGMA. Methyl resonances are on the y-axis (Peak ID), whereas methyl-containing residues (Atom ID) are on the x-axis. A heat scale indicates the confidence in assignments, where confidently assigned methyl resonances are in black. Experimentally obtained methyl assignments lay along the diagonal. The total number of confidently assigned methyl groups is indicated, alongside the number of correctly assigned resonances. MAGMA performance was faultless. Where ambiguous assignments are possible, the correct result was always one of the options.

[0241] FIG. 7 illustrates the performance of MAGMA compared to alternative protocols. (a) The number of confident assignments from MAGMA in our benchmark versus the total number of inter-methyl distance restraints used in the calculation. Performance depends on the data sparsity, the labelling scheme and the specific fold of each protein. On average over the benchmark, 3.6 restraints were required for each confident assignment. The gradient of the line and R2 correlation coefficient are indicated. (b) Performance comparison of MAGMA to alternative automated methyl assignment protocols. For each protein in the benchmark, the total number of assigned methyl resonances is shown for MAGMA (black), FLAMEnGO2.0 (dark grey), and MAP-XSII (light grey). The confident assignments that were incorrect are indicated. FLAMEnGO2.0 returned no confident assignments for EIN1 and MSG (*). MAGMA tends to both obtain more confident assignments especially on large targets (MBP, MSG, .sub.7.sub.7), and provides perfect accuracy on this benchmark.

[0242] FIG. 8 illustrates the MAGMA workflow, as explained further in Example 2.

[0243] FIG. 9 illustrates using MAGMA to target specific residues for site-directed mutagenesis, to refine an ambiguous assignment. Where ambiguity is detected in assignments, it is desirable to obtain additional experimental data to further refine the assignment. In the case of a 50% sparse ILV inter-methyl data set (FIG. 5d), a cluster of six residues, 63, 64, 65, 66, 69 and 70 were identified by MAGMA to be interchangeable. Consecutive point mutations of the three most ambiguous residues (70V, 63V, 64V) provides 5 further confident assignments.

[0244] FIG. 10 illustrates the determination of the appropriate distance threshold for the calculation. The plots in FIG. 10 showing how the fraction of restraints MAGMA uses (methyl-methyl NOEs) i increasing as a function of the distance threshold used in the calculation. This figure shows that in many cases a 10 distance threshold is suitable to explain a high fraction of NOEs. If all NOEs are explained at 10 , the user can consider lowering the threshold and checking whether all NOEs are still explained. If that is the case the lowest distance at which all NOEs are explained is used (as with ss-ubiquitin and ubiquitin). Otherwise, the calculation is simply run at 10 (as with the remaining benchmarks).

[0245] FIG. 11(a): Shown are the results of short runs of MAGMA that determine the size of the MCES acquired early in the calculation with different distance thresholds. The distance chosen by the program is the shorter of either the size required for the MCES to match 100% of the inter-methyl restraints, or 10 . The empirical rule implemented in MAGMA provides 100% accuracy in our benchmark without introducing unnecessary additional heterogeneity. (b) The total time taken for the calculation (Table 1). The circle indicates the point at which no further change was found in the final assignment outputs, including a complete ambiguity description. This took a maximum of 2 days (.sub.7.sub.7). (c) MAGMA execution times in our benchmark scaled with 3.210.sup.14N.sup.8.5 (black line), where N is the total number of inter-methyl restraints. For comparison, the theoretical worst-case scenario, N! is also indicated (grey line.)

[0246] FIG. 12: Determination of the optimal distance for calculation when using FLAMEnGO2.0 (b) and MAP-XSII (a) . . . (a) For FLAMEnGO2.0, the

[0247] score provided was followed as a function of distance threshold. The recommended distance is that where the curve starts to reach a plateau. The calculation is then repeated 100 times at this optimal distance in order to obtain a final result. The final distances for each protein are specified in the legend. (b) MAP-XSII requires that the program is run at a number of distance thresholds, and the distance that provides the most confident assignments is taken as the final value. Results are shown here for the benchmark, indicating the distance reported with an asterisk. The number of incorrect assignments at each distance threshold is indicated.

[0248] FIG. 13 The inclusion of chemical shift information to refine MAGMA assignments. Assignment results obtained by MAGMA were rescored based on .sup.1H-.sup.13C chemical shifts predicted from the structure using ShiftX2 and CH.sub.3Shift. Scoring was applied conservatively (Supplementary S.4). Where one member of the ambiguous set was found to have a chemical shift within the statistical prediction accuracy for the predictor, and the others possibilities were outside of this, the one identified assignment choice was taken to be a new confident assignment. This generally increased the number of confident assignments, extending to resonances that do not generate inter-methyl restraints, but at a significant cost of accuracy. In all cases other than ubiquitin, a protein used to train the chemical shift prediction algorithms, the new confident assignments were as likely to be correct as incorrect. This was the case if we used either .sup.1H (a) or .sup.13C (b) chemical shifts, or the two combined (c). The narrow range of chemical shifts obtained from methyl groups makes them challenging to predict accurately.

[0249] FIG. 14 Application of MAGMA to discriminate between two possible structures. For successful operation of MAGMA, the structure used must reflect that present during the NMR measurements. In the case of ATCase R2 dimer (a) MSG (b) and HSP90 (c), two classes of structure are available in the PDB (i). MAGMA gives each of these slightly different confident assignment results (ii). In a conservative case, we recommend pooling these results, which lowers the number of confident assignments (ii). Notably, 4 sites in ATCase and 2 sites in MSG are revealed with confident, but different assignments between the two structures. Performing a single point mutagenesis of one of these sites allows a user to completely distinguish the two structures. This reveals that the R state of the ATCase R2 dimer, the ligand-free form of MSG and the open conformation of Hsp90 were analysed.

[0250] FIG. 15 MAGMA can distinguish between ligand binding modes. An aminotriazine based fragment hit compound was bound to the N-terminal domain of HSP90. a) A 1D .sup.1H NMR spectrum of the compound, allows a straightforward assignment of H1, H2 and H3. b) Three structures show the compound in different arrangements (PDB: i) 3B24 chain A, ii) 3B24 chain B, NMR iii) 1YER_1). MAGMA was run on each, including both protein/protein and protein/ligand NOEs. One structure could be excluded as the MAGMA result explained few ligand/protein NOEs, and none from H1. Of the remaining two similar structures, one resulted in incorrect ligand assignments. 1YER_1 was the only structure consistent with the NMR data, following screening by MAGMA. MAGMA determines, when combined with ligand NOEs, both a change in structure of the protein, and where the ligand binds.

EXAMPLE 1

[0251] With reference to FIGS. 1 and 2, exemplary methods of processing NMR data in accordance with the invention will now be described.

[0252] FIG. 1 is a schematic of an exemplary method of processing NMR data to assign resonance peaks in accordance with the methods of the present invention.

[0253] Sample 101 is subjected to NMR testing 102 and an NMR dataset D.sub.1 103 is thereby obtained. In FIG. 1 an illustrative and non-limiting example of such a dataset 103 is represented in graphical form. It will be appreciated by one of ordinary skill in the art that this is simply a conventional visual way of presenting NMR data, employed in this Example merely for ease of understanding. The skilled person would appreciate in light of the present disclosure that the underlying data may be employed in the inventive methods described herein without the need to generate such a visual representation.

[0254] In the illustrative dataset 103 shown in FIG. 1, the dataset arises from an NMR experiment in which a .sup.1H-.sup.13C TROSY-HMQC spectrum 103a and a .sup.13C-.sup.13C CCH-HMQC-NOESY spectrum 103b were recorded for the enzyme aspartate carbamoyltransferase (ATCase).

[0255] However, other NMR experiments and pulse sequences, other molecules of interest, and other combinations of NMR-active nuclei, may be employed as described elsewhere in the present disclosure. The data presented in visual form as 103a are an example of a first set of data points ij representing resonance peaks arising from heteronuclear J-couplings between nuclei i and nuclei j. The data presented in visual form as 103b are an example of a second set of data points jj representing through-space proximity correlations (here: nuclear Overhauser interactions) between nuclei j associated with the first set of data points ij.

[0256] Molecular structure dataset D.sub.2 105 is obtained, representing the three-dimensional spatial arrangement of atoms in a molecule 104 (for illustrative purposes, a structural formula for ATCase is shown for consistency with the data presented as 103). This may be obtained by homology modeling or by acquiring a predetermined crystal structure from a database such as the PDB, for example. Where homology modeling is performed this may be done in computer 106 or separately.

[0257] Code executed in computer 106 defines magnetic resonance data graph G.sub.1 on the basis of NMR dataset 103 and structure graph G.sub.2 on the basis of molecular structure dataset 105 employing a graph-matching algorithm as described elsewhere herein which compares G.sub.1 and G.sub.2 to generate all possible mappings of edges and vertices in G.sub.1 to edges and vertices in G.sub.2 which are consistent with subgraph isomorphism and/or at least one MCES (where only one MCES exists, all such mappings for such MCES are generated; where more than one MCES exists, all such mappings for each such MCES are generated). For each possible mapping, the code then generates a peak assignment such that resonances in molecular structure dataset D.sub.1 103 are attributed to nuclei of atoms in molecular structure dataset D.sub.2 105.

[0258] Once all possible mappings and assignments have been generated, an output 107 is generated which indicates all possible assignments.

[0259] The code executed in computer 106 in order to generate mappings, assignments, and output 107, will be configured to perform a graph-matching method as described herein. An algorithm such as MAGMA, which is described in further detail in subsequent Example 2, may be employed and thus the code executed in computer 106 may be written to perform such an algorithm when executed.

[0260] FIG. 2 is a block diagram of an exemplary workflow for processing NMR data to assign resonance peaks in accordance with the methods of the present invention. This illustrates in further detail the steps of the graph-matching algorithms employed in the methods of the invention, such as those which may be executed by computer 106.

[0261] In step 201, a nuclear magnetic resonance dataset D.sub.1 (such as dataset 103 shown in FIG. 1) is obtained, corresponding to an embodiment of step (i) of the methods described herein. As described in further detail elsewhere herein, this may be obtained by performing an NMR experiment on a molecule for the express purposes of generating the assignment, or may be a pre-recorded NMR dataset. In step 202, a molecular structure dataset D.sub.2 (such as dataset 105 shown in FIG. 1) is obtained, for example from a database such as the PDB or via homology modeling as described herein. This corresponds to an embodiment of step (ii) of the methods described herein.

[0262] Steps 203 to 207 are then performed to process the NMR data and generate peak assignments. Such steps may be performed according to any of the embodiments as described herein, and may for example be performed in a computer (e.g. computer 106 in FIG. 1) by executing a suitable code. At step 203 a magnetic resonance data graph G.sub.1 and a structure graph G.sub.2 are defined, each comprising vertices and edges which correspond to features in the relevant datasets as described in further detail elsewhere in the present disclosure. This corresponds to an embodiment of step (iii) of the methods described herein. At step 204 G.sub.1 and G.sub.2 are compared to identify subgraph isomorphism (corresponding to an embodiment of optional step (iv) of the methods described herein) and/or at least one MCES (corresponding to an embodiment of step (v) of the methods described herein). At step 205, corresponding to an embodiment of step (vi) of the methods described herein, all possible mappings are generated which are consistent with the at least one MCES identified in 204 (where subgraph isomorphism was identified in step 204 all possible mappings consistent with said subgraph isomorphism are also generated, and so in this respect 205 may also correspond to an embodiment of step (iv) of the methods described herein). At step 206, corresponding to an embodiment of step (vii) of the methods described herein, the mappings generated at step 205 are used to generate peak assignments, and in step 207 (corresponding to an embodiment of step (viii) of the methods described herein) an output is generated comprising all assignments from step 206.

[0263] The steps 203 to 206 and optionally 207 may if desired be performed in a computer programmed with code which, when executed, processes the datasets accordingly. An algorithm named MAGMA has been devised by the inventors which performs such steps and therefore a computer may be employed to implement the MAGMA algorithm. This is explored in further detail in Example 2.

EXAMPLE 2: MAGMA

[0264] MAGMA is an algorithmic method of processing NMR data in accordance with the methods of the invention. The flowchart for MAGMA is shown in FIG. 8. This is the implementation of the methods of the invention which was used to generate the data in the remaining Examples.

[0265] In this illustrative and non-limiting Example, as input, MAGMA takes a set of side-chain atoms from a protein structure (corresponding to a molecular structure dataset D.sub.2 as described herein) to create an input structure graph (FIG. 3d) (corresponding to a structure graph G.sub.2 as described herein) and a set of methyl-methyl NOEs, extracted from 3D or 4D NOESY spectra (thus corresponding to a nuclear magnetic resonance dataset D.sub.1 as described herein) that have been filtered based on criteria for reciprocity, intensity, and signal-to-noise to define a magnetic resonance data (NOE) graph (FIG. 3d) (corresponding to a magnetic resonance data graph G.sub.1 as described herein).

[0266] The core of MAGMA consists of two exact and deterministic graph matching algorithms: the VF2 subgraph isomorphism algorithm (this is an illustrative implementation of optional step (iv) of the methods of the invention) and a maximal common subgraph algorithm (this is an illustrative implementation of step (v) of the methods of the invention) adapted from the backtracking algorithm described by McGregor. The implementation of the VF2 algorithm was accessed via a Python interface to the igraph high performance graph library python-igraph0.765. Using Python, we designed and implemented an adapted MCES algorithm based on that of McGregor, as described above and illustrated in FIG. 8. MAGMA determines the set of graph matching solutions that comprise the MCES. This allows MAGMA to generate all combinations of methyl assignments that satisfy the maximal number of the methyl-methyl NOE restraints (an illustrative implementation of steps (vi) and (vii) of the methods of the invention), which therefore highlights assignment ambiguities. Analyses and plotting of graphs were supported by the Python package networkx-1.966 and the Mayavi application.

[0267] MAGMA functions as follows, described with reference to the numerals in the flowchart of FIG. 8:

[0268] Structure Graph Generation (Flowchart Reference 801):

[0269] This corresponds to an illustrative implementation of defining a structure graph G.sub.2 in accordance with step (iii) of the methods of the invention.

[0270] In the following Examples herein, unless indicated differently, pseudo-atoms were created by averaging the coordinates of the two methyl carbon atoms for both Leu and Val groups to create a pseudo-atom. The matrix of all Euclidean distances between Ile-.sub.1, Leu-pseudo , Val-pseudo , and when applicable, alanine , threonine and methionine , was computed from protein coordinates (provided in a PDB file). If more than a single distance was measured for any of the methyl-methyl pairs, the matrix was reduced to the set of shortest distances between any two carbon atoms. Next, the distance heuristic was applied and only the distances smaller or equal to the defined distance threshold were kept. This final set of distances defined a set of non-weighted edges for the structure graph, in which the methyl carbon atoms were used to define a set of vertices. Each vertex was assigned a label according to its amino acid type. In calculations in which discrimination between Leu and Val was not assumed (FIG. 5c), the same label was assigned to methyls of both Leu and Val residue types.

[0271] Preprocessing of NOESY Data (Flowchart Reference 802):

[0272] This embodies one illustrative example of the optional step of filtering the magnetic resonance dataset D.sub.1 prior to defining graph G.sub.1. Such filtering steps are described in further detail elsewhere in the present disclosure.

[0273] Processed 3D- and 4D-HMQC NOESY spectra were analyzed using Sparky for the r2 dimer of ATCase and the N-terminal domain of EIN, respectively. For other proteins in the benchmark, the spectral analyses were performed by others and the resultant NOE peak lists were provided. The following rules were applied to obtain confident methyl-methyl NOE assignments:

[0274] 1) NOE reciprocity, i.e. mutual cross-peaks existed between two auto-peaks

[0275] 2) Similar normalized cross-peak intensity (90%) and signal-to-noise ratio

[0276] Normalized cross-peak intensity and signal-to-noise ratio were calculated in accordance with the detailed description of the invention.

[0277] Data heights and noise threshold were determined by Sparky software for analysis of NMR spectra. In the obtained NOE lists for which data heights (or calculated peak intensities) were missing, solely the NOE reciprocity rule was applied to filter methyl-methyl NOEs. The .sup.1H-.sup.13C frequencies from methyl-TROSY HMQC spectra and the filtered NOE lists were used to define the set of vertices and the set of edges of the methyl-methyl NOE graph.

[0278] NOE Graph Generation (Flowchart Reference 802):

[0279] This corresponds to an illustrative implementation of defining a magnetic resonance data graph G.sub.1 in accordance with step (iii) of the methods of the invention.

[0280] Within MAGMA, .sup.1H-.sup.13C frequencies from methyl-TROSY HMQC spectra and the filtered methyl-methyl NOEs were converted to an adjacency matrix of an NOE graph. The vertices of the graph were labelled according to the amino acid type. If the assignments for the two methyl groups (pro-R and pro-S) within a given Leu or Val residue were known, then a single Leu or Val residue was used to join the methyl-methyl NOEs from both methyl peaks, thereby creating a single vertex in NOE graph. The residue types were added as being known a priori, a requirement that can be accomplished through inspection of the chemical shifts (e.g. discriminating Ile resonances from Leu resonances), or by preparing specifically labeled samples (e.g. to discriminate Leu from Val resonances). When the resonances of Leu and Val types cannot be distinguished, MAGMA supports giving the same label for Leu/Val .sup.1H-.sup.13C resonances, albeit with higher ambiguity from the collapse of two labels into one.

[0281] NOE Graph Analysis (Flowchart Reference 803):

[0282] This corresponds to an illustrative implementation of the optional feature of decomposing G.sub.1 into non-intersecting connected subgraphs prior to steps (iv)/(v) of the method of the invention, as described in further detail elsewhere herein.

[0283] MAGMA analysed connectivity of every NOE graph to check if the graph was disconnected. Disconnected NOE graphs were decomposed into their connected components (connected NOE subgraphs) and their cardinality (number of vertices) was determined. The connected NOE subgraphs were ordered according to their cardinality (from the highest to the lowest) and were henceforth treated separately.

[0284] Subgraph Isomorphism Testing (Flowchart References 804, 809):

[0285] This corresponds to an illustrative implementation of optional step (iv) of the methods of the invention.

[0286] Subsequently to the steps described above (structure graph generation, preprocessing of NOESY data, NOE graph generation, and NOE graph analysis), the SI test was iteratively performed on each of the connected NOE subgraphs according to the aforementioned order.

[0287] If a subgraph passed the SI test, it was then matched to a structure graph by the VF2 algorithm, which returns all mapping solutions that result in SI. Those subgraphs that did not pass the SI test are consequently collected for graph matching by the adapted MCES algorithm (see below).

[0288] Before the exact MCES search was executed, each of the subgraphs that did not result in SI (or, if no subgraphs were present, the entire NOE graph) was passed through a protocol that determined: [0289] 1. Optimal matching order (vertices of an NOE graph to vertices of a structure graph) [0290] 2. Optimal order of the traversal of vertices in an NOE (sub)graph

[0291] Determining the optimal orders for 1. and 2. is critical for obtaining a high starting scores in an exact MCES search. Moreover, optimal ordering results in significant reduction of the search space of the exact search, as the high initial MCES scores eliminate completion of any partial assignment that would lead to lower scores. Together, the optimal matching and traversing account for efficient pruning of the search tree and faster convergence of the exact MCES search.

[0292] Sorting Matching Options Prior to MCES Search (Flowchart References 804,805) and Ordering the Visits of Nodes in NOE (Sub)Graphs and Iterative Protocol for the Improvement of the Initial MCES Score (Flowchart References 805, 806, 807, 808):

[0293] These are preliminary stages of one illustrative implementation of step (v) of the methods of the invention.

[0294] Optimal matching order refers to ordering a set of vertices of structure graph associated with each vertex of an NOE sub(graph). The set comprised all possible assignment options for a particular NOE vertex, i.e. every assignment option for a particular .sup.1H-.sup.13C resonance in a methyl-TROSY spectrum. The adapted MCES algorithm of the invention supported associating the ordered list of assignment options to each vertex of an NOE (sub)graph. The order of the vertices in the set was the order in which different assignment options should be tried. In MAGMA, sorting of the sets was determined by a variation of the Hungarian method for the assignment problem (Munkres algorithm), combined with a Jaccard similarity coefficient, as described in the detailed description of the invention.

[0295] The exact MCES algorithm constructed the search tree starting from a vertex in an NOE (sub)graph. From a given starting point (root), the order of traversal of vertices of an NOE graph during the search was found to significantly impact the starting scores of the exact MCES search. In order to determine the most optimal starting point (vertex of an NOE graph) a single iteration of the MCES search was performed from every vertex of an NOE (sub)graph, and the obtained scores (size of first found MCES) were used to determine the best starting vertex. The highest scoring, first found MCES comprised the optimal starting point for MAGMA. The order of the rest of the vertices from the root was determined through breadth-first tree search.

[0296] After determining the matching order and NOE graph traversal order, MAGMA performed an incomplete, short MCES run prior to execution of the exact MCES run. The goal of this short search was to collect a number of high scoring MCES solutions that are used to update the initially determined matching orders. The highest scoring solutions (max (|MCES|)) of the short MCES search were inserted to the front of the set of the assignment options for every NOE vertex. The initially determined matching order (given by Munkres (Hungarian) algorithm and Jaccard coefficients) came after the new order established by the short MCES runs. We determined that the optimal number of high scoring MCES solutions to collect for the update of matching orders was 10.

[0297] It should be noted that the final result of the exact MCES algorithm was not affected by the described heuristics for determining best starting point for the search and the best matching orders. However, these considerations were important determinants of the speed of convergence to the final set of solutions.

[0298] Exact MCES Search (Flowchart References 806, 807, 808):

[0299] This is a further stage in the illustrative implementation of step (v) of the methods of the invention. Once the order of the matching priorities and optimal traversal of NOE graph had been determined as described above, the exact MCES algorithm was applied.

[0300] MAGMA Scoring (Flowchart References 809, 810, 806, 807):

[0301] This corresponds to an illustrative implementation of steps (vi) and (vii) of the methods of the invention.

[0302] At the end of its protocol, MAGMA joined all assignment solutions that comprised subgraph isomorphisms (in the case of VF2) and/or MCES (adapted McGregor), and thereby associated each methyl (vertex in the methyl-methyl NOE graph) to a set of atoms (vertices in the structure graph) to which it was uniquely assigned over a set of all solutions. For each peak, the set of all possible structure graph (PDB) assignments were then found. If multiple assignment options were found, the score given to each peak was defined as the reciprocal of the total number of possible assignments for that peak (FIGS. 5, 6). Therefore, for a confident assignment, where there is only one possibility, the score was 1. The score therefore provided an indication of the degree of ambiguity or confidence of the assignment of any given peak.

EXAMPLE 3: A SIMPLIFIED PROBLEM

[0303] To illustrate how graph matching solves the assignment problem, all solutions were obtained to a relatively simple problem with nine methyl-labelled residues as a function of the number of common edges (i.e. explained NOEs, FIG. 4c). The data graph contained fewer edges than the structure graph, and a deliberate error was introduced. Due to the inclusion of the erroneous restraint, the data graph was not subgraph isomorphic to the structure graph and so the VF2 algorithm could not be applied. Out of the 288 possible assignments, only two, which comprise the MCES, had the maximum possible number of edges (NOEs) that are common to both graphs (FIGS. 4c and e). This provided 7 confident assignments and 2 that could be interchanged as they were structurally identical.

EXAMPLE 4: PROOF-OF-PRINCIPLE

[0304] To demonstrate the effectiveness of MAGMA, a series of proof-of-principle calculations using simulated data were performed to examine how different levels of inter-methyl NOE sparsity and amino acid labelling schemes influenced the accuracy and ambiguity of methyl assignments (FIG. 5).

[0305] The 34 kDa human Cyclin-dependent kinase 2 (CDK2) (PDB:2C60)(FIG. 5a) was chosen as an investigative target, and the maximum distance threshold r for simulated NOEs was set to 8 . Several common methyl labelling schemes were employed, as described in the figure caption.

[0306] Different methyl-methyl NOE sparsity levels were simulated by removal of a subset of edges from the CDK2 structure graph. Here, 0% sparsity corresponds to the case where all distances below the specified threshold were included in the calculation (i.e. the maximum number of edges). The ratio of the number of NOEs removed to the total possible number of NOEs then defines the sparsity. Edges were removed by random sampling according to the following distribution: the probability of removing an edge was given by:

[00007] $\frac{r_{i}^{6}}{.Math. r_{i}^{6}}$

[0307] where r.sub.i is the distance of any given edge (i). Hence, the edges corresponding to longer distances are removed with a higher probability. As the precise result from MAGMA depends on exactly which NOEs have been inputted, for each amino acid labelling scheme and the NOE sparsity percentage, MAGMA was run on three different subsets of edges, generated by changing the seed of the random number generator. This enabled calculation of the mean and the standard deviation of the number of confidently assigned methyls for a given NOE sparsity (FIG. 5c).

[0308] NOE data with varying sparsity levels were simulated such that shorter-distance NOEs were retained with a greater probability than long range NOEs. The results show that the correct assignment solution was always sampled within the set of solutions generated by MAGMA (FIG. 5b, d). In addition, MAGMA did not produce a confident assignment that was incorrect.

[0309] As expected, an increase in NOE sparsity resulted in an increase in the number of methyls that have multiple assignment possibilities (FIG. 5b, c, d). Different NOE restraints and amino acid labelling schemes were tested in order to determine if these parameters influenced the number of confident (non-ambiguous) assignments (FIG. 5c). The results showed that the largest percentage of confident assignments were generated when Ile, Leu, Val, and Ala residues were methyl-labelled and when the assignments of the two pro-R and pro-S methyl groups from a particular Leu or Val residue were known (FIG. 5c, ILVA). As implied above, having more restraints between as many different types of residue was beneficial for assignment.

EXAMPLE 5: COMPARISON TO OTHER APPROACHES

[0310] We compared the performance of MAGMA to two other freely available automated assignment approaches, FLAMEnGO2.014 and MAP-XSII15 (FIG. 5d). Unlike MAGMA, these additionally utilise chemical shift data, which MAGMA does not use. For a performance comparison, predicted chemical shifts were supplied using either ShiftX2 or CH3Shift, with each method yielding similar results. The results from a challenging data graph simulated at 50% sparsity with Ile-, Leu-, and Val- methyl labelling provide a representative performance comparison of the three algorithms. The number of correct/incorrect assignments was 36/0 in the case of MAGMA. FLAMEnGO2.0 and MAP-XSII provided 18/17 and 30/21 correct/incorrect assignments, respectively (FIG. 5, FIG. 12). These results demonstrate that MAGMA can provide accurate and reliable methyl assignments.

EXAMPLE 6: APPLICATION TO AMBIGUOUS ASSIGNMENTS

[0311] For a subset of the methyls that have ambiguous assignments, the ambiguity arises from structural equality of vertices in the NOE graph, where two nodes can be exchanged but the number of satisfied NOEs does not change. Nevertheless, the ability of MAGMA to reveal ambiguity in the assignment of some methyls is valuable for an NMR spectroscopist, as it highlights the protein residues/regions for which more experimental information is required, for example, by targeted SDM (site-directed mutagenesis).

[0312] Where an assignment provided by MAGMA reveals multiple possibilities, vertices can be exchanged and still explain the maximum number of inter-methyl distance restraints. This is valuable information that highlights regions within the protein where additional data should be sought. To illustrate how this information can be used, we show that in a 50% sparse ILV dataset, three single point mutations of the most ambiguous residues provides five new confident assignments (FIG. 9). MAGMA allows any known assignments to be specified, which reduces potential ambiguity of unknown assignments and can lead to a significant increase in the final number of confident assignments.

EXAMPLE 7: TESTING OF MAGMA ON EXPERIMENTAL NOE DATA

[0313] MAGMA was tested against a benchmark of data obtained from 3D or 4D NOESY spectra of methyl-labelled proteins, where methyl-TROSY assignments had been previously determined and cross-validated with other techniques (FIG. 6, Table 1). The benchmark proteins were: Ubiquitin (Ubq), methionine-R-sulfoxide reductase B protein (MsrB), the N-terminal domain of E. coli Enzyme I (EIN1), a dimer of regulatory chains of aspartate transcarbamoylase from E. coli (ATCase r2), maltose binding protein (MSB), malate synthase G (MSG) and the double-7-ring of the 20S proteasome core particle (.sub.7.sub.7).

[0314] This benchmark was, to the best of the inventors' knowledge, the largest assembled benchmark for testing automatic methyl assignment programs. The benchmark proteins display diversity in molecular mass, shape, amino acid labelling, and data sparsity.

[0315] The distribution of NOEs as a function of distance is shown with the total distances possible within the molecule (FIG. 6a), together with the results from MAGMA (FIG. 6b). As expected, the majority of short ranges NOEs were observed, while in general few NOEs were observed corresponding to distances greater than 10 .

[0316] The number of confident assignments obtained varied with the details of the specific experimental dataset, but coarsely scaled with the number of NOEs in the dataset. On average over the whole benchmark, five methyl-methyl NOEs were needed for each confident assignment (FIG. 7a). Most notably, one of the larger proteins under study, the 77 proteasome double-ring resulted in 95% confidently assigned methyls. The exceptional performance on the 77 proteasome could be attributed to the quality of the NMR data, which yielded low sparsity (80%) and high local graph connectivity (with graph vertex degrees 6 on average). The NOE data resulted in a single graph with only three methyls without any methyl-methyl NOEs for which therefore the assignment could not be obtained. Overall, the number of assignments obtained depended on the shape and size of the problem under consideration, in addition to the quality of experimental data.

[0317] An interesting case is the comparison of the isolated r2ATCase dimer (34 kDa) and the r2ATCase dimer in the context of the holoenzyme. The former was produced as it could be assigned using traditional backbone experiments. Fewer NOE peaks were observed in spectra from the dimer over the holoenzyme. Moreover, crystal structures of the holoenzyme exist, whereas to make progress with the dimer, the structure has to be assumed to be equivalent. The known assignment predicts some unexpected very long range NOEs in the isolated dimer, which might suggest that the structure of the dimer is not the same. Remarkably, 16 confident assignments were obtained from the holoenzyme despite sparser NOE data (FIG. 6 a, b ii), iii)).

[0318] Another interesting case was EIN. The assignment strategy based primarily on PRE data, with the NOE data being used only for cross validation used this protein as benchmark. Here, it was shown that 35% of the methyl assignments can be obtained using NOE data alone.

[0319] In general, MAGMA will provide more confident assignments when there are more NOEs, and the reference structure is the closest possible match with what is present in solution.

TABLE-US-00005 TABLE 1 Methyl-methyl NOE benchmark data Methyl Total Assignable M.sub.w PDB Labelling Methyl containing Data inter- Data resonances (kDa) code scheme groups residues type methyl Subgraphs (i) Ubiquitin 8.6 1ubq I, L, V 33 20 NOE 25 3 18 Ubiquitin 8.6 1ubq L, V 26 13 DREAM 18 1 10 MsrB 16.6 3e0o I, L, V 39 22 NOE 37 2 21 EIN 27 1eza I, L, V, A 146 100 NOE 145 6 84 ATCase R.sub.2 30 1d09 I, L, V 66 39 NOE 91 1 34 MBP 43.4 1ez9 I, L, V 123 73 NOE 144 4 70 MSG 81.4 1y8b I, L, V 274 159 NOE 230 12 141 .sub.7.sub.7 360 1yau I, L, V 96 56 NOE 154 1 53 Mean Distance Confident Assigned Time to Total vertex threshold Sparsity assignments fraction converge time degree (ii) (A) (iii) (iv) (v) (vi (s) (vii) (s) (viii) Ubiquitin 2.89 6.5 0.56 16 0.80 2 10.sup.2 2 10.sup.2 Ubiquitin 3.6 7.5 0.78 10 0.77 2 10.sup.2 2 10.sup.2 MsrB 3.52 10 0.62 17 0.77 4 10.sup.3 4 10.sup.3 EIN 3.45 10 0.37 39 0.35 4 10.sup.3 5 10.sup.3 ATCase R.sub.2 5.35 10 0.58 12 0.31 7 10.sup.4 2 10.sup.5 MBP 4.11 10 0.56 47 0.64 7 10.sup.4 2 10.sup.5 MSG 3.26 10 0.47 60 0.38 7 10.sup.4 4 10.sup.6 .sub.7.sub.7 5.81 10 0.80 52 0.93 2 10.sup.5

EXAMPLE 8: PERFORMANCE COMPARISON

[0320] The performance of MAGMA was compared to that of the two alternative automatic methyl assignment programs (FLAMEnGO2.0 and MAP-XSII) on the protein benchmark described in Example 6 (FIG. 7a). In the case of MAGMA, residue type and NOESY data were used as restraints, all of which are available from data on differentially labelled samples. On average, MAGMA provided more confident assignments than alternative programs but most crucially, confident assignments identified by MAGMA were always correct, which was not generally the case for the other two methods. In the case where MAGMA identified an ambiguous assignment, the experimentally determined answer was sampled as one of the possibilities.

EXAMPLE 9: USE OF MAGMA IN STRUCTURE DETERMINATION

[0321] The ATCase R2 dimer is known to exist in R and T conformations. MSG adopts a different structure when bound to certain ligands. When the different structures were provided as input for MAGMA, different sets of confident assignments are returned (see FIG. 14). If the fold of the protein is not known, we recommend pooling the assignment results from all available structures, as shown in FIG. 14. In these two cases, a single point mutant in the location where two confident assignments differed was sufficient to distinguish between the two alternative structural forms. We determined ATCase R2 dimer to be in its R state and MSG to be in its free form. In these two instances, the structural assignment was also supported by the observed .sup.1H-.sup.13C chemical shifts.

EXAMPLE 10: CHEMICAL SHIFT FILTERING

[0322] We assessed if chemical shift scoring could be used to reduce ambiguity in assignment solutions generated by MAGMA. Both SHIFTX2 and CH3Shift were used to predict methyl .sup.1H and .sup.13C chemical shifts from a structure, with each giving similar results in the final assignment. For each resonance where MAGMA returned an ambiguous assignment, we calculated a score as S=|.sub.calc.sub.data|/ where gives the chemical shift and , the prediction accuracy of the specific chemical shift under consideration as defined by the relevant programs. A combined .sup.1H-.sup.13C chemical shift score was obtained from the individual .sup.1H and .sup.13C scores from (S.sub.H.sup.2+S.sub.C.sup.2).sup.1/2. In the case where two stereospecific methyl groups are possible, chemical shift differences that correspond to all 4 possible combinations of stereo-specific assignment were computed, and the lowest score was taken as being most likely to be correct.

[0323] The score S was calculated for each possibility of an ambiguous assignment. A new assignment was considered confident if exactly one of the options had a score that was less than or equal to 1. Otherwise, the restraint was not applied (FIG. 13). Even with this conservative application of chemical shift information, new confident assignments resulted in a roughly equal probability of correct and incorrect assignments. This is likely due to a combination of challenges associated with calculating methyl chemical shifts, and variations between methyl orientations and their dynamics in solution and in crystallographic forms. Other than identifying residue types, and perhaps protons that are in immediate proximity to aromatic rings, we recommend caution when using methyl chemical shifts for assignment purposes.

EXAMPLE 11: USING MAGMA FOR LIGAND STRUCTURE DETERMINATION

[0324] In addition to using protein-protein distance restraints, NOEs from ligands can be included with the calculation. By providing MAGMA with different ligand/protein complexes, it becomes possible to determine where the ligand is bound. The example shown is Hsp90. Without ligand, the protein adopts a closed conformation. With ligand, the protein adopts an open conformation, and the binding site of the ligand is determined as shown in FIG. 15.

Method for Processing Nuclear Magnetic Resonance (NMR) Spectroscopic Data

Inventors

Cpc classification

Classification Explorer

G01R33/4608

PHYSICS

Classification Explorer

G01R33/4625

PHYSICS

Classification Explorer

G01R33/4633

PHYSICS

Classification Explorer

G01N24/087

PHYSICS

International classification

Classification Explorer

G01R33/46

PHYSICS

Classification Explorer

G01N24/08

PHYSICS

Abstract

Claims

Description