IDENTIFICATION OF LIGAND BINDING SITES IN INTRINSICALLY DISORDERED PROTEINS WITH DIFFERENTIAL BINDING SCORES
20230125652 · 2023-04-27
Inventors
Cpc classification
G16B15/00
PHYSICS
G16B35/00
PHYSICS
International classification
Abstract
Various embodiments disclosed relate to method for identification of preferred binding sites on intrinsically disorganized proteins (IDPs). The present disclosure includes methods including generating an IDP ensemble comprising one or more of the IDPs, sampling ligand interactions with the IDP ensemble to produce sampled ligand interactions, subjecting each of the sampled ligand interactions to an IDP ensemble docking, producing a differential binding score (DIBS) based on the sampled ligand interactions with the IDP ensemble docking, and modeling the DIBS to identify binding sites on the IDP ensemble.
Claims
1. A method of screening ligands binding to intrinsically disorganized proteins (IDPs), the method comprising: generating an IDP ensemble comprising one or more of the IDPs; sampling ligand interactions with the IDP ensemble to produce sampled ligand interactions; subjecting each of the sampled ligand interactions to an IDP ensemble docking; producing a differential binding score (DIBS) based on the sampled ligand interactions with the IDP ensemble docking; and modeling the DIGS to identify binding sites on the IDP ensemble.
2. The method of claim 1, wherein generating the ensemble comprises using molecular dynamics simulations combined with an ensemble of random coil conformations.
3. The method of claim 1, wherein sampling ligand interactions comprises randomly sampling a subset of populations from a set of conformations for
4. The method of claim 3, wherein the set of conformations comprises about 1,000.
5. The method of claim 3, wherein the subset of conformations comprises about 100.
6. The method of claim 1, wherein subjecting each of the sampled ligand interactions to an ensemble docking comprises 24 independent docking routines.
7. The method of claim 1, wherein subjecting each of the sampled ligand interactions to an ensemble docking is repeated three times for the ensemble IDP.
8. The method of claim 1, wherein the DIBS is based on an estimated binding affinity of each docking run and a number of times a particular amino acid is involved in a binding event.
9. The method of claim 1, wherein modeling the DIBS comprises modeling data between a plurality of produced DIBS to identify residues corresponding to the binding sites on the IDP ensemble.
10. The method of claim 1, further comprising comparing the ensemble IDP to a control ensemble comprising random coils.
11. The method of claim 10, wherein comparing the ensemble IDP to a control ensemble comprising random coils comprises: generating the control ensemble comprising the random coils; sampling ligand interactions with the control ensemble to produce sampled ligand interactions; subjecting each of the sampled ligand interactions to a control ensemble docking; producing a differential binding score (DIBS) for the control ensemble; modeling the DIBS to identify binding sites on the control ensemble.
12. The method of claim 1, wherein the IDPs comprise intrinsically disorganized proteins or proteins with intrinsically disordered regions.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. Like numerals having different letter suffixes may represent different instances of similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.
[0007]
[0008]
[0009]
[0010]
[0011]
DETAILED DESCRIPTION
[0012] Screening ligands directly binding to an ensemble of IDPs to discover potential hits or leads for new drugs is an emerging but challenging area due to the lack of well-defined and ordered 3D-protein structures. To explore a new IDP-based rational drug discovery strategy, a differential binding score (DIBS) is defined to quantitatively estimate the binding preference of a ligand to an ensemble of conformations specified by IDP versus such preferences to an ensemble of random coil conformations of the same protein. Ensemble docking procedures performed on repeated samples of conformations, and the results tested for statistical significance determine the preferential binding sites of the IDP. The results of this approach closely reproduce the experimental data from recent literature on the binding of the ligand epigallocatechin gallate (EGCG) to the intrinsically disordered N-terminal domain of the tumor suppressor p53. Combining established approaches in developing a new method to screen ligands against IDPs could be valuable as a screening tool for IDP-based drug discovery.
[0013] Using computational algorithms to dock small molecules to proteins with ordered 3D structures to screen potential ligands is helpful in a drug discovery process. The in silico screening is routinely applied in the early stages in rational structure-based approaches in selecting a ranked set of priority leads for experimental validation. Docking methods initially utilized rigid ligands against static protein structures, as in influenza virus or HIV-1 protease. As proteins sample multiple conformational sub-states, the ensemble docking approach notably advanced the virtual screening approaches closer to experimental conditions. Ensemble docking for drug discovery pipeline has become one of the critical elements in the arsenal to develop potential drug molecules, as exemplified by the recent applications to find drug targets for Covid-19 infection.
[0014] Intrinsically disordered proteins (IDPs) emerge as promising druggable targets due to their functional association with various diseases. Screening ligands directly binding to an ensemble of IDPs to discover potential hits or leads for new drugs is exceptionally appealing because the ligands may directly block the undesired biological interactions mediated by the IDPs. However, to take advantage of the well-established structure-based rational drug design strategies, these methods need to be repurposed for IDPs that lack well-defined or ordered 3D-structures.
[0015] It is advantageous to access well-defined, folded, and experimentally generated three-dimensional structures for a ligand docking protocol. However, the inherent complexity of the IDP's conformational flexibility introduces an increased complexity to ensemble docking procedures. Therefore, the methods to identify binding sites of potential drug molecules to IDPs are current challenges in structure-based rational drug discovery. For the intrinsically disordered proteins when represented as an ensemble of structures for a docking protocol, two critical factors need to be distinguished concerning other conventional studies involving an ensemble of structured proteins. The differential binding score (DIBS), the difference in probability between two sets of ensemble docking protocols, is proposed to address these two factors. First, given an ensemble of conformations posed by an IDP, a potential ligand will bind to a selective subset of IDPs at a given moment but not necessarily in the same configuration because the subset of proteins with similar affinity may not have the same binding definitions to the ligand. The dynamic interconversion between the conformations of the protein is responsible for the differential binding events, leading to a probabilistic nature of protein-ligand interaction. A binding score defined must reflect how many such events of receptor-ligand interactions occur from a reasonably large set of binding events sampled. The second factor investigates the specificity of the IDP-ligand binding events by testing if similar events would occur if an ensemble of random coil structures represented the protein of interest. In combination, the procedure can then address that a particular ligand prefers a conformational subset posed by the IDP more specifically than random distribution, and the portions of the protein responsible for the interaction can be deduced.
[0016] In the method proposed herein, the two criteria mentioned above are measured, first by repeated sampling of ligand interaction with an ensemble of IDP conformations generated by molecular dynamics simulations and an ensemble of random coil conformations. From a large number of conformations (1,000) each, a subset of populations (100 structures each) is randomly sampled for the ensemble docking protocol. Each sampled population is subjected to an ensemble docking for 24 independent docking routines to a total of 2,400 runs for each sub-population. The same protocol is repeated three times separately for the IDP and random coil ensembles on a subset of conformations selected each time randomly. A differential binding score (DIBS) is defined based on the estimated binding affinity of each docking run and the number of times a particular amino acid is involved in the binding event. Secondly, linear modeling of the triplicate data was performed between the binding scores to identify residues that show significant differences leading to the identification of preferred binding sites on the IDP ensemble.
[0017] To demonstrate the utility of the approach, the method was tested on the binding of the ligand epigallocatechin gallate (EGCG) to the N-terminal domain of the tumor suppressor p53 (p53-NTD, an IDP). Zhao and Blayney et al. have revealed, based on their Surface Plasmon Resonance (SPR) and Nuclear Magnetic Resonance (NMR) studies, EGCG preferentially and directly interacts with the p53-NTD in a highly dynamic fashion with multiple binding interfaces. The DIBS supports these observations and the heterogeneous complex formation of EGCG with disordered proteins. Furthermore, the differential binding score (DIBS) for the p53-NTD binding to EGCG shows a close agreement with experimental chemical shift perturbation data from the literature.
Examples
[0018] Methods
[0019] Molecular System
[0020] The 93 residues long N-terminal domain of the p53′ was obtained from the DISPROT database in the FASTA format (DP00086r024). A three-dimensional structure of epigallocatechin gallate (EGCG) was downloaded from the PubChem (CID 65064). The EGCG structure was energy minimized and optimized using the DFT, BYLIP, and a basis-set of 6-31 G (d) in the computational program Gaussian.
[0021] Generation of the Ensemble of IDP Conformations
[0022] Ensemble of conformations representative of the intrinsically disordered p53-NTD were generated using molecular dynamics simulations performed using the academic implementation Desmond combined with the user interface for visualization Schrödinger's Maestro. An extended conformation of the protein was generated within Maestro tools from the primary structure of p53-NTD. The protein-peptide preparation tools with Maestro were used to optimize the starting configuration of the protein structure. The molecular dynamics simulation system was built with an explicit solvent model of water TIP4PD. TIP4PD water model is preferred, as it tends to correct general deficiencies in standard water models, particularly for the disordered proteins in reproducing experimental ensembles. The MD simulations were performed using the default six-step protocol and at standard conditions of isothermal-isobaric (NPT) ensemble with pressure 1.013 bar (set by Martyna-Tobias-Klein method) and temperature 300K (Nose-Hoover thermostat). Other simulation parameters are SHAKE algorithm with two fs each for bonded and near interactions and six fs for far interactions with OPLS-AA 2005 force field. Starting from an extended structure built from the primary sequence (FASTA format), MD simulations were performed for 500 ns, and the performance was evaluated using the built-in tools of Desmond. A total of 1,000 conformations was collected after the first 100 ns to define the pool of IDP ensembles.
[0023] Generation of Random Coil Ensembles
[0024] Starting from the primary sequence of p53-NTD, a representative ensemble of 10,000 random coil structures was generated using the TraDES. Then, a subset of 1,000 confirmations was sampled randomly to define the random coil ensemble of p53-NTD.
[0025] Docking Protocol
[0026] The molecular docking protocols were performed using the Autodock VINA, with the default parameters implemented within YASARA (version 19.1.27) molecular modeling program. The optimized structure of EGCG was imported into YASARA for an ensemble docking protocol using the built-in macro (run_ensemble.mcr). For each ensemble of conformations, the EGCG (ligand) was docked 24 times against each of the 100 p53-NTD (receptor), giving rise to a table consisting of 2400 results ranked according to the binding energy (kcal/mol) and dissociation constant (pM). The results are clustered such that they all differ by at least 5 Å (heavy atom RMSD) along with the amino acid residues involved in the binding process.
[0027] The probability score (PS.sub.k) for each amino acid residue (k) in a particular ensemble run is defined as follows:
[0028] Where n.sub.i is the number of times the residue (k) encounters the ligand, N total number of the runs (typically number of docking runs×number of structures in the ensemble), and DC.sub.i is the corresponding dissociation constant. Thus, equation [1] can be considered a weighted sum of the dissociation constant for each amino acid residue in the receptor. The probability factor will be between zero for a residue that does not involve binding events and one whence it is involved in all the binding runs. Each run generated a data set of 2,400 (number of docking runs)×93 (length of p53-NTD), repeated for three independent samplings of sub-structures from the pool of MD generated or random coil structures.
[0029] Statistical Analysis
[0030] The ensemble docking on the p53-NTD either as an IDP or as random coil sampling generated three sets of probability scores for each. Next, a statistical test was performed between these two data sets to determine which amino acids are responsible for the differential effects between the random coil and the IDP ensembles. Although with three independent runs each, a t-test would be sufficient; a linear model was employed to determine the differential binding score (DIBS), the fold change, and significance (p-values). These statistical methods were based on established protocols and applied to other studies previously. Upon completing the statistical test, amino acid residues that are differentially affected between the random coil (control) and the IDP ensembles of p53-NTD are considered significant if fold change (log 2)>2.0 and p-value <0.05.
[0031] Results
[0032] Profiles of the Probability Scores of the Random Coil and IDP Ensembles of p53-NTD.
[0033] The ensemble of conformations of the MD simulations to represent the IDP populations and the corresponding random coil simulations are generated. Supporting information (
[0034] The affinity of the EGCG is hypothesized to be differential between the ensembles of conformations. During each binding run, the interaction of the ligand and the receptor will lead to a measure of the binding constant (in kcal/mol), and the amino acid residues encounter the ligand. A representative example of the first 100 docking run results is shown in
[0035] Cumulatively, the ensemble docking runs representing the p53-NTD binding to EGCG are given by the binding scores for the IDP ensemble (
[0036] The Differential Binding Scores (DIBS) Defined Between IDP Vs. Random Coil Ensembles.
[0037] Chemical shift perturbation data is plotted in
[0038] The ensemble docking was performed on a set of conformations selected for the IDP, and random coil conformations show a notable difference in the profiles. Are these binding events are statistically significant? To measure the differential effect, the procedure (
[0039] Considering that p53-NTD is an IDP (
[0040] Identification of the Residues of Differential Significance Due to EGCG Binding
[0041] Repeated sampling of the subset of conformations from the IDP and random coil ensembles allows testing if these measured effects have any statistical significance, such as testing for the null hypothesis. For example, a particular amino acid residue of p53-NTD is considered significant if the binding score in that residue in the IDP ensemble is higher than that of the random coil ensemble with a p-value of the statistical test less than 0.05. These measures are generally shown in a fold-change vs. p-value, known as the volcano plot. A volcano plot for the statistical measure between the random coil ensemble and the IDP ensemble for repeated measures (three times each) is shown in
[0042] Statistical analysis of the results from the ensemble docking runs to identify potential interactions between p53-NTD and EGCG is an essential element of the approach. This requirement confirms that EGCG has a higher affinity for IDP conformations than the sampled random coil conformations.
[0043]
[0044]
[0045]
[0046]
[0047]
[0048] Discussion
[0049] A differential scoring approach distinguishes the ensemble docking results between intrinsically disordered vs. the random coil states of a receptor. When the method is applied to assess the binding of p53-NTD with EGCG, in addition to providing a close match with the NMR-based chemical shift perturbation studies (literature results), other residues are also identified. Typically, docking studies are performed only with a specific receptor or with an ensemble representation. The value of the method relies on the fact that the differential binding score (DIBS) establishes a means to identify binding sites selected explicitly by the ligand on an intrinsically disordered ensemble over a random coil ensemble. Multiple sampling of conformations and a systematic statistical test suggest that this method could serve as a screening tool to identify potential ligands that bind to IDPs.
[0050] The study has limitations from the fundamental assumptions upon the ensemble docking concept that considers that the ligand selects a protein conformation based on the induced fit model and conformational selection and population fit models. The presumption of success of this method depends on the hypothesis that the conformations generated and used for repeated sampling for the ensemble docking runs are sensitive enough for the EGCG to bind to the IDP than the random coil ensemble preferentially. However, in a broader sense, considering the number of conformations sampled by a protein typical of the size of p53-NTD in either the IDP or random coil ensemble is much larger than the size of the conformations used in the study. Thus, through the selection of a more extensive set of conformers might by itself not rectify the problem, the number of conformations sampled (1,000), the number of conformations re-sampled (three times at 1/10.sup.th of the total), and the ensemble docking runs of these conformations are limited by the computational resources available to the authors. The calculations were done on a Linux (Ubuntu 18.04.5 LTS) system with eight cores and a single GPU (GTX 1070). The two computationally expansive steps are generating the ensemble of IDP conformations using the molecular dynamics simulations and the ensemble docking routines required.
[0051] The apparent validity of DIBS is based on a broader agreement with the experimental NMR chemical shift changes (
[0052] Recently, in the case of NUPR1, a multifunctional IDP similar to the p53-NTD (82 AA), MD simulations typically in the range of 80-200 ns have been utilized. The RMSD plots of the simulations (
[0053] Molecular dynamics trajectories are often clustered to identify sub-groups of structures that share similar conformational properties. However, in an extensive study of ensemble docking of four G-protein-coupled receptors (in membrane environment), Falcon et al. found that ensembles generated from the clustered MD trajectories do not represent the conformations selected the ligands. These results suggest that repeated independent sampling using random number generation perhaps does not bias the ensembles with pre-selected conformations by clustering.
[0054] In a comprehensive perspective, a team of experts led by Smith and co-workers discussed the broader issues related to problems, approaches, and opportunities in conformational sampling and selection of conformations for ensemble docking studies. Although this work, is not providing a solution to what are the ‘selectable’ conformations, the introduction of control variables such as the random coil ensemble may increase the sensitivity of the search parameters over the free-energy landscape of the apo-protein ensemble by considering a differential free energy selection (between IDP and random coil). If the sampling of conformations used for the ensemble docking between the IDP and random coil ensembles is similar, then EGCG may not differentiate between them. The statistical test suggests that the residues show a significant difference between the data set only on selected amino acid residues of p53-NTD. More importantly, a close match with the experimental results verifies the value of the approach.
[0055] DIBS can differentiate IDP ensembles over another (random coil) ensemble via ensemble docking protocols and can offer an alternative approach to conventional results that utilize only the IDP ensemble. The proposed differential probability increases the sensitivity of docking scores as statistical comparison identifies regions of the receptor that may be significantly different from the control set. The DIBS approach presented here would be of value to proteins that are intrinsically disordered or enzymes that have intrinsically disordered regions (IDR). The relatively high computational cost of DIBS compared to standalone docking protocols may be circumvented with scalable molecular dynamics on CPU and GPU architectures. The Eroom's Law (the infamous Moore's law backward) in drug discovery states that the cost for discovering a new drug doubles every nine years. With the increasing computational power towards the availability of exaFLOP machines in the near future, custom-designed systems for MD simulations such as ANTON, as well high throughput simulations coupled with Markov State Models (MSMs), could make it viable to scale the DIBS. DIBS could be used as a rational screening to select a subset of potential ligands that prefer an ensemble of intrinsically disordered conformations, paving an avenue to the IDP-based drug discovery.
[0056] The above detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show, by way of illustration, specific embodiments in which the invention can be practiced. These embodiments are also referred to herein as “examples.” Such examples can include elements in addition to those shown or described. However, the present inventors also contemplate examples in which only those elements shown or described are provided. Moreover, the present inventors also contemplate examples using any combination or permutation of those elements shown or described (or one or more aspects thereof), either with respect to a particular example (or one or more aspects thereof), or with respect to other examples (or one or more aspects thereof) shown or described herein.
[0057] In the event of inconsistent usages between this document and any documents so incorporated by reference, the usage in this document controls.
[0058] In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of “at least one” or “one or more.” In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. In this document, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, composition, formulation, or process that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.
[0059] Method examples described herein can be machine or computer-implemented at least in part. Some examples can include a computer-readable medium or machine-readable medium encoded with instructions operable to configure an electronic device to perform methods as described in the above examples. An implementation of such methods can include code, such as microcode, assembly language code, a higher-level language code, or the like. Such code can include computer readable instructions for performing various methods. The code may form portions of computer program products. Further, in an example, the code can be tangibly stored on one or more volatile, non-transitory, or non-volatile tangible computer-readable media, such as during execution or at other times. Examples of these tangible computer-readable media can include, but are not limited to, hard disks, removable magnetic disks, removable optical disks (e.g., compact disks and digital video disks), magnetic cassettes, memory cards or sticks, random access memories (RAMs), read only memories (ROMs), and the like.
[0060] The above description is intended to be illustrative, and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in combination with each other. Other embodiments can be used, such as by one of ordinary skill in the art upon reviewing the above description. The Abstract is provided to comply with 37 C.F.R. § 1.72(b), to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed Description, various features may be grouped together to streamline the disclosure. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description as examples or embodiments, with each claim standing on its own as a separate embodiment, and it is contemplated that such embodiments can be combined with each other in various combinations or permutations. The scope of the invention should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.