Patent classifications
G16B15/20
SYSTEM AND METHOD FOR GENERATING A PROTEIN SEQUENCE
A method and system for generating a protein sequence is implemented using a computer-implemented neural network. An empty or partially filed sequence of node elements, representing amino acid positions of the protein sequence, and an edge index, having edge elements defining physical interaction between amino acid positions, are received. The computer-implemented neural network operates to determine enhanced edge attribute values for edge elements of the edge index and enhanced amino acid values for node elements of the sequence. Amino acid values are generated for elements of the partially filed sequence having missing values.
EMBEDDING-BASED GENERATIVE MODEL FOR PROTEIN DESIGN
A system and method for designing protein sequences conditioned on a specific target fold. The system is a transformer-based generative framework for modeling a complex sequence-structure relationship. To mitigate the heterogeneity between the sequence domain and the fold domain, a Fold-to-Sequence model jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. The joint sequence-fold representation through novel intra-domain and cross-domain losses with an intra-domain loss forcing two semantically similar (where the proteins should have the same fold(s)) samples from the same domain to be close to each other in a latent space, while a cross-domain loss forces two semantically similar samples in different domains to be closer. In an embodiment, the Fold-to-Sequence model performs design tasks that include low resolution structures, structures with region of missing residues, and NMR structural ensembles.
EMBEDDING-BASED GENERATIVE MODEL FOR PROTEIN DESIGN
A system and method for designing protein sequences conditioned on a specific target fold. The system is a transformer-based generative framework for modeling a complex sequence-structure relationship. To mitigate the heterogeneity between the sequence domain and the fold domain, a Fold-to-Sequence model jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. The joint sequence-fold representation through novel intra-domain and cross-domain losses with an intra-domain loss forcing two semantically similar (where the proteins should have the same fold(s)) samples from the same domain to be close to each other in a latent space, while a cross-domain loss forces two semantically similar samples in different domains to be closer. In an embodiment, the Fold-to-Sequence model performs design tasks that include low resolution structures, structures with region of missing residues, and NMR structural ensembles.
Optimizing Proteins Using Model Based Optimizations
Humanizing proteins can be a laborious process, often involving trial and error or other non-systematic methods. To improve humanization, neural networks can be employed to generate new protein sequences having higher probabilities of being humanized. In an embodiment, a method includes evaluating the immunogenicity of a sampling of protein sequences. The method can include weighting the sampling of protein sequences from the generative model according to an estimated probability of a particular generated protein sequence having a deviation in immunogenicity than a particular percentile of immunogenicity of the sampling of protein sequences. The method can further include generating a protein sequence weighted sampling of protein sequences. The generated protein sequence representing a protein has an altered immunogenicity. Such a generated protein has a higher likelihood of being humanized.
Optimizing Proteins Using Model Based Optimizations
Humanizing proteins can be a laborious process, often involving trial and error or other non-systematic methods. To improve humanization, neural networks can be employed to generate new protein sequences having higher probabilities of being humanized. In an embodiment, a method includes evaluating the immunogenicity of a sampling of protein sequences. The method can include weighting the sampling of protein sequences from the generative model according to an estimated probability of a particular generated protein sequence having a deviation in immunogenicity than a particular percentile of immunogenicity of the sampling of protein sequences. The method can further include generating a protein sequence weighted sampling of protein sequences. The generated protein sequence representing a protein has an altered immunogenicity. Such a generated protein has a higher likelihood of being humanized.
PROTEINS FOR STABILIZATION OF BIOLOGICAL MATERIAL
Embodiments of the present disclosure generally relate to methods and compositions for stabilizing biological material using intrinsically disordered proteins. In an embodiment, a composition is provided, the composition including a first component comprising at least one intrinsically disordered protein; and a second component comprising at least one biological material of interest, at least one biologically-derived material of interest, or both, the second component being free of the at least one intrinsically disordered protein. The methods and compositions include at least one intrinsically disordered protein that can be modified to prevent, or at least mitigate, polymerization thereof and the formation of gel-like matrices, thereby, e.g., improving the ability of the intrinsically disordered proteins to protect and stabilize sensitive biological materials.
PROTEINS FOR STABILIZATION OF BIOLOGICAL MATERIAL
Embodiments of the present disclosure generally relate to methods and compositions for stabilizing biological material using intrinsically disordered proteins. In an embodiment, a composition is provided, the composition including a first component comprising at least one intrinsically disordered protein; and a second component comprising at least one biological material of interest, at least one biologically-derived material of interest, or both, the second component being free of the at least one intrinsically disordered protein. The methods and compositions include at least one intrinsically disordered protein that can be modified to prevent, or at least mitigate, polymerization thereof and the formation of gel-like matrices, thereby, e.g., improving the ability of the intrinsically disordered proteins to protect and stabilize sensitive biological materials.
Methods of profiling mass spectral data using neural networks
Methods are provided to classify and identify features in mass spectral data using neural network algorithms. A convolutional neural network (CNN) was trained to identify amino acids from an unknown protein sample. The CNN was trained using known peptide sequences to predict amino acid presence, diversity, and frequency, peptide length, subsequences of amino acids classified by features include aliphatic/aromatic, hydrophobic/hydrophilic, positive/negative charge, and combinations thereof. Mass spectra data of a sample unknown to the trained CNN was discretized into a one-dimensional vector and input into the CNN. The CNN models can potentially be integrated to determine the complete peptide sequence from a spectrum, thereby improving the yield of identifiable protein sequences from mass spec analysis.
Methods of profiling mass spectral data using neural networks
Methods are provided to classify and identify features in mass spectral data using neural network algorithms. A convolutional neural network (CNN) was trained to identify amino acids from an unknown protein sample. The CNN was trained using known peptide sequences to predict amino acid presence, diversity, and frequency, peptide length, subsequences of amino acids classified by features include aliphatic/aromatic, hydrophobic/hydrophilic, positive/negative charge, and combinations thereof. Mass spectra data of a sample unknown to the trained CNN was discretized into a one-dimensional vector and input into the CNN. The CNN models can potentially be integrated to determine the complete peptide sequence from a spectrum, thereby improving the yield of identifiable protein sequences from mass spec analysis.
Method for calculating binding free energy, calculation device, and program
A method for calculating binding free energy, where the method includes a plurality of steps each including adding a distance restraint potential between a binding calculation target molecule and a target molecule, wherein the method is a method for calculating binding free energy between the binding calculation target molecule and the target molecule using a computer, and wherein anchor points of the binding calculation target molecule in the plurality of the steps are identical anchor points, and anchor points of the target molecule in the plurality of the steps are different anchor points.