Patent classifications
G16B15/30
ARTIFICIAL INTELLIGENCE-BASED DRUG MOLECULE PROCESSING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
An artificial intelligence-based (AI-based) drug molecule processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product are provided. The method includes: determining a plurality of candidate drug molecules for a target protein; performing activity prediction based on the plurality of candidate drug molecules and the target protein, to obtain activity information of each candidate drug molecule; performing homology modeling on the target protein, to obtain a reference protein having a structure homologous with that of the target protein; performing molecular docking based on the reference protein and the plurality of candidate drug molecules, to obtain molecular docking information of each candidate drug molecule; and screening the plurality of candidate drug molecules based on the activity information of each candidate drug molecule and the molecular docking information of each candidate drug molecule, to obtain target drug molecules for the target protein.
ARTIFICIAL INTELLIGENCE-BASED DRUG MOLECULE PROCESSING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
An artificial intelligence-based (AI-based) drug molecule processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product are provided. The method includes: determining a plurality of candidate drug molecules for a target protein; performing activity prediction based on the plurality of candidate drug molecules and the target protein, to obtain activity information of each candidate drug molecule; performing homology modeling on the target protein, to obtain a reference protein having a structure homologous with that of the target protein; performing molecular docking based on the reference protein and the plurality of candidate drug molecules, to obtain molecular docking information of each candidate drug molecule; and screening the plurality of candidate drug molecules based on the activity information of each candidate drug molecule and the molecular docking information of each candidate drug molecule, to obtain target drug molecules for the target protein.
MOLECULE DESIGN
Systems and methods of discovering compounds with biological properties are provided. A first training dataset is obtained, including chemical structures and biological properties. Projections of compounds are obtained by projecting chemical structure information into a latent representation space using encoder weights. Compounds are classified by inputting projections into the classifier using classifier weights. The encoder and classifier are trained by comparing the classification of each compound to actual biological properties and updating the respective weights. A second training dataset is obtained including chemical structures. Projections of compounds are obtained by projecting chemical structure information into a latent representation space using encoder weights. Chemical structures are obtained by inputting projections into a decoder using decoder weights. The decoder is trained by comparing outputted and actual chemical structures and updating the respective weights. Candidate compounds not present in the first and second datasets are identified using the trained encoder, classifier, and decoder.
MOLECULE DESIGN
Systems and methods of discovering compounds with biological properties are provided. A first training dataset is obtained, including chemical structures and biological properties. Projections of compounds are obtained by projecting chemical structure information into a latent representation space using encoder weights. Compounds are classified by inputting projections into the classifier using classifier weights. The encoder and classifier are trained by comparing the classification of each compound to actual biological properties and updating the respective weights. A second training dataset is obtained including chemical structures. Projections of compounds are obtained by projecting chemical structure information into a latent representation space using encoder weights. Chemical structures are obtained by inputting projections into a decoder using decoder weights. The decoder is trained by comparing outputted and actual chemical structures and updating the respective weights. Candidate compounds not present in the first and second datasets are identified using the trained encoder, classifier, and decoder.
Engineering and optimization of systems, methods, enzymes and guide scaffolds of CAS9 orthologs and variants for sequence manipulation
The invention provides for systems, methods, and compositions for altering expression of target gene sequences and related gene products. Provided are structural information on the Cas protein of the CRISPR-Cas system, use of this information in generating modified components of the CRISPR complex, vectors and vector systems which encode one or more components or modified components of a CRISPR complex, as well as methods for the design and use of such vectors and components. Also provided are methods of directing CRISPR complex formation in eukaryotic cells and methods for utilizing the CRISPR-Cas system. In particular the present invention comprehends optimized functional CRISPR-Cas enzyme systems. In particular the present invention comprehends engineered new guide architectures and enzymes to be used in optimized Staphylococcus aureus CRISPR-Cas enzyme systems.
Engineering and optimization of systems, methods, enzymes and guide scaffolds of CAS9 orthologs and variants for sequence manipulation
The invention provides for systems, methods, and compositions for altering expression of target gene sequences and related gene products. Provided are structural information on the Cas protein of the CRISPR-Cas system, use of this information in generating modified components of the CRISPR complex, vectors and vector systems which encode one or more components or modified components of a CRISPR complex, as well as methods for the design and use of such vectors and components. Also provided are methods of directing CRISPR complex formation in eukaryotic cells and methods for utilizing the CRISPR-Cas system. In particular the present invention comprehends optimized functional CRISPR-Cas enzyme systems. In particular the present invention comprehends engineered new guide architectures and enzymes to be used in optimized Staphylococcus aureus CRISPR-Cas enzyme systems.
MEDIA, METHODS, AND SYSTEMS FOR PROTEIN DESIGN AND OPTIMIZATION
Exemplary embodiments relate to a protein engineering pipeline configured to optimize or improve proteins for specified functions. The problem space of such a task can grow quickly based on the sequence of the protein being optimized and the functions for which the protein is being designed. The solutions described herein allow the problem space to be efficiently searched by applying a combination of a protein design pipeline and an evaluation procedure performed on a quantum computer. As a result, single or multiple amino acid substitutions at a site of interest may be predicted in order to generate optimized protein variants.
MEDIA, METHODS, AND SYSTEMS FOR PROTEIN DESIGN AND OPTIMIZATION
Exemplary embodiments relate to a protein engineering pipeline configured to optimize or improve proteins for specified functions. The problem space of such a task can grow quickly based on the sequence of the protein being optimized and the functions for which the protein is being designed. The solutions described herein allow the problem space to be efficiently searched by applying a combination of a protein design pipeline and an evaluation procedure performed on a quantum computer. As a result, single or multiple amino acid substitutions at a site of interest may be predicted in order to generate optimized protein variants.
SYSTEMS AND METHODS FOR ARTIFICIAL INTELLIGENCE-BASED PREDICTION OF AMINO ACID SEQUENCES AT A BINDING INTERFACE
Presented herein are systems and methods for prediction of protein interfaces for binding to target molecules. In certain embodiments, technologies described herein utilize graph-based neural networks to predict portions of protein/peptide structures that are located at an interface of custom biologic (e.g., a protein and/or peptide) that is being designed for binding to a target molecule, such as another protein or peptide. In certain embodiments, graph-based neural network models described herein may receive, as input, a representation (e.g., a graph representation) of a complex comprising a target and a partially-defined custom biologic. Portions of the partially-defined custom biologic may be known, while other portions, such an amino acid sequence and/or particular amino acid types at certain locations of an interface, are unknown and/or to be customized for binding to a particular target. A graph-based neural network model as described herein may then, based on the received input, generate predictions of likely acid sequences and/or types of particular amino acids at the unknown portions. These predictions can then be used to determine (e.g., fill in) amino acid sequences and/or structures to complete the custom biologic.
SYSTEMS AND METHODS FOR ARTIFICIAL INTELLIGENCE-BASED PREDICTION OF AMINO ACID SEQUENCES AT A BINDING INTERFACE
Presented herein are systems and methods for prediction of protein interfaces for binding to target molecules. In certain embodiments, technologies described herein utilize graph-based neural networks to predict portions of protein/peptide structures that are located at an interface of custom biologic (e.g., a protein and/or peptide) that is being designed for binding to a target molecule, such as another protein or peptide. In certain embodiments, graph-based neural network models described herein may receive, as input, a representation (e.g., a graph representation) of a complex comprising a target and a partially-defined custom biologic. Portions of the partially-defined custom biologic may be known, while other portions, such an amino acid sequence and/or particular amino acid types at certain locations of an interface, are unknown and/or to be customized for binding to a particular target. A graph-based neural network model as described herein may then, based on the received input, generate predictions of likely acid sequences and/or types of particular amino acids at the unknown portions. These predictions can then be used to determine (e.g., fill in) amino acid sequences and/or structures to complete the custom biologic.