Patent classifications
G16B40/00
SYSTEM AND METHOD FOR PREDICTING LOSS OF FUNCTION CAUSED BY GENETIC VARIANT
Disclosed herein is a system for predicting a loss of the function of genetic variants. The system includes a loss of function (LoF) prediction unit for calculating a probability that a target genetic variant will cause a loss of function (LoF) in a target gene through logistic regression with respect to a first probability that the target gene will be intolerant of the loss of function and a second probability that the target genetic variant contained in the target gene will be intolerant.
SYSTEM AND METHOD FOR PREDICTING LOSS OF FUNCTION CAUSED BY GENETIC VARIANT
Disclosed herein is a system for predicting a loss of the function of genetic variants. The system includes a loss of function (LoF) prediction unit for calculating a probability that a target genetic variant will cause a loss of function (LoF) in a target gene through logistic regression with respect to a first probability that the target gene will be intolerant of the loss of function and a second probability that the target genetic variant contained in the target gene will be intolerant.
SYSTEM AND METHOD FOR THE CONTEXTUALIZATION OF MOLECULES
A system and method that given one or more input molecules, produces a contextualized summary of characteristics of related target molecules, e.g., proteins. Using a knowledge graph which is populated with all known molecules, input molecules are analyzed according to various similarity indexes which relate the input molecules to target proteins or other biological entities. The knowledge graph may also comprise scientific literature, governmental data (FDA clinical phase data), private research endeavors (general assays, etc.), and other related biological data. The summary produced may comprise target proteins that satisfy certain biological properties, general assay results (ADMET characteristics), related diseases, off-target molecule interactions (non-targeted molecules involved in a specific pathway or cascade), market opportunities, patents, experiments, and new hypothesis.
SYSTEM AND METHOD FOR THE CONTEXTUALIZATION OF MOLECULES
A system and method that given one or more input molecules, produces a contextualized summary of characteristics of related target molecules, e.g., proteins. Using a knowledge graph which is populated with all known molecules, input molecules are analyzed according to various similarity indexes which relate the input molecules to target proteins or other biological entities. The knowledge graph may also comprise scientific literature, governmental data (FDA clinical phase data), private research endeavors (general assays, etc.), and other related biological data. The summary produced may comprise target proteins that satisfy certain biological properties, general assay results (ADMET characteristics), related diseases, off-target molecule interactions (non-targeted molecules involved in a specific pathway or cascade), market opportunities, patents, experiments, and new hypothesis.
SYSTEMS AND METHODS FOR ARTIFICIAL INTELLIGENCE-BASED PREDICTION OF AMINO ACID SEQUENCES AT A BINDING INTERFACE
Presented herein are systems and methods for prediction of protein interfaces for binding to target molecules. In certain embodiments, technologies described herein utilize graph-based neural networks to predict portions of protein/peptide structures that are located at an interface of custom biologic (e.g., a protein and/or peptide) that is being designed for binding to a target molecule, such as another protein or peptide. In certain embodiments, graph-based neural network models described herein may receive, as input, a representation (e.g., a graph representation) of a complex comprising a target and a partially-defined custom biologic. Portions of the partially-defined custom biologic may be known, while other portions, such an amino acid sequence and/or particular amino acid types at certain locations of an interface, are unknown and/or to be customized for binding to a particular target. A graph-based neural network model as described herein may then, based on the received input, generate predictions of likely acid sequences and/or types of particular amino acids at the unknown portions. These predictions can then be used to determine (e.g., fill in) amino acid sequences and/or structures to complete the custom biologic.
SYSTEMS AND METHODS FOR ARTIFICIAL INTELLIGENCE-BASED PREDICTION OF AMINO ACID SEQUENCES AT A BINDING INTERFACE
Presented herein are systems and methods for prediction of protein interfaces for binding to target molecules. In certain embodiments, technologies described herein utilize graph-based neural networks to predict portions of protein/peptide structures that are located at an interface of custom biologic (e.g., a protein and/or peptide) that is being designed for binding to a target molecule, such as another protein or peptide. In certain embodiments, graph-based neural network models described herein may receive, as input, a representation (e.g., a graph representation) of a complex comprising a target and a partially-defined custom biologic. Portions of the partially-defined custom biologic may be known, while other portions, such an amino acid sequence and/or particular amino acid types at certain locations of an interface, are unknown and/or to be customized for binding to a particular target. A graph-based neural network model as described herein may then, based on the received input, generate predictions of likely acid sequences and/or types of particular amino acids at the unknown portions. These predictions can then be used to determine (e.g., fill in) amino acid sequences and/or structures to complete the custom biologic.
Classification and identification of disease genes using biased feature correction
Embodiments of the present invention provide methods, computer program products, and systems for classification and identification of cancer genes while correcting for sample bias for tumor-derived genomic features as well as other biased features using machine learning techniques. Embodiments of the present invention can be used to receive a set of genes that include a first gene and a subset of synthetic genes that include similar features to the first gene and receive a set of gene labels associated with physiological characteristics. Embodiments of the present invention can estimate probabilities that genes in the set of genes are associated with gene labels in the set of gene labels using a machine learning classifier and estimate an effective probability range for the first gene and each gene label based, at least in part, on the first gene's estimated probabilities and the estimated probabilities of one or more of the synthetic genes.
Classification and identification of disease genes using biased feature correction
Embodiments of the present invention provide methods, computer program products, and systems for classification and identification of cancer genes while correcting for sample bias for tumor-derived genomic features as well as other biased features using machine learning techniques. Embodiments of the present invention can be used to receive a set of genes that include a first gene and a subset of synthetic genes that include similar features to the first gene and receive a set of gene labels associated with physiological characteristics. Embodiments of the present invention can estimate probabilities that genes in the set of genes are associated with gene labels in the set of gene labels using a machine learning classifier and estimate an effective probability range for the first gene and each gene label based, at least in part, on the first gene's estimated probabilities and the estimated probabilities of one or more of the synthetic genes.
Rapid assessment of crude oil fouling propensity to prevent refinery fouling
A process for producing liquid transportation fuels in a petroleum refinery while avoiding the usage of crude oil feed stock that characterized by a fouling thermal resistance having the potential to foul refinery processes and equipment. Spectral data selected from NIR, NMR or both is obtained and converted to wavelets coefficients data. A genetic algorithm (or support vector machines) is then trained to recognize subtle features in the wavelet coefficients data to allow classification of crude samples into one of two groups based on fouling potential. Rapid classification of a potential crude oil feed stock according to its fouling potential prevents the utilization of feed stocks characterized by increased fouling potential in a petroleum refinery to produce liquid transportation fuels.
Rapid assessment of crude oil fouling propensity to prevent refinery fouling
A process for producing liquid transportation fuels in a petroleum refinery while avoiding the usage of crude oil feed stock that characterized by a fouling thermal resistance having the potential to foul refinery processes and equipment. Spectral data selected from NIR, NMR or both is obtained and converted to wavelets coefficients data. A genetic algorithm (or support vector machines) is then trained to recognize subtle features in the wavelet coefficients data to allow classification of crude samples into one of two groups based on fouling potential. Rapid classification of a potential crude oil feed stock according to its fouling potential prevents the utilization of feed stocks characterized by increased fouling potential in a petroleum refinery to produce liquid transportation fuels.