Patent classifications
G16B40/20
MACHINE LEARNING SYSTEM FOR INTERPRETING HOST PHAGE RESPONSE
A computer implemented method of generating a machine learning model for interpreting host phage response data comprising receiving datasets and labels for a host phage response, training a machine learning model and using this model to estimate the efficacy of a test phage in inhibiting growth of a test bacteria.
ARTIFICIAL INTELLIGENCE-BASED DRUG MOLECULE PROCESSING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
An artificial intelligence-based (AI-based) drug molecule processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product are provided. The method includes: determining a plurality of candidate drug molecules for a target protein; performing activity prediction based on the plurality of candidate drug molecules and the target protein, to obtain activity information of each candidate drug molecule; performing homology modeling on the target protein, to obtain a reference protein having a structure homologous with that of the target protein; performing molecular docking based on the reference protein and the plurality of candidate drug molecules, to obtain molecular docking information of each candidate drug molecule; and screening the plurality of candidate drug molecules based on the activity information of each candidate drug molecule and the molecular docking information of each candidate drug molecule, to obtain target drug molecules for the target protein.
ARTIFICIAL INTELLIGENCE-BASED DRUG MOLECULE PROCESSING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
An artificial intelligence-based (AI-based) drug molecule processing method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product are provided. The method includes: determining a plurality of candidate drug molecules for a target protein; performing activity prediction based on the plurality of candidate drug molecules and the target protein, to obtain activity information of each candidate drug molecule; performing homology modeling on the target protein, to obtain a reference protein having a structure homologous with that of the target protein; performing molecular docking based on the reference protein and the plurality of candidate drug molecules, to obtain molecular docking information of each candidate drug molecule; and screening the plurality of candidate drug molecules based on the activity information of each candidate drug molecule and the molecular docking information of each candidate drug molecule, to obtain target drug molecules for the target protein.
SCREENING SYSTEM AND METHOD FOR ACQUIRING AND PROCESSING GENOMIC INFORMATION FOR GENERATING GENE VARIANT INTERPRETATIONS
A screening system includes control circuitry that determines gene variants present in a compiled genome representative of a subject based on a difference between a reference genome and the compiled genome representative of the subject, and acquires phenotype information from an observation of the subject. The control circuitry further generates multi-dimensional data structure that includes the gene variants in respect of a first dimension, the phenotype information in respect of a second dimension; and a set of data samples in respect of a third dimension. The set of data samples includes the compiled genome sequence representative of the subject, and corresponding historical data samples of other subjects including their corresponding transcript information (for example, including phenotype information) of the other subjects and their gene variants. The control circuitry executes a gene variant interpretation using a correlation function to find phenotype-gene variant relationships based on the generated multi-dimensional data structure.
SCREENING SYSTEM AND METHOD FOR ACQUIRING AND PROCESSING GENOMIC INFORMATION FOR GENERATING GENE VARIANT INTERPRETATIONS
A screening system includes control circuitry that determines gene variants present in a compiled genome representative of a subject based on a difference between a reference genome and the compiled genome representative of the subject, and acquires phenotype information from an observation of the subject. The control circuitry further generates multi-dimensional data structure that includes the gene variants in respect of a first dimension, the phenotype information in respect of a second dimension; and a set of data samples in respect of a third dimension. The set of data samples includes the compiled genome sequence representative of the subject, and corresponding historical data samples of other subjects including their corresponding transcript information (for example, including phenotype information) of the other subjects and their gene variants. The control circuitry executes a gene variant interpretation using a correlation function to find phenotype-gene variant relationships based on the generated multi-dimensional data structure.
METHOD AND SYSTEM FOR SCREENING NEOANTIGENS, AND USES THEREOF
Provided are a method and system for screening neoantigen and uses of neoantigens. Specifically, provided are a method and system for screening neoantigens derived from a gene of which expression is essential for survival of a cancer cell and/or a is homogeneously expressed in all cells in cancer tissue as a diagnostic and/or therapeutic target, and uses of neoantigens.
MOLECULE DESIGN
Systems and methods of discovering compounds with biological properties are provided. A first training dataset is obtained, including chemical structures and biological properties. Projections of compounds are obtained by projecting chemical structure information into a latent representation space using encoder weights. Compounds are classified by inputting projections into the classifier using classifier weights. The encoder and classifier are trained by comparing the classification of each compound to actual biological properties and updating the respective weights. A second training dataset is obtained including chemical structures. Projections of compounds are obtained by projecting chemical structure information into a latent representation space using encoder weights. Chemical structures are obtained by inputting projections into a decoder using decoder weights. The decoder is trained by comparing outputted and actual chemical structures and updating the respective weights. Candidate compounds not present in the first and second datasets are identified using the trained encoder, classifier, and decoder.
MOLECULE DESIGN
Systems and methods of discovering compounds with biological properties are provided. A first training dataset is obtained, including chemical structures and biological properties. Projections of compounds are obtained by projecting chemical structure information into a latent representation space using encoder weights. Compounds are classified by inputting projections into the classifier using classifier weights. The encoder and classifier are trained by comparing the classification of each compound to actual biological properties and updating the respective weights. A second training dataset is obtained including chemical structures. Projections of compounds are obtained by projecting chemical structure information into a latent representation space using encoder weights. Chemical structures are obtained by inputting projections into a decoder using decoder weights. The decoder is trained by comparing outputted and actual chemical structures and updating the respective weights. Candidate compounds not present in the first and second datasets are identified using the trained encoder, classifier, and decoder.
MACHINE LEARNING PREDICTION OF THERAPY RESPONSE
A method comprising receiving, for each of a plurality of subjects having a specified type of disease and receiving a specified therapy for treating the disease, a first biological signature obtained pre-treatment and a second biological signature obtained on-treatment; calculating, for each of the plurality of subjects, a set of values representing a ratio between the first and second biological signatures associated with the respective subject; at a training stage, training a machine learning model on a training set comprising: (i) the calculated sets of values, and (ii) labels associated with an outcome of the specified therapy in each of the subjects; to generate a classifier suitable for predicting a response in a target patient to said specified therapy.
METHODS AND APPARATUS FOR EFFICIENT AND ACCURATE ASSEMBLY OF LONG-READ GENOMIC SEQUENCES
The present application generally relates to identifying gene clusters from long-read genomic sequencing data. The disclosure provides methods, non-transitory computer readable media, and apparatuses for processing long-read genomic sequencing data, performing error corrections, and identifying gene cluster, e.g. biosynthetic gene clusters. The methods, non-transitory computer readable media, and apparatuses described herein can be employed in broad areas of biological applications, such as drug discovery, industrial chemical discovery and production, and basic biological research.