G06F19/28

METHODS AND SYSTEMS FOR DESIGNING GENE PANELS
20170351807 · 2017-12-07 ·

A system and method of selecting genes for a gene panel, includes retrieving gene-disease associations of genes associated with diseases at a given level in the disease hierarchy from a disease association database. The disease association database stores disease information, gene information, phenotype information, associations between diseases in the disease hierarchy, gene-disease associations and strength parameters related to the gene-disease associations. For each gene associated with the diseases at the given level, the strength parameters are weighted and combined to determine a rank score for the each gene. The genes are ranked based on the rank scores to provide ranked gene information. The ranked gene information is linked with diseases at the higher levels of the disease hierarchy based on hierarchical relationships. The ranked gene information for gene-disease associations can be used to select genes for a gene panel design.

METHOD AND SYSTEM FOR MICROBIOME-DERIVED DIAGNOSTICS AND THERAPEUTICS FOR LOCOMOTOR SYSTEM CONDITIONS
20170344719 · 2017-11-30 ·

A method for at least one of characterizing, diagnosing, and treating a locomotor system condition in at least a subject, the method comprising: receiving an aggregate set of biological samples from a population of subjects; generating at least one of a microbiome composition dataset and a microbiome functional diversity dataset for the population of subjects; generating a characterization of the locomotor system condition based upon features extracted from at least one of the microbiome composition dataset and the microbiome functional diversity dataset; based upon the characterization, generating a therapy model configured to correct the locomotor system condition; and at an output device associated with the subject, promoting a therapy to the subject based upon the characterization and the therapy model.

COMPUTATIONAL METHOD FOR CLASSIFYING AND PREDICTING PROTEIN SIDE CHAIN CONFORMATIONS
20170329892 · 2017-11-16 · ·

Computational methods for classifying and predicting protein side chain conformations utilizing a data driven scoring function are disclosed. According to some embodiments, the methods may include obtaining structure data representing a plurality of conformations of a compound. The methods may also include determining structural differences among the conformations. The methods may also include classifying, based on the structural differences, the conformations into one or more clusters. The methods may also include determining representative conformations of the dusters, wherein an average structural difference between a representative conformation of a duster and conformations in the duster is below a predetermined threshold. The method may further include determining the representative conformations as poses of the compound.

IDENTIFYING VARIANTS OF INTEREST BY IMPUTATION
20170329901 · 2017-11-16 ·

Processing genetic information comprises: receiving an input that includes information pertaining to a specific genetic variant; and identifying, in a database comprising genotype information of a plurality of candidate individuals, a matching individual imputed to have the specific genetic variant. The genotype information of the matching individual corresponding to the specific genetic variant is not directly assayed.

DATABASE AND DATA PROCESSING SYSTEM FOR USE WITH A NETWORK-BASED PERSONAL GENETICS SERVICES PLATFORM

Databases and data processing systems for use with a network-based personal genetics services platform may include member information pertaining to a plurality of members of the network-based personal genetics services platform. The member information may include genetic information, family history information, environmental information, and phenotype information of the plurality of members. A data processing system may determine, based at least in part on the member information, a model for predicting a phenotype from genetic information, family history information, and environmental information, wherein determining the model includes training the model using the member information pertaining to a set of the plurality of members. The data processing system may also receive a request from a questing member to predict a phenotype of interest, and apply an individual's genetic information, family history information, and environmental information to the model to obtain a prediction associated with the phenotype of interest for the requesting member.

ESTIMATION OF ADMIXTURE GENERATION

Admixture generation determination includes: obtaining ancestry assignment information associated with an individual's genotype data, the ancestry assignment information at least indicating that a portion of the individual's genotype data is deemed to be associated with a specific ancestry; determining the individual's genetic ancestry summary data corresponding to the specific ancestry; estimating an admixture generation associated with the specific ancestry, the admixture generation indicating a most recent generation or a most recent generation range from which the individual has at least one non-admixed ancestor of the specific ancestry, the estimation including a maximum likelihood determination based at least in part on the individual's genetic ancestry summary data and a recombination model; and outputting the estimated admixture generation.

Identification of Microorganisms from genome sequencing data

A diagnostic analysis method and system is provided for identifying a microorganism from a genome sequence. Partially or fully assembled microbial genomes or short reads from whole-genome sequencing of microbial genomes are processed into a 4 MB Boolean array while preserving 1% of the genomic information in a way that allows for rapid comparison of a query genome to a large reference database. This represents a critical savings in storage space and speed by which large reference libraries can be queried.

METHODS AND SYSTEMS FOR IDENTIFYING LIGAND-PROTEIN BINDING SITES

The invention provides a novel integrated structure and system-based approach for drug target prediction that enables the large-scale discovery of new targets for existing drugs Novel computer-readable storage media and computer systems are also provided. Methods and systems of the invention use novel sequence order-independent structure alignment, hierarchical clustering, and probabilistic sequence similarity techniques to construct a probabilistic pocket ensemble (PPE) that captures even promiscuous structural features of different binding sites for a drug on known targets. The drug's PPE is combined with an approximation of the drug delivery profile to facilitate large-scale prediction of novel drug-protein interactions with several applications to biological research and drug development.

PARALLEL-PROCESSING SYSTEMS AND METHODS FOR HIGHLY SCALABLE ANALYSIS OF BIOLOGICAL SEQUENCE DATA
20170316154 · 2017-11-02 ·

An apparatus includes a memory configured to store a sequence that includes an estimation of a biological sequence. The sequence includes a set of elements. The apparatus also includes an assignment module implemented in a hardware processor. The assignment module is configured to receive the sequence from the memory, and assign each element to at least one segment from a set of segments, including, when an element maps to at least a first segment and a second segment, assigning the element set of segments specific to that hardware processor, and substantially simultaneous with the remaining hardware processors, remove at least a portion of duplicate elements in that segment to generate a deduplicated segment. Reorder the elements in the deduplicated segment to generate a realigned segment that has a reduced likelihood for alignment errors

Methods for Genome Assembly, Haplotype Phasing, and Target Independent Nucleic Acid Detection
20170314014 · 2017-11-02 ·

The disclosure provides methods to assemble genomes of eukaryotic or prokaryotic organisms. The disclosure provides methods for haplotype phasing and meta-genomics assemblies. The disclosure provides a streamlined method for accomplishing these tasks, such that intermediates need not be labeled by an affinity label to facilitate binding to a solid surface. The disclosure also provides methods and compositions for the de novo generation of scaffold information, linkage information, and genome information for unknown organisms in heterogeneous metagenomic samples or samples obtained from multiple individuals. Practice of the methods can allow de novo sequencing of entire genomes of uncultured or unidentified organisms in heterogeneous samples, or the determination of linkage information for nucleic acid molecules in samples comprising nucleic acids obtained from multiple individuals.