G16B10/00

Bin-specific and hash-based efficient comparison of sequencing results

The technology disclosed generates a reference array of variant data for locations that are shared between read results which are to be compared, and generates hashes over a selected pattern length of positions in the reference array to independently produce non-unique window hashes for base patterns in the read results. It then selects for comparison window hashes that occur less than a ceiling number of times and compares the selected window hashes to identify common window hashes between the read results. It then determines a similarity measure for the read results based on the common window hashes.

Computational systems and methods for discovering allosteric sites and allosteric modulators of proteins
11621053 · 2023-04-04 · ·

Methods and systems are described for identification and characterization of allosteric sites in proteins and enzyme molecules. The disclosed methods allow for identification of natural and true binding sites on surface regions of protein and enzyme molecules by following the pathways of energy flow between the activity center to the surface regions. Allosteric sites are identified and ranked for their effect on target activity of the protein using computational methods. Then chemical libraries are screened to find the best candidates for drug like molecules that affect target activity of the protein or enzyme.

Computational systems and methods for discovering allosteric sites and allosteric modulators of proteins
11621053 · 2023-04-04 · ·

Methods and systems are described for identification and characterization of allosteric sites in proteins and enzyme molecules. The disclosed methods allow for identification of natural and true binding sites on surface regions of protein and enzyme molecules by following the pathways of energy flow between the activity center to the surface regions. Allosteric sites are identified and ranked for their effect on target activity of the protein using computational methods. Then chemical libraries are screened to find the best candidates for drug like molecules that affect target activity of the protein or enzyme.

ESTIMATION OF PHENOTYPES USING DNA, PEDIGREE, AND HISTORICAL DATA

Disclosed are techniques for predicting a trait of an individual and identifying a set of enriched record collections of a genetic community. To predict a trait of an individual, DNA features and non-DNA features of the individual are accessed to generate a feature vector that is inputted into a machine learning model. The machine learning model generates a prediction of the trait. The prediction may be based on an inheritance prediction and/or a community prediction. To identify a set of enriched record collections, individuals belonging to a genetic community are identified and a set of candidate record collections are accessed. A community count and a background count is determined for each candidate record collection. The set of enriched record collections are identified based on a comparison of the community count and the background count. The genetic community may be annotated using the set of enriched record collections.

ESTIMATION OF PHENOTYPES USING DNA, PEDIGREE, AND HISTORICAL DATA

Disclosed are techniques for predicting a trait of an individual and identifying a set of enriched record collections of a genetic community. To predict a trait of an individual, DNA features and non-DNA features of the individual are accessed to generate a feature vector that is inputted into a machine learning model. The machine learning model generates a prediction of the trait. The prediction may be based on an inheritance prediction and/or a community prediction. To identify a set of enriched record collections, individuals belonging to a genetic community are identified and a set of candidate record collections are accessed. A community count and a background count is determined for each candidate record collection. The set of enriched record collections are identified based on a comparison of the community count and the background count. The genetic community may be annotated using the set of enriched record collections.

Optimizing k-mer databases by k-mer subtraction

Methods are disclosed for reducing the size of a k-mer reference database used for queries and/or taxonomic classifications when available computer storage and/or memory are inadequate. The k-mers of the reference database have been previously classified to a taxonomy, preferably based on genetic distances. In one method, the k-mers are separated into one or more groups followed by removing k-mers common to the groups. In another method, k-mers are removed based on a selected taxonomic threshold level. A third method combines the features of the previous two methods. The methods are adaptable to machine learning.

Optimizing k-mer databases by k-mer subtraction

Methods are disclosed for reducing the size of a k-mer reference database used for queries and/or taxonomic classifications when available computer storage and/or memory are inadequate. The k-mers of the reference database have been previously classified to a taxonomy, preferably based on genetic distances. In one method, the k-mers are separated into one or more groups followed by removing k-mers common to the groups. In another method, k-mers are removed based on a selected taxonomic threshold level. A third method combines the features of the previous two methods. The methods are adaptable to machine learning.

LEVERAGING GENETICS AND FEATURE ENGINEERING TO BOOST PLACEMENT PREDICTABILITY FOR SEED PRODUCT SELECTION AND RECOMMENDATION BY FIELD

An example computer-implemented method includes receiving agricultural data records comprising a first set of yield properties for a first set of seeds grown in a first set of environments, and receiving genetic feature data related to a second set of seeds. The method further includes generating a second set of yield properties for the second set of seeds associated with a second set of environments by applying a model using the genetic feature data and the agricultural data records. In addition, the method includes determining predicted yield performance for a third set of seeds associated with one or more target environments by applying the second set of yield properties, and generating seed recommendations for the one or more target environments based on the predicted yield performance for the third set of seeds. In the present example, the method also includes causing display of the seed recommendations.

LEVERAGING GENETICS AND FEATURE ENGINEERING TO BOOST PLACEMENT PREDICTABILITY FOR SEED PRODUCT SELECTION AND RECOMMENDATION BY FIELD

An example computer-implemented method includes receiving agricultural data records comprising a first set of yield properties for a first set of seeds grown in a first set of environments, and receiving genetic feature data related to a second set of seeds. The method further includes generating a second set of yield properties for the second set of seeds associated with a second set of environments by applying a model using the genetic feature data and the agricultural data records. In addition, the method includes determining predicted yield performance for a third set of seeds associated with one or more target environments by applying the second set of yield properties, and generating seed recommendations for the one or more target environments based on the predicted yield performance for the third set of seeds. In the present example, the method also includes causing display of the seed recommendations.

Ancestry composition determination

Presenting ancestral origin information, comprising: receiving a request to display ancestry data of an individual; obtaining ancestry composition information of the individual, the ancestry composition information including information pertaining to a proportion of the individual's genotype data that is deemed to correspond to a specific ancestry; and presenting the ancestry composition information to be displayed.