Patent classifications
G16B10/00
Estimation of phenotypes using DNA, pedigree, and historical data
Disclosed are techniques for predicting a trait of an individual and identifying a set of enriched record collections of a genetic community. To predict a trait of an individual, DNA features and non-DNA features of the individual are accessed to generate a feature vector that is inputted into a machine learning model. The machine learning model generates a prediction of the trait. The prediction may be based on an inheritance prediction and/or a community prediction. To identify a set of enriched record collections, individuals belonging to a genetic community are identified and a set of candidate record collections are accessed. A community count and a background count is determined for each candidate record collection. The set of enriched record collections are identified based on a comparison of the community count and the background count. The genetic community may be annotated using the set of enriched record collections.
Estimation of phenotypes using DNA, pedigree, and historical data
Disclosed are techniques for predicting a trait of an individual and identifying a set of enriched record collections of a genetic community. To predict a trait of an individual, DNA features and non-DNA features of the individual are accessed to generate a feature vector that is inputted into a machine learning model. The machine learning model generates a prediction of the trait. The prediction may be based on an inheritance prediction and/or a community prediction. To identify a set of enriched record collections, individuals belonging to a genetic community are identified and a set of candidate record collections are accessed. A community count and a background count is determined for each candidate record collection. The set of enriched record collections are identified based on a comparison of the community count and the background count. The genetic community may be annotated using the set of enriched record collections.
Methods for nested PCR amplification of cell-free DNA
Methods for non-invasive prenatal paternity testing are disclosed herein. The method uses genetic measurements made on plasma taken from a pregnant mother, along with genetic measurements of the alleged father, and genetic measurements of the mother, to determine whether or not the alleged father is the biological father of the fetus. This is accomplished by way of an informatics based method that can compare the genetic fingerprint of the fetal DNA found in maternal plasma to the genetic fingerprint of the alleged father.
Methods for nested PCR amplification of cell-free DNA
Methods for non-invasive prenatal paternity testing are disclosed herein. The method uses genetic measurements made on plasma taken from a pregnant mother, along with genetic measurements of the alleged father, and genetic measurements of the mother, to determine whether or not the alleged father is the biological father of the fetus. This is accomplished by way of an informatics based method that can compare the genetic fingerprint of the fetal DNA found in maternal plasma to the genetic fingerprint of the alleged father.
Methods and systems for identifying progenies for use in plant breeding
Exemplary methods for identifying progenies for use in plant breeding are disclosed. One exemplary computer-implemented method includes accessing a data structure including data representative of a pool of progenies and determining a prediction score for at least a portion of the pool of progenies based on the data included in the data structure. The prediction score indicates a probability of selection of the progeny based on historical data. The method further includes selecting a group of progenies from the pool of progenies based on the prediction score, identifying a set of progenies, from the group of progenies, based on at least one of an expected performance of the group of progenies and at least one factor associated with the set of progenies, the pool of progenies and/or the group of progenies, and directing the set of progenies into a validation phase of a breeding pipeline.
Methods and systems for identifying progenies for use in plant breeding
Exemplary methods for identifying progenies for use in plant breeding are disclosed. One exemplary computer-implemented method includes accessing a data structure including data representative of a pool of progenies and determining a prediction score for at least a portion of the pool of progenies based on the data included in the data structure. The prediction score indicates a probability of selection of the progeny based on historical data. The method further includes selecting a group of progenies from the pool of progenies based on the prediction score, identifying a set of progenies, from the group of progenies, based on at least one of an expected performance of the group of progenies and at least one factor associated with the set of progenies, the pool of progenies and/or the group of progenies, and directing the set of progenies into a validation phase of a breeding pipeline.
Finding relatives in a database
Determining relative relationships of people who share a common ancestor within at least a threshold number of generations includes: receiving recombinable deoxyribonucleic acid (DNA) sequence information of a first user and recombinable DNA sequence information of a plurality of users; processing, using one or more computer processors, the recombinable DNA sequence information of the plurality of users in parallel; determining, based at least in part on a result of processing the recombinable DNA information of the plurality of users in parallel, a predicted degree of relationship between the first user and a user among the plurality of users, the predicted degree of relative relationship corresponding to a number of generations within which the first user and the second user share a common ancestor.
Finding relatives in a database
Determining relative relationships of people who share a common ancestor within at least a threshold number of generations includes: receiving recombinable deoxyribonucleic acid (DNA) sequence information of a first user and recombinable DNA sequence information of a plurality of users; processing, using one or more computer processors, the recombinable DNA sequence information of the plurality of users in parallel; determining, based at least in part on a result of processing the recombinable DNA information of the plurality of users in parallel, a predicted degree of relationship between the first user and a user among the plurality of users, the predicted degree of relative relationship corresponding to a number of generations within which the first user and the second user share a common ancestor.
SYSTEM AND METHOD FOR ACHIEVING HIGH GENE DATA RESOLUTION USING TRAINING SETS
Systems, methods, and computer program products for generating an enhanced set of sequences for taxonomical classification are disclosed. In various embodiments, a plurality of reference sequences are received. Each of the plurality of reference sequences corresponds to a taxonomical classification. A label corresponding to at least one of the reference sequences is assigned to each of a plurality of supplemental sequences. Each of the plurality of supplemental sequences and each of the plurality of reference sequences are truncated to a region of interest to thereby generate a truncated set of sequences. Similarity is measured between pairs of truncated sequences in the truncated set of sequences to determine whether the similarity is above a predetermined threshold. An intermediate taxonomical label is assigned to the pair of truncated sequences in the truncated set of sequences when the similarity is above the predetermined threshold to thereby generate an enhanced set of sequences.
SYSTEM AND METHOD FOR ACHIEVING HIGH GENE DATA RESOLUTION USING TRAINING SETS
Systems, methods, and computer program products for generating an enhanced set of sequences for taxonomical classification are disclosed. In various embodiments, a plurality of reference sequences are received. Each of the plurality of reference sequences corresponds to a taxonomical classification. A label corresponding to at least one of the reference sequences is assigned to each of a plurality of supplemental sequences. Each of the plurality of supplemental sequences and each of the plurality of reference sequences are truncated to a region of interest to thereby generate a truncated set of sequences. Similarity is measured between pairs of truncated sequences in the truncated set of sequences to determine whether the similarity is above a predetermined threshold. An intermediate taxonomical label is assigned to the pair of truncated sequences in the truncated set of sequences when the similarity is above the predetermined threshold to thereby generate an enhanced set of sequences.