Patent classifications
G16B10/00
Finding relatives in a database
Determining relative relationships of people who share a common ancestor within at least a threshold number of generations includes: receiving recombinable deoxyribonucleic acid (DNA) sequence information of a first user and recombinable DNA sequence information of a plurality of users; processing, using one or more computer processors, the recombinable DNA sequence information of the plurality of users in parallel; determining, based at least in part on a result of processing the recombinable DNA information of the plurality of users in parallel, a predicted degree of relationship between the first user and a user among the plurality of users, the predicted degree of relative relationship corresponding to a number of generations within which the first user and the second user share a common ancestor.
LINKING INDIVIDUAL DATASETS TO A DATABASE
The disclosed system links an individual dataset to a database. The system receives a target individual dataset associated with a target individual and identifies candidate individual datasets that are potentially related to the target individual dataset. The system identifies a related individual dataset that has data bits that match some data bits in the target individual dataset. The system then identifies a parent node that is a common parent node to both the target individual dataset and the related individual dataset. The system retrieves a data tree that the parent node belongs to with the data tree containing information describing inter-relationships among datasets in the data tree. A node in the data tree is identified to assign the target individual dataset based on strings of matched data bits and number of the matched strings between the target individual dataset and the datasets in the data tree.
LINKING INDIVIDUAL DATASETS TO A DATABASE
The disclosed system links an individual dataset to a database. The system receives a target individual dataset associated with a target individual and identifies candidate individual datasets that are potentially related to the target individual dataset. The system identifies a related individual dataset that has data bits that match some data bits in the target individual dataset. The system then identifies a parent node that is a common parent node to both the target individual dataset and the related individual dataset. The system retrieves a data tree that the parent node belongs to with the data tree containing information describing inter-relationships among datasets in the data tree. A node in the data tree is identified to assign the target individual dataset based on strings of matched data bits and number of the matched strings between the target individual dataset and the datasets in the data tree.
Methods and systems for determining ancestral relatedness
The present disclosure provides methods of estimating a degree of ancestral relatedness between individuals. In an aspect, a method comprises receiving haplotype data comprising genetic markers shared among a population of individuals; dividing the haplotype data into segments based on the genetic markers; for each of the population of test individuals: (i) based on the genetic markers, matching segments of the haplotype data that are identical-by-descent between two individuals, (ii) for each of the matched segments: dividing the matched segment into discrete genomic intervals, scoring each of the discrete genomic intervals based on a degree of matching within or between the individuals, correcting the scores for consistency, and (iii) calculating a weighted sum over the discrete genomic intervals of the matched segment, based on the corrected scores and assigned weights; and (d) estimating the degree of ancestral relatedness between the individuals based on the weighted sums of the matched segments.
SYSTEMS AND METHODS FOR INFERRING GENETIC ANCESTRY FROM LOW-COVERAGE GENOMIC DATA
A computer-implemented method for Error! Reference source not found. may include (i) generating a reference matrix representing a genetic reference panel in terms of dosages for given reference samples at given loci, (ii) decomposing the reference matrix via non-negative matrix factorization into an ancestral genotype matrix and an ancestral attribution matrix, (iii) resampling the reference matrix, (iv) deriving an ancestral alternate reads matrix that, when multiplied with the ancestral attribution matrix, approximates the resampled reference matrix, (v) deriving an ancestral attribution vector that, when multiplied with the ancestral alternate reads matrix, approximates a vector representing the test sample, and (vi) determining the genetic ancestry of the subject based on the ancestral attribution vector. Various other methods, systems, and computer-readable media are also disclosed.
SYSTEMS AND METHODS FOR INFERRING GENETIC ANCESTRY FROM LOW-COVERAGE GENOMIC DATA
A computer-implemented method for Error! Reference source not found. may include (i) generating a reference matrix representing a genetic reference panel in terms of dosages for given reference samples at given loci, (ii) decomposing the reference matrix via non-negative matrix factorization into an ancestral genotype matrix and an ancestral attribution matrix, (iii) resampling the reference matrix, (iv) deriving an ancestral alternate reads matrix that, when multiplied with the ancestral attribution matrix, approximates the resampled reference matrix, (v) deriving an ancestral attribution vector that, when multiplied with the ancestral alternate reads matrix, approximates a vector representing the test sample, and (vi) determining the genetic ancestry of the subject based on the ancestral attribution vector. Various other methods, systems, and computer-readable media are also disclosed.
OVARIAN CARCINOMA DETECTION AND PROPHYLAXIS
The evolutionary origin of high-grade serous ovarian carcinoma remains largely unknown. The vast majority of tumor-specific genomic alterations from ovarian cancers are present in fallopian tube STIC lesions (average of 55 sequence alterations per tumor), including those affecting TP53, BRCA1, BRCA2 or PTEN genes. A quantitative evolutionary analysis indicated that tumors of the fallopian tube were the likely precursors of ovarian cancer and could directly give rise to metastatic lesions. These analyses suggest that there may be less than two years between the development of a STIC and the initiation of fallopian tube tumors, ovarian tumors or other metastases. Thus there may be a short window between the development of a STIC and the initiation of ovarian tumors or other metastases, highlighting the importance of the prevention, early detection and therapeutic intervention of this disease.
ANCESTRY PAINTING
Displaying an indication of ancestral data is disclosed. An indication that a genetic interval corresponds to a reference interval that has a likelihood of having one or more ancestral origins is received. One or more graphic display parameters are determined based at least in part on the indication. An indication of the one or more ancestral origins is visually displayed using the one or more graphic display parameters.
DETERMINING FAMILY CONNECTIONS OF INDIVIDUALS IN A DATABASE
Determining relative connections between individuals includes: obtaining identification information of a first individual and identification information of a second individual; determining, based at least in part on a relative connections graph, a relative connections path connecting the first individual, the second individual, and at least one additional individual; and outputting information pertaining to the relative connections path.
ESTIMATION OF ADMIXTURE GENERATION
Admixture generation determination includes: obtaining ancestry assignment information associated with an individual's genotype data, the ancestry assignment information at least indicating that a portion of the individual's genotype data is deemed to be associated with a specific ancestry; determining the individual's genetic ancestry summary data corresponding to the specific ancestry; estimating an admixture generation associated with the specific ancestry, the admixture generation indicating a most recent generation or a most recent generation range from which the individual has at least one non-admixed ancestor of the specific ancestry, the estimation including a maximum likelihood determination based at least in part on the individual's genetic ancestry summary data and a recombination model; and outputting the estimated admixture generation.