G16B10/00

DISPLAY OF ESTIMATED PARENTAL CONTRIBUTION TO ANCESTRY

Estimating parental contribution of ancestry includes: obtaining a set of ancestry assignment data associated with an individual's genotype data, at least some of the ancestry assignment data indicating that one or more segments of the individual's genotype data is deemed to be associated with a specific ancestry; determining whether in the individual's genotype data there is at least one confirmed region of overlapping ancestry assignment associated with the specific ancestry; in the event that it is determined that there is at least one confirmed region of overlapping ancestry assignment associated with the specific ancestry: specifying that parental contribution of the specific ancestry is made by both parents of the individual; in the event that it is determined that there is no confirmed region of overlapping ancestry assignment associated with the specific ancestry: statistically determining whether the parental contribution to the specific ancestry is made by only one parent of the individual or by both parents of the individual, the determination being based at least in part on one or more lengths of the one or more segments deemed to be associated with the specific ancestry; and outputting information pertaining to the parental contribution to the specific ancestry.

Generating configurable text strings based on raw genomic data

A genomic data translation system can be configured to process next-generation sequencing information. The system can receive an output file including raw genome data. The system can parse the output file to determine segments corresponding to individual chromosomes. The system can identify ranges of nucleotides and determine the first set of genes included in a human reference genome listing that fall within the ranges. The system can also maintain a gene list of genes, and determine a matched set of genes that are included in the gene list and the first set of genes. The system can generate a configurable text string including non-configurable regions and configurable regions. The configurable regions can be populated with text based on the raw genomic data, a set of translation rules, and a set of translation text strings.

Generating configurable text strings based on raw genomic data

A genomic data translation system can be configured to process next-generation sequencing information. The system can receive an output file including raw genome data. The system can parse the output file to determine segments corresponding to individual chromosomes. The system can identify ranges of nucleotides and determine the first set of genes included in a human reference genome listing that fall within the ranges. The system can also maintain a gene list of genes, and determine a matched set of genes that are included in the gene list and the first set of genes. The system can generate a configurable text string including non-configurable regions and configurable regions. The configurable regions can be populated with text based on the raw genomic data, a set of translation rules, and a set of translation text strings.

Method To Identify Genes Under Positive Selection

A method and computer system for identifying genes associated with a phenotype includes obtaining data representing mutations in a cohort of subjects exhibiting a phenotype. An evolutionary action (EA) score is calculated for each mutation using the data obtained. For each gene in the cohort, respective distributions of the calculated EA scores are determined for mutations found in the gene. The determined distributions of EA scores are quantitatively compared within the cohort and with random distributions to establish comparison data. Based on the comparison data, distributions of EA scores are identified that are non-random, and linkage of each gene in the cohort to the phenotype is assessed based on the identified non-random distributions to identify genes associated with the phenotype. The phenotype can be a disease, such as cancer, and linkage of each gene in the cohort to the disease can be assessed to identify disease causing genes.

DISCOVERY OF BIOLOGICAL SIGNATURES OF OPTIMIZED SENSITIVITY AND SPECIFICITY
20220042114 · 2022-02-10 ·

A bioinformatics method is provided for identifying candidate biological sequences, such as DNA, RNA, and proteins, with high sensitivity and specificity for application in procedures such as PCR and gene and protein sequencing. The method involves categorizing a collection of biological sequences within an out-group and an in-group, identifying the intersection between the in-group and the out-group, the union of the out-group, and a relative complement of sequences that are members of the in-group, but not the out-group. A biological signature for a species of interest with high sensitivity and specificity will be a member of the relative complement that has an out-group frequency of zero.

Discovering Population Structure from Patterns of Identity-By-Descent
20170242957 · 2017-08-24 ·

Described are techniques for determining population structure from identity-by-descent (IBD) of individuals. The techniques may be used to predict that an individual belongs to zero, one or more of a number of communities identified within an IBD network. Additional data may be used to annotate the communities with birth location, surname, and ethnicity information. In turn, these data may be used to provide to an individual a prediction of membership to zero, one or more communities, accompanied by a summary of the information annotated to those communities.

SYSTEMS AND METHODS FOR TRACKING AND IDENTIFYING INFECTION TRANSMISSION

The present disclosure describes systems and methods for determining sources of infection transmission. Phylogenetic methods are used for determining the evolutionary history and replication rates of infection isolates. The evolutionary distance and/or replication rate of an infection isolate maybe compared to other isolates. Based on a comparison of the evolutionary distance and/or replication rate, a determination of the source of infection transmission is made.

Linking individual datasets to a database

The disclosed system links an individual dataset to a database. The system receives a target individual dataset associated with a target individual and identifies candidate individual datasets that are potentially related to the target individual dataset. The system identifies a related individual dataset that has data bits that match some data bits in the target individual dataset. The system then identifies a parent node that is a common parent node to both the target individual dataset and the related individual dataset. The system retrieves a data tree that the parent node belongs to with the data tree containing information describing inter-relationships among datasets in the data tree. A node in the data tree is identified to assign the target individual dataset based on strings of matched data bits and number of the matched strings between the target individual dataset and the datasets in the data tree.

Linking individual datasets to a database

The disclosed system links an individual dataset to a database. The system receives a target individual dataset associated with a target individual and identifies candidate individual datasets that are potentially related to the target individual dataset. The system identifies a related individual dataset that has data bits that match some data bits in the target individual dataset. The system then identifies a parent node that is a common parent node to both the target individual dataset and the related individual dataset. The system retrieves a data tree that the parent node belongs to with the data tree containing information describing inter-relationships among datasets in the data tree. A node in the data tree is identified to assign the target individual dataset based on strings of matched data bits and number of the matched strings between the target individual dataset and the datasets in the data tree.

FINDING RELATIVES IN A DATABASE

Determining relative relationships of people who share a common ancestor within at least a threshold number of generations includes: receiving recombinable deoxyribonucleic acid (DNA) sequence information of a first user and recombinable DNA sequence information of a plurality of users; processing, using one or more computer processors, the recombinable DNA sequence information of the plurality of users in parallel; determining, based at least in part on a result of processing the recombinable DNA information of the plurality of users in parallel, a predicted degree of relationship between the first user and a user among the plurality of users, the predicted degree of relative relationship corresponding to a number of generations within which the first user and the second user share a common ancestor.