Patent classifications
G16B40/30
SYSTEM AND METHOD FOR MELTING CURVE CLUSTERTING
The present invention relates to methods and systems for the analysis of nucleic acids present in biological samples, and more specifically, relates to clustering melt curves derived from high resolution thermal melt analysis performed on a sample of nucleic acids, the resulting clusters being usable, in one embodiment, for analyzing the sequences of nucleic acids and to classify their genotypes that are useful for determining the identity of the genotype of a nucleic acid that is present in a biological sample.
TRACE RECONSTRUCTION FROM READS WITH INDETERMINANT ERRORS
Polynucleotide sequencing generates multiple reads of a polynucleotide molecule. Many or all of the reads contain errors. Trace reconstruction takes multiple reads generated by a polynucleotide sequencer and uses those multiple reads to reconstruct accurately the nucleotide sequence of the polynucleotide molecule. Some reads may contain errors that cannot be corrected. Thus, there may be reads that can be used throughout their entire length and other reads that have indeterminant errors which cannot be corrected. Rather than discarding the entire read when an indeterminant error is found, the portion of the read with the error is skipped and the sequence of the read following the error is used to reconstruct the trace. The amount of the read skipped is determined by the location of subsequence after the error that matches a consensus sequence of the other reads. Analysis resumes at a location determined by the location of the match.
GENERATING PROTEIN SEQUENCES USING MACHINE LEARNING TECHNIQUES BASED ON TEMPLATE PROTEIN SEQUENCES
Systems and techniques are described to generate amino acid sequences of target proteins based on amino acid sequences of template proteins using machine learning techniques. The amino acid sequences of the target proteins can be generated based on data that constrains the modifications that can be made to the amino acid sequences of the template proteins. In illustrative examples, the template proteins can include antibodies produced by a non-human mammal that bind to an antigen and the target proteins can correspond to human antibodies with a region having at least a threshold amount of identity with the binding region of the template antibody. Generative adversarial networks can be used to produce the amino acid sequences of the target proteins.
Methods and systems for copy number variant detection
Methods and systems for determining copy number variants are disclosed. An example method can comprise applying a sample grouping technique to select reference coverage data, normalizing sample coverage data comprising a plurality of genomic regions, and fitting a mixture model to the normalized sample coverage data based on the selected reference coverage data. An example method can comprise identifying one or more copy number variants (CNVs) according to a Hidden Markov Model (HMM) based on the normalized sample coverage data and the fitted mixture model. An example method can comprise outputting the one or more copy number variants.
Methods and systems for copy number variant detection
Methods and systems for determining copy number variants are disclosed. An example method can comprise applying a sample grouping technique to select reference coverage data, normalizing sample coverage data comprising a plurality of genomic regions, and fitting a mixture model to the normalized sample coverage data based on the selected reference coverage data. An example method can comprise identifying one or more copy number variants (CNVs) according to a Hidden Markov Model (HMM) based on the normalized sample coverage data and the fitted mixture model. An example method can comprise outputting the one or more copy number variants.
Discovery systems for identifying entities that have a target property
Systems and methods for assaying a test entity for a property, without measuring the property, are provided. Exemplary test entities include proteins, protein mixtures, and protein fragments. Measurements of first features in a respective subset of an N-dimensional space and of second features in a respective subset of an M-dimensional space, is obtained as training data for each reference in a plurality of reference entities. One or more of the second features is a metric for the target property. A subset of first features, or combinations thereof, is identified using feature selection. A model is trained on the subset of first features using the training data. Measurement values for the subset of first features for the test entity are applied to thereby obtaining a model value that is compared to model values obtained using measured values of the subset of first features from reference entities exhibiting the property.
Discovery systems for identifying entities that have a target property
Systems and methods for assaying a test entity for a property, without measuring the property, are provided. Exemplary test entities include proteins, protein mixtures, and protein fragments. Measurements of first features in a respective subset of an N-dimensional space and of second features in a respective subset of an M-dimensional space, is obtained as training data for each reference in a plurality of reference entities. One or more of the second features is a metric for the target property. A subset of first features, or combinations thereof, is identified using feature selection. A model is trained on the subset of first features using the training data. Measurement values for the subset of first features for the test entity are applied to thereby obtaining a model value that is compared to model values obtained using measured values of the subset of first features from reference entities exhibiting the property.
STORYTELLING VISUALIZATION OF GENEALOGY DATA IN A LARGE-SCALE DATABASE
A storytelling interface comprising a map panel and a genealogy panel, and methods for using the same, are described. The storytelling interface facilitates dynamic and automatic scaling and relocation of the map panel based on a user's location within the genealogy panel, which facilitates a continuous scrolling operation to navigate between different sections of the genealogy panel. The storytelling interface facilitates a user receiving, viewing, and interacting with DNA and ethnic communities results determined from DNA testing, and allows a user to navigate through pertinent communities in both time and/or space.
Systems and methods for identifying cancer treatments from normalized biomarker scores
Techniques for generating therapy biomarker scores and visualizing same. The techniques include determining, using a patient's sequence data and distributions of biomarker values across one or more reference populations, a first set of normalized scores for a first set of biomarkers associated with a first therapy, and a second set of normalized scores for a second set of biomarkers associated with a second therapy, generating a graphical user interface (GUI) including a first portion associated with the first therapy and having at least one visual characteristic determined based on a normalized score of the respective biomarker in the first set of normalized scores; and a second portion associated with a second therapy and having at least one visual characteristic determined based on a normalized score of the respective biomarker in the second set of normalized scores; and displaying the generated GUI.
Systems and methods for identifying cancer treatments from normalized biomarker scores
Techniques for generating therapy biomarker scores and visualizing same. The techniques include determining, using a patient's sequence data and distributions of biomarker values across one or more reference populations, a first set of normalized scores for a first set of biomarkers associated with a first therapy, and a second set of normalized scores for a second set of biomarkers associated with a second therapy, generating a graphical user interface (GUI) including a first portion associated with the first therapy and having at least one visual characteristic determined based on a normalized score of the respective biomarker in the first set of normalized scores; and a second portion associated with a second therapy and having at least one visual characteristic determined based on a normalized score of the respective biomarker in the second set of normalized scores; and displaying the generated GUI.