Patent classifications
G16B50/00
DEEP LEARNING-BASED USE OF PROTEIN CONTACT MAPS FOR VARIANT PATHOGENICITY PREDICTION
The technology disclosed relates to a variant pathogenicity classifier. The variant pathogenicity classifier comprises memory and runtime logic. The memory stores (i) a reference amino acid sequence of a protein, (ii) an alternative amino acid sequence of the protein that contains a variant amino acid caused by a variant nucleotide, and (iii) a protein contact map of the protein. The runtime logic has access to the memory, and is configured to provide (i) the reference amino acid sequence, (ii) the alternative amino acid sequence, and (iii) the protein contact map as input to a first neural network, and to cause the first neural network to generate a pathogenicity indication of the variant amino acid as output in response to processing (i) the reference amino acid sequence, (ii) the alternative amino acid sequence, and (iii) the protein contact map.
Computing system for normalizing computer-readable genetic test results from numerous different sources
A computer-executable application receives genetic test results for a genetic test a patient has undergone, an identifier for the genetic test, and an identifier for the genetic laboratory that performed the genetic test. The application identifies a format type of the genetic test results. When the format type is unstructured, the application performs an optical character recognition process to the genetic test results such that the format type of the genetic test results is semi-structured. When the format type is semi-structured, the application identifies a set of lexing and parsing rules assigned to the genetic test. The application generates processed genetic test results by applying the set of lexing and parsing rules to the genetic test results and stores the processed genetic test results in a data store. When the format type is structured, the application stores the genetic test results as the processed genetic test results in the data store.
Computing system for normalizing computer-readable genetic test results from numerous different sources
A computer-executable application receives genetic test results for a genetic test a patient has undergone, an identifier for the genetic test, and an identifier for the genetic laboratory that performed the genetic test. The application identifies a format type of the genetic test results. When the format type is unstructured, the application performs an optical character recognition process to the genetic test results such that the format type of the genetic test results is semi-structured. When the format type is semi-structured, the application identifies a set of lexing and parsing rules assigned to the genetic test. The application generates processed genetic test results by applying the set of lexing and parsing rules to the genetic test results and stores the processed genetic test results in a data store. When the format type is structured, the application stores the genetic test results as the processed genetic test results in the data store.
DNA alignment using a hierarchical inverted index table
System and method for constructing a hierarchical index table usable for matching a search sequence to reference data. The index table may be constructed to contain entries associated with an exhaustive list of all subsequences of a given length, wherein each entry contains the number and locations of matches of each subsequence in the reference data. The hierarchical index table may be constructed in an iterative manner, wherein entries for each lengthened subsequence are selectively and iteratively constructed based on the number of matches being greater than each of a set of respective thresholds. The hierarchical index table may be used to search for matches between a search sequence and reference data, and to perform misfit identification and characterization upon each respective candidate match.
DNA alignment using a hierarchical inverted index table
System and method for constructing a hierarchical index table usable for matching a search sequence to reference data. The index table may be constructed to contain entries associated with an exhaustive list of all subsequences of a given length, wherein each entry contains the number and locations of matches of each subsequence in the reference data. The hierarchical index table may be constructed in an iterative manner, wherein entries for each lengthened subsequence are selectively and iteratively constructed based on the number of matches being greater than each of a set of respective thresholds. The hierarchical index table may be used to search for matches between a search sequence and reference data, and to perform misfit identification and characterization upon each respective candidate match.
SYSTEMS, METHODS, AND APPARATUSES FOR SEQUENCE ALIGNMENT
Systems, methods, and apparatuses are disclosed for reducing the computational time of assigning a species to an infection isolate. A method for dividing a search index into one or more sub-indices based on a phylogenetic tree of reference sequences is disclosed. A method for dividing reads into test sets and aligning to sub-indices for assigning a species to an infection isolate is disclosed. A system for aligning sequence reads to a database of reference sequences using sub-indices is disclosed.
METHOD AND SYSTEM FOR COMPRESSING GENOME SEQUENCES USING GRAPHIC PROCESSING UNITS
The present invention provides a method for compressing genome sequences readers using GPU processing unit. The method comprising the steps of: identifying position of each given genome reader characters string in the sequence of a reference genome, determining alignment of each reader string within the reference genome, comparing each reader characters string to corresponding reference genome sequence based on determined alignment, filtering characters in each reader by GPU processor by eliminating similar characters and extracting only characters differences in association to their position in the genome sequence and recording filtered data of each reader in association to its alignment in genome reference at the genome compressed database.
Efficient polymer synthesis
The efficiency of polymer synthesis is increased by reducing the number of monomer addition cycles needed to create a set of polymer strands. The number of cycles depends on the sequences of the polymer strands and the order in which each type of monomer is made available for addition to the growing strands. Efficiencies are created by grouping the polymer strands into batches such that all the strands in a batch require a similar number of cycles to synthesize. Efficiencies are also created by selecting an order in which the monomers are made available for addition to the growing polymer strands in a batch. Both techniques can be used together. With these techniques, the number of cycles of monomer addition and commensurate reagent use may be reduced by over 10% as compared to naïve synthesis techniques.
Efficient polymer synthesis
The efficiency of polymer synthesis is increased by reducing the number of monomer addition cycles needed to create a set of polymer strands. The number of cycles depends on the sequences of the polymer strands and the order in which each type of monomer is made available for addition to the growing strands. Efficiencies are created by grouping the polymer strands into batches such that all the strands in a batch require a similar number of cycles to synthesize. Efficiencies are also created by selecting an order in which the monomers are made available for addition to the growing polymer strands in a batch. Both techniques can be used together. With these techniques, the number of cycles of monomer addition and commensurate reagent use may be reduced by over 10% as compared to naïve synthesis techniques.
Finding Relatives in a Database
Determining relative relationships of people who share a common ancestor within at least a threshold number of generations includes: receiving recombinable deoxyribonucleic acid (DNA) sequence information of a first user and recombinable DNA sequence information of a plurality of users; processing, using one or more computer processors, the recombinable DNA sequence information of the plurality of users in parallel; determining, based at least in part on a result of processing the recombinable DNA information of the plurality of users in parallel, a predicted degree of relationship between the first user and a user among the plurality of users, the predicted degree of relative relationship corresponding to a number of generations within which the first user and the second user share a common ancestor.