Patent classifications
G16B30/10
DEEP NEURAL NETWORK-BASED VARIANT PATHOGENICITY PREDICTION
The technology disclosed describes determination of which elements of a sequence are nearest to uniformly spaced cells in a grid, where the elements have element coordinates, and the cells have dimension-wise cell indices and cell coordinates. The determination includes generating an element-to-cells mapping that maps, to each of the elements, a subset of the cells. The subset of the cells mapped to a particular element in the sequence includes a nearest cell in the grid and one or more neighborhood cells in the grid, and the nearest cell is selected based on matching element coordinates of the particular element to the cell coordinates. The determination further includes generating a cell-to-elements mapping that maps, to each of the cells, a subset of the elements, and using the cell-to-elements mapping to determine, for each of the cells, a nearest element in the sequence.
METHODS FOR IDENTIFYING MICROBES IN A CLINICAL AND NON-CLINICAL SETTING
The present invention relates to a method for identifying a microorganism in a biological sample by polymerase chain reaction (PCR), comprising the steps of a) providing a biological sample suspected of comprising microbes, and optionally isolating nucleic acid sequences from said biological sample; b) PCR amplifying at least one microbial rRNA internal transcribed spacer (ITS) region comprised in said optionally isolated nucleic acid sequences using a set of broad-taxonomic range amplification primers to thereby generate PCR amplicons from nucleic acid sequences of microbial origin; c) recording a high resolution melting curve for the PCR amplicons, and recording the length of the PCR amplicons; d) comparing the high resolution melting curve with a database comprising high resolution melting curves of reference amplicons of known microbial species or strains, to thereby obtain a first identity indicator; e) comparing the length of each PCR amplicon having a distinct length with a database comprising PCR amplicon lengths of reference amplicons of known microbial species or strains, to thereby obtain a second identity indicator; and f) identifying the microorganism present in said sample to the species or strain level if the first and second identity indicator match.
METHODS FOR IDENTIFYING MICROBES IN A CLINICAL AND NON-CLINICAL SETTING
The present invention relates to a method for identifying a microorganism in a biological sample by polymerase chain reaction (PCR), comprising the steps of a) providing a biological sample suspected of comprising microbes, and optionally isolating nucleic acid sequences from said biological sample; b) PCR amplifying at least one microbial rRNA internal transcribed spacer (ITS) region comprised in said optionally isolated nucleic acid sequences using a set of broad-taxonomic range amplification primers to thereby generate PCR amplicons from nucleic acid sequences of microbial origin; c) recording a high resolution melting curve for the PCR amplicons, and recording the length of the PCR amplicons; d) comparing the high resolution melting curve with a database comprising high resolution melting curves of reference amplicons of known microbial species or strains, to thereby obtain a first identity indicator; e) comparing the length of each PCR amplicon having a distinct length with a database comprising PCR amplicon lengths of reference amplicons of known microbial species or strains, to thereby obtain a second identity indicator; and f) identifying the microorganism present in said sample to the species or strain level if the first and second identity indicator match.
METHOD FOR DETERMINING A MEASURE CORRELATED TO THE PROBABILITY THAT TWO MUTATED SEQUENCE READS DERIVE FROM THE SAME SEQUENCE COMPRISING MUTATIONS
Disclosed is a computer-implemented method for determining a measure correlated to the probability that two mutated sequence reads derive from the same sequence comprising mutations. The method comprises receiving mutated sequence reads each corresponding to a subsequence of a sequence comprising mutations compared to a sequence not comprising mutations, applying a common minimizer function to each mutated sequence read, to determining minimizers for each mutated sequence read, determining positions of the one or more minimizers in each mutated sequence read, determining positions of mutations in each mutated sequence read, and for at least two mutated sequence reads with a common minimizer, counting the number of mutations with matching position and/or mismatching position when the respective minimizers are aligned. Also disclosed is a corresponding method for determining at least a portion of a sequence of at least one target template nucleic acid molecule.
METHOD FOR DETERMINING A MEASURE CORRELATED TO THE PROBABILITY THAT TWO MUTATED SEQUENCE READS DERIVE FROM THE SAME SEQUENCE COMPRISING MUTATIONS
Disclosed is a computer-implemented method for determining a measure correlated to the probability that two mutated sequence reads derive from the same sequence comprising mutations. The method comprises receiving mutated sequence reads each corresponding to a subsequence of a sequence comprising mutations compared to a sequence not comprising mutations, applying a common minimizer function to each mutated sequence read, to determining minimizers for each mutated sequence read, determining positions of the one or more minimizers in each mutated sequence read, determining positions of mutations in each mutated sequence read, and for at least two mutated sequence reads with a common minimizer, counting the number of mutations with matching position and/or mismatching position when the respective minimizers are aligned. Also disclosed is a corresponding method for determining at least a portion of a sequence of at least one target template nucleic acid molecule.
Automated database updating and curation
Systems and methods for retrieval of information from read-only databases that hold taxonomic-related and sequence-related data. A method may include receiving organism names from a taxonomy database and detecting new organism names. The method may also include retrieving hierarchical data and assigning the new organism names to buckets based on the hierarchical data. The method may further include receiving sequence data elements from a nucleotide database, identifying particular buckets to correspond to a screener data set, querying organism names assigned to the particular buckets with names of reference sequences of the sequence data elements, generating a mapping between the sequence data elements and organism names returned as a result of the queries, and storing the mapping.
MICROSIMULATION OF MULTI-CANCER EARLY DETECTION EFFECTS USING PARALLEL PROCESSING AND INTEGRATION OF FUTURE INTERCEPTED INCIDENCES OVER TIME
A simulation system performs microsimulations to model the impact of one or more early cancer detection screenings for a plurality of participants to simulate a randomized controlled trial (RCT). In one instance, the microsimulations are performed using parallel processing techniques. The microsimulation simulates the impact of early detection screenings on individual trajectories of the participants. In particular, while most screening modalities are for single cancer types, the microsimulation herein simulates the effect of a detection model on individual trajectories for participant populations having multiple types of cancer using, for example, multi-cancer early detection (MCED) screenings that are capable of detecting multiple types of cancer.
DEEP LEARNING-BASED USE OF PROTEIN CONTACT MAPS FOR VARIANT PATHOGENICITY PREDICTION
The technology disclosed relates to a variant pathogenicity classifier. The variant pathogenicity classifier comprises memory and runtime logic. The memory stores (i) a reference amino acid sequence of a protein, (ii) an alternative amino acid sequence of the protein that contains a variant amino acid caused by a variant nucleotide, and (iii) a protein contact map of the protein. The runtime logic has access to the memory, and is configured to provide (i) the reference amino acid sequence, (ii) the alternative amino acid sequence, and (iii) the protein contact map as input to a first neural network, and to cause the first neural network to generate a pathogenicity indication of the variant amino acid as output in response to processing (i) the reference amino acid sequence, (ii) the alternative amino acid sequence, and (iii) the protein contact map.
DEEP LEARNING-BASED USE OF PROTEIN CONTACT MAPS FOR VARIANT PATHOGENICITY PREDICTION
The technology disclosed relates to a variant pathogenicity classifier. The variant pathogenicity classifier comprises memory and runtime logic. The memory stores (i) a reference amino acid sequence of a protein, (ii) an alternative amino acid sequence of the protein that contains a variant amino acid caused by a variant nucleotide, and (iii) a protein contact map of the protein. The runtime logic has access to the memory, and is configured to provide (i) the reference amino acid sequence, (ii) the alternative amino acid sequence, and (iii) the protein contact map as input to a first neural network, and to cause the first neural network to generate a pathogenicity indication of the variant amino acid as output in response to processing (i) the reference amino acid sequence, (ii) the alternative amino acid sequence, and (iii) the protein contact map.
DNA alignment using a hierarchical inverted index table
System and method for constructing a hierarchical index table usable for matching a search sequence to reference data. The index table may be constructed to contain entries associated with an exhaustive list of all subsequences of a given length, wherein each entry contains the number and locations of matches of each subsequence in the reference data. The hierarchical index table may be constructed in an iterative manner, wherein entries for each lengthened subsequence are selectively and iteratively constructed based on the number of matches being greater than each of a set of respective thresholds. The hierarchical index table may be used to search for matches between a search sequence and reference data, and to perform misfit identification and characterization upon each respective candidate match.