Patent classifications
G16B30/10
DNA alignment using a hierarchical inverted index table
System and method for constructing a hierarchical index table usable for matching a search sequence to reference data. The index table may be constructed to contain entries associated with an exhaustive list of all subsequences of a given length, wherein each entry contains the number and locations of matches of each subsequence in the reference data. The hierarchical index table may be constructed in an iterative manner, wherein entries for each lengthened subsequence are selectively and iteratively constructed based on the number of matches being greater than each of a set of respective thresholds. The hierarchical index table may be used to search for matches between a search sequence and reference data, and to perform misfit identification and characterization upon each respective candidate match.
Methods for Sequence-Directed Molecular Breeding
The present invention provides breeding methods and compositions to enhance the germplasm of a plant by the use of direct nucleic acid sequence information. The methods describe the identification and accumulation of preferred nucleic acid sequences in the germplasm of a breeding population of plants.
SYSTEMS, METHODS, AND APPARATUSES FOR SEQUENCE ALIGNMENT
Systems, methods, and apparatuses are disclosed for reducing the computational time of assigning a species to an infection isolate. A method for dividing a search index into one or more sub-indices based on a phylogenetic tree of reference sequences is disclosed. A method for dividing reads into test sets and aligning to sub-indices for assigning a species to an infection isolate is disclosed. A system for aligning sequence reads to a database of reference sequences using sub-indices is disclosed.
BAMBAM: PARALLEL COMPARATIVE ANALYSIS OF HIGH-THROUGHPUT SEQUENCING DATA
The present invention relates to methods for evaluating and/or predicting the outcome of a clinical condition, such as cancer, metastasis, AIDS, autism, Alzheimer's, and/or Parkinson's disorder. The methods can also be used to monitor and track changes in a patient's DNA and/or RNA during and following a clinical treatment regime. The methods may also be used to evaluate protein and/or metabolite levels that correlate with such clinical conditions. The methods are also of use to ascertain the probability outcome for a patient's particular prognosis.
CRYSTAL STRUCTURE OF THE LARGE RIBOSOMAL SUBUNIT FROM S. AUREUS
A composition-of-matter comprising a crystallized form of a large ribosomal (50S) subunit of a pathogenic bacterium, and the atomic coordinates of the three-dimensional structure thereof are provided herein, as well as methods for crystallizing the same, and using the atomic coordinates of the same to design de novo ligands with high specificity thereto.
METHOD AND SYSTEM FOR COMPRESSING GENOME SEQUENCES USING GRAPHIC PROCESSING UNITS
The present invention provides a method for compressing genome sequences readers using GPU processing unit. The method comprising the steps of: identifying position of each given genome reader characters string in the sequence of a reference genome, determining alignment of each reader string within the reference genome, comparing each reader characters string to corresponding reference genome sequence based on determined alignment, filtering characters in each reader by GPU processor by eliminating similar characters and extracting only characters differences in association to their position in the genome sequence and recording filtered data of each reader in association to its alignment in genome reference at the genome compressed database.
TRACE RECONSTRUCTION FROM READS WITH INDETERMINANT ERRORS
Polynucleotide sequencing generates multiple reads of a polynucleotide molecule. Many or all of the reads contain errors. Trace reconstruction takes multiple reads generated by a polynucleotide sequencer and uses those multiple reads to reconstruct accurately the nucleotide sequence of the polynucleotide molecule. Some reads may contain errors that cannot be corrected. Thus, there may be reads that can be used throughout their entire length and other reads that have indeterminant errors which cannot be corrected. Rather than discarding the entire read when an indeterminant error is found, the portion of the read with the error is skipped and the sequence of the read following the error is used to reconstruct the trace. The amount of the read skipped is determined by the location of subsequence after the error that matches a consensus sequence of the other reads. Analysis resumes at a location determined by the location of the match.
SEQUENCE ASSEMBLY
The invention relates to assembly of sequence reads. The invention provides a method for identifying a mutation in a nucleic acid involving sequencing nucleic acid to generate a plurality of sequence reads. Reads are assembled to form a contig, which is aligned to a reference. Individual reads are aligned to the contig. Mutations are identified based on the alignments to the reference and to the contig.
SEQUENCE ASSEMBLY
The invention relates to assembly of sequence reads. The invention provides a method for identifying a mutation in a nucleic acid involving sequencing nucleic acid to generate a plurality of sequence reads. Reads are assembled to form a contig, which is aligned to a reference. Individual reads are aligned to the contig. Mutations are identified based on the alignments to the reference and to the contig.
ALIGNMENT FREE FILTERING FOR IDENTIFYING FUSIONS
Cell free nucleic acids from a test sample obtained from an individual are analyzed to identify possible fusion events. Cell free nucleic acids are sequenced and processed to generate fragments. Fragments are decomposed into kmers and the kmers are either analyzed de novo or compared to targeted nucleic acid sequences that are known to be associated with fusion gene pairs of interest. Thus, kmers that may have originated from a fusion event can be identified. These kmers are consolidated to generate gene ranges from various genes that match sequences in the fragment. A candidate fusion event can be called given the spanning of one or more gene ranges across the fragment.