Patent classifications
G06F19/22
PARALLEL-PROCESSING SYSTEMS AND METHODS FOR HIGHLY SCALABLE ANALYSIS OF BIOLOGICAL SEQUENCE DATA
An apparatus includes a memory configured to store a sequence that includes an estimation of a biological sequence. The sequence includes a set of elements. The apparatus also includes an assignment module implemented in a hardware processor. The assignment module is configured to receive the sequence from the memory, and assign each element to at least one segment from a set of segments, including, when an element maps to at least a first segment and a second segment, assigning the element set of segments specific to that hardware processor, and substantially simultaneous with the remaining hardware processors, remove at least a portion of duplicate elements in that segment to generate a deduplicated segment. Reorder the elements in the deduplicated segment to generate a realigned segment that has a reduced likelihood for alignment errors
Methods for Genome Assembly, Haplotype Phasing, and Target Independent Nucleic Acid Detection
The disclosure provides methods to assemble genomes of eukaryotic or prokaryotic organisms. The disclosure provides methods for haplotype phasing and meta-genomics assemblies. The disclosure provides a streamlined method for accomplishing these tasks, such that intermediates need not be labeled by an affinity label to facilitate binding to a solid surface. The disclosure also provides methods and compositions for the de novo generation of scaffold information, linkage information, and genome information for unknown organisms in heterogeneous metagenomic samples or samples obtained from multiple individuals. Practice of the methods can allow de novo sequencing of entire genomes of uncultured or unidentified organisms in heterogeneous samples, or the determination of linkage information for nucleic acid molecules in samples comprising nucleic acids obtained from multiple individuals.
METHOD AND SYSTEM FOR REPRESENTING COMPOSITIONAL PROPERTIES OF A BIOLOGICAL SEQUENCE FRAGMENT AND APPLICATIONS THEREOF
A method and system is provided for representing compositional properties of a biological sequence fragment and application thereof. The present application provides a method and system for representing compositional properties of a biological sequence fragment using a unidimensional compositional metric; comprising of collecting a plurality of biological sequence fragments; sequencing collected plurality of biological sequence fragments; generating a first set of reference vectors; computing a unidimensional compositional metric for each sequenced biological sequence fragment out of the plurality of sequenced biological sequence fragments as a cumulative function of the distance of the tetra-nucleotide frequency vector (v) from three or more reference vectors selected out of the generated first set of reference vectors; and segregating each sequenced biological sequence fragment out of the plurality of sequenced biological sequence fragments in to a plurality of groups based on respective unidimensional compositional metric.
Platform for the identification of tumor-associated cancer/testes antigens
Methods of identifying cancer/testes antigens (CTAs) useful as cancer treatment targets are disclosed and claimed herein. The methods include identifying human sperm proteins to which patients diagnosed with solid or hematological malignancies have established a humoral immune response.
Nucleic acid sequencing system and method
A technique for sequencing nucleic acids in an automated or semi-automated manner is disclosed. Sample arrays of a multitude of nucleic acid sites are processed in multiple cycles to add nucleotides to the material to be sequenced, detect the nucleotides added to sites, and to de-block the added nucleotides of blocking agents and tags used to identify the last added nucleotide. Multiple parameters of the system are monitored to enable diagnosis and correction of problems as they occur during sequencing of the samples. Quality control routines are run during sequencing to determine quality of samples, and quality of the data collected.
Method, computer-accessible medium, and systems for generating a genome wide haplotype sequence
Methods, computer-accessible medium, and systems for generating a genome wide probe map and/or a genome wide haplotype sequence are provided. In particular, a genome wide probe map can be generated by obtaining a plurality of detectable oligonucleotide probes hybridized to at least one double stranded nucleic acid molecule cleaved with at least one restriction enzyme, and detecting the location of the detectable oligonucleotide probes. For example, genome wide haplotype sequence can be generated by analyzing at least one genome wide restriction map in conjunction with at least one genome wide probe map to determine distances between restriction sites of the genome wide restriction map(s) and locations of detectable oligonucleotide probes of the genome wide probe map(s) and defining a consensus map indicating restriction sites based on the genome wide restriction map(s) and/or locations of detectable oligonucleotide probes based on each of the genome wide probe map(s).
METHODS AND SYSTEMS FOR ASSEMBLY OF PROTEIN SEQUENCES
Methods and systems for determining amino acid sequence of a polypeptide or protein from mass spectrometry data is provided, using a weighted de Bruijn graph. Extracted and purified protein is cleaved into a mixture of peptide and then analyzed using mass spectrometry. A list of peptide sequences is derived from mass spectrometry fragment data by de novo sequencing, and amino acid confidence scores are determined from peak fragment ion intensity. A weighted de Bruijn graph is constructed for the list of peptide sequences having node weights defined by k−1 mer confidence scores. At least one contig is assembled from the de Bruijn graph by identifying node weights having the highest k-1 mer confidence scores.
HEAT DIFFUSION BASED GENETIC NETWORK ANALYSIS
Methods and devices are provided for performing heat diffusion based genetic analysis. A network comprising a plurality of genes is defined an initial heat score is assigned to each of the plurality of genes. A threshold value for evaluating whether heat will be diffused from each of the plurality of genes within the network is assigned. Heat from at least one of the plurality of genes is diffused across the network, and after reaching equilibrium, the network is partitioned into a hierarchy of subnetworks according to an amount and a direction of heat exchange amongst each of the plurality of genes, and a statistical significance of the partitioned network and/or hierarchy of partitioned networks is assessed.
Distributed automation apparatus for laboratory diagnostics
A distributed automation apparatus for laboratory diagnostics is described, comprising modules for processing biological products transported on an automatic conveyor and modules for interfacing with analysis devices, both said modules being connected to said automatic conveyor, each of said modules is independent of the other modules, it being provided with its own control board, which allows it to work autonomously and independently of a central control unit which provides a worklist to each node which is dynamically read and updated by said control unit, and said module reading and updating said, worklist.
Diagnosis of lymphoid malignancies and minimal residual disease detection
Methods are described for diagnosis of a lymphoid hematological malignancy in a subject prior to treatment, and for detecting minimal residual disease (MRD) in the subject after treatment for the malignancy, by high throughput quantitative sequencing (HTS) of multiple unique adaptive immune receptor (TCR or Ig) encoding DNA molecules that have been amplified from DNA isolated from blood samples or other lymphoid cell-containing samples. Amplification employs oligonucleotide primer sets designed to amplify CDR3-encoding sequences within substantially all possible human VDJ or VJ combinations. Disease-characteristic adaptive immune receptor clonotypes occur, prior to treatment, at a relative frequency of at least 15-30% of rearranged receptor CDR3-encoding gene regions. Following treatment, persistence of at least one such clonotype at a detectable frequency of at least 10.sup.−6 or at least 10.sup.−5 receptor CDR3-encoding regions indicates MRD. Improved quantitative embodiments are provided by inclusion of a template composition for amplification factor determination and related methods.