G16B30/20

Variant Calling For Multi-Sample Variation Graph
20230223110 · 2023-07-13 ·

A method for calling variants in genetic data includes sorting nodes in a graph-based reference genome, assigning identification information to the sorted nodes, assigning depth values to respective ones of the sorted nodes, determining a reference genome path and one or more variation paths, and determining one or more variants in the graph-based reference genome based on the depth values assigned to nodes on the one or more variation paths.

Systems and methods for de novo assembly of nucleotide sequence reads using a modified string graph

Systems and methods to automatically de novo assemble a set of unordered read sequences into one or more, larger nucleotide sequences are presented. The method involves first creating two identical sets of the reads, dividing each read in both sets into smaller sorted mer sequences and then comparing the mers for each read in set 1 to the mers from each read in set 2 to exhaustively identify overlapping segments. Overlap information is used to construct a modified assembly string graph, traversal of which produces a sorted string graph layout file consisting of all the reads ordered left to right including their approximate starting offset position. The sorted string graph layout file is then processed by a novel multiple sequence alignment system that uses mer matches between all the overlapping reads at a given position to place matching individual bases from each read into columns from which an overall consensus sequence is determined.

Methods for non-invasive assessment of fetal genetic variations that factor experimental conditions

Provided herein are methods, processes and apparatuses for non-invasive assessment of genetic variations.

Method for large scale scaffolding of genome assemblies

Computational methods used for large scale scaffolding of a genome assembly are provided. Such methods may include a step of applying a location clustering model to a test set of contigs to form two or more location cluster groups, each location cluster group comprising one or more location-clustered contigs; a step of applying an ordering model to each of the two or more location cluster groups to form an ordered set of one or more location-clustered contigs within each cluster group; and a step of applying an orienting model to each ordered set of one or more location-clustered contigs to assign a relative orientation to each of the location-clustered contigs within each location cluster group. In some aspects, the test set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique (e.g., Hi-C) with a draft assembly, a reference assembly, or both.

Method for large scale scaffolding of genome assemblies

Computational methods used for large scale scaffolding of a genome assembly are provided. Such methods may include a step of applying a location clustering model to a test set of contigs to form two or more location cluster groups, each location cluster group comprising one or more location-clustered contigs; a step of applying an ordering model to each of the two or more location cluster groups to form an ordered set of one or more location-clustered contigs within each cluster group; and a step of applying an orienting model to each ordered set of one or more location-clustered contigs to assign a relative orientation to each of the location-clustered contigs within each location cluster group. In some aspects, the test set of contigs are generated from aligning a set of reads generated by a chromosome conformation analysis technique (e.g., Hi-C) with a draft assembly, a reference assembly, or both.

Systems and methods for de novo peptide sequencing from data-independent acquisition using deep learning

The present systems and methods introduce deep learning to de novo peptide sequencing from tandem mass spectrometry data, and in particular mass spectrometry data obtained by data-independent acquisition. The systems and methods achieve improvements in sequencing accuracy over existing systems and methods and enables complete assembly of novel protein sequences without assisting databases. To sequence peptides from mass spectrometry data obtained by data-independent acquisition, precursor profiles representing intensities of one or more precursor ion signals associated with a precursor retention time and fragment ion spectra representing signals from fragment ions and fragment retention times are fed into a neural network.

Systems and methods for de novo peptide sequencing from data-independent acquisition using deep learning

The present systems and methods introduce deep learning to de novo peptide sequencing from tandem mass spectrometry data, and in particular mass spectrometry data obtained by data-independent acquisition. The systems and methods achieve improvements in sequencing accuracy over existing systems and methods and enables complete assembly of novel protein sequences without assisting databases. To sequence peptides from mass spectrometry data obtained by data-independent acquisition, precursor profiles representing intensities of one or more precursor ion signals associated with a precursor retention time and fragment ion spectra representing signals from fragment ions and fragment retention times are fed into a neural network.

Method for the Compression of Genome Sequence Data
20220415441 · 2022-12-29 ·

The invention relates to a reference-based method for the compression of genome sequence data produced by a sequencing machine. The sequences of nucleotides or bases, that have been previously aligned to a reference sequence, are determined to be perfectly mapped, imperfectly mapped or unmapped with the reference sequence; and then coded according to said determination. The determining step comprises comparing, for each imperfectly mapped sequence, the number of mismatches between said sequence and the reference sequence with a reference threshold value, and encoding the imperfectly mapped sequences according to distinct encoding processes, depending on the result of said comparison method for the compression of genome sequence data produced by a sequencing machine.

Method for the Compression of Genome Sequence Data
20220415441 · 2022-12-29 ·

The invention relates to a reference-based method for the compression of genome sequence data produced by a sequencing machine. The sequences of nucleotides or bases, that have been previously aligned to a reference sequence, are determined to be perfectly mapped, imperfectly mapped or unmapped with the reference sequence; and then coded according to said determination. The determining step comprises comparing, for each imperfectly mapped sequence, the number of mismatches between said sequence and the reference sequence with a reference threshold value, and encoding the imperfectly mapped sequences according to distinct encoding processes, depending on the result of said comparison method for the compression of genome sequence data produced by a sequencing machine.

System and methods for indel identification using short read sequencing
11538557 · 2022-12-27 · ·

Systems, methods, and analytical approaches for short read sequence assembly and for the detection of insertions and deletions (indels) in a reference genome. A method suitable for software implementation is presented in which indels may be readily identified in a computationally efficient manner.