Patent classifications
G16B30/20
ARTIFICIAL INTELLIGENCE-BASED CHROMOSOMAL ABNORMALITY DETECTION METHOD
The present invention relates to an artificial intelligence-based chromosomal abnormality detection method, and more specifically, to an artificial intelligence-based chromosomal abnormality detection method using a method that involves: extracting nucleic acids from a biological sample to generate vectorized data on the basis of DNA fragments arranged by acquiring sequence information; and then comparing a reference value and a value calculated by inputting the vectorized data into a trained artificial intelligence model. Rather than using each of values related to reads as an individual normalized value as in existing schemes, which use a step for determining the amount of a chromosome on the basis of a read count, or existing detection methods using the distance concept between arranged reads, the artificial intelligence-based chromosomal abnormality detection method according to the present invention generates vectorized data and analyzes the data using an AI algorithm, and thus is useful in that a similar effect can be exhibited even when read coverage is low.
ARTIFICIAL INTELLIGENCE-BASED CHROMOSOMAL ABNORMALITY DETECTION METHOD
The present invention relates to an artificial intelligence-based chromosomal abnormality detection method, and more specifically, to an artificial intelligence-based chromosomal abnormality detection method using a method that involves: extracting nucleic acids from a biological sample to generate vectorized data on the basis of DNA fragments arranged by acquiring sequence information; and then comparing a reference value and a value calculated by inputting the vectorized data into a trained artificial intelligence model. Rather than using each of values related to reads as an individual normalized value as in existing schemes, which use a step for determining the amount of a chromosome on the basis of a read count, or existing detection methods using the distance concept between arranged reads, the artificial intelligence-based chromosomal abnormality detection method according to the present invention generates vectorized data and analyzes the data using an AI algorithm, and thus is useful in that a similar effect can be exhibited even when read coverage is low.
BIOLOGICAL SEQUENCE COMPRESSION USING SEQUENCE ALIGNMENT
Compressing files is disclosed. An DNA sequence to be compressed is first aligned. Aligning the DNA sequence includes splitting the DNA sequences into smaller sequences or portions that can be aligned. After the DNA sequence is spilt one or more time and aligned, a compression matrix is generated. Each row of the compression matrix corresponds to part of the DNA sequence. A consensus sequence is determined from the compression matrix. Using the consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The compressed file includes the pointer pairs and the consensus sequence.
BIOLOGICAL SEQUENCE COMPRESSION USING SEQUENCE ALIGNMENT
Compressing files is disclosed. An DNA sequence to be compressed is first aligned. Aligning the DNA sequence includes splitting the DNA sequences into smaller sequences or portions that can be aligned. After the DNA sequence is spilt one or more time and aligned, a compression matrix is generated. Each row of the compression matrix corresponds to part of the DNA sequence. A consensus sequence is determined from the compression matrix. Using the consensus sequence, pointer pairs are generated. Each pointer pair identifies a subsequence of the consensus matrix. The compressed file includes the pointer pairs and the consensus sequence.
Compositions and methods for high fidelity assembly of nucleic acids
Aspects of the invention relate to methods, compositions and algorithms for designing and producing a target nucleic acid. The method can include: (1) providing a plurality of blunt-end double-stranded nucleic acid fragments having a restriction enzyme recognition sequence at both ends thereof; (2) producing via enzymatic digestion a plurality of cohesive-end double-stranded nucleic acid fragments each having two different and non-complementary overhangs; (3) ligating the plurality of cohesive-end double-stranded nucleic acid fragments with a ligase; and (4) forming a linear arrangement of the plurality of cohesive-end double-stranded nucleic acid fragments, wherein the unique arrangement comprises the target nucleic acid. In certain embodiments, the plurality of blunt-end double-stranded nucleic acid fragments can be provided by: releasing a plurality of oligonucleotides synthesized on a solid support; and synthesizing complementary strands of the plurality of oligonucleotides using a polymerase based reaction.
Compositions and methods for high fidelity assembly of nucleic acids
Aspects of the invention relate to methods, compositions and algorithms for designing and producing a target nucleic acid. The method can include: (1) providing a plurality of blunt-end double-stranded nucleic acid fragments having a restriction enzyme recognition sequence at both ends thereof; (2) producing via enzymatic digestion a plurality of cohesive-end double-stranded nucleic acid fragments each having two different and non-complementary overhangs; (3) ligating the plurality of cohesive-end double-stranded nucleic acid fragments with a ligase; and (4) forming a linear arrangement of the plurality of cohesive-end double-stranded nucleic acid fragments, wherein the unique arrangement comprises the target nucleic acid. In certain embodiments, the plurality of blunt-end double-stranded nucleic acid fragments can be provided by: releasing a plurality of oligonucleotides synthesized on a solid support; and synthesizing complementary strands of the plurality of oligonucleotides using a polymerase based reaction.
Methods and compositions for addressing inefficiencies in amplification reactions
Methods and systems for decreasing amplification bias and primer-dimer formation in amplification reactions and for amplifying a plurality of target polynucleotides from a sample in a single reaction and for sequencing the target polynucleotides where samples can include forensic samples and where target polynucleotides can include identity- or ancestry-informative markers, short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs). Methods of determining a nucleotide spacer sequence for disrupting primer dimer formation can include: receiving a set of primer sequences; determining a plurality of candidate spacers between an adapter sequence and a gene-specific portion of the primer sequence, the determined plurality of candidate spacers comprises sequences that disrupt stable interactions between sequences of the set of primer sequences; ranking candidate spacers that meet a predetermined threshold value of stable interactions in the extension sequences; and outputting a set of the ranked spacers that meet the predetermined threshold.
Methods and compositions for addressing inefficiencies in amplification reactions
Methods and systems for decreasing amplification bias and primer-dimer formation in amplification reactions and for amplifying a plurality of target polynucleotides from a sample in a single reaction and for sequencing the target polynucleotides where samples can include forensic samples and where target polynucleotides can include identity- or ancestry-informative markers, short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs). Methods of determining a nucleotide spacer sequence for disrupting primer dimer formation can include: receiving a set of primer sequences; determining a plurality of candidate spacers between an adapter sequence and a gene-specific portion of the primer sequence, the determined plurality of candidate spacers comprises sequences that disrupt stable interactions between sequences of the set of primer sequences; ranking candidate spacers that meet a predetermined threshold value of stable interactions in the extension sequences; and outputting a set of the ranked spacers that meet the predetermined threshold.
Proteogenomic-based method for identifying tumor-specific antigens
T cells, notably CD8 T cells, are known to be essential players in tumor eradication as the presence of tumor-infiltrating lymphocytes (TILS) in several cancers positively correlates with a good prognosis. To eliminate tumor cells, CD8 T cells recognize tumor antigens, which are MHC I-associated peptides present at the surface of tumor cells, with no or very low expression on normal cells. Described herein a proteogenomic approach using RNA-sequencing data from cancer and normal-matched mTEC.sup.hi samples in order to identify non-tolerogenic tumor-specific antigens derived from (i) coding and non-coding regions of the genome, (ii) non-synonymous single-base mutations or short insertion/deletions and more complex rearrangements as well as (iii) endogenous retroelements, which works regardless of the sample's mutational load or complexity.
Variant Calling For Multi-Sample Variation Graph
A method for calling variants in genetic data includes sorting nodes in a graph-based reference genome, assigning identification information to the sorted nodes, assigning depth values to respective ones of the sorted nodes, determining a reference genome path and one or more variation paths, and determining one or more variants in the graph-based reference genome based on the depth values assigned to nodes on the one or more variation paths.