G16B10/00

Methods and systems of tracking disease carrying arthropods

The present invention comprises the capture and display of arthropod, human and arthropod-based metadata, which is capable of tracking and displaying the metadata, which is time and location-based, in order to show migration paths of arthropods and/or the diseases they have the potential to carry. This real-time view can help predict future arthropod and disease based on various scenarios such as, but not limited to: increased exposure based on the following: a user's geo-location, date and/or time of year, carrier type, etc. These variables can then assist with the education, awareness and potential prevention of disease.

Systems and methods for inferring genetic ancestry from low-coverage genomic data

A computer-implemented method for inferring genetic ancestry from low-coverage genomic data may include (i) generating a reference matrix representing a genetic reference panel in terms of dosages for given reference samples at given loci, (ii) decomposing the reference matrix via non-negative matrix factorization into an ancestral genotype matrix and an ancestral attribution matrix, (iii) resampling the reference matrix, (iv) deriving an ancestral alternate reads matrix that, when multiplied with the ancestral attribution matrix, approximates the resampled reference matrix, (v) deriving an ancestral attribution vector that, when multiplied with the ancestral alternate reads matrix, approximates a vector representing the test sample, and (vi) determining the genetic ancestry of the subject based on the ancestral attribution vector. Various other methods, systems, and computer-readable media are also disclosed.

Systems and methods for inferring genetic ancestry from low-coverage genomic data

A computer-implemented method for inferring genetic ancestry from low-coverage genomic data may include (i) generating a reference matrix representing a genetic reference panel in terms of dosages for given reference samples at given loci, (ii) decomposing the reference matrix via non-negative matrix factorization into an ancestral genotype matrix and an ancestral attribution matrix, (iii) resampling the reference matrix, (iv) deriving an ancestral alternate reads matrix that, when multiplied with the ancestral attribution matrix, approximates the resampled reference matrix, (v) deriving an ancestral attribution vector that, when multiplied with the ancestral alternate reads matrix, approximates a vector representing the test sample, and (vi) determining the genetic ancestry of the subject based on the ancestral attribution vector. Various other methods, systems, and computer-readable media are also disclosed.

METHOD AND A SYSTEM FOR PROFILING OF METAGENOME

This disclosure relates generally to a method and a system for profiling of metagenome samples. Most state of-art techniques for metagenomic profiling use homology-based, curated database of identified marker sequences generated after complex and costly pre-processing. The disclosed method and system for profiling of metagenome samples are a non-homology based, a non-marker based and an alignment free strain level profiling tools for microbe profiling. The disclosure works with a several k-mer based indexing techniques for constructing a compact and comprehensive multi-level indexing, wherein the multi-level indexing includes a L1-Index and a L2-Index. The multi-level indexing is used for profiling metagenomics by abundance estimation, wherein the abundance estimation includes a relative abundance and an absolute abundance.

METHOD AND A SYSTEM FOR PROFILING OF METAGENOME

This disclosure relates generally to a method and a system for profiling of metagenome samples. Most state of-art techniques for metagenomic profiling use homology-based, curated database of identified marker sequences generated after complex and costly pre-processing. The disclosed method and system for profiling of metagenome samples are a non-homology based, a non-marker based and an alignment free strain level profiling tools for microbe profiling. The disclosure works with a several k-mer based indexing techniques for constructing a compact and comprehensive multi-level indexing, wherein the multi-level indexing includes a L1-Index and a L2-Index. The multi-level indexing is used for profiling metagenomics by abundance estimation, wherein the abundance estimation includes a relative abundance and an absolute abundance.

METHOD FOR DETECTION AND IDENTIFICATION OF KNOWN AND EMERGENT PATHOGENS

A method of detecting and identifying pathogens in a sample comprising a plurality of genetic sequences. A plurality of electronic sequence reads corresponding to the plurality of genetic sequences is received and sampled to form a sample set. The sample set is iteratively and electronically compared to a plurality of pathogen sequences to create a detection group, which populates a putative genome data structure. A distance score is measured between each electronic sequence read of the sampled set to each pathogen sequence of the putative genome data structure. A hit score is calculated by comparing the distance score to a threshold value. A plurality of clusters of the electronic sequence reads of the sample set is formed to maximize the cluster hit score and to minimize a difference in distance scores of the cluster. A respective taxonomic group assigned to electronic reads of the sample set after clustering is displayed.

IDENTIFICATION OF MATCHED SEGMENTED IN PAIRED DATASETS
20220382730 · 2022-12-01 ·

Disclosed herein relates to processes that identify segments of a target dataset that match segments of other datasets in a database. A computing server may encode the target dataset to generate a pair of encoded target bitmap sequences based on an encoding scheme. The encoding scheme defines encoding values based on homogeneity between the pair of data value sequences. The computing server may compare the pair of encoded target bitmap sequences with other pairs of encoded bitmap sequences to identify homogeneous mismatched locations. A homogeneous mismatched location may be a location where the target dataset and the other dataset in comparison are both homogeneous but have different types of homogeneity at the location. The computing server may identify a matched segment between the target dataset and one of the other datasets based on the homogeneous mismatched locations identified. The matched segment is contained within two homogeneous mismatched locations.

COMPARATIVELY-REFINED POLYGENIC RISK SCORE GENERATION MACHINE LEARNING FRAMEWORKS
20220383982 · 2022-12-01 ·

Various embodiments of the present invention describe techniques for generating a polygenic risk score generation machine learning framework that integrates an optimal genetic variant refinement model without requiring brute-force traversal of potential parameter spaces defined by various distinct genetic variant sets. In response, various embodiments of the present invention use holistic Bayesian sampling routines to efficiently generate Bayesian evidence numerical estimates for various genetic variant refinement models and select an optimal genetic variant refinement model accordingly. This enables enhancing the accuracy of polygenic risk score generation machine learning frameworks without resorting to computationally resource-intensive traversals of potential parameter spaces defined by various distinct genetic variant sets. In doing so, various embodiments of the present invention enhance the computational efficiency of generating a polygenic risk score generation machine learning framework that integrates an optimal genetic variant refinement model in contrast to computationally-inefficient techniques that require brute-force traversal of potential parameter spaces.

COMPARATIVELY-REFINED POLYGENIC RISK SCORE GENERATION MACHINE LEARNING FRAMEWORKS
20220383982 · 2022-12-01 ·

Various embodiments of the present invention describe techniques for generating a polygenic risk score generation machine learning framework that integrates an optimal genetic variant refinement model without requiring brute-force traversal of potential parameter spaces defined by various distinct genetic variant sets. In response, various embodiments of the present invention use holistic Bayesian sampling routines to efficiently generate Bayesian evidence numerical estimates for various genetic variant refinement models and select an optimal genetic variant refinement model accordingly. This enables enhancing the accuracy of polygenic risk score generation machine learning frameworks without resorting to computationally resource-intensive traversals of potential parameter spaces defined by various distinct genetic variant sets. In doing so, various embodiments of the present invention enhance the computational efficiency of generating a polygenic risk score generation machine learning framework that integrates an optimal genetic variant refinement model in contrast to computationally-inefficient techniques that require brute-force traversal of potential parameter spaces.

Copy number alteration and reference genome mapping

Technology provided herein relates in part to methods, processes, machines and apparatuses for non-invasive assessment of genomic nucleic acid instability and genomic nucleic acid stability. The method comprises providing a set of genomic portions each coupled to a copy number alteration quantification for a test sample, wherein the genomic portions comprises portions of a reference genome to which sequence reads obtained for nucleic acid from a test sample obtained from the subject have been mapped, and the copy number alteration quantification coupled to each genomic portion has been determined from a quantification of sequence reads mapped to the genomic portion; and determining, by a computing device, presence or absence of genomic instability for the subject according to the copy number alteration quantifications coupled to the genomic portions.