G16B10/00

Ancestry composition determination

Presenting ancestral origin information, comprising: receiving a request to display ancestry data of an individual; obtaining ancestry composition information of the individual, the ancestry composition information including information pertaining to a proportion of the individual's genotype data that is deemed to correspond to a specific ancestry; and presenting the ancestry composition information to be displayed.

SPECIES PROXIMITY-AWARE EVOLUTIONARY CONSERVATION PROFILES

The technology disclosed relates to generating species-differentiable evolutionary profiles using a weighting logic. In particular, the technology disclosed relates to determining a weighted summary statistic for a given residue category at a given position in a multiple sequence alignment based on one or more weights of one or more sequences in the multiple sequence alignment that have a residue of the given residue category at the given position.

SPECIES PROXIMITY-AWARE EVOLUTIONARY CONSERVATION PROFILES

The technology disclosed relates to generating species-differentiable evolutionary profiles using a weighting logic. In particular, the technology disclosed relates to determining a weighted summary statistic for a given residue category at a given position in a multiple sequence alignment based on one or more weights of one or more sequences in the multiple sequence alignment that have a residue of the given residue category at the given position.

DEEP LEARNING NETWORK FOR EVOLUTIONARY CONSERVATION
20230207054 · 2023-06-29 · ·

The technology disclosed relates to a deep learning network system for evolutionary conservation prediction. In one implementation, the system includes a first model for processing a first multiple sequence alignment that aligns a query sequence with a masked base at a target position to N non-query sequences and predicting a first identity of the masked base at the target position. The system also includes a second model for processing a second multiple sequence alignment that aligns the query sequence to M non-query sequences, where M>N, and predicting a second identity of the masked base at the target position. The system further includes an evolutionary conservation determination logic configured to measure an evolutionary conservation of the masked base at the target position based on the first and second identities of the masked base.

DEEP LEARNING NETWORK FOR EVOLUTIONARY CONSERVATION
20230207054 · 2023-06-29 · ·

The technology disclosed relates to a deep learning network system for evolutionary conservation prediction. In one implementation, the system includes a first model for processing a first multiple sequence alignment that aligns a query sequence with a masked base at a target position to N non-query sequences and predicting a first identity of the masked base at the target position. The system also includes a second model for processing a second multiple sequence alignment that aligns the query sequence to M non-query sequences, where M>N, and predicting a second identity of the masked base at the target position. The system further includes an evolutionary conservation determination logic configured to measure an evolutionary conservation of the masked base at the target position based on the first and second identities of the masked base.

UNIQUE MAPPER TOOL FOR EXCLUDING REGIONS WITHOUT ONE-TO-ONE MAPPING BETWEEN A SET OF TWO REFERENCE GENOMES

A first reference genome is segmented into a plurality of bins and high-quality sequenced reads are mapped on a bin-by-bin basis to the plurality of bins in the first reference genome, and a second reference genome is segmented into a plurality of bins and high-quality sequenced reads are mapped on a bin-by-bin basis to the plurality of bins in the second reference genome. A best-mapped bin is identified in the second reference genome based on the greatest degree of match between the best-mapped bin in the second reference genome and a corresponding bin in the first reference genome.

VARIANT CALLING WITHOUT A TARGET REFERENCE GENOME

The technology disclosed relates to determining feasibility of using a reference genome of a non-target species for variant calling a sample of a target species. In particular, the technology disclosed relates to mapping sequenced reads of a sample of a target species to a reference genome of a non-target species to detect a first set of variants in the sequenced reads of the sample of the target species, and mapping the sequenced reads of the sample of the target species to a reference genome of a pseudo-target species to detect a second set of variants in the sequenced reads of the sample of the target species.

QUALITY DETECTION OF VARIANT CALLING USING A MACHINE LEARNING CLASSIFIER

The technology disclosed relates to variant calling of sequenced reads of a sample of a target species against a reference genome of a pseudo-target species. Low-quality variants are identified as false positive variants that are present in the second set of variants but absent from the first set of variants.

MASK PATTERN FOR PROTEIN LANGUAGE MODELS

The technology disclosed relates to accessing a multiple sequence alignment that aligns a query residue sequence to a plurality of non-query residue sequences, applying a set of periodically-spaced masks to a first set of residues at a first set of positions in the multiple sequence alignment, and cropping a portion of the multiple sequence alignment that includes the set of periodically-spaced masks at the first set of positions, and a second set of residues at a second set of positions in the multiple sequence alignment to which the set of periodically-spaced masks is not applied. The first set of residues includes a residue-of-interest at a position-of-interest in the query residue sequence.

PATHOGENICITY LANGUAGE MODEL

A system comprises chunking logic that chunks (or splits) a multiple sequence alignment (MSA) into chunks, first attention logic that attends to a representation of the chunks and produces a first attention output, first aggregation logic that produces a first aggregated output that contains those features in the first attention output that correspond to masked residues in the plurality of masked residues, mask revelation logic that produces an informed output based on the first aggregated output and a Boolean mask, second attention logic that attends to the informed output and produces a second attention output based on masked residues revealed by the Boolean mask, second aggregation logic that produces a second aggregated output that contains those features in the second attention output that correspond to masked residues concealed by the Boolean mask, and output logic that produces identifications of the masked residues based on the second aggregated output.