METHOD OF HAPLOTYPING

20220267836 · 2022-08-25

    Inventors

    Cpc classification

    International classification

    Abstract

    The present invention relates to detecting aberrant expression of genes which may be associated with a disease or disorder using haplotype phasing. In particular, the invention relates to a method of obtaining an indication of dysregulation between the expression levels of at least two alleles of a gene in a target eukaryotic cell. The method comprises the steps of for a plurality of genes from one or more target eukaryotic cells, (a) obtaining pre-mRNAs of at least two alleles of the same gene; and (b) determining the ratios (Ri,j) between amounts of the pre-mRNAs of one or more pairs of alleles (i,j) of the same gene.

    Claims

    1. A method of identifying mutations in alleles of a gene which may be causative of dysregulation of the expression levels of the alleles of the gene in a target eukaryotic cell, the method comprising the steps of: for a plurality of genes from one or more target eukaryotic cells, (a) obtaining pre-mRNAs of at least two alleles of the genes; and (b) determining the ratios (R.sub.i,j) between the amounts of pre-mRNAs of one or more pairs of alleles (i,j) of the genes; wherein when R.sub.i,j≠1 for a pair of alleles (i,j) of a gene, or in response to determining that R.sub.i,j≠1 for a pair of alleles (i,j) of a gene, the method additionally comprises the steps: (c) determining the nucleotide sequences of that pair of alleles; and (d) comparing the nucleotide sequences of that pair of alleles in order to identify differences between the nucleotide sequences of that pair of alleles; wherein one or more of the differences between the nucleotide sequences of the pair of alleles of the gene may be mutations which are causative of the dysregulation of the expression levels of the two alleles of that gene in the target eukaryotic cell, wherein the method is performed in a phased genome sequence, and wherein all sequence differences are then attributed to a specific allele to determine allelic skew across the whole gene and these sequence differences are linked with sequence variation outside the body of the gene.

    2. The method as claimed in claim 1, wherein when R.sub.i,j is <0.9 or if R.sub.i,j>1.1 for a pair of alleles (i,j) of a gene, or in response to determining that R.sub.i,j is <0.9 or if R.sub.i,j>1.1 for a pair of alleles (i,j) of a gene, the method comprises the steps: (c) determining the nucleotide sequences of that pair of alleles; and (d) comparing the nucleotide sequences of that pair of alleles in order to identify differences between the nucleotide sequences of that pair of alleles; wherein one or more of the differences between the nucleotide sequences of the pair of alleles of the gene may be mutations which are causative of the dysregulation of the expression levels of the two alleles of that gene in the target eukaryotic cell.

    3. The method as claimed in claim 1, wherein Step (c) is carried out using RNA-Seq.

    4. The method as claimed in claim 3, wherein sequences deriving from specific alleles are identified and counted by identifying, within the RNA-Seq data, sequence changes in the introns, exons and downstream transcribed regions, known to be specific to that allele.

    5. The method as claimed in claim 1, wherein if R.sub.i,j<0.9 or R.sub.i,j>1.1, then this provides an indication that there exists a change in the regulatory elements on one allele that controls the expression of the gene.

    6. The method as claimed in claim 3, wherein the method additionally comprises the further step of carrying out a sequence-based assay that measures the activity of regulatory elements to detect skew on the same allele of the genes found to be skewed using RNA-seq.

    7-12. (canceled)

    13. method as claimed in claim 1, wherein the eukaryotic cells are human primary lymphoid cells or primary neuronal cells.

    14. The method as claimed in claim 1, wherein the plurality of genes is 2-10, 10-100, 100-500, 500-1000, 1000-5000, 5000-10000, or 10000 or more genes.

    15. The method as claimed in claim 1, wherein there are 2 alleles of the same gene in each target eukaryotic cell.

    16. The method as claimed in claim 1, wherein the pre-mRNA is polyA.sup.+ mRNA.

    17. The method as claimed in claim 1, wherein R.sub.i,j≠1 means that R.sub.i,j is less than 0.95, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 0.05 or 0.01; or R.sub.i,j is more than 1.05, 1.1, 1.2, 1.4, 1.6, 2, 2.5, 3, 5, 10, 20 or 100.

    18. The method as claimed in claim 5, wherein the regulatory elements are ones which exist within the introns of the gene or outside of the body of the gene.

    19. The method as claimed in claim 18, wherein the regulatory elements are ones which exist outside of the coding region of the gene.

    20. The method as claimed in claim 6, wherein the sequence-based assay is ATAC-seq, DNase-seq or ChIP-seq.

    21. The method as claimed in claim 16, wherein the pre-mRNA is polyA-mRNA obtained from total cellular RNA.

    Description

    BRIEF DESCRIPTION OF THE FIGURES

    [0103] FIG. 1 shows the use of pre-mRNA in identifying gene dysregulation using phased haplotypes and pre-mRNA.

    [0104] FIG. 2 shows the use of pre-mRNA in phased genomes to detect dysregulation of the IKZF1 gene associated with a specific sequence variant in a regulatory element.

    [0105] FIG. 3 shows the frequency with which informative heterozygotes are found in the general population for risk alleles associated with common disease.

    EXAMPLES

    [0106] The present invention is further illustrated by the following Examples, in which parts and percentages are by weight and degrees are Celsius, unless otherwise stated. It should be understood that these Examples, while indicating preferred embodiments of the invention, are given by way of illustration only. From the above discussion and these Examples, one skilled in the art can ascertain the essential characteristics of this invention, and without departing from the spirit and scope thereof, can make various changes and modifications of the invention to adapt it to various usages and conditions.

    [0107] Thus, various modifications of the invention in addition to those shown and described herein will be apparent to those skilled in the art from the foregoing description. Such modifications are also intended to fall within the scope of the appended claims.

    Example 1: Use of pre-mRNA in Phased Genomes to Detect Gene Dysregulation Associated with a Specific Haplotype

    [0108] FIG. 1A shows a schematic representation of two genome alleles each of which contain two genes and one regulatory element. Sequence changes which distinguish the two alleles are shown as Xs (for example, single nucleotide polymorphisms (SNPs), small insertions or deletions). The exons of the two genes are shown as boxes and the promoter elements of the two genes are shown as a vertical line and horizontal arrow associated with the first exon of the genes. The position of the regulatory element is shown as a triangle between the two genes.

    [0109] In Haplotype A, this regulatory element contains a sequence change which alters its activity (shown as a lighter shade). The regulatory interactions between the regulatory element and genes, mapped by 3C methods such as Capture-C, are shown as arced lines with arrows. The sequence changes which distinguish the source allele of the pre-mRNAs from both genes lie within the transcribed portions of the genes (for example introns, exons and downstream regions). In this example the gene dysregulation is caused by a damaged regulatory element, but the same use of phased sequence changes combined with the sequencing of pre-mRNA can be used for any other mechanism (for example, gain-of-function caused by sequence variation or larger scale structural variation).

    [0110] FIG. 1B shows, in an exemplar gene, the increased coverage in pre-mRNA forms of RNA-Seq which retain the transcribed intronic and downstream regions of the gene and which increases the amount of sequence variation capable of detecting genes dysregulation. The exons of the gene are shown as vertical lines of varying thickness while the introns are shown as a horizontal hatched line.

    [0111] FIG. 1C shows the loss of binding of the GATA1 transcription factor (ChIP qPCR) at an erythroid regulatory element containing a single base pair change (rs10758656), homozygously edited into Hudep2 cells.

    [0112] FIG. 1D shows the loss of open chromatin signal at the same regulatory element using ATAC-seq in the homozygous presence of the sequence change rs10758656.

    [0113] FIG. 1E shows the erythroid-specific interaction between this regulatory element in wild type cells with the promoter of the RCL1 and JAK2.

    [0114] FIG. 1F shows the loss of expression of only the JAK2 gene in cells homozygously edited for rs10758656.

    [0115] FIG. 1G shows the allelic skew in primary erythroid cells toward the wild type allele for both ATAC-seq at the regulatory element (squares) and in pre-mRNA expression (circles) of the JAK2 gene only in individuals heterozygous for rs10758656.

    Example 2: Use of pre-mRNA in Phased Genomes to Detect Dysregulation of the IKZF1 Gene Associated with a Specific Sequence Variant in a Regulatory Element

    [0116] FIG. 2A shows the identification of a sequence change in a regulatory element which damages the binding potential of a transcription factor using the Sasquatch algorithm (Schwessinger R. et al., 2017).

    [0117] FIG. 2B shows that this regulatory element interacts erythroid-specifically with the promoter of the IKZF1 gene using NG Capture-C.

    [0118] FIG. 2C shows the allelic skew of open chromatin signal towards the wild type (Hap-B) in primary erythroid cells in 3 individuals heterozygous for the damaging sequence change (Hap-A) as determined by ATAC-seq. The corresponding signal within the same sample in haplotype B is linked to the signal in haplotype A with a dotted line.

    [0119] FIG. 2D shows the cumulative decrease in pre-mRNA of the IKZF1 gene haplotype A, summed up from all transcribed sequence changes which distinguish the two haplotypes. The corresponding signal within the same sample in haplotype B is linked to the signal in haplotype A with a dotted line.

    Example 3: The Use of Allelic Skew in pre-mRNA in Phased Genomes Allows for Regulatory Variation to be Analysed at Unprecedented Scale in Primary Cells

    [0120] FIG. 3 show the number of informative individuals expected for a given minor allele frequency (MAF) of the sequence variant in a random sampling of the general population. Grey bars represent the number of expected individuals that are heterozygous at a given MAF. The black line shows the average distribution of minor allele frequencies in a typical genome-wide association (GWA) study for human disease (Type 1 diabetes, ankylosing spondylitis, erythroid traits and multiple sclerosis, combined). This shows that, at a MAF of 0.3, this would provide greater than 20 independent observations of gene dysregulation and would cover greater than half of a typical GWA study. Similarly, for a MAF of 0.1, this would provide 5 or more independent observations of gene dysregulation and would cover greater than 90% of a typical GWA study.

    REFERENCES

    [0121] James C et al., Cell, vol. 155, 2013, “Human SNP Links Differential Outcomes in Inflammatory and Infectious Disease to a FOX03-Regulated Pathway”, pages 57-69

    [0122] Kowalczyk, M. S. et al. Intragenic enhancers act as alternative promoters. Mol Cell 45, 447-58 (2012).

    [0123] Quinn E M, et al. (2013) Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data. PLoS ONE 8(3): e58815. https://doi.org/10.1371/journal.pone.0058815

    [0124] Rainbow et al., BIOCHEMICAL SOCIETY TRANSACTIONS, vol. 36, 2008, “Commonality in the genetic control of Type 1 diabetes in humans and NOD mice: variants of genes in the IL-2 pathway are associated with autoimmune diabetes in both species”, page 312

    [0125] Schwessinger R, et al. (2017) Sasquatch: predicting the impact of regulatory SNPs on transcription factor binding from cell- and tissue-specific DNase footprints. Genome Res. 2017 Oct;27(10):1730-1742. PMCID: PMC5630036.

    [0126] Sigurdsson et al., HUMAN MOLECULAR GENETICS, vol. 17, 2008, “A risk haplotype of STAT4 for systemic lupus erythematosus is over-expressed, correlates with anti-dsDNA and shows additive effects with two risk alleles of IRF5”, pages 2868-2876

    [0127] Thomas et al., EPIGENETICS & CHROMATIC, vol. 4, 2011, “Allele-specific transcriptional elongation regulates monoallelic expression of the IGF2BPI gene”, page 14