METHOD FOR SCREENING PATHOGENIC UNIPARENTAL DISOMY AND USE THEREOF

20220328131 · 2022-10-13

    Inventors

    Cpc classification

    International classification

    Abstract

    A method of screening a pathogenic uniparental disomy and a use thereof is provided. The method includes the steps as follows: obtaining data: obtaining whole exome sequencing data; screening for sites: screening and obtaining mutations under pre-determined conditions; judging LOH: performing LOH judgement according to the mutations obtained above; and judging UPD: judging UPD according to the LOH judgement, wherein when an amount of chromosomes with LOH exceeds 2, a sample is judged as a consanguineous marriage; when there is a single copy of a region with LOH, a sample is judged as a fragment deletion; and other samples are judged as UPD when there are regions with LOH. In the method, specific mutated sites are screened out to perform LOH judgment, to finally obtain the results for UPD judgment. The method is based on the whole exome sequencing data, indicating the risk of pathogenic UPD alongside conventional screening of pathogenic mutations, without additional experiments and labor cost.

    Claims

    1. A method of screening a pathogenic uniparental disomy, comprising: obtaining data: obtaining whole exome sequencing data; screening for sites: screening and obtaining mutations under pre-determined conditions; judging LOH: performing LOH judgement according to the mutations obtained in the step of screening for sites; and a region is judged to be LOH when a product of an amount of contiguous sites and a coverage range thereof is greater than a pre-set value; and judging UPD: judging UPD according to the LOH judgement, wherein when an amount of chromosomes with LOH exceeds 2, a sample is judged as a consanguineous marriage; when there is a single copy of a region with LOH, a sample is judged as a fragment deletion; and other samples are judged as UPD when there are regions with LOH.

    2. The method of screening a pathogenic uniparental disomy according to claim 1, wherein the mutations under the pre-determined conditions are screened and obtained through the following approaches: screening for high-quality mutation sites: screening for high-quality mutation sites from the whole exome sequencing data; removing Y chromosome mutations: removing Y chromosome mutations from the above mutation sites; screening for point mutations: screening for point mutations from the mutations obtained in the step of removing Y chromosome mutations; screening for allele frequency: screening for sites which are located in the point mutations in the previous step to obtain sites which have a population allele frequency of less than 0.7 in each race in a population database; and screening for mutation frequency: removing sites which have a mutation frequency of heterozygous sites of higher than 70%, and removing sites which have a mutation frequency of homozygous sites of less than 85% from the sites which are located in point mutations in the previous step, thereby obtaining the mutations under predetermined conditions.

    3. The method of screening a pathogenic uniparental disomy according to claim 2, wherein in the step of screening for high-quality mutation sites, the high-quality mutation sites are the mutation sites passed through a quality control of GATK-VQSR, and having a total coverage range of more than 40X and a mutation frequency of greater than 30%.

    4. The method of screening a pathogenic uniparental disomy according to claim 2, wherein a step of excluding false positive sites is further included between the step of screening for allele frequency and the step of screening for mutation frequency, wherein the step of excluding false positive sites is performed according to the Hardy-Weinberg balance, by excluding false positive sites from a frequency database in a regional population to be evaluated.

    5. The method of screening a pathogenic uniparental disomy according to claim 1, wherein the step of screening for high-quality mutation sites further includes a step of quality control, wherein the step of quality control is used to detect the amount of mutations obtained by the screening; when the amount of mutations is greater than or equal to 10,000, the step of quality control indicates PASS; when the amount of mutations is less than 10,000, the step of quality control indicates FAIL.

    6. The method of screening a pathogenic uniparental disomy according to claim 1, wherein in the step of judging LOH, the amount of contiguous homozygous sites is greater than or equal to 20, and their coverage range is greater than or equal to 3 Mbp.

    7. The method of screening a pathogenic uniparental disomy according to claim 6, wherein in the step of judging LOH, when the product of the amount of contiguous homozygous sites and their coverage range is greater than 200 Mbp, a region is judged to be LOH.

    8. The method of screening a pathogenic uniparental disomy according to claim 1, wherein the step of judging UPD further includes a step of judging a pathogenic risk; in the step of judging a pathogenic risk, the LOH region which is judged to be UPD is further compared with imprinted genes; when the LOH region does not cover an imprinted gene or a corresponding band, a sample is indicated as a benign UPD; when the LOH region covers the imprinted gene or the corresponding band, a sample is indicated as being at risk of pathogenic UPD.

    9. A method of preparing a device for screening a pathogenic uniparental disomy, comprising applying the method of claim 1 to screen a pathogenic uniparental disomy.

    10. A device for screening a pathogenic uniparental disomy, comprising: a module of data acquisition, configured for obtaining whole exome sequencing data; a module of site screening, configured for screening for mutations under pre-determined conditions; a module of LOH judgment, configured for performing LOH judgment according to the mutations obtained in the module of site screening, and a region is judged to be LOH when a product of an amount of contiguous homozygous sites and their coverage range is greater than a pre-set value; and a module of UPD judgment, configured for performing UPD judgement according to the LOH judgment, wherein when an amount of chromosomes with LOH exceeds 2, a sample is judged as a consanguineous marriage; when there is a single copy of a region with LOH, a sample is judged as a fragment deletion; and other samples are judged as UPD when there are regions with LOH.

    11. The device for screening a pathogenic uniparental disomy according to claim 10, wherein the mutations under pre-determined conditions are screened and obtained through the following approaches: screening for high-quality mutation sites: screening for high-quality mutation sites from whole exome sequencing data; removing Y chromosome mutations: removing Y chromosome mutations from the above mutation sites; screening for point mutations: screening for point mutations from the mutations obtained in the step of removing Y chromosome mutations; screening for allele frequency: screening for sites which are located in the point mutations in the previous step to obtain sites which have a population allele frequency of less than 0.7 in each race in a population database; and screening for mutation frequency: removing sites which have a mutation frequency of heterozygous sites of higher than 70%, and removing sites which have a mutation frequency of homozygous sites of less than 85% from the sites which are located in point mutations in the previous step, thereby obtaining the mutations under predetermined conditions.

    12. The device for screening a pathogenic uniparental disomy according to claim 11, wherein in the module of screening high-quality mutation sites, the high-quality mutation sites are the mutation sites passed through a quality control of GATK-VQSR, and having a total coverage range of more than 40X and a mutation frequency of greater than 30%.

    13. The device for screening a pathogenic uniparental disomy according to claim 11, wherein a module of excluding false positive site is further included between the module of allele frequency screening and the module of mutation frequency screening, wherein the module of excluding false positive site is performed according to the Hardy-Weinberg balance, by excluding false positive sites from a frequency database in a regional population to be evaluated.

    14. The device for screening a pathogenic uniparental disomy according to claim 10, wherein the module of sites screening further includes a quality control unit, wherein the quality control unit is used to detect the amount of mutations obtained by the screening; when the amount of mutations is greater than or equal to 10,000, the quality control unit indicates PASS; when the amount of mutations is less than 10,000, the quality control unit indicates FAIL.

    15. The device for screening a pathogenic uniparental disomy according to claim 10, wherein in the module of LOH judgment, the amount of contiguous homozygous sites is greater than or equal to 20, and the coverage range is greater than or equal to 3 Mbp.

    16. The device for screening a pathogenic uniparental disomy according to claim 15, wherein in the module of LOH judgment, when a product of the amount of contiguous homozygous sites and the coverage range thereof is greater than 200 Mbp, a region is judged to be LOH.

    17. The device for screening a pathogenic uniparental disomy according to claim 10, wherein the module of UPD judgment further includes a unit of judging a pathogenic risk, wherein in the module of judging a pathogenic risk, the LOH region which is judged to be UPD is further compared with an imprinted gene; and when the LOH region does not cover an imprinted gene or a corresponding band, this region is indicated as a benign UPD; when the LOH region covers the imprint gene or the corresponding band, the region is indicated as being at risk of pathogenic UPD.

    18. A computer program product, comprising a computer readable storage medium storing a computer readable program code, the computer readable program code comprising an algorithm that when executed by a computer processor of a computing system implements the method according to claim 1.

    19. A system comprising a processor configured to perform the method according to claim 1.

    Description

    BRIEF DESCRIPTION

    [0056] Some of examples will be described in detail, with reference to the following figures, wherein like designations denote like members, wherein:

    [0057] FIG. 1 shows a diagram of LOH distribution on chromosomes in Example 1;

    [0058] FIG. 2 shows an enlarged diagram of LOH distributed on chromosomes 5 and 7 of FIG. 1;

    [0059] FIG. 3 shows an enlarged diagram of LOH distributed on chromosome 14, 16 and 19 of FIG. 1;

    [0060] FIG. 4 shows a distribution diagram of LOH on chromosomes in Example 2;

    [0061] FIG. 5 shows an enlarged diagram of LOH distributed on chromosome 15 of FIG. 4;

    [0062] FIG. 6 shows a distribution diagram of LOH on chromosomes in Example 3;

    [0063] FIG. 7 shows an enlarged diagram of LOH 12.57 (M) distributed on chromosome 5 of FIG. 6;

    [0064] FIG. 8 shows a distribution diagram of LOH on chromosomes of sample NP19E1405 in Example 5;

    [0065] FIG. 9 shows an enlarged diagram of LOH distributed on chromosome 15 of FIG. 8;

    [0066] FIG. 10 shows a schematic diagram indicating the verified results of sample NP19E1405 in the methylation test;

    [0067] FIG. 11 shows a distribution diagram of LOH on chromosomes of sample NP19F0095 in Example 5;

    [0068] FIG. 12 shows an enlarged diagram of LOH distributed on chromosome 15 of FIG. 11;

    [0069] FIG. 13 shows a schematic diagram indicating the verified results of sample NP19F0095 in the methylation test;

    [0070] FIG. 14 shows a distribution diagram of LOH on chromosomes of sample NP19E0517 in Example 5;

    [0071] FIG. 15 shows an enlarged diagram of LOH distributed on chromosome 15 of FIG. 14;

    [0072] FIG. 16 shows a schematic diagram indicating the verified results of sample NP19E0517 in the methylation test;

    [0073] FIG. 17 shows a distribution diagram of LOH on chromosomes of sample NP16S0255 in Example 5;

    [0074] FIG. 18 shows an enlarged diagram of LOH distributed on chromosome 15 of FIG. 17;

    [0075] FIG. 19 shows a schematic diagram indicating the verified results of sample NP16S0255 in the methylation test;

    [0076] FIG. 20 shows a distribution diagram of LOH on chromosomes of sample NP16S0320 in Example 5;

    [0077] FIG. 21 shows an enlarged diagram of LOH distributed on chromosome 15 of FIG. 20; and

    [0078] FIG. 22 shows a schematic diagram indicating the verified results of sample NP16S0320 in the methylation test;

    [0079] wherein, in FIGS. 1, 4, 6, 8, 11, 14, 17, and 20, the abscissa indicates a serial number of each chromosome, the lower half of the figures shows proportions of lengths of contiguous homozygous sites to the entire chromosome, while the upper half of the figures shows the distribution of mutation sites on each chromosome; and

    [0080] in the FIGS. 2, 3, 5, 7, 9, 12, 15, 18, and 21 showing enlarged diagrams of LOHs, the black line in the middle is an exome bed, the diamond points on the left is detected heterozygous (Het) mutations, and the five-pointed star points on the right is detected homozygous (Hom) mutations, the dotted line on the right is an imprinted location, and the cross points on the imprinted location are imprinted genes.

    DETAILED DESCRIPTION

    [0081] For better understanding of the present disclosure, the present disclosure will be fully described below with reference to the relevant accompanying figures. The preferred embodiments are shown in the figures. However, the present disclosure can be implemented in many different forms and is not limited to the embodiments described herein. Rather, these embodiments are provided for the purpose of making the disclosed contents of the present disclosure more thorough and complete.

    [0082] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those normally understood by one skilled in the art in the technical field belonging to the present disclosure. The terms used in the description of the present disclosure herein are only for the purpose of describing embodiments, and are not intended to limit the present disclosure. The term “and/or” used herein comprises anyone or all combinations of one or more corresponding items listed herein.

    EXAMPLE 1

    [0083] A method for screening a pathogenic uniparental disomy, comprise the steps as follows:

    [0084] 1. Obtaining Data

    [0085] The whole exome sequencing data of one sample was obtained, wherein there were 59312 mutations.

    [0086] 2. Screening for Sites

    [0087] 2.1 Screening for High-quality Mutation Sites

    [0088] The high-quality mutation sites were screened in the whole exome sequencing data, specifically, the high-quality mutation sites were those passed through a quality control of GATK-VQSR, and having a total coverage range of more than 40X and a mutation frequency of greater than 30%. In this sample, there were 45260 mutations.

    [0089] 2.2 Removing Y Chromosome Mutations

    [0090] The mutations on Y chromosome were removed from the above mutation sites, to obtain 45256 mutations.

    [0091] 2.3 Screening for Point Mutations

    [0092] The point mutations were screened out from the mutations obtained in the step of removing Y chromosome to obtain 41273 mutations.

    [0093] 2.4 Screening for Allele Frequency

    [0094] Sites which had a population allele frequencies of less than 0.7 in each race (East Asians, South Asians, African/African American, American, Finnish, non-Finnish European) in the population database (1000 Genomes, ESP6500, ExAC, gnomAD) were screened out from the point mutations obtained in the previous step, thereby obtaining 22,231 mutations.

    [0095] 2.5 Excluding False Positive Sites

    [0096] According to the Hardy-Weinberg balance, the false positive sites were excluded from a frequency database in a regional population to be evaluated thereby obtaining 21,705 mutations.

    [0097] 2.6 Screening for Mutation Frequency

    [0098] Sites which had a mutation frequency of heterozygous sites of higher than 70% and sites which had a mutation frequency of homozygous sites of less than 85% were removed from the above-mentioned point mutations in the previous step, thereby obtaining 21644 mutations under pre-determined conditions.

    [0099] 3. Judging LOH

    [0100] For the above-mentioned sites, a region was judged to be LOH, if a product of an amount of contiguous homozygous sites and the coverage range thereof was greater than 200 Mbp, wherein the amount of contiguous homozygous sites was greater than or equal to 20, and the coverage range was greater than or equal to 3 Mbp.

    [0101] According to the above rule, there were 5 LOH regions detected among the sample of Example 1, as shown in TABLE 1.

    TABLE-US-00001 TABLE 1 LOH Regions Coverage Range of Amount of contiguous Start End homozygous homozygous Imprinted Imprinted Chromosome Position Position mutations sites (M) gene band chr5 94860194 112927900 50 18.07 ERAP2 5q15 chr7 105752555 135329690 99 29.58 MEST, 7q21-q22, KLF14, 7q22, 7q32, CPA4, 7q32.2, MESTIT1 7q32.3 chr14 58563694 75747258 88 17.18 SMOC1 14q24.2 chr16 5132636 13003248 45 7.87 16p13.3 chr19 49096065 53345414 98 4.25 19q13.4

    [0102] It can be seen from the above results that five LOH regions are located on five chromosomes, respectively. FIG. 1 shows a distribution diagram of the five LOH regions on the chromosomes, wherein the ellipse represents LOH regions. FIGS. 2 and 3 show enlarged diagram of LOH distribution on chromosomes 5, 7, 14, 16, and 19, respectively.

    [0103] 4. Judging UPD

    [0104] As the five LOH regions were located on five chromosomes, respectively, the sample was judged as consanguineous marriage, rather than UPD. pathogenicity.

    [0105] The sample is proved to be offspring of consanguineous marriage later.

    EXAMPLE 2

    [0106] A screening of a pathogenic UPD was performed on a sample by using the method of Example 1, wherein:

    [0107] 1. Obtaining Data

    [0108] It was performed with reference to Example 1.

    [0109] 2. Screening for Sites

    [0110] It was performed with reference to Example 1, and 22210 mutations meeting the pre-determined conditions were obtained.

    [0111] 3. Judging LOH

    [0112] For the above obtained sites, a region was judged to be LOH if a product of an amount of contiguous homozygous sites and the coverage range thereof was greater than 200 Mbp, wherein the amount of contiguous homozygous sites was greater than or equal to 20, and the coverage range was greater than or equal to 3 Mbp.

    [0113] According to the above rule, there was 1 LOH region detected in the sample of this example, as shown in TABLE 2.

    TABLE-US-00002 TABLE 2 LOH Regions Coverage Range of Amount of contiguous Start End homozygous homozygous Imprinted Chromosome Position Position mutations sites(M) Imprinted gene band chr15 22369343 34649247 19 12.28 SNRPN, MAGEL2, 15q11-q12, NDN, SNORD107, 15q11-q13, SNORD108, SNORD109A, 15q11.2, SNORD115-48, ATP10A, 15q11.2-q12, UBE3A, MKRN3, SNURF, 15q12 SNORD64, NPAP1

    [0114] It can be seen from above results that the above-mentioned LOH region is located on chromosome 15, with a length of 12.28 M. FIG. 4 shows a diagram of the 12.8 M of LOH distribution on the chromosome, wherein the ellipse represents a LOH region. FIG. 5 shows an enlarged diagram of LOH distribution on chromosome 15.

    [0115] 4. Judging UPD

    [0116] 4.1 Principle Judgment

    [0117] As such LOH region was not in accordance with the rules for judging consanguineous marriage and fragment deletion, the sample was judged as UPD.

    [0118] 4.2 Judging Pathogenic Risk

    [0119] The above 12.28 M of LOH covers the imprinted gene which corresponds to Prader-Willi syndrome.

    [0120] The sample is proved to have Prader-Willi syndrome later.

    EXAMPLE 3

    [0121] A screening of a pathogenic UPD was performed on a sample by using the method of Example 1, wherein:

    [0122] 1. Obtaining Data

    [0123] It was performed with reference to Example 1.

    [0124] 2. Screening for Sites

    [0125] It was performed with reference to Example 1, and 22947 mutations meeting the pre-determined conditions were obtained.

    [0126] 3. Judging LOH

    [0127] For the above obtained sites, a region was judged to be LOH if a product of an amount of contiguous homozygous sites and the coverage range thereof was greater than 200 Mbp, wherein the amount of contiguous homozygous sites was greater than or equal to 20, and the coverage range was greater than or equal to 3 Mbp.

    [0128] According to the above rule, there were 2 LOH regions detected in the sample of this example, as shown in TABLE 3.

    TABLE-US-00003 TABLE 3 LOH Regions Coverage Range of Amount of contiguous Start End homozygous homozygous Imprinted Imprinted Chromosome Position Position mutations sites(M) gene band chr5 2748427 96350710 262 93.6 ERAP2, 5q15 RNU5D-1 chr5 167645888 180219304 109 12.57

    [0129] It can be seen from above results that the above LOH regions are located on chromosome 5, with a length of 93.6 M and 12.57 M, respectively. FIG. 6 shows a diagram of both LOH regions distribution on the chromosome, wherein the ellipse represents LOH regions. FIG. 7 shows an enlarged diagram of 12.57 M of LOH according to FIG. 6.

    [0130] Notes: CMA gene chip detection (chip type was CytoScan HD) was also done on the sample, and the tested results shows two LOH regions, i.e., chr5:2667631-99572420 and chr5:166974594-180520810, which are almost the same as those detected through the method of the present disclosure.

    [0131] 4. Judging UPD

    [0132] 4.1 Principle Judgment

    [0133] As such LOH region was not in accordance with the rules for judging consanguineous marriage and fragment deletion, the sample was judged as UPD.

    [0134] 4.2 Pathogenic Risk Judgment

    [0135] The above 93.6 M of LOH covered the imprinted genes ERAP2 and RNU5D-1. However, there are few studies related to them at present, so that they cannot be clearly identified as the cause of diseases, but can suggest relevant risks.

    EXAMPLE 4

    [0136] A screening of a pathogenic UPD was performed by using a device as follows, the device comprises:

    [0137] a module of data acquisition, configured for obtaining whole exome sequencing data;

    [0138] a module of site screening, configured for screening for mutations under pre-determined conditions;

    [0139] a module of LOH judgment, configured for performing LOH judgment according to the mutations obtained above; and a region is judged to be LOH when a product of an amount of contiguous homozygous sites and their coverage range is greater than a pre-set value; and

    [0140] a module of UPD judgment, configured for performing UPD judgement according to the LOH judgment, wherein when an amount of chromosomes with LOH exceeds 2, a sample is judged as a consanguineous marriage; when there is a single copy of a region with LOH, a sample is judged as a fragment deletion; and other samples are judged as UPD when there are regions with LOH.

    [0141] The above device run program according to the method of Example 1.

    EXAMPLE 5

    [0142] A screening of a pathogenic UPD was performed by using the device of Example 4.

    [0143] In this example, the whole exome gene sequencing obtained in routine examinations was analyzed, and five clinical samples were judged to be positive for UPD.

    [0144] After a routine examination of the conventional whole exome gene sequencing, the above samples were analyzed with conventional methods and tested by MLPA. Among them, no clinically relevant and clear pathogenic variations were detected in 3 samples, but it was proved in methylation experiment that the above 5 samples were all PWS-AS, as shown in the following table.

    TABLE-US-00004 TABLE 4 VERIFICATION RESULTS OF THE JUDGEMENT METHOD OF THE PRESENT DISCLOSURE Verification Results Reported Sample LOH Analysis Results-Methylation in a Conventional number Results Level (%) Method NP19E1405 15q11q13 hmz 4 negative NP19F0095 chr15 hmz 4 negative NP19E0517 15q14q21, 15q26 96 chr15 UPD hmz NP16S0255 chr15 hmz 98 negative NP16S0320 15q11q14 hmz 86 15q11q14 del Note 1: hmz is short for homozygous, indicating the region is homozygous, i.e., loss of heterozygosity (LOH). Note 2: For this region, the level of maternal-origin methylation is above 80% and the level of paternal-origin methylation is below 10%, so the methylation level in a normal people is about 45%. If the maternal-origin UPD occurs, the overall methylation level is above 80%, and the clinical manifestation is PWS (Prader-Willi syndrome); if the paternal-origin UPD occurs, the overall methylation level is below 10%, and the clinical manifestation is AS (Angelman syndrome). Note 3: The original reported results of sample NP16S0320 showed a large heterozygous region deletion of 15q11-q14, i.e., loss of one copy, which would also be indicated as LOH.

    [0145] In the above samples, LOH results of sample NP19E1405 are shown in FIGS. 8-9, and verification results of methylation experiments are shown in FIG. 10. The results of sample NP19F0095 are shown in FIGS. 11-12, and the verification results of methylation experiment are shown in FIG. 13. The results of sample NP19E0517 are shown in FIGS. 14-15, and the verification results of methylation experiment are shown in FIG. 16. The results of sample NP16S0255 are shown in FIGS. 17-18, and the verification results of methylation experiment are shown in FIG. 19. The results of sample NP16S0320 are shown in FIGS. 20-21, and the verification results of methylation experiment are shown in FIG. 22.

    EXAMPLE 6

    [0146] A screening of a pathogenic UPD was performed based on the whole exome sequencing data of 12444 samples, which were sent for screening pathogenic UPD. The screening was carried out according to the method in Example 1. 1018 samples were detected with LOH and 800 samples were remained apart from consanguineous marriage. After analysis, it was found that imprinted gene were covered in 142 samples, parts of which were proved to be consistent with the screening results at a coincidence rate of more than 95% after return visit.

    [0147] Although the present invention has been disclosed in the form of preferred embodiments and variations thereon, it will be understood that numerous additional modifications and variations could be made thereto without departing from the scope of the invention.

    [0148] For the sake of clarity, it is to be understood that the use of ‘a’ or ‘an’ throughout this application does not exclude a plurality, and ‘comprising’ does not exclude other steps or elements.