Method for the Study of Embryo Mutations in IN VITRO Reproduction Processes
20210343365 · 2021-11-04
Inventors
Cpc classification
G16B20/20
PHYSICS
C12N15/1072
CHEMISTRY; METALLURGY
C12Q1/6883
CHEMISTRY; METALLURGY
International classification
G16B20/20
PHYSICS
C12N15/10
CHEMISTRY; METALLURGY
Abstract
The invention relates to a method for the study of embryo mutations in in vitro reproduction processes with the particular feature that it combines the detection techniques of Aneuploidy (PGD-A) and the study of monogenic diseases in embryos (PGD-M), and wherein the method comprises a SNP selection process wherein the values of some n candidate SNPs (t.sub.1 . . . t.sub.k) of each subject x, in a chromosomal region of interest and specifically extracted for a study population, are taken as an input; a SNP selection process wherein all the SNP combinations are evaluated to obtain a minimum set t of tagSNPs from the matrix M obtained in the first SNP selection process; and an in-silico validation process of the tagSNP panel obtained in the second process.
Claims
1. A method for the study of embryo mutations in in vitro reproduction processes with the particular feature that combines the detection techniques of Aneuploidy (PGD-A) and the study of monogenic embryonic diseases (PGD-M) and characterised in that comprises the processes of: a SNP selection process wherein the values of some n candidate SNPs (t.sub.1 . . . t.sub.k) of each subject x, in a chromosomal region of interest and specifically extracted for a study population, are taken as an input; and wherein this process is configured to maximise the situation in which one of the parents has the value of an SNP in a heterozygous state, while the other parent has the value of an SNP in a homozygote state, and to obtain a panel of z optimised SNPs for both maximised values in the form of matrix M whose columns correspond to the subjects of the population and the rows to the values of each SNP for each subject; a SNP selection process wherein all the SNP combinations are evaluated to obtain a minimum set t of tagSNPs from the matrix M obtained in the first SNP selection process; and an in-silico validation process of the tagSNP panel obtained in the second process.
2. The method according to claim 1, wherein the first SNP selection process comprises the selection of those SNPs that are biallelic, wherein subjects can be represented as length haplotypes m formed by binary strings {1,0}, wherein 1|0 and 0|1 are the values for heterozygous SNPs and 0|0 and 1|1 are the values for the homozygotes SNPs; and wherein this selection is made throughout the chromosomal region of interest.
3. The method according to claim 2, wherein the chromosomal region of interest is defined as any position that is located two megabases above and two megabases below the gene or mutation under study.
4. The method according to any one of claims 1 to 3, wherein the first process comprises a stage of analysing the n candidate SNPs in the region and excluding the SNPs that meet any of the following conditions: SNPs with more than one alternative allele (non-biallelic SNPs); SNPs whose alleles are different from the change of a single nucleotide; SNPs that are homozygous in at least 99% of the population of interest; and uncommon SNPs, wherein the minor allele frequency is less than 1%.
5. The method according to any one of claims 1 to 4, wherein the first process comprises a stage of maximising the situation in which one of the parents has the value of a SNP in a heterozygous state, while the other parent has the value of the SNP in a homozygote state, wherein is informative through the maximisation of the value of two functions above a certain threshold value:
MaxP: p−(3p2)+(4p3)−(2p4)
HET rate: 2pq wherein p and q are, respectively, the allele frequencies of the reference and alternative alleles for each SNP.
6. The method according to any one of claims 1 to 5, wherein the second SNP selection process comprises, firstly, that the SNPs of the matrix M of the block-region are organised in groups of high correlation based on the pairwise r.sup.2 criterion; wherein the pairwise r.sup.2 value is calculated from the allele frequency calculated for the matrix M.
7. The method according to claim 6, wherein the SNPs of different groups will present low correlation, wherein two SNPs will belong to the same group only when the pairwise r.sup.2 therebetween exceeds a certain threshold value set by the user.
8. The method according to any one of claims 1 to 7, wherein the selection of tagSNPs within each group is made based on the detection limit (LD) criterion, starting with k=1 SNPs and studying all possible k-combinations, organising the SNPs within each group.
9. The method according to any one of claims 6 to 8, wherein if a SNP does not exceed the r.sup.2 or LD thresholds it will be considered in one group only and taken as tagSNP by itself.
10. The method according to any one of claims 1 to 9, wherein in the third validation process a genomic database is used where subjects are randomly chosen to perform 300 crosses, after which the number of tagSNPs that were informative of each crossing is counted and the average is provided as informative data of the informative power.
11. A kit for the study of embryo mutations in in vitro reproduction processes, characterised in that it comprises, at least one electronic device with a processor or processors and a memory, wherein the memory stores instructions that when executed by the processor or processors cause the electronic device to execute the method according to any one of claims 1 to 10.
12. A computer program product with instructions configured to be executed by one or more processors that make the electronic device of the kit of claim 11 carry out the method according to any one of claims 1 to 10.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0050] What follows is a very brief description of a series of drawings that aid in better understanding the invention, and which are expressly related to an embodiment of said invention that are presented by way of a non-limiting example of the same.
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
DESCRIPTION OF A DETAILED EMBODIMENT OF THE INVENTION
[0057] The method for the study of embryo mutations in in vitro reproduction processes object of the present invention can be divided into three sequential and differentiated processes. The method of the invention combines several tagSNP selection techniques, since it calculates the linkage disequilibrium correlations for the block of interest (by default, 4 Mb around the mutation, 1 Mb is equal to one million nucleotides). The SNPs present in this region, in turn, will be considered as in a block-free approach, such that all the correlations between SNPs are calculated and taken into account in the selection of the tagSNPs. In this way, the present invention selects polymorphisms that are highly likely to be informative, considering the allele frequencies of the same within the target population and whether or not they are part of the same haploblock (set of SNPs that are inherited together). A tagSNP is the SNP that is considered representative for the entire haploblock, that is to say, that if the tagSNP is in heterozygosis, for example, all the SNPs belonging to that same haploblock will be in heterozygosis. A tagSNP avoids having to analyse all the polymorphisms because by knowing how it behaves, it is possible to deduce how the rest of the polymorphs of the haploblock behave.
[0058] Therefore, the objective of the method of the invention (
[0059] More specifically, the method of the invention is divided into two basic processes and a third validation process. The first two processes do not have to necessarily be in this order: [0060] (i) A first SNP selection process: [0061] a. In this process, the values of the n candidate SNPs (t.sub.1 . . . t.sub.k) of each subject x, in a chromosomal region of interest and specifically extracted for the studied population, are taken as an input. SNPs that are biallelic are selected, such that subjects can be represented as length haplotypes m formed by binary strings {1,0}, being 1|0 and 0|1 the values for heterozygous SNPs and 0|0 and 1|1 the values for the homozygotes. This is done in the entire chromosomal region of interest that is defined as any position that is 2 Mb (it is a default value that can be modified) upstream and 2 Mb downstream of the gene/mutation that is to be studied. [0062] b. Subsequently, the n candidate SNPs of the region are analysed and those that meet any of the following conditions are excluded: [0063] i. SNPs with more than one alternative allele (non-biallelic SNPs) [0064] ii. SNPs whose alleles are different from the change of a single nucleotide (indels, changes of polynucleotide pattern, among others) [0065] iii. SNPs that are homozygous in at least 99% of the population of interest [0066] iv. Uncommon SNPs, that is, whose minor allele frequency is less than 1% [0067] c. The next step is to maximise the situation in which one of the parents has the value of an SNP in a heterozygous state, while in the other parent it is presented as a homozygote, that is, it is informative. This is achieved through the maximisation of the value of two functions above a certain threshold value:
MaxP: p−(3p2)+(4p3)−(2p4)
HET rate: 2pq wherein p and q, respectively, are the allele frequencies of the reference and alternative alleles for each SNP. These are the equations of the Hardy-Weinberg equilibrium and its derivative. [0068] d. The output of this algorithm will be a panel of z optimised SNPs for both values in the form of matrix M whose columns correspond to the subjects of the population and the rows to the values of each SNP for each subject. [0069] (ii) A second SNPs selection process: [0070] a. Through an exhaustive search, all the SNP combinations are evaluated to obtain a minimum set t of tagSNPs from the matrix M obtained in point (i). [0071] b. First, the SNPs of the matrix M of the block-region are organised in groups of high correlation based on the pairwise r.sup.2 criterion. To do this, the pairwise r.sup.2 value is calculated from the allele frequency calculated for the matrix M. In this way SNPs from different groups will present low correlation, such that two SNPs will belong to the same group only when the pairwise r.sup.2 therebetween exceeds a certain threshold value (set by the user). [0072] c. After this, the selection of tagSNPs within each group is made based on the LD criterion, starting with k=1 SNPs and studying all possible k-combinations, organising the SNPs within each group. [0073] d. Assuming two SNPs whose frequencies are p (most frequent allele) and q=1−p, the following equations are used:
[0075] As indicated,
EXAMPLE 1
Diagnosis of Aneuploidies
[0076] In
[0077] The input of the software will be the 69473 SNPs contained in the block-region chr13:9181319-11681319. The output of this algorithm will be a matrix M of 1625 candidate SNPs, which will act as input for the SNP selection algorithm, whose output will be a panel of 283 tagSNPs. In the validation phase, it was found that on average 49% of tagSNPs in the panel were informative.
[0078] Wet laboratory protocol. Once selected the polymorphisms that are to be sequenced, positions are entered into the corresponding enrichment platform. Preferably, Ion Ampliseq. This platform designs the primers needed to capture the regions. In the IVF laboratory, produced embryos are biopsied when they reach the blastocyst stage. The biopsy is placed in a PCR tube and sent to the laboratory. At the laboratory, the DNA is amplified using, for example, Ion Reproseq, so that, in addition to the amplification, the library is made for PGD-A. Following the appropriate protocol, all the regions designed with Ampliseq in the previously amplified material are amplified, and the library for PGD-M is produced. Subsequently, massive sequencing is carried out. It is important to keep in mind that in order to be able to simultaneously sequence multiple samples, it is necessary to mark the samples with a molecular barcode. Special care must be taken in that the barcodes do not coincide between the samples.
[0079] Data analysis. Once the sequencing is finished, a series of files with readings for the whole embryo (PGD-A) and other files with the detected polymorphisms (PGD-M) are obtained. A bioinformatic analysis must be carried out with these files. With the first files, the aneuploidies are determined using the most appropriate software according to the platform. With the second files, the segregation pattern of each polymorphism is determined for the PGD-M analysis. Firstly, the obtained readings are aligned to the reference genome, and polymorphisms are identified in each and every sample, including patients, relatives used as a reference and embryos. In the case of relatives, the simplest situation is that in which we have the couple and an affected child. For SNP phasing, it is necessary to determine which polymorphisms are shared in the trio and in this way, to find out which ones segregate with the healthy allele and which ones segregate with the pathogenic allele. Since biallelic SNPs have been selected, each of the samples can be 0/0, 0/1, 1/1 if they are homozygous for the wild SNP, heterozygous or homozygous for the alternative SNP, respectively. The number 0 indicates that it is the reference SNP (whatever it is) while 1 indicates that it is the alternative SNP. This is true for all chromosomes, except sex chromosomes, in which women can be homozygous or heterozygous, while men are always hemizygotes (0 or 1). With this data, the SNP phasing is then carried out. To do so, those SNPs that are informative in the couple are analysed. Informative SNPs are those in which one of the patients is heterozygous (0/1) and the other homozygous (0/0 or 1/1). The polymorphism that is used for phasing is that which is different in the heterozygous subject. For example, if we have a subject 0/1 and the other 1/1, the polymorphism that we will use for phasing is 0. By comparing one or more subjects in the family, it is arbitrarily determined which allele each one belongs to. An example of SNP phasing would be the following: [0080] Considering the situation in which there is a couple with a child, each with their alleles. It is considered that the father has the alleles P1 and P2, the mother M1 and M2. Logically, the child will have inherited one allele from each, for example, P1 of the father and M1 of the mother. If when analysing polymorphisms, the result is that the father is 0/1, the mother 1/1 and the child 0/1, it will be determined that the polymorphism 0 belongs to the P1 allele, which is shared between father and child, since both have said polymorphism. In another case, the father may be 0/1, the mother 1/1 and the child 1/1. In this case, the polymorphism 0 must necessarily belong to the P2 allele of the father, since it is not shared between father and child. We process all the polymorphisms that are informative for the mother in a similar manner. [0081] Once the haplotypes of the parents are determined, the analysis of the embryos is carried out. To do so, the informative SNP in each one of the embryos is identified. For example, if polymorphism 0 belongs to the P1 allele, if an embryo has said polymorphism, it means that it has said haplotype. Indeed, the informative SNPs in the different embryos are identified, thus determining the haplotype of each of them.
[0082] An example of the SNP phasing process is shown in
[0083] Once SNP phasing has been carried out with relatives, the pattern of polymorphisms must be compared with the sequenced embryos, and in this way, it will be determined which embryos are carriers and which are normal. In
[0084] The SNP phasing algorithm is, moreover, able to identify the different possible sources of error and alert the analyst so that he/she can weigh and analyse them. There are different sources of error. In
EXAMPLE 2
Identification of Triploid Embryos
[0090] Triploid embryos are a major problem in any IVF cycle. They account for 15% of miscarriages due to chromosomal abnormalities. Triploid embryos should always be discarded from any in vitro fertilisation cycle, but it is difficult to identify them because there are no differences in embryo quality with respect to normal embryos. Sometimes, it is possible to distinguish them because in D+1 three pronuclei are observed, but it is not always possible. The triploid embryos may be of a dyspermic origin (in cases of IVF) or be originated by an oocyte failure when the second polar corpuscle is not extruded.
[0091] Triploid embryos cannot be identified by ordinary PGD-A techniques, despite being a numerical anomaly. Sometimes, through visual inspection, it is possible to detect embryos 46, XXY when observing an abnormal distribution of the readings of the sex chromosomes, but it is not always possible and requires trained personnel.
[0092] The method herein described can be used to identify this type of embryo. Informative polymorphisms can be selected along the genome and it can be determined whether they are triploids by analysing the polymorphisms present and the frequency thereof. Normally, a polymorphism in heterozygosis should be found in a proportion of around 0.5, since half of the readings will correspond to one allele and half to another. A triploid embryo has three alleles, so this proportion will be diverted. Thus, the result can be three polymorphisms for the same position (if they are multiallelic) or two polymorphs but one of them with frequency over 33% and the other over 66%. If all polymorphisms with sufficient readings follow this pattern throughout the entire genome, this means that the embryo is triploid.
EXAMPLE 3
Identification of Embryos with Balanced Translocations
[0093] Sometimes, some couples decide to undergo in vitro fertilisation cycles because one of them is a carrier of a balanced translocation. In these cases, these parents have a high reproductive risk, since 50% of their embryos will have an unbalanced translocation as a result of inheriting one of the altered chromosomes. Furthermore, there will be a 25% chance of producing completely normal embryos, and a 25% chance of producing embryos with the balanced alteration.