METHOD FOR THE QUALITY CONTROL OF SEED LOTS

Abstract

The invention relates to a method for the quality control of the varietal purity of seed lots by analysing sub-lots of the seeds, said control being carried out by sequencing the genes of interest.

Claims

1. A method for determining the quantity of contaminants at at least one locus of interest, present in a seed lot of a variety of interest comprising: a) grouping seeds from a seed lot into sub-lots of at least 10 seeds, the number of sub-lots so obtained being greater than or equal to 10, b) performing targeted sequencing of at least the region of the seed genome containing the locus of interest for each sub-lot, c) qualitatively determining the presence of a contaminant for each sub-lot by detection of an allele alternative to the expected allele(s) for each sequenced genomic region (presence/absence of the expected allele(s)), and d) determining the quantity of contaminants in the overall lot by compiling the qualitative results obtained for all sub-lots.

2. The method according to claim 1, wherein the sequencing of step b) is performed on the DNA extracted from the seeds present in a sub-lot, the region of the seed genome containing the locus of interest being optionally amplified.

3. The method according to claim 1, wherein steps b), c) and d) are carried out for several regions of the genome corresponding to several loci of interest.

4. The method according to claim 3, wherein a subset of these loci of interest is sufficient to identify the variety of interest.

5. The method according to claim 4, wherein a lot is declared as containing a contaminant if an allele alternative to the expected allele(s) is observed for a single locus of interest.

6. The method according to claim 4, wherein a lot is declared as containing a contaminant if an allele alternative to the expected allele(s) is observed for more than one locus of interest.

7. The method according to claim 1, wherein at least one locus of interest is linked to a trait of interest.

8. The method according to claim 3, wherein a combination of loci is linked to characters of interest (trait).

9. The method according to claim 3, wherein a combination of loci is linked to a character of interest (trait).

10. The method according to claim 1, wherein at least one locus of interest is linked to a specific trait a priori not present in the seeds of the batch, in order to detect the fortuitous presence of this trait.

11. The method according to claim 10, wherein the lot is considered to be non-compliant if the frequency of the trait is greater than 10% in the seed lot.

12. The method according to claim 1, wherein i) RNA is extracted from the seeds of the sub-lot and reverse transcribed into cDNA prior to step b), ii) sequencing of this cDNA is performed using primers specific to genes related to an agronomic property of the seeds, at the same time as the sequencing of step b) is performed, iii) the presence of seeds with the agronomic property is qualitatively determined for each sub-lot, in case of detection of cDNA relating to the specific genes of the agronomic property of the seeds in the sequencing step (ii) (presence/absence of cDNA), and iv) the quantity of seeds with this agronomic characteristic in the overall lot is determined by compiling the qualitative results obtained for all sub-lots in (iii).

13. The method according to claim 12, wherein the agronomic property of the seeds is selected from state of dormancy, priming quality, germination ability, vigor, and viability of the seeds.

14. The method according to claim 1, wherein i) DNA sequencing of the sub-lots is carried out using primers specific to one or more species different from those of the seeds present in the sub-lot, at the same time as the sequencing of step b) is performed, ii) the presence of seeds of different species is determined qualitatively for each sub-lot, in case of detection of genes belonging to the said species (presence/absence of genes specific to other species), and iii) the quantity of exogenous seeds in the overall lot is determined by compiling the qualitative results obtained for all sub-lots in ii).

15. The method according to claim 14, wherein at least one different species is a weed.

16. The method according to claim 1, wherein i) sequencing of DNA or cDNA contained in the sub-lots using pathogen species-specific primers is carried out at the same time as the sequencing of step b) is performed, ii) the presence or absence of DNA of the pathogenic species is determined for each sub-lot if sequences belonging to those pathogenic species are detected, or iii) the conclusion as to the contamination of the lot is based on the presence of sequences belonging to the said pathogenic species.

17. The method according to claim 16, wherein the pathogenic species is a bacterium, a fungus, a virus, or an insect.

18. The method according to claim 1, wherein before step b) i) DNA is extracted from each sub-lot of seeds, ii) RNA is extracted from each seed sub-lot and reverse transcribed into cDNA, iii) the DNA extracted in i) and the cDNA obtained in ii) are mixed, iv) optionally, an amplification is performed on the DNA obtained in iii), specific to certain loci, or non-specific, and v) the DNA obtained in iii) or the amplification products obtained in iv) are used as a template for the sequencing step.

19. The method according to claim 18, wherein step iv) is carried out by amplifying specific sequences of other organisms whose absence or presence is to be verified.

20. The method according to claim 18, wherein step iv) is carried out by amplifying specific sequences making it possible to determine certain agronomic properties of the seeds of the sub-lot.

21. The method according to claim 20, wherein at least one agronomic property of the seeds is selected from state of dormancy, priming quality, germination ability, vigor, and viability of the seeds.

22. The method according to claim 1, wherein the quantity of seeds in each sub-lot prepared in step a) is between 80 and 120.

23. The method according to claim 1, wherein the quantity of seeds in each sub-lot prepared in step a) is between 15 and 25.

24. The method according to claim 1, wherein the identification of the contaminant for each contaminated sub-lot is also carried out by i) inferring the molecular profile of the contaminant in a contaminated sub-lot by comparing the profile observed in that sub-lot with the profile expected in the absence of the contaminant, and by ii) comparing the profile obtained in i) with those of a reference database.

Description

DESCRIPTION OF THE FIGURES

[0135] FIG. 1: Taqman analysis result for a SNP, comprising two allelic forms detected respectively by the fluorochromes FAM and VIC, in maize samples homozygous (A, B) or heterozygous for the SNP (C). A: homozygous sample for the allelic form detected in FAM. B: homozygous sample for the allelic form detected in VIC. C: heterozygous sample for the allelic forms detected in FAM and VIC.

[0136] FIG. 2: Relative frequency, in each sub-lot, of the allele alternative for SNP10. Sub-lots 3, 14 and 16 show a significant frequency of the alternative allele.

[0137] FIG. 3: Qualitative profile (presence/absence of a contaminating allele). Profile of presence of an alternative allele for the 17 markers (row) (16 discriminatory markers and one marker associated with a trait) within the 16 sub-lots (column). The presence of an alternative allele is detected for at least 3 SNPs in sub-lots 3, 14 and 16. These sub-lots are declared contaminated. The remaining 13 sub-lots are declared uncontaminated.

[0138] FIG. 4: Molecular profiles obtained on the 17 SNPs (16 discriminatory markers and one marker associated with a trait) obtained from the 16 sub-lots analyzed. The profile of the first line corresponds to the main profile, the subsequent profiles to the contaminated profiles observed for lots 3, 14 and 16 respectively.

EXAMPLES

Example 1: Contaminant Detection by Taqman

[0139] This example evaluates the possibility of detecting a contaminating seed in a sub-lot of maize seed, by genotyping using the Taqman (Applied Biosystem) technology.

[0140] FIG. 1 shows the result of the Taqman analysis for a SNP, comprising two allelic forms detected respectively by the fluorochromes FAM and VIC, in maize samples that are homozygous or heterozygous to the SNP, and highlights the presence of a signal with the FAM probe in a sample that is homozygous for the VIC allele (B), i.e. a non-specific signal that does not distinguish a false positive signal from a signal related to real contamination in a sample.

[0141] These results show that the Taqman method does not reliably detect contaminants.

Example 2: Detection of Contaminants by Genotyping on a Chip

[0142] In this example, batches of 200 seeds from a line A containing 10%, 20%, 30%, 40%, and up to 90% contaminants from a line B were prepared and a sample of 15 seeds from this batch was analyzed by genotyping on an Infinium (Illumina) chip, in order to assess the feasibility of identifying a contamination. Contaminations higher than 10% can be detected, but mixtures containing 10% contamination are not distinguishable from uncontaminated controls. A fortiori, the less important contaminations will not be detectable.

Example 3: Implementation of the Method According to the Invention on a Set of Markers

[0143] In this example, a set of 16 discriminating markers (SNPs) was used, allowing the unambiguous identification of the presence of a variety other than the expected one. This set of 16 markers was defined from reference genotyping data on several thousand markers for the varieties of interest, and allows each variety to be differentiated from the others by at least 3 discriminatory markers. In this case, it is the overall molecular profile of the 16 markers that determines the identity of each variety. Each marker is specific to a locus of interest.

[0144] In an experiment under controlled contamination conditions, 24 seeds of a pure L1 line were introduced in a batch of 2376 seeds of a pure L2 line, the batch thus obtained has a 99% purity level, the seeds were randomly distributed in twenty-four sub-lots of 100 grains (i.e. 2400 analyzed grains). Each batch of seeds thus obtained was crushed independently and DNA was extracted from the crushed seeds. Thus, there is an average of 1 contaminant per batch: the number of sub-lots is indeed equal to the number of contaminants present in the complete seed batch. Due to the statistical random distribution, however, it is known that some sub-lots will not contain contaminants, and that other sub-lots will contain several contaminants, due to the sampling by forming the sub-lots.

[0145] For each of the 16 markers, an amplicon of 70 to 120 bp was defined, and the 16 markers were co-amplified by multiplex PCR. A unique index (TAG) is used for each DNA sample, allowing sequencing of all the amplicons and attribution of the sequences obtained to their original batch.

[0146] The amplicons have been sequenced by the. Illumina technology on a Miniseq sequencer. Paired sequences of 75 bases were generated, assigned to the original DNA by a demultiplexing step. After removal of adaptor sequences and of poor quality bases (Q30 threshold), each pair of sequences was reassembled into a single sequence and aligned to the reference maize genome (RefGenV4). For each SNP, the relative allele frequencies of the main and alternative allele were calculated, and correspond to the number of readings containing the allele of interest relative to the sum of the readings of each allele.

[0147] Contamination is considered to occur for an SNP marker if, in a sub-lot, the sequence of an allelic form, which is not that of the allele expected for the variety tested, appears to be greater than the background.

[0148] A sample is declared contaminated when it contains at least 3 SNPs for which an alternative allele is detected. Thus, it is concluded that, among these 24 sub-lots, 13 are considered contaminated and 11 are considered pure.

[0149] The number of contaminated sub-lots is used to estimate the varietal purity of the lot analyzed. This calculation is performed using the Seed Calc software, which uses the formulas of Remund (2001). In this example, the estimated purity is 99.22% (98.64%-99.6%), for a controlled true purity of 99%.

[0150] The estimation of the impurity {circumflex over (p)} of the batch is obtained according to the formula:

[00004] $\hat{p} = 1 - {(1 - \frac{d}{n})}^{\frac{1}{m}}$

[0151] In which n is the number of pools; m is the number of grains in a pool; d is the number of pools in which a contaminant has been identified.

[0152] In the above case: 1−(1−13/24).sup.0.01=1-0.9922=0.0078 or a purity of 99.22. The confidence interval is also calculated according to the procedures described in Remund 2001.

Example 4: Identification of the Contaminant

[0153] In this example, basic seed lots of maize were analyzed using the same approach as in Example 3. For one lot, 16 sub-lots of 100 seeds were formed.

[0154] The seeds from each sub-lot were crushed and the DNA extracted. A set of 17 markers was identified, including 16 discriminating SNPs (allowing unambiguous identification of the presence of a variety other than the expected one) and one marker associated with a trait. For each marker, a 70-120 bp amplicon was defined, and the 17 markers were co-amplified by multiplex PCR. A unique index (Tag) is used for each DNA sample, allowing the sequencing of all the amplicons and the attribution of the sequences obtained to their original batch.

[0155] The amplicons were sequenced using Illumina technology on a Miniseq sequencer. Paired sequences of 75 bases were generated and assigned to the original DNA by a demultiplexing step. After removal of adaptor sequences and of poor quality bases (Q30 threshold), each pair of sequences was reassembled into a single sequence and aligned to the reference maize genome (RefGenV4). For each SNP, the relative allele frequencies of the main and alternative allele were calculated, and correspond to the number of readings containing the allele of interest relative to the sum of the readings of each allele.

[0156] FIG. 2 shows, for an SNP (SNP10), the frequency of the alternate allele in each of the sub-lots (i.e. the frequency of occurrence of the alternate allele sequence). In this example, sub-blots 3, 14 and 16 show a significant presence of the alternate allele (above the background noise represented by the horizontal line). This analysis is performed for each SNP, and FIG. 3 shows the qualitative profile (presence/absence of the alternate allele) obtained for each SNP in each sub-lot. The presence of an alternative allele is confirmed for at least 3 SNPs in sub-lots 3, 14 and 16. These 3 sub-lots are declared contaminated. The remaining 13 sub-lots are declared uncontaminated. The varietal purity estimated with SeedCalc is 99.79% (95% confidence interval: 99.39%-99.96%).

[0157] In parallel, the same batch was analyzed on 558 individual seeds. For each seed, a fragment was taken by punching the embryo with a punch, then DNA was extracted and genotyped was performed using KASP technology (LGC Genomics) on 16 discriminatory markers. This analysis estimates a purity of 99.46% (95% confidence interval: 98.42%-99.89%).

[0158] The marker SNP17 was analyzed separately and makes it possible to estimate the purity of the associated trait.

[0159] FIG. 3 shows that sub-lots 3 and 16 show a significant frequency of the alternative allele. These 2 sub-lots are declared contaminated, leading to a line purity estimate of 99.87% (95% confidence interval: 99.52-99.98%).

[0160] The molecular profile identified on the non-contaminated sub-lots is first used to check its conformity with the expected profile for the analyzed variety (the previous step verifies the varietal purity of the batch, this step verifies that the identified variety is indeed the expected one). Then, on sub-lots 3, 14 and 16 showing contamination, a contaminant molecular profile is deduced from the observed molecular profile, by subtraction of the expected profile. For each SNP marker showing contamination, the 2 observed alleles are reported (FIG. 4). The contaminant can thus be homozygous for the minority allele, or heterozygous.

[0161] Each contaminant molecular profile is then compared with a reference database in order to identify it. If this genotype corresponds to a known accession, it is proposed as a potential contaminant, otherwise the contaminant genotype is declared non-identifiable.

[0162] This reference database can be refined according to the production plan, in particular this database will then contain as a priority all the varieties grown in the production sector of the line. And in this context, a contaminant which will not appear in this reference database will be qualified as a contaminant related to the post-harvest process.

Example 5: Implementation of the Method for Simultaneous Assessment of Varietal Purity and Germinative Quality of a Seed Lot

[0163] In this example, 16 sub-lots of 100 seeds are formed, so that the seed lot is evaluated on a sample of 1600 seeds. From each sub-lot, DNA and RNA are co-extracted.

[0164] For this purpose, each sub-lot is mechanically ground into a tube by adding stainless steel beads. The tubes and the grinding support are previously cooled in liquid nitrogen in order to preserve the integrity of the nucleic acids, in particular RNA. Co-extraction of DNA and RNA is performed using Macherey-Nagel's total DNA, RNA and protein isolation NucleoSpin® TriPrep kit. In a first step, a lysis buffer is added to the milled material, allowing the destruction of cell structures and the simultaneous inactivation of enzymes such as RNases. The lysates are then deposited on columns containing a silica membrane to which DNA and RNA molecules are attached. A first elution in a specific buffer elutes the DNAs while keeping the RNAs attached to the silica membrane. After a treatment with DNAse degrading DNA residues, the RNAs are washed and then eluted in RNAse free water.

[0165] For each sub-lot, a reverse transcription is performed, primed with oligo-dT oligonucleotides to synthesize double-stranded DNA complementary to the messenger RNAs present in each sample. A DNA mixture is then constituted for each sub-lot, composed of the extracted genomic DNA and the cDNAs synthesized from the RNA fraction.

[0166] A multiplex PCR is performed on each DNA sample in order to specifically amplify the targets of interest in the form of 70 to 120 bp amplicons. These amplicons correspond to the genomic regions of interest for the determination of the varietal identification molecular profile on the one hand (set of discriminant SNPs), and to the DOG1 gene, marker of the seed dormancy state on the other hand. A unique index (TAG) is used for each DNA sample, allowing sequencing of all the amplicons and attribution of the sequences obtained to their original sub-lot. Amplicons are sequenced using Illumina technology, generating paired sequences of 75 bases each. These sequences are then assigned to the original DNA by a demultiplexing step, and then undergo various treatments consisting of the removal of adaptor sequences and of poor quality bases (Q30 threshold). Each pair of sequences is finally assembled into a single sequence and aligned with the reference genome sequence.

[0167] For each SNP, the relative allele frequencies of the main and alternative alleles were calculated, and correspond to the number of readings containing the allele of interest relative to the sum of the readings of each allele. Contamination is considered to occur for an SNP marker if, in a sub-lot, the sequence of an allelic form, which is not that of the allele expected for the variety tested, appears to be greater than the background. A sample is declared contaminated when it contains at least 3 SNPs for which an alternative allele is detected. The number of contaminated sub-lots is used to estimate the varietal purity of the lot tested. This calculation is performed using the Seed Calc software which uses the formulas of Remund (2001).

[0168] With regard to the DOG1 gene, a sub-lot is considered to contain a dormant seed if specific transcript sequences of this gene are detected in an amount significantly different from the background, the expression of this gene being negligible in non-dormant seeds. This threshold of significance is previously determined using a standard range. The dormancy rate is then estimated by counting the number of sub-lots for which DOG1 gene expression is detected, using the calculation method previously used.

METHOD FOR THE QUALITY CONTROL OF SEED LOTS

Inventors

Cpc classification

Classification Explorer

C12Q2537/165

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2535/122

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2537/165

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2535/122

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2600/142

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6895

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12Q1/6895

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Abstract

Claims

Description