Application of ZM00001D012005 gene in regulating starch content of maize kernels
12433214 ยท 2025-10-07
Assignee
Inventors
- Xingming Fan (Yunnan, CN)
- Xiaoping Yang (Yunnan, CN)
- Fuyan Jiang (Yunnan, CN)
- Xingfu Yin (Yunnan, CN)
- Yaqi Bi (Yunnan, CN)
Cpc classification
International classification
A01H1/04
HUMAN NECESSITIES
A01H1/00
HUMAN NECESSITIES
Abstract
The disclosure relates to the field of molecular marker-assisted breeding of maize, and specifically an application of a Zm00001d012005 gene in regulating starch content of maize kernels. Specifically, an application of a gene related to starch content of maize kernels in molecular marker-assisted breeding of maize is provided in the disclosure. A sequence of the Zm00001d012005 gene is as shown in SEQ ID NO: 1. In the disclosure, genome-wide association study (GWAS) analysis and genetic linkage analysis are utilized to co-localize SNP_166371888 which is on chromosome 8 and significantly associated with kernel starch content, and a functional gene Zm00001d012005 that regulates the kernel starch content is further identified. The gene Zm00001d012005 can explain 10.19% of phenotypic variation in the kernel starch content.
Claims
1. A method for identifying a starch content of maize kernels, comprising extracting genomic DNA from maize mature seeds; subjecting the genomic DNA to genotyping; and determining a genotype of at 2724 bp locus from 5 terminal of a Zm00001d012005 gene sequence as shown in SEQ ID NO: 1 in maize, wherein an AA or AG genotype at the 2724 bp locus has a higher starch content of the maize kernels than a GG genotype.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
DETAILED DESCRIPTION
(22) For clearer objective, technical solutions and advantages of the disclosure, the technical solutions of the embodiment in the disclosure will be described clearly and completely by reference to the accompanying drawings of the embodiment in the disclosure below. Obviously, the embodiment described is only some, rather than all embodiments of the disclosure. On the basis of the embodiment of the disclosure, all other embodiments obtained by those ordinary skilled in the art without creative efforts are included in the scope of protection of the disclosure.
Embodiment 1
1. Experiment
(23) 1.1 Plant Materials and Trial Design
(24) Six excellent lines Ye107, CML384, CML395, YML46, YML32 and CML171 were taken as parents (Table 1), among which, Ye107 came from temperate regions, CML384 from subtropical regions, and CML395, YML46, YML32 and CML171 from tropical regions.
(25) YS(2319-2359N, 10335-10445E) and JH(2127-2236N, 10025-10131E) of Yunnan province in China were selected as trial sites.
(26) Ye107 served as a common parent in this study, which was crossed with five other lines, respectively. A single-seed descent method was employed from F1 to F9, to construct an MPP (pop1: Ye107CML384, pop2: Ye107CML395, pop3: Ye107YML46, pop4: Ye107YML32, and pop5: Ye107CML171).
(27) TABLE-US-00001 TABLE 1 Parental information Kernel starch Ecological content Parents Pedigree type (%) Ye107 Derived from US hybrid DeKalb Temperature 73.8 XL80 CML384 P502c1#-771-2-2-1-3-B-1-1-3-1(DH) Subtropical 75.0 CML395 90323B-1-B-1-B*4-1-1-2-1(DH) Tropical 74.3 YML46 SW1-1-1-2-1-2-1 Tropical 68.0 YML32 Suwan 1(S)C9-S8-346-2 (Kei 8902)-3- Tropical 69.8 4-4-6 CML171 G25QS4B-MH13-5-B-1-1-2-B-1-B-B- Tropical 68.1 B-1-1-6-1(DH)
(28) A completely randomized block design was employed in the experiment, with three replicates at each site. A field trial plot was 3 meters long, with a row spacing of 0.70 meters, 14 plants per row, and two rows per plot. The trials were conducted at YS and JH in year 2022 and 2023.
(29) 1.2 Phenotypic Statistical Analysis
(30) A near infrared reflectance spectroscopy (NIRS, No. S-14105 Kungens Kurva, Sweden) was employed to quantify kernel starch content of 601 RILs. 30 seeds were randomly selected from each line for three repeated measurements, and a mean of the three measurements is taken as a final value. Additionally, BLUP values for the phenotypic data from three replicates in two environments were calculated using a mixed linear model (MLM). A mean, a standard deviation, and a coefficient of variation for 521 RILs were calculated using Excel 2019. A Shapiro-Wilk test was performed on the phenotypic data to assess whether they followed a normal distribution. The correlation of phenotypic data in three environments was visualized at http-shiplot.com.cn/home/index.html.
(31) 1.3 Genotyping-by-Sequencing (GBS)
(32) Firstly, genomic DNA was extracted from mature seeds using a PureLink DNA kit (Thermo Fisher Scientific, USA). The genomic DNA was fragmented using restriction enzymes PstI and MspI (New England BioLabs, Ipswich, USA), and adapters were ligated to terminals of the DNA fragments using T4 ligase (New England BioLabs, Ipswich, USA). Before the polymerase chain reaction (PCR) amplification, ligation products were pooled and purified using a QIAquick PCR purification kit (QIAGEN, Valencia, USA). Final PCR products were also purified using the QIAquick PCR purification kit, and a library concentration was measured using a Qubit 2.0 fluorometer and a Qubit dsDNA HS assay kit (Life Technologies). The library was sequenced on an Illumina NovaSeq 6000 platform (Illumina Inc., San Diego, USA) in a 150 bp paired-end sequencing mode. After sequencing, original data were filtered to remove the adapters and low-quality sequences.
(33) 1.4 SNP Identification, Filtration and Annotation
(34) Clean reads were compared with the maize B73 v4 genome using a BWA v0.7.17 tool to generate a bam file. SNP was extracted using GATK v4.1.4.0 software, and the clean reads were compared with the maize B73 v4 genome. SNP was extracted using Plink v1.9 software to filter out loci with a missing rate higher than 20% and SNPs with a minor allele frequency (MAF) lower than 5%, with parameters set to -geno 0.2 and -maf 0.05. The SNPs were annotated using ANNOVAR v2021-7-16 software to determine the regions of variation loci and the types of mutations on the genome.
(35) 1.5 Population Structure Analysis and LD Analysis
(36) TreeBeST v1.9.2 software was used for calculating a distance matrix, to construct a phylogenetic tree. Bootstrap values were obtained by means of 1000 calculations. Genome-wide complex trait analysis (GCTA) was used for performing PCA, and scatterplot3d was used to visualize the results. Admixture v1.3.0 was employed to preset K value, and population structure analysis was performed, and the results are visualized using ggplot2.
(37) PopLDdecay software was used for calculating the degree of LD (r.sup.2) between any two makers, and script Plot_OnePop.p1 of the PopLDdecay software was used to plot an LD decay plot.
(38) 1.6 Linkage Mapping and QTL Location
(39) Firstly, progeny genotyping was filtered on the basis of a completeness threshold of 0.8 and a segregation distortion threshold of 0.001 to obtain population markers. Subsequently, bins were created on the basis of the population markers (with bins created every 15 unlinked markers), and final population markers are obtained. Joinmap4.0 was used to order the bin markers for each population and to calculate the genetic distances between markers using a Kosambi function.
(40) A logarithm of the odds (LOD) threshold was determined to be 2.5 through 1000 random permutation tests (P<0.05). The QTL locations of starch content were determined using a composite interval mapping (CIM) method. If a genetic distance between intervals exceeding a threshold line was less than 10 cM, they were considered as a single interval.
(41) 1.7 GWAS and Haplotype Analysis of Candidate Genes
(42) On the basis of the mean of 521 starch content data in three environments and their BLUP values, in this study, GWAS was performed using the MLM in genome-wide efficient mixed model association (GEMMA). Population structure and genetic relationship were introduced as covariates to reduce errors, with the parameter set to 1 mm 1. SNP loci meeting or exceeding the significance threshold were extracted using bedtools v1.7. The results were visualized using CMplot v3.6.2.
(43) With reference to the maize B73 v4 reference genome sequence in the MaizeGDB genome browser (www.maizegdb.org/), candidate genes were predicted within a 10 kb region upstream and downstream of the significant SNPs. Functional annotations of the candidate genes were obtained by browsing the MaizeGDB and NCBI (www.ncbi.nlm.nih.gov/) databases. Finally, haplotype analysis for the candidate genes was performed using Haploview v4.2 software.
2. Results
(44) 2.1 Analysis of Kernel Starch Content
(45) The starch phenotype data of the five subpopulations are statistically analyzed, as shown in Table 2 below:
(46) TABLE-US-00002 TABLE 2 Statistical analysis of kernel starch content Coefficient Range of Popu- Environ- Standard of variation Heritability lation ment Mean deviation variation (%) (%) pop1 21YS 69.50 1.593 65.3-74.7 2.29 22YS 69.22 1.811 64.4-76.5 2.62 50.33 23JH 69.73 2.455 63.6-75.3 3.52 pop2 21YS 71.33 1.499 67.9-75.9 2.10 22YS 71.76 1.717 67.9-77.0 2.39 47.96 23JH 69.85 2.108 65.2-74.7 3.02 pop3 21YS 70.31 1.674 65.8-75.1 2.38 22YS 70.12 2.107 64.9-75.5 3.00 67.91 23JH 70.66 2.166 64.6-75.5 3.07 pop4 21YS 70.66 1.729 64.3-76.1 2.45 22YS 70.49 2.139 65.1-76.2 3.03 55.66 pop5 23JH 71.37 2.222 65.1-76.7 3.11 21YS 69.89 1.448 65.1-73.7 2.07 58.79 22YS 69.22 1.986 61.4-74.2 2.87 23JH 69.87 2.302 62.9-79.5 3.30
2.2 Population Structure Analysis and LD Analysis
(47) Population structure analysis results are shown in
(48) When r.sup.2 drops gradually, a genetic distance between loci is 10 kb, and the degree of association between loci tends to stabilize. These loci may contain genetic variations associated with a target trait, and therefore, in this study, the significant SNP and its 10 kb range upstream and downstream as the criteria for screening candidate genes (
(49) 2.3 QTL Location of Kernel Starch Content
(50) In this study, on the basis of high-density genetic linkage maps of the five subpopulations, significant QTLs associated with starch content of maize kernel are screened. For pop1, three significant QTLs in the 22YS environment are detected, including two significant QTLs, qSC2-1 and qSC4-1 (
(51) Many of the QTLs identified in this study, which are closely related to starch content of maize kernels, exhibit overlaps in different subpopulations. The overlapping QTLs are important for further investigation. It is found that QTL qSC1-2 identified in pop4 in the 21YS environment has the same interval with QTL qSC1-3 identified in the 22YS environment. Additionally, partial overlaps are observed between QTL intervals. QTL qSC2-1 identified in pop1 in the 22YS environment partially overlaps with the intervals of QTL qSC2-3 and qSC2-4 identified in pop4 in the 21YS and 22YS environments, respectively. QTL qSC4-1 identified in pop1 in the 22YS environment partially overlaps with the interval of QTL qSC4-3 identified in pop2 in the 23JH environment. QTL qSC7-1 identified in pop2 in the 21YS environment partially overlaps with the intervals of QTL qSC7-5 identified in pop4 in the 21YS environment and QTL qSC7-4 identified in pop3 in the 21YS environment. QTLs qSC1-2 and qSC1-3 identified in pop4 in the 21YS and 22YS environments, respectively, partially overlap with the interval of QTL qSC1-1 identified in pop3 in the 23JH environment. QTL qSC7-5 identified in pop4 in the 21YS environment completely overlaps with the interval of QTL qSC7-4 identified in pop3 in the 21YS environment. QTL qSC1-4 identified in pop4 in the 22YS environment partially overlaps with the interval of QTL qSC1-1 identified in pop3 in the 23JH environment. QTL qSC1-7 identified in pop5 in the 23JH environment partially overlaps with the intervals of QTLs qSC1-2 and qSC1-3 identified in pop4 in the 21YS and 22YS environments, respectively. QTL qSC1-6 identified in pop5 in the 22YS environment partially overlaps with the intervals of QTL qSC1-4 identified in pop4 in the 22YS environment and the QTL qSC1-1 identified in pop3 in the 23JH environment.
(52) TABLE-US-00003 TABLE 3 Significant QTL of kernel starch content Phenotypic Interval variation mapping explained Population Environment QTL Chromosome Threshold (bp) (PVE, %) pop1 22YS qSC 2 3.10 3600538 11.42 2-1 2-95882685 22YS qSC 4 2.58 3091019 9.37 4-1 4-38257069 23JH qSC 5 2.85 1766221 10.84 5-1 31-189983598 21YS qSC 4 4.62 8234598 15.34 4-2 3-93625301 21YS qSC 7 4.05 8627244 16.91 7-1 8-166108588 pop2 22YS qSC 5 2.92 1357856 10.49 5-2 67-138001307 22YS qSC 7 2.61 8148643 10.00 7-2 8-83974624 23JH qSC 2 3.42 1440342 10.83 2-2 11-147190981 23JH qSC 3 3.31 2846109 10.45 3-1 0-106275349 23JH qSC 4 2.78 3496220 10.33 4-3 2-36255362 23JH qSC 4 3.14 1439315 14.45 4-4 16-148426883 23JH qSC 7 2.84 1771284 9.51 7-3 47-179180904 21YS qSC 4 3.27 2014982 14.08 4-5 47-203568347 pop3 21YS qSC 7 5.62 1338155 26.15 7-4 48-168488752 23JH qSC 1 6.25 9173370 25.04 1-1 6-176336688 23JH qSC 8 3.70 1503922 12.17 8-1 18-181122637 pop4 21YS qSC 1 4.72 1619624 17.61 1-2 46-190891876 21YS qSC 2 3.18 3066481 11.39 2-3 8-36423523 21YS qSC 7 2.74 1474807 9.67 7-5 30-150363074 22YS qSC 1 3.83 1619624 14.44 1-3 46-190891876 22YS qSC 1 3.44 8269510 12.36 1-4 0-101538733 22YS qSC 2 2.98 5200902 10.94 2-4 5-56430250 23JH qSC 1 2.65 2100700 9.92 1-5 4-47132971 pop5 22YS qSC 1 4.41 8249926 17.00 1-6 3-92972017 22YS qSC 9 3.09 8697493 10.77 9-1 2-92313888 23JH qSC 1 2.72 1781690 24.28 1-7 65-179379188
2.4 GWAS of Kernel Starch Content
(53) GWAS is conducted using 582663 high-quality SNPs in combination with the mean starch content value of 521 RILs of the MPP in three environments. Additionally, GWAS is performed using the BLUP values of starch content in all subpopulations. The MLM model in GEMMA is employed to identify loci associated with kernel starch content. In the GWAS, population structure and genetic relationship matrices are used as covariates to mitigate false positives. In the 21YS environment, two significant SNPs are identified on chromosomes 5 and 8, explaining 11.23% and 10.19% of phenotypic variance, respectively (
(54) Given that LD decay analysis shows that the physical distance between loci decays at 10 kb, candidate genes within 10 kb upstream and downstream of the significant SNP are screened, ultimately identifying 14 candidate genes potentially associated with the starch content of maize kernels (Table 5).
(55) TABLE-US-00004 TABLE 4 Significant SNP of kernel starch content Position PVE Environment SNP Chromosome (bp) Mutatuib (%) Theshold 21YS 5_97046470 5 97046470 G/C 11.23 5.33 8_166371888 8 166371888 G/A 10.19 5.24 5_96705777 5 96705777 A/G 10.41 5.23 5_97026470 5 97026470 G/C 10.29 5.36 5_98879482 5 98879482 T/A 10.49 5.08 22YS 5_129613503 5 129613503 C/A 8.23 5.20 5_138562866 5 138562866 G/C 6.55 5.03 5_1473351276 5 147335276 G/A 10.22 5.50 8_178656036 8 178656036 T/A 5.72 5.33 23JH 1_54575694 1 54575694 G/A 4.86 5.30 2_11478963 2 11478963 A/T 4.38 5.74 BLUP 6_137604184 6 137604184 C/T 8.89 5.48
(56) TABLE-US-00005 TABLE 5 Candidate genes for kernel starch content on the basis of GWAS SNP Candidate genes Chromosome Start & Ebd Functional annotation 5_97046470 Zm00001d015551 5 97049470- / 97050003 8_166371888 Zm00001d012005 8 166369165- Histidine kinase 166375273 5_96705777 Zm00001d015545 5 96698291- Protein phosphatase 2C 96698985 Zm00001d015546 5 96701931- / 96707760 5_97046470 Zm00001d015551 5 97049470- / 97050003 Zm00001d015571 5 98866461- / 98875915 5_98879482 Zm00001d015572 5 98876242- / 98877021 5_129613503 Zm00001d015891 5 129629958- Protein LRKS7 129632006 5_138562866 Zm00001d016000 5 138562425- Myb-related protein 3R-1 138590869 5_147335276 Zm00001d016152 5 147337518- / 147343809 Zm00001d012685 8 178635302- Mitochondrial import inner 178641627 membrane translocase subunit TIM50 8_178656036 Zm00001d012686 8 178642508- / 178644235 Zm00001d012687 8 178645606- Triglyceride lipase 178650028 1_54575694 Zm00001d029008 1 54577150- O-fucosyltransferase family 54580933 protein 2_11478963 Zm00001d002378 2 11470204 Cationic transporter HKT7 11474115
2.5 Integration of QTL Location and GWAS to Reveal Candidate Genes
(57) In this study, QTL location and GWAS analysis are employed to identify loci associated with kernel starch content. The comparison of the two analysis results shows that the candidate SNIP 8_166371888 located on chromosome 8 and identified by GWAS in the 21YS environment overlaps within the QTL interval of qSC8-1 mapped in pop3 in the 23JH environment (Table 6). Similarly, another important SNP 8_178636036 located on chromosome 8 and identified by GWAS in the 22YS environment falls within the QTL interval of qSC8-1 identified in pop3 in the 23JH environment (Table 6). On the basis of the co-localization analysis, four candidate genes (Zm00001d012005, Zm00001d012685, Zm00001d012686, and Zm0000d012687) are identified as potentially related to the starch content of maize kernels (Table 6). Zm00001d012005 is located on SNP 8_166371888. Zm00001d012685 is located on SNP 8_178636036. Zm00001d012686 and Zm00001d012687 are located nearby SNIP 8_178636036. Functional annotations of the candidate genes are performed using the NCBI and MaizeGDB databases, and the results show that Zm00001d012005 encodes the histidine kinase, Zm00001d012685 encodes the mitochondrial import inner membrane translocase subunit TIM50, and Zm00001d012687 encodes the triacylglycerol lipase.
(58) TABLE-US-00006 TABLE 6 Candidate genes for kernel starch content co-localized by QTL and GWAS Functional Candidate genes Chromosome QTL SNP Start & End annotation Zm00001d012005 8 qSC8-1 8_166371888 166369165- Histidine kinase 166375273 Zm00001d012685 8 qSC8-1 8_178656036 178635302- Mitochondrial 178641627 import inner membrane translocase subunit TIM50 Zm00001d012686 178642508- / 178644235 Zm00001d012687 178645606- Histamine kinase 178650028
2.6 Haplotype Analysis
(59) Haplotype analysis shows that in 521 RILs, Zm00001d012005 (a gene sequence of Zm00001d012005 is as shown in SEQ ID NO: 1) has two haplotypes: Hap1(G) and Hap2(A). In 521 RILs, the distribution frequency of Hap1 is 254, and the distribution frequency of Hap2 is 29 (
(60) The embodiment described above is merely used for illustrating the technical solutions of the disclosure, rather than limiting the disclosure. Although the disclosure is described in detail by reference to the foregoing embodiment, it is to be understood by those ordinary skilled in the art that the technical solutions in each embodiment can still be modified or some technical features can be replaced equivalently, and those modifications or replacements cannot make the essence of the corresponding technical solutions out of the spirit and scope of the technical solutions in each embodiment of the disclosure.