Whole seed specific promoter

Abstract

The present invention is concerned with the provision of means and methods for gene expression. Specifically, it relates to a polynucleotide comprising an expression control sequence which allows for seed specific of a nucleic acid of interest being operatively linked thereto in plants. Furthermore, vectors, host cells, transgenic plants and methods for expressing nucleic acids of interest are provided which are based on the said polynucleotide.

Claims

1. A polynucleotide comprising: (i) an expression control sequence which allows for seed specific expression in a plant of a nucleic acid sequence of interest being operatively linked thereto, said expression control sequence being selected from the group consisting of: (a) an expression control sequence comprising the nucleic acid sequence of SEQ ID NO: 123 or 124, (b) an expression control sequence comprising a nucleic acid sequence that has at least 97% nucleic acid sequence identity to the nucleic acid sequence of SEQ ID NO: 124, and (c) an expression control sequence comprising a nucleic acid sequence that has at least 97% nucleic acid sequence identity to the nucleic acid sequence of SEQ ID NO: 123; and (ii) at least one nucleic acid sequence of interest operatively linked to and heterologous to the expression control sequence, and wherein the expression control sequence allows for specific expression of the at least one nucleic acid sequence of interest in the whole seed of a monocotyledonous plant.

2. The polynucleotide of claim 1, wherein said polynucleotide further comprises a first intron of a plant gene encoding a Metallothionin 1 polypeptide.

3. A vector comprising the polynucleotide of claim 1.

4. The vector of claim 3, wherein said vector is a T-DNA vector.

5. A host cell comprising: (a) the polynucleotide of claim 1; or (b) a vector comprising the polynucleotide of claim 1.

6. The host cell of claim 5, wherein said host cell is a monocotyledonous plant cell.

7. A transgenic monocotyledonous plant or monocotyledonous plant seed thereof comprising: (a) the polynucleotide of claim 1; or (b) a vector comprising the polynucleotide of claim 1.

8. A method for expressing a nucleic acid sequence of interest in a host cell comprising (a) introducing the polynucleotide of claim 1 or a vector comprising said polynucleotide into the host cell; and (b) expressing the at least one nucleic acid sequence of interest in said host cell.

9. The method of claim 8, wherein said host cell is a monocotyledonous plant cell.

10. A method for expressing a nucleic acid sequence of interest in a monocotyledonous plant or monocotyledonous seed thereof comprising: (a) introducing the polynucleotide of claim 1 or a vector comprising said polynucleotide into a monocotyledonous plant or monocotyledonous seed thereof; and (b) expressing at least one nucleic acid sequence of interest in said monocotyledonous plant or monocotyledonous seed thereof.

11. The polynucleotide of claim 2, wherein said first intron comprises the nucleic acid sequence of SEQ ID NO: 119.

12. The polynucleotide of claim 1, wherein the expression control sequence comprises the nucleic acid sequence of SEQ ID NO: 123.

13. The polynucleotide of claim 1, wherein the expression control sequence comprises the nucleic acid sequence of SEQ ID NO: 124.

14. The vector of claim 3, wherein the expression control sequence comprises the nucleic acid sequence of SEQ ID NO: 123 or SEQ ID NO: 124.

15. The host cell of claim 5, wherein the expression control sequence comprises the nucleic acid sequence of SEQ ID NO: 123 or SEQ ID NO: 124.

16. The transgenic monocotyledonous plant or monocotyledonous plant seed of claim 7, wherein the expression control sequence comprises the nucleic acid sequence of SEQ ID NO: 123 or SEQ ID NO: 124.

17. The method of claim 8, wherein the expression control sequence comprises the nucleic acid sequence of SEQ ID NO: 123 or SEQ ID NO: 124.

18. The method of claim 10, wherein the expression control sequence comprises the nucleic acid sequence of SEQ ID NO: 123 or SEQ ID NO: 124.

Description

FIGURES

(1) FIG. 1. Sequence of KG_Fragment 86 (SEQ ID NO: 10)

(2) FIG. 2. Sequence of 62/260,557.f_o13_1 Maize (SEQ ID NO: 11)

(3) FIG. 3. q-RT-PCR results showing whole seed-specific expression of 62/260,557.f_o13_1 Maize. [Root_dv: a mixture of roots at 5, 15, 30 days after pollination (DAP); Leaf_dv: a mixture of leaves at 5, 15, 30 DAP; Ear a mixture of ear at 5 and 10 DAP; whole seeds: a mixture of whole seeds at 15, 20, 30 DAP; Endosperm: a mixture of endosperm at 15, 20, 30 DAP; Embryo: a mixture of embryo at 15, 20, 30 DAP; Root_V2+V4: a mixture of root at V2 and V4 stages; Shoot/leaf_V2+V4: a mixture of V2 shoot and V4 leaves; Flower_GS: a mixture of flower and geminating seeds.]

(4) FIG. 4. The corresponding CDS sequence of the KG_Fragment 86 (SEQ ID NO:4)

(5) FIG. 5. Amino acid sequence of the deduced protein of the corresponding gene of KG_Fragment 86 (SEQ ID NO: 5)

(6) FIGS. 6A, 6B and 6C, combined. The sequence of AZM5_7833 (SEQ ID NO: 128) containing the predicted CDS sequence and the upstream promoter region. The 5′ UTR (127 bp) was determined by comparing the genomic sequence to the maize EST sequence and is indicated in italic, the predicted open reading frame is underlined, and the primers used to isolated the promoter region is in bold.

(7) FIG. 7. Sequence of Promoter KG86 (p-KG86) (SEQ ID NO: 1) FIG. 8. Diagram of vector RKF126

(8) FIGS. 9A, 9B, 9C, 9D, 9E, and 9F, combined. Sequence of RKF126 (SEQ ID NO: 56)

(9) FIG. 10. GUS expression in different tissues at different developmental stages driven by p-KG86 in transgenic maize with RKF126

(10) FIGS. 11A, 11B, and 11C. 11A) Sequences of ZM1s61973481 (SEQ ID NO: 57), 11B) ZM1s61221800 (SEQ ID NO: 58) and 11C) ZM1s62042561 (SEQ ID NO: 59)

(11) FIG. 12. q-RT-PCR results showing whole seed-specific expression of MAWS42 [Root_dv: a mixture of roots at 5, 15, 30 days after pollination (DAP); Leaf_dv: a mixture of leaves at 5, 15, 30 DAP; Ear: a mixture of ear at 5 and 10 DAP; whole seeds: a mixture of whole seeds at 15, 20, 30 DAP; Endosperm: a mixture of endosperm at 15, 20, 30 DAP; Embryo: a mixture of embryo at 15, 20, 30 DAP; Root_V2+V4: a mixture of root at V2 and V4 stages; Shoot/leaf_V2+V4: a mixture of V2 shoot and V4 leaves; Flower_GS: a mixture of flower and geminating seeds.]

(12) FIG. 13. q-RT-PCR results showing whole seed-specific expression of MAWS45 [Root_dv: a mixture of roots at 5, 15, 30 days after pollination (DAP); Leaf_dv: a mixture of leaves at 5, 15, 30 DAP; Ear: a mixture of ear at 5 and 10 DAP; whole seeds: a mixture of whole seeds at 15, 20, 30 DAP; Endosperm: a mixture of endosperm at 15, 20, 30 DAP; Embryo: a mixture of embryo at 15, 20, 30 DAP; Root_V2+V4: a mixture of root at V2 and V4 stages; Shoot/leaf_V2+V4: a mixture of V2 shoot and V4 leaves; Flower_GS: a mixture of flower and geminating seeds.]

(13) FIG. 14. The corresponding CDS sequence of MAWS42 (SEQ ID NO: 6)

(14) FIG. 15. Amino acid sequence of the ZmTIP3-1 of the corresponding gene to MAWS42 (SEQ ID NO: 7)

(15) FIG. 16. The corresponding CDS sequence of MAWS45 (SEQ ID NO: 8)

(16) FIG. 17. Amino acid sequence of the corresponding gene to MAWS45 (SEQ ID NO: 9)

(17) FIGS. 18A, 18B, 18C, 18D and 18E. The sequences of AZM5_17960 (SEQ ID NO: 70; FIGS. 18A and 18B, combined) and AZM5_6324 (SEQ ID NO: 71; FIGS. 18C, 18D, and 18E, combined) containing the predicted CDS sequence (ATG bold underlined), the predicted 5′-UTR (italics), and the additional putative promoter sequence The 5′ UTR sequences were determined by comparing the genomic sequence to the maize EST.

(18) FIGS. 19A and 19B. Sequences of Promoter MAWS42 (p-MAWS42), SEQ ID NO: 2 (FIG. 19A) and promoter MAWS45 (p-MAWS45), SEQ ID NO: 3 (FIG. 19B).

(19) FIG. 20. Diagram of RTP1052

(20) FIGS. 21A, 21B, 21C, 21D, 21E, 21F, and 21G, combined. Sequence of RTP1052 (SEQ ID NO: 116)

(21) FIG. 22. Diagram of RTP1057

(22) FIGS. 23A, 23B, 23C, 23D, 23E, 23F, and 23G, combined. Sequence of RTP1057 (SEQ ID NO: 117)

(23) FIG. 24. GUS expression in different tissues at different developmental stages driven by p-MAWS42 in transgenic maize with RTP1052

(24) FIG. 25. GUS expression in different tissues at different developmental stages driven by p-MAWS45 in transgenic maize with RTP1057

EXAMPLES

(25) The invention will now be illustrated by the following Examples which are not intended, whatsoever, to limit the scope of this application.

Example 1: Identification and Validation of Maize Whole Seed Promoter KG86

(26) Identification of Transcript of KG86

(27) A maize gene expression profiling analysis was carried out using a commercial supplier of AFLP comparative expression technology (Keygene N.V., P.O. Box 216, 6700 AE Wageningen, The Netherlands) using a battery of RNA samples from 23 maize tissues generated by BASF (Table 1). Among the AFLP bands that were identified as having whole seed specific expression was a 231 bp fragment designated “KG_Fragment 86”. The sequence of KG_Fragment 86 is shown in FIG. 1.

(28) TABLE-US-00001 TABLE 1 Corn Tissues used for mRNA expression profiling experiment Sample Timing and Days after No. Tissue number of plants Pollination 1 Root 9 am (4 plants) 5 2 9 am (4 plants) 15 3 9 am (4 plants) 30 4 leaf above the ear 9 am (6 plants) 5 5 9 am (6 plants) 15 6 9 am (6 plants) 30 7 ear complete 9 am (6 plants) 5 8 9 am (6 plants) 10 9 Whole seed 9 am (6 plants) 15 10 9 am (6 plants) 20 11 9 am (6 plants) 30 12 Endosperm 9 am (6 plants) 15 13 9 am (6 plants) 20 14 9 am (6 plants) 30 15 Embryo 9 am (6 plants) 15 16 9 am (6 plants) 20 17 9 am (6 plants) 30 18 Female pistilate 6 plants before pollination flower 19 germinating seed 20 seeds imbibition for 3 days 20 root, veg. state V2 21 root, veg. state V4 22 leaf, veg. State V2 23 leaf, veg. State V4

(29) Identification of the Gene Corresponding to KG_Fragment 86

(30) Sequence of KG_Fragment 86 was used as query for BLASTN searching against BASF's in-house database, HySeq AII EST. An accession, 62260557.f_o13_1 Maize, showing 97% identities to KG_Fragment 86 was identified as having the highest homology with KG_Fragment 86. The sequence of 62/260,557.f_o13_1 Maize is shown in FIG. 2.

(31) Confirmation of Expression Pattern of 62/260,557.f_o13_1 Maize Using Quantitative Reverse Transcriptase-Polymerase Chain Reaction (qRT-PCR)

(32) In order to confirm the native expression pattern of 62/260,557.f_o13_1 Maize, quantitative reverse transcription PCR (q-RT-PCR) was performed using total RNA isolated from the same materials as were used for the AFLP expression profiling (Table 1).

(33) Primers for qRT-PCR were designed based on the sequences of 62/260,557.f_o13_1 Maize and of KG_Fragment 86 using the Vector NTI software package (Invitrogen, Carlsbad, Calif., USA). Two sets of primers were used for PCR amplification of 62/260,557.f_o13_1 Maize (Table 2). The glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene served as a control for normalization purposes.

(34) TABLE-US-00002 TABLE 2 Primer sequences for q-RT-PCR Primer Sequence (SEQ ID NO) 62260557_Forward_1 CAGCTAGCGGCTTAGTCT (12) 62260557_Reversa_1 CTCTTCGCCTGGAGGTTC (13) 62260557_Forward_2 TGGTTTCATTGGATGCAGC (14) 62260557_Reverse_2 TGCAGTGCGAGTCAGAGA(15) GAPDH_Forward GTAAAGTTCTTCCTGATCTGAAT (16) GAPDH_Reverse TCGGAAGCAGCCTTAATA (17)

(35) q-RT-PCR was performed using SuperScript III Reverse Transcriptase (Invitrogen, Carlsbad, Calif., USA) and SYBR Green QPCR Master Mix (Eurogentec, San Diego, Calif., USA) in an ABI Prism 7000 sequence detection system. cDNA was synthesized using 2-3 ug of total RNA and 1 μL reverse transcriptase in a 20 uL volume. The cDNA was diluted to a range of concentrations (15-20 ng/uL). Thirty to forty ng of cDNA was used for quantitative PCR (qPCR) in a 30 uL volume with SYBR Green QPCR Master Mix following the manufacturer's instruction. The thermocycling conditions were as follows: incubate at 50° C. for 2 minutes, denature at 95° C. for 10 minutes, and run 40 cycles at 95° C. for 15 seconds and 60° C. for 1 minute for amplification. After the final cycle of the amplification, the dissociation curve analysis was carried out to verify that the amplification occurred specifically and no primer dimer product was generated during the amplification process. The housekeeping gene glyceraldehyde-3-phosphate-dehydrogenase (GAPDH, primer sequences in Table 2) was used as an endogenous reference gene to normalize the calculation using the Comparative Ct (Cycle of threshold) value method. The ΔCT value was obtained by subtracting the Ct value of GAPDH gene from the Ct value of the candidate gene (62/260,557.f_o13_1 Maize), and the relative transcription quantity (expression level) of the candidate gene was expressed as 2.sup.−ΔCt. The q-RT-PCR results are summarized in FIG. 3. Both primer sets gave the similar expression patterns that are equivalent to the expression patterns obtained from the AFLP data.

(36) Annotation of the KG_Fragment 86

(37) The coding sequence of KG_Fragment 86 was annotated based on the in silico results obtained from both BLASTX search of EST 62260557.f_o13_1 Maize sequence against GenBank protein database (nr) and the result of in silico translation of the sequence using Vector NTI software package. The EST 62260557.f_o13_1 Maize sequence encodes a partial protein with the highest homology to the rice gene annotated as hypothetical protein Osl_025737 (GenBank Accession: EAZ04505.1). The top 15 homologous sequences identified in the BlastX query are presented in Table 3.

(38) TABLE-US-00003 TABLE 3 BLASTX search results of the maize EST 62260557.f_o13_1 Accession Description Score E-value EAZO4505.1 hypothetical protein Osl_025737 Oryza 152 8e−36 sativa (indica cultivar-group)] hypothetical protein [Oryza sativa (japonica)] BAC222SG.1 hypothetical protein OsJ_023945[Oryza 152 8e−36 EAZ40462.1 sativa (japonica)] 146 5e−34 CAO61483.1 unnamed protein product [Vitis vinifera] 114 2e−24 ABK28018.1 unknown [Arabidopsis thaliana] 100 6e−20 NP_001117365.1 unknown [Arabidopsis thaliana] 100 6e−20 AAF99742.1 F17L21.26 [Arabidopsis thaliana] 100 6e−20 XP_001751813.1 predicted protein [Physcomitrella patens 75 1e−12 subsp. Patens] XP_001778474.1 predicted protein [Physcomitrella patens 74 5e−12 subsp. Patens] CAN72846.1 hypothetical protein [Vitis vinifera] 69 2e−10 predicted protein [Physcomitrella patens XP_0001763429.1 subsp. Patens] 67 6e−10 CAO14607.1 unnamed protein product [Vitis vinifera] 55 2e−06 NP_001067585.1 0s11g0241200 [Oryza sativa (japonica)] 52 1e−05 ABK28287.1 unknown [Arabidopsis thaliana] 51 3e−05 NP_198895.1 unknown protein [Arabidopsis thaliana] 51 3e−05

(39) The CDS sequence of KG_Fragment 86 is shown in FIG. 4 and the deduced amino acid sequence is shown in FIG. 5.

(40) Identification of the Promoter Region

(41) For our promoter identification purposes, the sequence upstream of the start codon of the predicted KG_Fragment 86 gene was defined as the promoter p-KG86. To identify this promoter region, the sequence of 62/260,557.f_o13_1 was mapped to the BASF Plant Science proprietary genomic DNA sequence database, PUB_tigr_maize_genomic_partial_5.0.nt. One maize genomic DNA sequence, AZM5_7833 (5084 bp) was identified. This 5084 bp sequence harboured the CDS of the KG_Fragment 86 and more than 2 kb upstream sequence of the ATG start codon of this gene (FIGS. 6A, 6B, and 6C, combined).

(42) Isolation of the Promoter Region by PCR Amplification

(43) The putative promoter region was isolated via genomic PCR using the following sequence

(44) TABLE-US-00004 Forward primer: (SEQ ID NO: 18) tcccgtgtccgtcaatgtgata Reverse primer: (SEQ ID NO: 19) Ggactcacgagctgaggctcgg

(45) The expected 1198 bp fragment was amplified from maize genomic DNA, and anotated as promoter KG86 (p-KG86). Sequence of p-KG86 was shown in FIG. 7.

(46) PLACE Analysis of the Promoter KG86

(47) Cis-acting motifs in the 1198 bp KG86 promoter region were identified using PLACE (a database of Plant Cis-acting Regulatory DNA elements) using the Genomatix database suite. The results are listed in Table 4. Although no putative consensus TATA box was identified in the forward strand, a CAAT Box motif is found at nucleotide position 701-705 in the forward strand.

(48) TABLE-US-00005 TABLE 4 PLACE analysis results of the 1198bp promoter, p-KG86 Start End Mis- IUPAC pos. pos. Strand matches Score Sequence (SEQ ID NO) WBOXATNPR1 2 16 − 0 1 ATTGACGGACACGGG (20) DPBFCOREDCDC3 2 8 − 0 1 ACACGGG ASF1MOTIFCAMV 7 19 − 0 1 CACATTGACGGAC (21) S1FBOXSORPS1L21 41 46 − 0 1 ATGGTA RYREPEATGMGY2 42 52 + 0 1 ACCATGCATAC (22) DRECRTCOREAT 61 67 − 0 1 GCCGACC GCCCORE 65 71 + 0 1 GGCCGCC BIHD1OS 103 107 + 0 1 TGTCA SORLIP1AT 131 143 − 0 1 TAGCTAGCCACGC (23) GT1GMSCAM4 159 164 − 0 1 GAAAAA IBOXCORE 171 177 + 0 1 GATAATA TBOXATGAPB 180 185 + 0 1 ACTTTG BIHD1OS 184 188 + 0 1 TGTCA S1FBOXSORPS1L21 188 193 + 0 1 ATGGTA MYB1AT 208 213 − 0 1 TAACCA TATABOX4 211 217 − 0 1 TATATAA MYBST1 244 250 + 0 1 AGGATAG IBOXCORE 275 281 + 0 1 GATAAAA BIHD1OS 300 304 − 0 1 TGTCA MYBCOREATCYCB1 306 310 + 0 1 AACGG RYREPEATGMGY2 315 325 + 0 1 CGCATGCATTG (24) CCAATBOX1 322 326 − 0 1 CCAAT CGACGOSAMY3 328 332 + 0 1 CGACG CGCGBOXAT 345 350 + 0 1 GCGCGT CGCGBOXAT 345 350 − 0 1 ACGCGC SURECOREATSULTR11 347 353 − 0 1 GAGACGC DPBFCOREDCDC3 351 357 − 0 1 ACACGAG PALBOXAPC 362 368 + 0 1 CCGTCCA CMSRE1IBSPOA 362 368 − 0 1 TGGACGG SORLIP1AT 379 391 + 0 1 TCTCACGCCACGT (25) ABREATRD2 383 395 − 0 1 GAGCACGTGGCGT (26) CACGTGMOTIF 384 396 + 0 1 CGCCACGTGCTCA (27) RAV1AAT 395 399 + 0 1 CAACA ASF1MOTIFCAMV 411 423 − 0 1 GCTGGTGACGAAC (28) ASF1MOTIFCAMV 438 450 + 0 1 AGGGATGACGCAT (29) LTRE1HVBLT49 450 455 − 0 1 CCGAAA BIHD1OS 460 464 + 0 1 TGTCA MYBST1 485 491 − 0 1 TGGATAT TATCCAOSAMY 486 492 + 0 1 TATCCAA RAV1AAT 490 494 + 0 1 CAACA EMHVCHORD 524 532 + 0 1 TGTAAAGTC 300ELEMENT 524 532 + 0 1 TGTAAAGTC TAAAGSTKST1 524 530 + 0 1 TGTAAAG NTBBF1ARROLB 525 531 − 0 1 ACTTTAC CACGTGMOTIF 544 556 − 0 1 CTGCACGTGCTGT (30) CACGTGMOTIF 545 557 + 0 1 CAGCACGTGCAGA (31) HEXMOTIFTAH3H4 561 573 + 0 1 ATTAACGTCATTA (32) TGACGTVMAMY 563 575 − 0 1 AATAATGACGTTA (33) CPBCSPOR 572 577 + 0 1 TATTAG RYREPEATGMGY2 588 598 − 0 1 ATCATGCATCT (34) DPBFCOREDCDC3 618 624 + 0 1 ACACAAG OSE2ROOTNODULE 622 626 − 0 1 CTCTT MYBPLANT 667 677 − 0 1 CACCAACCAGC (35) BOXLCOREDCPAL 670 676 − 0 1 ACCAACC CGCGBOXAT 684 689 + 0 1 GCGCGC CGCGBOXAT 684 689 − 0 1 GCGCGC CCAATBOX1 696 700 − 0 1 CCAAT CCAATBOX1 701 705 + 0 1 CCAAT SORLIP1AT 721 733 + 0 1 CCACTCGCCACGC (36) SORLIP2AT 738 748 − 0 1 GGGGCCATTCA (37) CGCGBOXAT 774 779 + 0 1 CCGCGC CGCGBOXAT 774 779 − 0 1 GCGCGG CGCGBOXAT 776 781 + 0 1 GCGCGC CGCGBOXAT 776 781 − 0 1 GCGCGC SITEIIATCYTC 777 787 − 0 1 TGGGCCGCGCG (38) CGCGBOXAT 778 783 + 0 1 GCGCGG CGCGBOXAT 778 783 − 0 1 CCGCGC DRECRTCOREAT 793 799 − 0 1 GCCGACT SORLIP1AT 801 813 + 0 1 GAACGCGCCACGG (39) CGCGBOXAT 803 808 + 0 1 ACGCGC CGCGBOXAT 803 808 − 0 1 GCGCGT SORLIP2AT 829 839 + 0 1 AGGGCCGAGGC (40) CGCGBOXAT 841 846 + 0 1 GCGCGG CGCGBOXAT 841 846 − 0 1 CCGCGC OCTAMOTIF2 842 849 + 0 1 CGCGGCAT BS1EGCCR 864 869 + 0 1 AGCGGG RYREPEATBNNAPA 876 886 + 0 1 TGCATGCAGGT (41) INTRONLOWER 877 882 − 0 1 TGCAGG RYREPEATBNNAPA 879 889 + 0 1 TGCATGCAGCC (42) ASF1MOTIFCAMV 902 914 − 0 1 ACGACTGACGAGG (43) BOXCPSAS1 921 927 + 0 1 CTCCCAC MYBPZM 937 943 + 0 1 CCCAACC CGCGBOXAT 963 968 + 0 1 ACGCGC CGCGBOXAT 963 968 − 0 1 GCGCGT ABREMOTIFAOSOSEM 985 997 + 0 1 GCCTACGTGTCGG (44) DRECRTCOREAT 992 998 − 0 1 GCCGACA ABREOSRAB21 1014 1026 − 0 1 GGGTACGTGGGCG (45) UPRMOTIFIIAT 1025 1043 + 0 1 CCCGCCCCGTTCTCCCACG (46) MYBCOREATCYCB1 1031 1035 − 0 1 AACGG IRO2OS 1036 1048 − 0 1 GGGCACGTGGGAG (47) BOXCPSAS1 1036 1042 + 0 1 CTCCCAC ABREOSRAB21 1037 1049 + 0 1 TCCCACGTGCCCC (48) CGCGBOXAT 1057 1062 + 0 1 GCGCGC CGCGBOXAT 1057 1062 − 0 1 GCGCGC CGCGBOXAT 1059 1064 + 0 1 GCGCGT CGCGBOXAT 1059 1064 − 0 1 ACGCGC CCAATBOX1 1068 1072 − 0 1 CCAAT WBOXNTCHN48 1072 1086 + 0 1 GCTGACCCGCCCTTC (49) CGCGBOXAT 1092 1097 + 0 1 CCGCGC CGCGBOXAT 1092 1097 − 0 1 GCGCGG SORLIP2AT 1107 1117 − 0 1 GGGGCCCGGAC (50) SORLIP2AT 1110 1120 + 0 1 CGGGCCCCAAC (51) HEXAMERATH4 1129 1134 + 0 1 CCGTCG CGACGOSAMY3 1130 1134 − 0 1 CGACG CGACGOSAMY3 1133 1137 − 0 1 CGACG SURECOREATSULTR11 1135 1141 − 0 1 GAGACGA SITEIIATCYTC 1154 1164 − 0 1 TGGGCTCGATC (52) QELEMENTZMZM13 1159 1173 − 0 1 CCAGGTCAGTGGGCT (53) WBOXNTCHN48 1164 1178 + 0 1 ACTGACCTGGCCCCC (54) SORLIP2AT 1167 1177 − 0 1 GGGGCCAGGTC (55)

(49) Binary Vector Construction for Maize Transformation to Evaluate the Function of p-KG86

(50) To facilitate subcloning, the 1198 bp promoter fragment was modified by the addition of a PacI restriction enzyme site at its 5′ end and a BsiWI site at its 3′end. The PacI-pKG86-BsiWI promoter fragment was digested and ligated into a PacI and BsiWI digested BPS basic binary vector HF84. HF84 comprises a plant selectable marker expression cassette (p-Ubi::c-EcEsdA::t-NOS) as well as a promoter evaluation cassette that consists of a multiple cloning site for insertion of putative promoters via PacI and BaiWI, rice MET1-1 intron to supply intron-mediated enhancement in monocot cells, GUS reporter gene, and NOS terminator. The resulting binary vector comprising the p-KG86::i-MET1::GUS::t-NOS expression cassette was named as RKF126, and was used to evaluate the expression pattern driven by the p-KG86 promoter. FIG. 8 is a diagram of RKF126. Sequence of the binary vector RKF126 is shown in FIGS. 9A, 9B, 9C, 9D, 9E, and 9F, combined.

(51) Promoter Evaluation in Transgenic Maize with RKF126

(52) Expression patterns and levels driven by the p-KG86 promoter were measured using GUS histochemical analysis following the protocol in the art (Jefferson 1987). Maize transformation was conducted using an Agrobacterium-mediated transformation system. Ten and five single copy events for T0 and T1 plants were chosen for the promoter analysis. GUS expression was measured at various developmental stages:

(53) 1) Roots and leaves at 5-leaf stage

(54) 2) Stem at V-7 stage

(55) 2) Leaves, husk and silk at flowering stage (first emergence of silk)

(56) 3) Spikelets/Tassel (at pollination)

(57) 5) Ear or Kernels at 5, 10, 15, 20, and 25 days after pollination (DAP)

(58) The results indicated that promoter p-KG86 of RKF126 expressed specifically in pollen and in whole seeds (FIG. 10).

(59) TABLE-US-00006 TABLE 4A Summary of tested tissues and relative expression intensities for pKG86 Tissues Stages tested Leaf Root Stem husk silk Spikelets/Tassel/pollen un-pollinated cob pollinated cob embryo endosperm seedling (5-leaf) − − V-7 Flowering (emergence of silk) − − − − pollination ++ 5 DAP 10 DAP + ++ ++ 15 DAP ++ ++ 20 DAP +++ +++ 25 DAP +++ +++ 48 hrs after imibibition ++++ ++++ 72 hrs after imibibition ++++ ++++ 1 week germination − − − = no expression, + = weak expression, ++ = medium expression, +++ = strong expression, ++++ = very strong expression

Example 2: Identification and Validation of Maize Whole Seed Promoter MAWS42 and MAWS45

(60) Identification of Transcript of MAWS42 and MAWS45

(61) A microarray study was conducted to identify transcripts with whole seed-specific expression in maize using the same panel of maize RNA samples shown in Table 1. The twenty-three labeled RNAs of these maize tissues were hybridized separately to 23 of our custom designed BPS maize Affymetrix chips, labeled with fluorescent streptavidin antibody, washed, stained and scanned as instructed in the Affymetrix Expression Analysis Technical Manual.

(62) The chip hybridization data were analyzed using Genedata Specialist software and relative expression level was determined based on the hybridization signal intensity of each tissue.

(63) Three of the BPS maize chip probe sets were selected as candidate transcripts showing 3-8 fold higher expression in whole seeds as compared to other tissues: ZM1s61973481_at, ZM1s61221800_s_at and ZM1s62042561 at. Consensus sequences of ZM1s61973481_at, ZM1s61221800_s_at and ZM1s62042561_at are shown in FIGS. 11A, 11B, and 11C, respectively.

(64) Preliminary sequence analysis indicated that ZM1s61221800 is included in ZM1s62042561, therefore, we considered ZM1s61221800 and ZM1s62042561 to represent the same gene; further studies for this gene were conducted based on ZM1s62042561. For the purpose of presentation convenience we named ZM1s61973481 as candidate MAWS42 and ZM1s62042561 as MAWS45.

(65) Confirmation of Expression Pattern of MAWS42 and MAWS45 Using Quantitative Reverse Transcriptase-Polymerase Chain Reaction (q-RT-PCR)

(66) Confirmation of the native expression patterns of MAWS42 and MAWS45 was carried out via quantitative reverse transcription PCR (q-RT-PCR) using total RNA isolated from the same materials as what used for the chip study (Table 1).

(67) Primers for qRT-PCR were designed based on the sequences of ZM1s61973481 for MAWS42 and ZM1s62042561 for MAWS45 using Vector NTI software package. Two sets of primers were used for PCR amplification of each gene. The sequences of primers are in Table 5. The glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene served as a control for normalization.

(68) TABLE-US-00007 TABLE 5 Primer sequences for q-RT-PCR Primer Sequences (SEQ ID No) MAWS42_Forward_1 CTGGCCGTGGGCTTCCTGCT (60) MAWS42_Reverse_1 AAGGGCCCAGCCAGTACACCCA (61) MAWS42_Forward_2 TGGAGGCACCACTGGGTGTACTGG (62) MAWS42_Reverse_2 GCTAGTAGTCCTCTGGCGCGAGCG (63) MAWS45_Forward_1 GCCAACTCTTCCATTTCGCCAAGG (64) MAWS45_Reverse_1 GGAGGATTGGCGGTGACAGTCTCA (65) MAWS45_Forward_2 AGGAAAAAATGGCGGCTCGCTGG (66) MAWS45_Reverse_2 CCATGCAAATGGAGGATTGGCGG (67) GAPDH_Forward GTAAAGTTCTTCCTGATCTGAAT (68) GAPDH_Reverse TCGGAAGCAGCCTTAATA (69)
q-RT-PCR was performed using SuperScript III Reverse Transcriptase (Invitrogen, Carlsbad, Calif., USA) and SYBR Green QPCR Master Mix (Eurogentec, San Diego, Calif., USA) in an ABI Prism 7000 sequence detection system. cDNA was synthesized using 2-3 □g of total RNA and 1 μL reverse transcriptase in a 20 □L volume. The cDNA was diluted to a range of concentrations (15-20 ng/□L). Thirty to forty ng of cDNA was used for quantitative PCR (qPCR) in a 30 □L volume with SYBR Green QPCR Master Mix following the manufacturer's instruction. The thermocycling conditions were as follows: incubate at 50° C. for 2 minutes, denature at 95° C. for 10 minutes, and run 40 cycles at 95° C. for 15 seconds and 60° C. for 1 minute for amplification. After the final cycle of the amplification, the dissociation curve analysis was carried out to verify that the amplification occurred specifically and no primer dimer product was generated during the amplification process. The housekeeping gene glyceraldehyde-3-phosphate-dehydrogenase (GAPDH, primer sequences in Table 2) was used as an endogenous reference gene to normalize the calculation using the Comparative Ct (Cycle of threshold) value method. The ΔCT value was obtained by subtracting the Ct value of GAPDH gene from the Ct value of the candidate genes. The relative transcription quantity (expression level) of the candidate gene was expressed as 2-ΔCT. The qRT-PCR results were summarized in FIG. 12 and FIG. 13. Both primer sets gave similar expression patterns as were obtained in the microarray study.

(69) Annotation of MAWS42 and MAWS45

(70) The coding sequences corresponding to the MAWS42 and MAWS45 genes were annotated based on the in silico results obtained from both BLASTX of the chip consensus sequences of ZM1s61973481 and of ZM1s62042561 against GenBank protein database (nr) and results from the translation program of Vector NTI software package. The ZM1s61973481 encodes partially a maize Tonoplast intrinsic protein 3-1(ZmTIP3). The CDS of ZmTIP3-1 (GenBank Accession:NP_0011050321) is shown in FIG. 14, the translated amino acid sequence is shown in FIG. 15, and the top 15 homologous sequences from the BLASTX query are presented in Table 6.

(71) TABLE-US-00008 TABLE 6 BLASTX search results of the maize ZM1s61973481 (MAWS42) Accession Description Score E-value NP_001105032.1 TIP31_MAIZE Aquaporin TIP3-1 150 8e−73 (Tonoplast intrinsic protein 3-1) NP_001064933.1 Os10g0492600 [Oryza sativa (japonica)] 147 4e−64 NP_001105045.1 TIP32_MAIZE Aquaporin TIP3-2 139 5e−63 (Tonoplast intrinsic protein 3-2) (ZmTIP3-2) BAA08107.1 membrane protein MP23 precursor 98 5e−42 [Cucurbita cv. Kurokawa Amakuri] CAA44669.1 tonoplast intrinsic protein [Phaseolus vulgaris] 98 4e−40 T10253 membrane protein MP28 BAA08108.1 [Cucurbita cv. Kurokawa Amakuri] 92 6e−39 ABK22410.1 unknown [Picea sitchensis] 98 6e−33 ABK22242.1 unknown [Picea sitchensis] 94 5e−32 NP_001053371.1| 0s04g0527900 [Oryza sativa (japonica 85 2e−24 cultivar-group)] CAA64952.1 tonopiast intrinsic protein [Tulipa 96 2e−24 gesneriana] EAY94920.1 hypothetical protein Osl_016153 [Oryza 86 2e−24 sativa [indica cultivar-group)] CAB39758.1 major intrinsic protein [Picea abies] 111 4e−24 AAC39480.1 aquaporin [Vernicia fordii] 87 8e−24 CAO62035.1 unnamed protein product (Vitis vinifera] 110 5e−22 BAD04010.1 tonoplast intrinsic protein [Prunus persica] 109 6e−22

(72) The ZM1s62042561(MAWS45) encodes a partial protein that has highest homology to a maize unknown protein (GenBank Accession: ACF84237.1), The CDS of this gene is shown in FIG. 16, the translated amino acid sequence is shown in FIG. 17, and the top 15 homologous sequences from the BLASTX query are presented in Table 7.

(73) TABLE-US-00009 TABLE 7 BLASTX search results of the maize ZM1s62042561 (MAWS45) Accession Description Score E-value ACF84237 1 unknown [Zea mays] 536 e−152 ACG56678.1 tryptophan aminotransferase [Zea mays] 534 e−151 NP_001054781.1 Os05g01639300 [Oryza sativa (japonica cultivar- 239 e−100 group)] EAY96895.1 hypothetical protein Osl_017928 [Oryza sativa 239 e−100 (indica cultivar-)] EAY96696.1 hypothetical protein Osl_017929 [Oryza sativa 233 4e−98 (indica cultivar-group)] EAY72702.1 hypothetical protein Osl_000549 [Oryza sativa 167 9e−85 (indica cultivar-group)] BAD68317.1 putative alliinase precursor [Oryza sativa Japonica 167 9e−85 Group] EAZ10701.1 hypothetical protein OsJ_000526 [Oryza sativa 167 9e−85 (japonica cultivar-group)] ACF80703.1 unknown [Zea mays] 204 2e−79 EAZ33023.1 hypothetical protein OsJ_016506 [Oryza sativa 158 3e−7 (japonica cultivar-group)] AAM69848.1 putative alliin lyase [Aegilops tauschii] 265 1e−73 NP_001042135.1 Os01g0169800 [Oryza sativa (japonica cultivar- 167 7e−73 group)] CAO64270.1 unnamed protein product [Vitis vinifera] 221 5e−71 CAN80923.1 hypothetical protein [Vitis vinifera] 221 7e−71 CAO16122.1 unnamed protein product [Vitis vinifera] 157 1e−61

(74) Identification of the Promoter Region

(75) The sequences upstream of the start codons of the corresponding genes to MAWS42 and MAWS45 were defined as the putative promoters p-MAWS42 and p-MAWS45. To identify these putative promoter regions, the sequences of ZM1s61973481 and ZM1s62042561 were mapped to the BASF Plant Science proprietary genomic DNA sequence database, PUB_tigr_maize_genomic_partial_5.0.nt. Two maize genomic DNA sequences, AZM5_17960 (3985 bp) and AZM5_6324 (4565 bp) were identified, respectively. The sequence of AZM5_17960 has about 1 kb sequence upstream of the predicted CDS of the corresponding gene to MAWS42 and AZM56324 has about 1.5 kb sequence upstream of the predicted CDS of the corresponding gene to MAWS45. These upstream sequences were considered as putative promoter MAWS42 (p-MAWS42) and Promoter MAWS45 (p-MAWS45). FIGS. 18A and 18B, combined, and FIGS. 18C, 18D, and 18E, combined, show sequences of AZM5_17960 and sequence AZM5_6324, respectively.

(76) Isolation of the Promoter Region by PCR Amplification

(77) The putative promoter sequences were isolated by genomic PCR using the sequence specific primers indicated in Table 8. A fragment of 1008 bp of AZM5_17960 and a fragment of 1492 bp of AZM5_6324 were amplified from maize genomic DNA. These fragments were named as promoter MAWS42 (p-MAWS42) and promoter MAWS45 (p-MAWS45), respectively. Sequences of p-MAWS42 and p-MAWS45 are shown in FIGS. 19A and 19B, respectively.

(78) TABLE-US-00010 TABLE 8 Primers for PCR cloning of pMAWS42 and p-MAWS45 Primer Sequence (SEQ ID NO) p-MAWS42_forward taactcatatccggttagata (72) p-MAWS42_reverse gtcgtcgccaaataaaaacctacc (73) p-MAWS45_forward atttaaatgtgttggataatct (74) p-MAWS45_reverse ctcctcctcctcctcctcctcct (75)

(79) PLACE Analysis of the Promoters MAWS42 and MAWS45

(80) Cis-acting motifs in the 1008 bp of p-MAWS42 and 1492 bp of p-MAWS45 promoter regions were identified using PLACE (a database of Plant Cis-acting Regulatory DNA elements) using the Genomatix database suite. The results are listed in Table 9 and Table 10.

(81) TABLE-US-00011 TABLE 9 PLACE analysis results of the 1008bp promoter p-MAWS42 Start End Mis- IUPAC pos. pos. Strand matches Score Sequence SEQ ID No) PREATPRODH 3 8 + 0 1 ACTCAT REBETALGLHCB21 7 13 − 0 1 CGGATAT NAPINMOTIFBN 27 33 + 0 1 TACACAT CPBCSPOR 50 55 − 0 1 TATTAG SEF1MOTIF 52 60 − 0 1 ATATTTATT SP8BFIBSP8BIB 74 80 − 0 1 TACTATT SEF1MOTIF 85 93 − 0 1 ATATTTAAT TATABOXOSPAL 86 92 − 0 1 TATTTAA PREATPRODH 92 97 − 0 1 ACTCAT 8IHD1OS 109 113 − 0 1 TGTCA CCAATBOX1 126 130 − 0 1 CCAAT ELRECOREPCRP1 140 154 + 0 1 ATTGACCCTATTTTG (76) CPBCSPOR 155 160 − 0 1 TATTAG D3GMAUX28 172 182 + 0 1 TATTTGCTTAA (77) MYBPZM 186 192 − 0 1 TCCTACC TATABOX2 214 220 + 0 1 TATAAAT IBOXCORE 218 224 − 0 1 GATAATT SREATMSD 219 225 + 0 1 ATTATCC MYBST1 220 226 − 0 1 TGGATAA AMYBOX2 221 227 + 0 1 TATCCAT TATCCAOSAMY 221 227 + 0 1 TATCCAT TATABOX2 239 245 + 0 1 TATAAAT PREATPRODH 265 270 + 0 1 ACTCAT LTRECOREATCOR15 274 280 + 0 1 CCCGACG CGACGOSAMY3 276 280 + 0 1 CGACG HEXAMERATH4 276 281 − 0 1 CCGTCG PREATPRODH 321 326 + 0 1 ACTCAT TATABOX4 326 332 − 0 1 TATATAA RAV1AAT 354 358 − 0 1 CAACA DPBFCOREDCDC3 360 366 + 0 1 ACACTAG S1FBOXSORPS1L21 375 380 − 0 1 ATGGTA HDZIP2ATATHB2 382 390 − 0 1 TAATAATTA TATABOX3 386 392 + 0 1 TATTAAT TGTCACACMCUCUMISIN 448 454 + 0 1 TGTCACA BIHD1OS 448 452 + 0 1 TGTCA MYBPLANT 454 464 − 0 1 CACCAAACATT (78) CANBNNAPA 460 468 − 0 1 CTAACACCA MYB1LEPR 464 470 + 0 1 GTTAGTT GT1CORE 485 495 + 0 1 AGGTTAATTAC (79) OSE1ROOTNODULE 502 508 + 0 1 AAAGATG LTRE1HVBLT49 525 530 + 0 1 CCGAAA MYBCOREATCYCB1 533 537 + 0 1 AACGG 2SSEEDPROTBANAPA 541 549 + 0 1 CAAACACAC RAV1AAT 554 558 + 0 1 CAACA BOXIINTPATPB 603 608 + 0 1 ATAGAA NTBBF1ARROLB 618 624 + 0 1 ACTTTAG TAAAGSTKST1 619 625 − 0 1 CCTAAAG PALBOXAPC 623 629 − 0 1 CCGTCCT CAATATGGMSAUR 637 642 + 0 1 CATATG CATATGGMSAUR 637 642 − 0 1 CATATG CCAATBOX1 647 651 − 0 1 CCAAT LTRE1HVBLT49 657 662 + 0 1 CCGAAA WBOXHVISO1 690 704 + 0 1 GGTGACTTGGCAGTT (80) REBETALGLHCB21 718 724 + 0 1 CGGATAA SREATMSD 719 725 − 0 1 TTTATCC IBOXCORE 720 726 + 0 1 GATAAAG TAAAGSTKST1 720 726 + 0 1 GATAAAG OSE1ROOTNODULE 723 729 + 0 1 AAAGATG PALBOXAPC 784 790 − 0 1 CCGTCCA CMSRE1IBSPOA 784 790 + 0 1 TGGACGG SORLIP2AT 788 798 − 0 1 GGGGCCGCCCG (81) GCCCORE 790 796 − 0 1 GGCCGCC ABRELATERD 799 811 + 0 1 TGAGACGTGCCGC (82) SURECOREATSULTR11 800 806 + 0 1 GAGACGT GCCCORE 806 812 + 0 1 TGCCGCC SORLIP2AT 813 823 − 0 1 CGGGCCAGCTG (83) BS1EGCCR 820 825 − 0 1 AGCGGG CACGTGMOTIF 829 841 − 0 1 CGCCACGTGTGGG (84) ABREATRD2 830 842 + 0 1 CCACACGTGGCGC (85) DPBFCOREDCDC3 832 838 + 0 1 ACACGTG SORLIP1AT 834 846 − 0 1 CTCCGCGCCACGT (86) CGCGBOXAT 839 844 + 0 1 GCGCGG CGCGBOXAT 839 844 − 0 1 CCGCGC CGCGBOXAT 849 854 + 0 1 GCGCGC CGCGBOXAT 849 854 − 0 1 GCGCGC CGCGBOXAT 851 856 + 0 1 GCGCGG CGCGBOXAT 851 851 − 0 1 CCGCGC SORLIP1AT 855 867 + 0 1 GGCTCGGCCACGT (87) ABREOSRAB21 859 871 − 0 1 TATAACGTGGCCG (88) SORLIP1AT 867 879 + 0 1 TTATAAGCCACGC (89) CGCGBOXAT 876 881 + 0 1 ACGCGC CGCGBOXAT 876 881 − 0 1 GCGCGT CGCGBOXAT 878 883 + 0 1 GCGCGC CGCGBOXAT 878 883 − 0 1 GCGCGC HEXAMERATH4 887 892 + 0 1 CCGTCG CGACGOSAMY3 888 892 − 0 1 CGACG WBOXNTCHN48 901 915 + 0 1 CCTGACTACTGCACA (90) DPBFCOREDCDC3 913 919 + 0 1 ACACTCG SURECOREATSULTR11 917 923 − 0 1 GAGACGA CGCGBOXAT 942 947 + 0 1 CCGCGG CGCGBOXAT 942 947 − 0 1 CCGCGG SURECOREATSULTR11 963 969 − 0 1 GAGACGG TAAAGSTKST1 974 980 + 0 1 GCTAAAG MYBPLANT 982 992 − 0 1 AACCTACCTCT (91) BOXLCOREDCPAL 985 991 − 0 1 ACCTACC CGACGOSAMY3 1002 1006 + 0 1 CGACG

(82) TABLE-US-00012 TABLE 10 PLACE analysis results of the 1492bp promoter p-MAWS45 Start End Mis- IUPAC pos. pos. Strand matches Score Sequence RAV1AAT 2 6 − 0 1 CAACA TATCCAOSAMY 4 10 − 0 1 TATCCAA MYBST1 5 11 + 0 1 TGGATAA SREATMSD 6 12 − 0 1 ATTATCC IBOXCORE 7 13 + 0 1 GATAATC OSE1ROOTNODULE 10 16 − 0 1 AAAGATT -300ELEMENT 12 20 − 0 1 TGCAAAAGA RYREPEATBNNAPA 14 24 − 0 1 TCCATGCAAAA (92) AMYBOX2 20 26 − 0 1 TATCCAT TATCCAOSAMY 20 26 − 0 1 TATCCAT MYBST1 21 27 + 0 1 TGGATAT RAV1AAT 29 33 − 0 1 CAACA MYCATRD2 44 50 − 0 1 CACATGG MYCATERD 45 51 + 0 1 CATGTGC ANAERO2CONSENSUS 59 64 + 0 1 AGCAGC CCAATBOX1 80 84 + 0 1 CCAAT RYREPEATBNNAPA 117 127 + 0 1 AACATGCAAAT (93) BIHD1OS 133 137 + 0 1 TGTCA DPBFCOREDCDC3 142 148 + 0 1 ACACCAG BOXLCOREDCPAL 157 163 − 0 1 ACCATCC S1FBOXSORPS1L21 159 164 + 0 1 ATGGTA AMYBOX2 218 224 − 0 1 TATCCAT TATCCAOSAMY 218 224 − 0 1 TATCCAT MYBST1 219 225 + 0 1 TGGATAT WBOXATNPR1 230 244 + 0 1 ATTGACAATAAAACA (94) BIHD1OS 232 236 − 0 1 TGTCA MYB1AT 248 253 + 0 1 TAACCA SEF3MOTIFGM 255 260 − 0 1 AACCCA MYB1AT 275 280 − 0 1 AAACCA -10PEHVPSBD 291 296 − 0 1 TATTCT P1BS 312 319 + 0 1 GTATATAC P1BS 312 319 − 0 1 GTATATAC RAV1AAT 321 325 + 0 1 CAACA CIACADIANLELHC 341 350 + 0 1 CAAAGCCATC (95) MYBPZM 351 357 + 0 1 TCCAACC RYREPEATGMGY2 372 382 − 0 1 ACCATGCATAT (96) RAV1AAT 384 388 + 0 1 CAACA WBOXATNPR1 398 412 + 0 1 ATTGACATGCATATA (97) BIFID1OS 400 404 − 0 1 TGTCA RYREPEATGMGY2 401 411 + 0 1 GACATGCATAT (98) SORLREP3AT 426 434 − 0 1 TGTATATAT SP8BFIBSP8BIB 443 449 + 0 1 TACTATT CATATGGMSAUR 451 456 + 0 1 CATATG CATATGGMSAUR 451 456 − 0 1 CATATG TATABOX4 457 463 − 0 1 TATATAA SEF1MOTIF 461 469 + 0 1 ATATTTATA TATABOX2 463 469 − 0 1 TATAAAT ANAERO1CONSENSUS 481 487 − 0 1 AAACAAA BIHD1OS 492 496 + 0 1 TGTCA DPBFCOREDCDC3 507 513 − 0 1 ACACACG GT1GMSCAM4 521 526 − 0 1 GAAAAA MYB1AT 543 548 + 0 1 TAACCA DPBFCOREDCDC3 563 569 + 0 1 ACACGCG CGCGBOXAT 565 570 + 0 1 ACGCGT CGCGBOXAT 565 570 − 0 1 ACGCGT RAV1AAT 589 593 + 0 1 CAACA MYCATERD 591 597 − 0 1 CATGTGT DPBFCOREDCDC3 591 597 + 0 1 ACACATG MYCATRD2 592 598 + 0 1 CACATGG S1FBOXSORPS1L21 595 600 + 0 1 ATGGTA CCA1ATLHCB1 603 610 − 0 1 AAAAATCT -300ELEMENT 604 612 − 0 1 TGAAAAATC GT1GMSCAM4 606 611 − 0 1 GAAAAA WBOXATNPR1 607 621 − 0 1 TTTGACACATGAAAA (99) MYCATRD2 610 616 − 0 1 CACATGA MYCATERD 611 617 + 0 1 CATGTGT DPBFCOREDCDC3 611 617 − 0 1 ACACATG BIHD1OS 615 619 + 0 1 TGTCA PREATPRODH 655 660 + 0 1 ACTCAT SURECOREATSULTR11 671 677 + 0 1 GAGACGA PALBOXAPC 703 709 − 0 1 CCGTCCG GT1GMSCAM4 718 723 − 0 1 GAAAAA CPBCSPOR 733 738 − 0 1 TATTAG SEF1MOTIF 740 748 − 0 1 ATATTTATT RAV1BAT 771 783 + 0 1 TACCACCTGTTGC (100) RAV1AAT 778 782 − 0 1 CAACA INTRONLOWER 792 797 + 0 1 TGCAGG MYBPLANT 794 804 − 0 1 CACCAAACCTG (101) SEBFCONSSTPR10A 802 808 − 0 1 CTGTCAC BIHD1OS 803 807 − 0 1 TGTCA RYREPEATGMGY2 814 824 + 0 1 AACATGCATTT (102) L1BOXATPDF1 818 825 − 0 1 TAAATGCA RAV1AAT 828 832 − 0 1 CAACA MYB2AT 847 857 − 0 1 CGATTAACTGC (103) RAV1AAT 867 871 − 0 1 CAACA 2SSEEDPROTBANAPA 875 883 + 0 1 CAAACACGA DPBFCOREDCDC3 878 884 + 0 1 ACACGAG SORLIP1AT 931 943 − 0 1 ACGACGGCCACCG (104) HEXAMERATH4 937 942 + 0 1 CCGTCG CGACGOSAMY3 938 942 − 0 1 CGACG DPBFCOREDCDC3 959 965 + 0 1 ACACCAG CCAATBOX1 967 971 + 0 1 CCAAT SV40COREENHAN 968 975 − 0 1 GTGGATTG RAV1AAT 980 984 + 0 1 CAACA CGCGBOXAT 986 991 + 0 1 CCGCGC CGCGBOXAT 986 991 − 0 1 GCGCGG WBOXNTCHN48 987 1001 − 0 1 ACTGACCGAGGCGCG (105) MYB2AT 997 1007 − 0 1 TCTATAACTGA (106) SORLIP1AT 1009 1021 − 0 1 CAGAAGGCCACGC (107) ANAERO1CONSENSUS 1022 1028 + 0 1 AAACAAA AACACOREOSGLUB1 1023 1029 + 0 1 AACAAAC CATATGGMSAUR 1033 1038 + 0 1 CATATG CATATGGMSAUR 1033 1038 − 0 1 CATATG MYCATERD 1055 1061 − 0 1 CATGTGT DPBFCOREDCDC3 1055 1061 + 0 1 ACACATG RYREPEATGMGY2 1056 1066 + 0 1 CACATGCATCC (108) MYCATRD2 1056 1062 + 0 1 CACATGC DPBFCOREDCDC3 1085 1091 − 0 1 ACACAAG IBOXCORE 1106 1112 + 0 1 GATAACC SEF3MOTIFGM 1109 1114 + 0 1 AACCCA SORLIP1AT 1110 1122 + 0 1 ACCCAGGCCACAT (109) CGCGBOXAT 1130 1135 + 0 1 CCGCGC CGCGBOXAT 1130 1135 − 0 1 GCGCGG CGCGBOXAT 1135 1140 + 0 1 CCGCGC CGCGBOXAT 1135 1140 − 0 1 GCGCGG GCCCORE 1138 1144 + 0 1 CGCCGCC SEF3MOTIFGM 1156 1161 + 0 1 AACCCA ACGTOSGLUB1 1181 1193 − 0 1 ACGTACGTGCAAG (110) CGCGBOXAT 1198 1203 + 0 1 GCGCGC CGCGBOXAT 1198 1203 − 0 1 GCGCGC MYBCOREATCYCB1 1207 1211 − 0 1 AACGG MYBCOREATCYCB1 1244 1248 − 0 1 AACGG SORLIP1AT 1256 1268 + 0 1 GAGTGCGCCACGC (111) LTRE1HVBLT49 1268 1273 + 0 1 CCGAAA ASF1MOTIFCAMV 1280 1292 + 0 1 CGAGCTGACGAGC (112) SORLIP1AT 1294 1306 + 0 1 CTAGACGCCACCG (113) CGCGBOXAT 1311 1316 + 0 1 GCGCGG CGCGBOXAT 1311 1316 − 0 1 CCGCGC SORLIP1AT 1316 1328 − 0 1 TGCCTTGCCACGC (114) SURECOREATSULTR11 1340 1346 − 0 1 GAGACCC ASF1MOTIFCAMV 1349 1361 + 0 1 ATAGCTGACGAGG (115) PALBOXAPC 1429 1435 + 0 1 CCGTCCC INTRONLOWER 1434 1439 − 0 1 TGCAGG INTRONLOWER 1441 1446 + 0 1 TGCAGG

(83) Binary Vector Construction for Maize Transformation to Evaluate the Function of p-MAWS42 and p-MAWS45

(84) The 1008 bp promoter fragment of p-MAWS42 was amplified by PCR, incorporating a SwaI restriction enzyme site at its 5′ end and a BsiWI site at its 3′end. The resulting fragment was digested and ligated into a SwaI and BsiWI digested BPS basic binary vector CB1006. Plasmid CB1006 is a plant transformation vector that comprises a plant selectable marker expression cassette (p-Ubi::c-ZmAHASL2::t-NOS) as well as a promoter evaluation cassette that consists of a multiple cloning site for insertion of putative promoters via SwaI and BsiWI sites, rice MET1-1 intron to supply intron-mediated enhancement in monocot cells, GUS reporter gene, and NOS terminator. The resulting binary vector comprising the p-MAWS42::i-MET1::GUS::t-NOS expression cassette was named as RTP1052, and was used to evaluate the expression pattern driven by the p-MAWS42 promoter. FIG. 20 is a diagram of RTP1052. Sequence of the binary vector RTP1052 is shown in FIGS. 21A, 21B, 21C, 21D, 21E, 21F, and 21G, combined.

(85) The 1492 bp promoter fragment of p-MAWS45 was amplified by PCR, incorporating a SwaI restriction enzyme site at its 5′ end and a BsiWI site at its 3′end. The resulting fragment was digested and ligated into a SwaI and BsiWI digested BPS basic binary vector CB1006. Plasmid CB1006 is a plant transformation vector that comprises a plant selectable marker expression cassette (p-Ubi::c-ZmAHASL2::t-NOS) as well as a promoter evaluation cassette that consists of a multiple cloning site for insertion of putative promoters via SwaI and BsiWI sites, rice MET1-1 intron to supply intron-mediated enhancement in monocot cells, GUS reporter gene, and NOS terminator. The resulting binary vector comprising the p-MAWS45::i-MET1::GUS::t-NOS expression cassette was named as RTP1057, and was used to evaluate the expression pattern driven by the p-MAWS45 promoter. FIG. 22 is a diagram of RTP1052. Sequence of the binary vector RTP1057 is shown in FIGS. 23A, 23B, 23C, 23D, 23E, 23F, and 23G combined.

(86) Promoter Evaluation in Transgenic Maize with RTP1052 or RTP1057

(87) The expression patterns and levels driven by promoters p-MAWS42 or p-MAWS45 were measured using GUS histochemical analysis following the protocol in the art (Jefferson 1987). Maize transformation was conducted using an Agrobacterium-mediated transformation system. Ten and five single copy events for T0 and T1 plants were chosen for the promoter analysis. GUS expression was measured at various developmental stages:

(88) 1) Roots and leaves at 5-leaf stage

(89) 2) Stem at V-7 stage

(90) 2) Leaves, husk and silk at flowering stage (first emergence of silk)

(91) 3) Spikelets/Tassel (at pollination)

(92) 5) Ear or Kernels at 5, 10, 15, 20, and 25 days after pollination (DAP)

(93) The results indicated that both promoter p-MAWS42 of RTP1052 and promoter p-MAWS45 of RTP1057 expressed specifically in pollen and in whole seeds (FIGS. 24 and 25).

(94) TABLE-US-00013 TABLE 11 Summary of tested tissues and relative expression intensities for pMAWS42 Tissues Stages tested Leaf Root Stem husk silk Spikelets/Tassel/pollen un-pollinated cob pollinated cob embryo endosperm seedling (5-leaf) − − V-7 Flowering (emergence of silk) − − − − pollination + 5 DAP + 10 DAP ++ + 15 DAP +++ ++ 20 DAP +++ ++ 25 DAP +++ ++ 48 hrs after imibibition ++++ +++ 72 hrs after imibibition ++++ +++ 1 week germination − − − = no expression, + = weak expression, ++ = medium expression, +++ = strong expression, ++++ = very strong expression

(95) TABLE-US-00014 TABLE 12 Summary of tested tissues and relative expression intensities for pMAWS45 Tissues Stages tested Leaf Root Stem husk silk Spikelets/Tassel/pollen un-pollinated cob pollinated cob embryo endosperm seedling (5-leaf) − − V-7 Flowering (emergence of silk) − − − − pollination + 5 DAP + 10 DAP + ++++ 15 DAP ++ ++++ 20 DAP ++ ++++ 25 DAP ++ ++++ 48 hrs after imibibition ++ +++ 72 hrs after imibibition ++ +++ 1 week germination − − − = no expression, + = weak expression, ++ = medium expression, +++ = strong expression, ++++ = very strong expression

Example 3

(96) The sequence of the pKG86 promoter (SEQ ID NO: 1) was searched for short open reading frames which may confer allergenicity or toxicity using a database comprising allergenic and toxic peptides and polypeptides. Short open reading frames were identified showing homology to peptides or polypeptides comprised by said database. In order to avoid expression of peptides which may be toxic or allergenic, the sequence of pKG86 was modified. The resulting promoters pKG86_12A (SEQ ID NO: 129), pKG86_14A (SEQ ID NO: 130) and pKG86_15A (SEQ ID NO:131) were operably linked to a reporter gene and transformed into Zea mays for expression analysis.

Whole seed specific promoter

Assignee

Inventors

Cpc classification

Classification Explorer

C12N15/8234

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/113

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/82

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/113

CHEMISTRY; METALLURGY

Abstract

Claims

Description