CRISPR-Cas complex

11760985 · 2023-09-19

Assignee

Inventors

US classification

  • 435/197

Cpc classification

International classification

Abstract

The invention related to the field of genetic engineering tools, methods and techniques for gene or genome editing. Specifically, the invention concerns isolated polypeptides having nuclease activity, host cells and expression vectors comprising nucleic acids encoding said polypeptides as well as methods of cleaving and editing target nucleic acids in a sequence-specific matter. The poly peptides, nucleic acids, expression vectors, host cells and methods of the present invention have application in many fields of biotechnology, including, for example, synthetic biology and gene therapy.

Claims

1. A CRISPR-Cas complex comprising: (a) a polypeptide comprising SEQ ID NO: 1 or a variant thereof which has at least 98% identity to the polypeptide of SEQ ID NO: 1, wherein the polypeptide comprises a RuvC-like domain and does not comprise an HNH domain and has nuclease activity; and (b) an engineered guide RNA comprising a sequence substantially complementary to a target nucleic acid sequence.

2. The complex of claim 1, wherein an additional protein domain is fused to the N- or C-terminus of the polypeptide.

3. The complex of claim 2, wherein the additional protein domain has nucleic acid or chromatin modifying, transcription activating or transcription repressing activity.

4. The complex of claim 1, wherein the polypeptide comprises a zinc finger-domain having a metal-binding site.

5. The complex of claim 1, wherein the polypeptide comprises an amino acid sequence motif comprising residues 783-794, 784-794, 785-794, 786-794, 787-794, 788-794, 789-794, 783-793, 783-792, 783-791, 783-790, 783-789, 783-788, 784-793, 785-792, or 786-790 of SEQ ID NO: 1.

Description

DETAILED DESCRIPTION

(1) The protein of amino acid sequence SEQ ID NO: 1 is a large protein (about 1300 amino acids) that contains an RuvC-like nuclease domain homologous to the respective domains of Cas9 and transposable element ORF-B, along with an arginine-rich region similar to that in Cas9 and a Zinc Finger (absent in Cas9 but shared with ORF-8), but lacks the HNH nuclease domain that is present in all Cas9 proteins.

(2) The invention will now be described in detail with reference to the examples and to the drawings in which:

(3) FIG. 1 shows the domain structure of the novel CRISPR-Cas nuclease, Cpfl. Three RuvC nuclease domains, a Zinc-finger and an arginine-rich domain that allows for interaction with RNA guide and DNA target are shown.

(4) FIG. 2 shows the results of an in silico analysis of conserved Protospacer Adjacent Motif (PAM). Panel A shows a Weblogo based on 5′ flanks of protospacers depicted in Table 1. Panel 8 shows a Weblogo based on 3′ flanks of protospacers depicted in Table 1.

(5) FIGS. 3A-3C show the results of a multiple alignment of the Cpfl protein family (SEQ ID NOS 32, 33, 36, 34 and 35, respectively, in order of appearance). Each sequence is labelled with GenBank Identifier (GI) number and systematic name of an organism. Predicted secondary structure (SS) is shown by shading. Active site residues of RuvC-like domain(s) are shown as bold and double underlined. Potential bridge helix is shown by shading and with single underline. The amino acid sequence FQIYN (SEQ ID NO: 2) is also indicated in bold, by shading and dotted underline.

EXAMPLE 1—NOVEL NUCLEASES FOR GENE EDITING

(6) Specific examples are (1) CRISPR-associated Cpfl from the marine bacterium Francisella novicida (Fn-Cpfl), and (2) CRISPR-associated Cpfl from the archaean Methanomethylophylus alvus strain Mx1201 (Mai-Cpfl) that resides in the human gut.

(7) Without the inventors wishing to be bound by any particular theory, Cpfl recognises the crRNA in a sequence-specific manner, after which cleavage occurs of the double stranded RNA segment, and eventually formation of an effector complex consisting of Cpfl and a single crRNA guide. Cpfl may operate as a dimer, with the RuvC-like domains of each of the two subunits cleaving individual DNA strands. Alternatively, Cpfl may contain more than one nuclease domain which permits cleavage of both DNA strands. Alternatively, one or more RuvC domains of Cpfl may exhibit unusual flexibility that allows for cleavage of both strands.

(8) The following examples were performed in parallel for the bacterial Fno-Cpfl and archaeal Mai-Cpfl protein variants:

(9) Cloning is carried out of the entire CRISPR locus, including cas operon (cpfl-cas4-cas1-cas2), leader region, CRISPR array, and flanking regions (approximately 10 kb) in low-copy vector (e.g. pACYC184) in an E. coli K12 strain; no details are known about the maturation of the guide, which may be similar to that of Cas9 (tracrRNA/RNaseIII), or may be similar to that of Cascade (Cas6-like ribonuclease, although that is not part of cpfl operons), or may be unique. Further detailed materials and methods are provided in Sapranauskas et al., 2011, Nucleic Acids Res. 39:9275-9282.

(10) Standard procedures were used to optimize chances for functional protein production of the selected Cpfl proteins in E. coli: (i) by performing codon harmonization design to adjust cpfl nucleotide sequences (see Angov et al., 2008, PLoS One 3, e2189); (ii) by including N-terminal or C-terminal strepII tag, that will allow for affinity purification; (iii) by cloning synthetic gene in T7 expression vector (e.g. pET24d) and transform plasmid to non-production strain of E. coli (e.g. JM109, lacking T7 RNA polymerase gene), (iv) transferring plasmid via second transformation to production strain of E. coli (e.g., BL21(DE3), containing T7 RNA polymerase gene under control of rhamnose promoter, that allows for accurate tuning of expression, (v) varying expression conditions (medium, inducer concentration, induction time), (vi) using optimal conditions for liter-scale cultivation, after which cells are harvested and mechanically disrupted to obtain cell-free extract (small volumes by sonication; large volumes by French Press), (vii) separating membrane and soluble fractions, and perform affinity purification using streptactin resin, (viii) testing relevant fractions by SDS-PAGE, and storing the pure protein for subsequent analyses.

(11) As well as the above, additionally, the predicted crRNA gene is sequenced, or a single-guide RNA (sgRNA) gene is made, e.g. by adding 4 nucleotide synthetic loops (Jinek et al., 2012, Science 337: 816-821); RNA genes residing either on the same plasmid as cpfl gene, or on a separate plasmid.

(12) Additionally, a catalytically inactive Cpfl mutant is made (RuvC active site contains conserved glutamate (E) as well as GID motif).

(13) Additionally, a catalytically inactive Cpfl mutant is made (RuvC active site contains conserved glutamate (E) as well as SID motif).

(14) Also, N-terminal or C-terminal fusions are made of the Cpfl mutant with Fokl nuclease domain with differently connecting linkers (as described for Cas9; see Guilinger et al., 2014, Nat. Biotechnol. 32: 577-82).

EXAMPLE 2—BIOCHEMICAL CHARACTERIZATION OF CPFL NUCLEASES

(15) These experiments characterize guide surveillance and target cleavage. The CRISPR system is an adaptive immunity system in bacteria and archaea. The CRISPR arrays consist of identical repeats (e.g. 30 bp) and variable spacers (e.g. 35 bp). The adaptive nature of the CRISPR system relies on regular acquisition of new spacers, often corresponding to fragments (protospacers) derived from viruses. Acquisition generally depends on the selection of a protospacer based on the presence of a protospacer adjacent motif (PAM). The presence of this motif is crucial for the eventual interference by the CRISPR-associated effector complex (e.g. Cas9) with its crRNA guide. The PAM motif allows for self versus non-self discrimination: the potential target sequences (i.e. complementary to the crRNA guide sequence) reside both on the host's genome (the self CRISPR array) as well as on the invader's genome (the non-self protospacer); the presence of the protospacer in the invader DNA triggers the effector complex to bind it in a step-wise manner; when perfect base pairing occurs between the sequence of the protospacer immediately adjacent to the PAM (the so-called seed sequence), then base pairing as a zipper, eventually leading to a state of Cas9 to catalyse cleavage of the target DNA strands (see Jinek et al., 2012, Science 337: 816-821; also Gasiunas et al., 2012, PNAS 109: E2579-E2586).

(16) In silico analysis of the Cpfl-associated PAM by BLAST analysis of the CRISPR spacers of the cpfl-loci. BLAST analysis of some spacers shows several homologous sequences (90-100% identity), (Table 1). The most promising hits concern identical sequences of virus genes in general, and genes of prophages in particular. Prophages are derived from lysogenic viruses, the genomes of which have integrated in the genome of bacteria. As is the case with eukaryotic viruses, the host range of prokaryotic viruses is often rather limited; hence, when the matching prophage is found in a bacterium that is closely related to the bacterium that has the corresponding spacer sequence in its CRISPR array, this gives some confidence that it is a real hit. In other words, it may well be that the prophage resembles a virus that has attempted to infected the CRISPR-containing bacterium, but the invasion has resulted in spacer acquisition and virus immunity of the latter bacterium.

(17) TABLE-US-00001 TABLE 1 BLAST results with FnU112 cpf1-associated CRISPR spacers as query sequences. The nucleotide sequence of both spacer (top) and protospacer are shaded; the 5′ and 3′ flanks of the protospacers are unshaded; Tool: CRISPR Target (bioanalysis.otago.ac.nz/CRISPRTarget/). Query: Entire CRISPR array from Francisella novicida sub species. Target database: Genbank-NT. Gap open -10, Extend -2; Nucleotide match 1, mismatch -1; E-value 1; Word size 7; Cutoff score 20; 3′ end flanking protospacer 8 bp; 5′ end flanking protospacer 8 bp. Host of prophage, Fn sub target gene Alignment of Fn sub species species accession spacer with protospacer (plus SEQ ID Spacer# number 8 nt flanks on both sides) NO: Francisella Francisella spacer 5′ AGATTAAAAGGTAATTCTATCTTGTTGAG 21 novicida novicida protospacer ||||||||||||||||||||o|||||||| U112 #1 3523, hypo 5′ ATAATTTAAGATTAAAAGGTAATTCTATTTTG 22 prot TTGAGATCTGAGC AEE26301.1 Francisella Francisella Spacer 5′ TAGCGATTTATGAAGGTCATTTTTTT 23 novicida novicida protospacer |||||||||||||||||||| U112 #2 3523, 5′ CTAAATTATAGCGATTTATGAAGGTCATTTTT 24 intergenic TTAAAAAGTT sequence in prophage Francisella Francisella Spacer 5′ ATGGATTATTACTTAACTGGAGTGTTTAC 25 novicida novicida protospacer ||||||||||||||||||||o|||||||o||| Fx1 #1 3523, hypo 5′ AATGTTCAATGGATTATTACTTAATTGGAGTG 26 prot TCTACGTCGATGG AEE26295.1, “phage major tail tube protein” Francisella Francisella Spacer 5′ GCCACAAATACTACAAAAAATAACTTAA 27 novicida novicida protospacer ||oo|||||||||||||||||| FTG #1 3523, hypo 5′ATTTTTTGGCTCCAAATACTACAAAAAATAAC 28 prot TTAAACTTTGAA YP_ 005824059.1 Francisella Francisella Spacer 5′ ATTGTCAAAACATAAGCAGCTGCTTCAAATAT 29 novicida novicida3523, protospacer |o|||o|oo||||||||||o||||||||||| GA99- hypoprot 5′GGTCTTTTACTGTTATTACATAAGCAGCCGCT 30 3549 #1 FN3523_1009, TCAAATATCTTAGCAA “baseplate_J”

(18) Analysis of the sequences flanking the protospacers in the prophage genes resulted in a T-rich conserved motif; interestingly, this motif does not reside downstream the protospacer (as in the Cas9 system), but rather upstream. Though not wishing to be bound by particular theory, the inventors find that Cpfl of the invention requires a PAM-like motif (3-4 nucleotides) for binding a target DNA molecule that is complementary to the guide, has a seed sequence (8-10 nucleotides) in which no mismatches are allowed, and has a single nuclease site that allows for nicking of the base paired target DNA strand.

(19) PAM motifs of Cpfl and variants of the invention were also characterized using the approach of Jiang et al., 2013, Nat. Biotechnol. 31: 233-239). Two derivatives of E. coli BL21(DE3) were used, initially transformed either with a target-plasmid or with a non-target plasmid; two variant target plasmids used have a similar part (GFP marker, KmR marker, origin of replication) and a variable part with target sequence (protospacer) with an associated degenerate PAM (5-8 variable nucleotides) either upstream or downstream of the protospacer); next, this strain was transformed with a Cpfl-expression plasmid (includes design-CRISPR with single-guide RNA (sgRNA, CmR-marker); screening for transformants was on plates with chloramphenicol (Cm) (not kanamycin (Km)), and screening for non-fluorescent colonies, indicating loss-of-target-plasmid. As the plasmids with the correct PAMs will be lost, DNA Deep Seq was performed of appropriate PCR products of the entire pool of target plasmid, before and after transformation. The differences reveal the PAM (Bikard et al., 2013, Nucleic Acids Res. 41: 7429-7437).

(20) PAM signatures were confirmed by in vitro characterization of cleavage activity of BsCas9/sgRNA; assays reveal optimal conditions (temperature, buffer/pH, salt, metals).

(21) Presence of a seed sequence in the PAM was established according to methods described by Jinek et al., 2012, Science 337: 816-821.

EXAMPLE 3—BACTERIAL ENGINEERING

(22) Performing of high-throughput engineering of bacterial genome with nuclease variants. Without wishing to be bound by particular theory, the inventors expect that Cpfl/guide complexes of the invention allow for specific targeting of genomic DNA. Multiplex targeting can be established by using a design CRISPR together with a matching crRNA.

(23) The experiments provide application of Cpfl and variants of the invention. Cas9 is tested in parallel as a reference.

(24) Gene knock-in/knock-out (insertion/disruption of any sequence) is performed. The host strain E. coli K12 (LacZ+, GFP−) was engineered as follows: the gene encoding a variant of the Green Fluorescent Protein (GFPuv) is inserted in the lacZ gene, resulting in a clear phenotype (LacZ−, GFP+). The cpfl gene was introduced on a plasmid (or derivatives of those plasmids), together with a fragment that allows for homologous recombination of the target sequence. A target (protospacer) sequence was selected, with an appropriate adjacently located PAM sequence; a corresponding guide designed, consisting of the crRNA (with spacer complementary to target protospacer) and the crRNA gene (as adapted from the method described for Cas9 by Jiang et al. (2013a) RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol. 31: 233-239).

(25) Gene expression silencing (using catalytically inactivated Cas9, was as described: dCas9 derivative of Spy-Cas9; (Bikard et al., 2013, Nucleic Acids Res. 41: 7429-7437; Qi et al., 2013, Cell 152: 1173-1183);) by binding at promoter (RNA polymerase binding site) of target gene, or of target genes using a multiplex approach (using a design CRISPR).

(26) Gene expression activation; as above (silencing); binding upstream binding site of RNA polymerase, with Cas9 fused to activation domain (as has been described for Spy-Cas9) (Bikard et al., 2013, Nucleic Acids Res. 41: 7429-7437).

(27) Fusion of inactivated Cpfl and the Fokl nuclease domain (described in Example 1 were compared with an active Cpfl in different experimental set-ups. This required two simultaneous interactions of guides and targets, that results in a major improvement of cleavage at the desired site.

EXAMPLE 4—HUMAN STEM CELL ENGINEERING

(28) Targeted editing of disease-causing genetic mutations would be an elegant and effective treatment for genetic disease. Recently discovered gene editing systems such as Cas9, allow the specific targeting of disease-causing mutations in the genome, and can be used to functionally repair or permanently disable mutated genes. The efficiency of gene editing systems has been demonstrated in a laboratory setting, and are now routinely used in genome editing of a wide variety of cell types from many different species, including human. However, despite the success of these systems in the research setting, clinical application of gene editing systems is hampered by the lack of a suitable delivery system to introduce gene-editing technologies into patient cells in a safe, transient and efficient manner. Several labs are working on the development of recombinant viral vectors which can be used to deliver gene editing systems into patient cells, but prolonged expression of for example CRISPR/Cas9 from such vectors will increase the likelihood of off-target effects and is therefore not ideal. Intracellular delivery of recombinant gene editing protein and synthetic CRISPR RNA would be an effective, non-integrating and transient method for the application of gene editing technology in patient cells.

(29) Recently a novel method has been developed that allows the transduction of native proteins into virtually any cell type (D'Astolfo et al., 2015, Cell, 161: 674-690). This technology, termed iTOP, for induced Transduction by Osmocytosis and Propanebetaine, is based on a combination of small molecule compounds, which trigger the uptake and intracellular release of native protein. iTOP is highly efficient, routinely achieving transduction efficiencies of >90% of cells, and works on a wide variety of primary cell types. It has been demonstrated that iTOP-mediated transduction of recombinant Cas9 protein and in vitro transcribed sgRNA allows for highly efficient gene editing in difficult-to-transfect cell types including human stem cells. Upon iTOP-CRISPR/Cas9 transduction, >70% bi-allelic gene targeting has been reported in human ES cells without the need for drug-selection of transduced cells.

(30) Key advantages of iTOP over existing technologies are: (i) the ability to transduce primary (stem) cells with native protein at very high efficiency, (ii) the non-integrating, transient nature of protein mediated gene editing, ensuring safety and minimizing off-target effects, and (iii) the tight control of dosage and timing of the delivered protein. We have demonstrated that iTOP-CRISPR/Cas9 is an effective tool to modify a large variety of primary (patient) cell types. However, due to size and protein solubility issues, production of recombinant Cas9 is hampering broad-scale (clinical) adoption of this system. Cpfl could solve these problems and pave the way for the development of novel therapies to treat genetic disease.

(31) The iTOP technology will be used to allow efficient intracellular delivery of Cpfl into human stem cells. The advantage of iTOP is its highly flexible approach. First, NaCl-mediated hypertonicity induces intracellular uptake op protein via a process called macropinocytosis (D'Astolfo op. cit.)). Second, a propanebetaine transduction compound (NDSB-201 or gamma-aminobutyric acid (GABA) or others triggers the intracellular release of protein from the macropinosome vesicles. In addition to these compounds, osmoprotectants such as glycerol and glycine are added to help cells to cope with the NaCl-induced hypertonic stress. By varying the concentration of NaCl, the concentration and type of transduction compound and/or the concentration and type of osmoprotectants, the iTOP system can be adapted and optimised to meet the specific requirements of the cargo protein and/or the target cells. iTOP parameters were optimized to allow efficient gene editing of human embryonic stem cells (hESCs), targeting the endogenous WDR85 gene by Cpfl (equipped with an N- or C-terminal nuclear localization signal (NLS)), as recently shown for Cas9.

(32) In the following sequence listing, the amino acid residues Glu Xaa Asp (single underlined) are the GID motif of an RuvC domain. Therefore in the SEQ ID NO: 1, the Xaa residue may be I.

(33) The amino acid residues lie Asp Arg Gly Glu Arg (double underlined) include the IDR residues of an RuvC domain.

(34) The amino acid residues Phe Glu Asp (triple underlined) include the E residue making up part of the active site residues of an RuvC domain.

EXAMPLE 5 MULTIPLE ALIGNMENT OF CPFL PROTEINS

(35) FIGS. 3A-3C show the results of an Multiple alignment of Cpfl proteins. The alignment was built using MUSCLE program and modified manually on the basis of local PSI-BLAST pairwise alignments and HHpred output. Each sequence is labelled with GenBank Identifier (GI) number and systematic name of an organism. Five sequences analysis in this work are marked by the respective numbers. Secondary structure (SS) was predicted by Jpred and is shown is shown by shading. CONSENSUS was calculated for each alignment column by scaling the sum-of-pairs score within the column between those of a homogeneous column (the same residue in all aligned sequences) and a random column with homogeneity cutoff 0.8. Active site residues of RuvC-like domain(s) are shown as bold and double underlined. Potential bridge helix is shown by shading and with single underline. The amino acid sequence FQIYN (SEQ ID NO: 2) is also indicated in bold, by shading and dotted underline.

(36) TABLE-US-00002 SEQUENCE LISTING <110> wageningen universiteit <120> cpf1 Nuclease <130> RAW/P223284GB <160> 1 <170> Patentin version 3.5 <210> 1 <211> 1304 <212> PRT <213> Artificial sequence <220> <223> Cpf1 <220> <221> misc_feature <222> (439)..(439) <223> Xaa can be any naturally occurring amino acid <220> <221> misc_feature <222> (504)..(504) <223> Xaa can be any naturally occurring amino acid <220> <221> misc_feature <222> (521)..(521) <223> Xaa can be any naturally occurring amino acid <220> <221> misc_feature <222> (539)..(539) <223> Xaa can be any naturally occurring amino acid <220> <221> misc_feature <222> (800)..(800) <223> Xaa can be any naturally occurring amino acid <400> 1 Met ser Ile Tyr Gln Glu Phe Val Asn Lys Tyr Ser Leu Ser Lys Thr 1               5                   10                  15 Leu Arg Phe Glu Leu Ile Pro Gln Gly Lys Thr Leu Glu Asn Ile Lys             20                  25                  30 Ala Arg Gly Leu Ile Leu Asp Asp Glu Lys Arg Ala Lys Asp Tyr Lys         35                  40                  45 Lys Ala Lys Gln Ile Ile Asp Lys Tyr His Gln Phe Phe Ile Glu Glu     50                  55                  60 Ile Leu Ser Ser Val Cys Ile Ser Glu Asp Leu Leu Gln Asn Tyr Ser 65                  70                  75                  80 Asp Val Tyr Phe Lys Leu Lys Lys Ser Asp Asp Asp Asn Leu Gln Lys                 85                  90                  95 Asp Phe Lys Ser Ala Lys Asp Thr Ile Lys Lys Gln Ile Ser Glu Tyr             100                 105                 110 Ile Lys Asp Ser Glu Lys Phe Lys Asn Leu Phe Asn Gln Asn Leu Ile         115                 120                 125 Asp Ala Lys Lys Gly Gln Glu Ser Asp Leu Ile Leu Trp Leu Lys Gln     130                 135                 140 Ser Lys Asp Asn Gly Ile Glu Leu Phe Lys Ala Asn Ser Asp Ile Thr 145                 150                 155                 160 Asp Ile Asp Glu Ala Leu Glu Ile Ile Lys Ser Phe Lys Gly Trp Thr                 165                 170                 175 Thr Tyr Phe Lys Gly Phe His Glu Asn Arg Lys Asn val Tyr ser ser             180                 185                 190 Asp Asp Ile Pro Thr ser Ile Ile Tyr Arg Ile val Asp Asp Asn Leu         195                 200                 205 Pro Lys Phe Leu Glu Asn Lys Ala Lys Tyr Glu Ser Leu Lys Asp Lys     210                 215                 220 Ala Pro Glu Ala Ile Asn Tyr Glu Gln Ile Lys Lys Asp Leu Ala Glu 225                 230                 235                 240 Glu Leu Thr Phe Asp Ile Asp Tyr Lys Thr ser Glu Val Asn Gln Arg                 245                 250                 255 Val Phe Ser Leu Asp Glu Val Phe Glu Ile Ala Asn Phe Asn Asn Tyr             260                 265                 270      Leu Asn Gln ser Gly Ile Thr Lys Phe Asn Thr Ile Ile Gly Gly Lys         275                 280                 285 Phe val Asn Gly Glu Asn Thr Lys Arg Lys Gly Ile Asn Glu Tyr Ile     290                 295                 300 Asn Leu Tyr Ser Gln Gln Ile Asn Asp Lys Thr Leu Lys Lys Tyr Lys 305                 310                 315                 320 Met Ser Val Leu Phe Lys Gln Ile Leu Ser Asp Thr Glu Ser Lys Ser                 325                 330                 335 Phe val Ile Asp Lys Leu Glu Asp Asp Ser Asp Val Val Thr Thr Met             340                 345                 350 Gln ser Phe Tyr Glu Gln Ile Ala Ala Phe Lys Thr Val Glu Glu Lys         355                 360                 365 Ser Ile Lys Glu Thr Leu Ser Leu Leu Phe Asp Asp Leu Lys Ala Gln     370                 375                 380 Lys Leu Asp Leu Ser Lys Ile Tyr Phe Lys Asn Asp Lys Ser Leu Thr 385                 390                 395                 400 Asp Leu ser Gln Gln val Phe Asp Asp Tyr ser val Ile Gly Thr Ala                 405                 410                 415 Val Leu Glu Tyr Ile Thr Gln Gln Val Ala Pro Lys Asn Leu Asp Asn             420                 425                 430 Pro ser Lys Lys Glu Gln Xaa Leu Ile Ala Lys Lys Thr Glu Lys Ala         435                 440                 445 Lys Tyr Leu Ser Leu Glu Thr Ile Lys Leu Ala Leu Glu Glu Phe Asn     450                 455                 460 Lys His Arg Asp Ile Asp Lys Gln cys Arg Phe Glu Glu Ile Leu Ala 465                 470                 475                 480 Asn Phe Ala Ala Ile Pro Met Ile Phe Asp Glu Ile Ala Gln Asn Lys                 485                 490                 495 Asp Asn Leu Ala Gln Ile Ser Xaa Lys Tyr Gln Asn Gln Gly Lys Lys             500                 505                 510 Asp Leu Leu Gln Ala ser Ala Glu xaa Asp val Lys Ala Ile Lys Asp         515                 520                 525 Leu Leu Asp Gln Thr Asn Asn Leu Leu His Xaa Leu Lys Ile Phe His     530                 535                 540 Ile ser Gln Ser Glu Asp Lys Ala Asn Ile Leu Asp Lys Asp Glu His 545                 550                 555                 560 Phe Tyr Leu Val Phe Glu Glu Cys Tyr Phe Glu Leu Ala Asn Ile Val                 565                 570                 575 Pro Leu Tyr Asn Lys Ile Arg Asn Tyr Ile Thr Gln Lys Pro Tyr Ser             580                 585                 590 Asp Glu Lys Phe Lys Leu Asn Phe Glu Asn Ser Thr Leu Ala Asn Gly         595                 600                 605 Trp Asp Lys Asn Lys Glu Pro Asp Asn Thr Ala Ile Leu Phe Ile Lys     610                 615                 620 Asp Asp Lys Tyr Tyr Leu Gly val Met Asn Lys Lys Asn Asn Lys Ile 625                 630                 635                 640 Phe Asp Asp Lys Ala Ile Lys Glu ASn Lys Gly Glu Gly Tyr Lys Lys                 645                 650                 655 Ile val Tyr Lys Leu Leu Pro Gly Ala Asn Lys Met Leu Pro Lys val             660                 665                 670 Phe Phe Ser Ala Lys Ser Ile Lys Phe Tyr Asn Pro Ser Glu Asp Ile         675                 680                 685 Leu Arg Ile Arg Asn His Ser Thr His Thr Lys Asn Gly Asn Pro Gln     690                 695                 700 Lys Gly Tyr Glu Lys Phe Glu Phe Asn Ile Glu Asp cys Arg Lys Phe 705                 710                 715                 720 Ile Asp Phe Tyr Lys Glu Ser Ile Ser Lys His Pro Glu Trp Lys Asp                 725                 730                 735 Phe Gly Phe Arg Phe Ser Asp Thr Gln Arg Tyr Asn Ser Ile Asp Glu             740                 745                 750 Phe Tyr Arg Glu val Glu Asn Gln Gly Tyr Lys Leu Thr Phe Glu Asn         755                 760                 765 Ile Ser Glu Ser Tyr Ile Asp Ser Val Val Asn Gln Gly Lys Leu Tyr     770                 775                 780 Leu Phe Gln Ile Tyr Asn Lys Asp Phe Ser Ala Tyr Ser Lys Gly Xaa 785                 790                 795                 800 Pro Asn Leu His Thr Leu Tyr Trp Lys Ala Leu Phe Asp Glu Arg Asn                 805                 810                 815 Leu Gln Asp Val Val Tyr Lys Leu Asn Gly Glu Ala Glu Leu Phe Tyr             820                 825                 830 Arg Lys Gln Ser Ile Pro Lys Lys Ile Thr His Pro Ala Lys Glu Ala         835                 840                 845 Ile Ala Asn Lys Asn Lys Asp Asn Pro Lys Lys Glu Ser Val Phe Glu     850                 855                 860 Tyr Asp Leu Ile Lys Asp Lys Arg Phe Thr Glu Asp Lys Phe Phe Phe 865                 870                 875                 880 His Cys Pro Ile Thr Ile Asn Phe Lys Ser Ser Gly Ala Asn Lys Phe                 885                 890                 895 Asn Asp Glu Ile Asn Leu Leu Leu Lys Glu Lys Ala Asn Asp Val His             900                 905                 910 Ile Leu Ser Ile Asp Arg Gly Glu Arg His Leu Ala Tyr Tyr Thr Leu         915                 920                 925 Val Asp Gly Lys Gly Asn Ile Ile Lys Gln Asp Thr Phe Asn Ile Ile     930                 935                 940 Gly Asn Asp Arg Met Lys Thr Asn Tyr His Asp Lys Leu Ala Ala Ile 945                 950                 955                 960 Glu Lys Asp Arg Asp Ser Ala Arg Lys Asp Trp Lys Lys Ile Asn Asn                 965                 970                 975 Ile Lys Glu Met Lys Glu Gly Tyr Leu Ser Gln Val Val His Glu Ile             980                 985                 990 Ala Lys Leu val Ile Glu Tyr Asn Ala Ile Val Val Phe Glu Asp Leu         995                 1000                1005 Asn Phe Gly Phe Lys Arg Gly Arg Phe Lys Val Glu Lys Gln Val     1010                1015                1020 Tyr Gln Lys Leu Glu Lys Met Leu Ile Glu Lys Leu Asn Tyr Leu     1025                1030                1035 Val Phe Lys Asp Asn Glu Phe Asp Lys Thr Gly Gly Val Leu Arg     1040                1045                1050 Ala Tyr Gln Leu Thr Ala Pro Phe Glu Thr Phe Lys Lys Met Gly     1055                1060                1065 Lys Gln Thr Gly Ile Ile Tyr Tyr Val Pro Ala Gly Phe Thr Ser     1070                    1075            1080 Lys Ile Cys Pro Val Thr Gly Phe Val Asn Gln Leu Tyr Pro Lys     1085                1090                1095 Tyr Glu Ser Val Ser Lys Ser Gln Glu Phe Phe Ser Lys Phe Asp     1100                1105                1110 Lys Ile Cys Tyr Asn Leu Asp Lys Gly Tyr Phe Glu Phe Ser Phe     1115                1120                1125 Asp Tyr Lys Asn Phe Gly Asp Lys Ala Ala Lys Gly Lys Trp Thr     1130                1135                1140 Ile Ala Ser Phe Gly Ser Arg Leu Ile Asn Phe Arg Asn Ser Asp     1145                1150                1155 Lys Asn His Asn Trp Asp Thr Arg Glu Val Tyr Pro Thr Lys Glu     1160                1165                1170 Leu Glu Lys Leu Leu Lys Asp Tyr Ser Ile Glu Tyr Gly His Gly     1175                1180                1185 Glu Cys Ile Lys Ala Ala Ile Cys Gly Glu Ser Asp Lys Lys Phe     1190                1195                1200 Phe Ala Lys Leu Thr Ser Val Leu Asn Thr Ile Leu Gln Met Arg     1205                1210                1215 Asn Ser Lys Thr Gly Thr Glu Leu Asp Tyr Leu Ile Ser Pro Val     1220                1225                1230 Ala Asp Val Asn Gly Asn Phe Phe Asp Ser Arg Gln Ala Pro Lys     1235                1240                1245 Asn Met Pro Gln Asp Ala Asp Ala Asn Gly Ala Tyr His Ile Gly     1250                1255                1260 Leu Lys Gly Leu Met Leu Leu Asp Arg Ile Lys Asn Asn Gln Glu     1265                1270                1275 Gly Lys Lys Leu Asn Leu Val Ile Lys Asn Glu Glu Tyr Phe Glu     1280                1285                1290 Phe Val Gln Asn Arg Asn Asn Ser ser Lys Ile     1295                1300