ATTP MV4-DERIVED SITE-SPECIFIC RECOMBINATION AND ITS USE FOR INTEGRATION OF SEQUENCE OF INTEREST

20250354175 ยท 2025-11-20

    Inventors

    Cpc classification

    International classification

    Abstract

    The present disclosure relates to a method for preparing a site-specific recombination polynucleotide molecule derived from the attP site of the bacteriophage mv4 and to a kit for such site-specific recombination. The kit can be used to transform procaryote hosts to integrate any polynucleotide sequence of interest.

    Claims

    1. A method for preparing a site-specific recombination polynucleotide molecule comprising the steps of: aselecting a DNA target site in a genome of a bacterial host cell having a sequence of BOB wherein: B is 5-X1-X1-X2-X3-X3-X3-X4-3 wherein at most 1 of the nucleic acids of B may be N; O is 5-NNNNNNN-3 and B is 5-X1-X5-X5-X5-X6-X7-X2-3 wherein at most 1 of the nucleic acids of B may be N; wherein X1 to X7 and N have independently the following definitions: X1 is A or G or T; X2 is C or G or T; X3 is A or G; X4 is A or T; X5 is C or T; X6 is A or C or G; X7 is A or C or T; and N is A or C or G or T; bproviding the site-specific recombination polynucleotide molecule having a sequence of COC wherein: C is 5-X1-X1-X2-X1-X3-X1-X4-3 wherein at most 1 of the nucleic acids of C may be N; O is 5-NNNNNNN-3; and C is 5-X1-X5-X5-X5-X6-X7-X5-3 wherein at most 1 of the nucleic acids of C may be N; and wherein X1, X2, X3, X4, X5, X6, X7 and N are as defined previously; and wherein O of COC is identical to O of BOB of the bacterial host cell.

    2. The method for preparing a site-specific recombination polynucleotide molecule according to claim 1, wherein B is 5-X4-X5-X5-X5-X6-X7-X2-3 and wherein at most 1 of the nucleic acids of B may be N; C is 5-X1-X1-X8-X1-X3-X1-X4-3 wherein at most 1 of the nucleic acids of C may be N; and C is 5-X4-X5-X5-X5-X9-X7-X5-3 wherein at most 1 of the nucleic acids of C may be N; and wherein X8 is T or G and X9 is A or C.

    3. A kit for site-specific recombination of at least one polynucleotide sequence of interest into a genome of a bacterial host cell comprising: Aa polynucleotide molecule A comprising: (i) a sequence of between 220 to 250 pb comprising polynucleotide fragments P1-P2, COC and P 1-P2 wherein: TABLE-US-00014 P1-P2is (SEQIDNo2) 5-ATCAACTAGATTTTTAACTAGAA-3; COC is the site-specific recombination polynucleotide molecule according to claim 1 or 2; and TABLE-US-00015 P1-P2is (SEQIDNo3) 5-TTTAACTAGAAAATAACTAGAA-3; the sequence interacting with the DNA target site according to claim 1 or 2 in the bacterial host cell for integrating the polynucleotide sequence of interest; and (ii) at least one polynucleotide sequence of interest; Ba polynucleotide molecule int having at least 80%, preferably at least 85%, 90%, 95% or 100% identity with the sequence of SEQ ID No 4 coding for .sup.mv4Int or the .sup.mv4Int of SEQ ID No 5.

    4. The kit of claim 3, wherein the polynucleotide molecule A is inserted in a first vector.

    5. The kit of claim 4, wherein the polynucleotide molecule int coding for .sup.mv4Int is inserted in the first vector or in a second vector.

    6. A method for integrating a polynucleotide sequence of interest into a genome of a genetically modified bacterial host cell comprising: apreparing a vector comprising a polynucleotide molecule A comprising: (i) a sequence of between 220 to 250 pb comprising the following polynucleotide fragments P1-P2, COC and P1-P2 wherein: TABLE-US-00016 P1-P2is (SEQIDNo2) 5-ATCAACTAGATTTTTAACTAGAA-3; COC is the site-specific recombination polynucleotide molecule according to claim 1 or 2; and TABLE-US-00017 P1-P2is (SEQIDNo3) 5-TTTAACTAGAAAATAACTAGAA-3; (ii) at least one polynucleotide sequence of interest; btransforming the bacterial host cell with the vector obtained at step (a) and the polynucleotide molecule int of SEQ ID No 4 coding for .sup.mv4Int; cmaintaining the transformed host cell under conditions that allow integration of the polynucleotide sequence of interest into the genome of the host cell.

    7. A genetically modified bacterial host cell obtained by the method of claim 6, wherein the genetically modified bacterial host cell comprises a vector comprising a polynucleotide molecule A comprising: (i) a sequence of between 220 to 250 pb comprising the following polynucleotide fragments P1-P2, COC and P1-P2 wherein: TABLE-US-00018 P1-P2is (SEQIDNo2) 5-ATCAACTAGATTTTTAACTAGAA-3; COC is the site-specific recombination polynucleotide molecule according to claim 1 or 2; and TABLE-US-00019 P1-P2is (SEQIDNo3) 5-TTTAACTAGAAAATAACTAGAA-3; (ii) at least one polynucleotide sequence of interest; and the polynucleotide molecule INT of SEQ ID No 4 coding for .sup.mv4Int.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0100] FIG. 1. Redefinition of the .sup.mv4Int sequence. (A) .sup.Int and .sup.mv4Int structure. The .sup.Int structure is from Biswas et al., (Biswas et al., 2005) and .sup.mv4Int structure was modelized on Alphafold (Jumper et al., 2021a) by using the sequence described in 1995 (Dupont et al., 1995b) and the corrected sequence presented in this paper. The .sup.mv4Int-1995 structure possesses an unstructured arm-binding domain and lacks the canonical antiparallel -sheet (red arrows in .sup.Int) that position the lysine (purple residue) into the catalytic domain. The new .sup.mv4Int structure presents a canonical three-stranded n-sheet arm-binding domain, and a canonical antiparallel -sheet (red arrows). (B) Schematic representation of .sup.mv4Int and consequences of sequencing errors on .sup.mv4Int sequence. Deletions (black cross) of a C, a T and three A, the inversion of GT/TG and CG/GC (two-way arrow) are found. Black boxes indicate protein regions different from the published sequence. (C) Alignment of catalytic domains of .sup.Int, Cre, XerC, XerD, .sup.HP1Int, .sup.mv4Int-1995 (the original sequence published in 1995) and .sup.mv4Int-2022 (the resequenced protein). The 7 conserved residues of the YR catalytic domain are indicated in bold and with an asterisk. The lysine residue (K) was manually adjusted based on the alignment performed by Nunes-Dby et al. (Nunes-Duby et al., 1998). Letters in bold grey for .sup.mv4Int-1995 sequence correspond to the amino acid sequence obtained with incorrect DNA sequencing performed in 1995.

    [0101] FIG. 2. Principle of the use of randomized libraries for the characterization of core regions of .sup.mv4Int/attP/attB. (A) Localization of libraries in the attP (green) or attB (orange) core regions. The published minimal attB site (Auvray et al., 1999b) is framed by the black box. The atypical P2 arm-type binding site is indicated by a black arrow and the dark vertical arrows indicate the strand exchange position on top and bottom strands of the 8-bp overlap sequence. The randomized region of each library is represented by N and nucleotides identical to the native sequences are symbolized by dashes. (B) Global strategy for the use of randomized libraries. The chromatograms and the sequence logos show the proportion of each nucleotide at each position. The consensus sequence indicated is arbitrary. A colour is associated with each nucleotide: blue, C; green, A; red, T; dark and yellow, G.

    [0102] FIG. 3. In vitro characterization of the minimal attB size. (A) Sequences used in this study for the in vitro recombination assay. Nucleotides differing from the native sequence are indicated by red lower-case letter. (B) Effect of the size of attB on the recombination reaction. The recombination reaction contains 7.2 pmol of .sup.mv4Int, 40 g of E. coli crude extract heated at 95 C., and the reaction was incubated 1 h30 at 42 C. The size of the attB used is indicated above each lane. The fluorescent attB fragment and the recombination product (I) are indicated on the gel. Lane -, reaction without .sup.mv4Int. (C) Results of attB Lib9attP.sub.WT in vitro recombination. The bases described outside of attB.sub.min are represented in grey. The dark vertical arrows indicate the two cleavage sites surrounding the published 8-bp overlap region (Coddeville et al., 2014b). O indicates the overlap region and N represents the randomized positions. Each nucleotide is associated with a colour on the chromatogram: blue, C; black, G; green, A; red, T. (D) Results of attB Lib6attP.sub.WT in vitro recombination. (E) Results of attB Lib5attP.sub.WT in vitro recombination.

    [0103] FIG. 4. Characterization of the attB and attP overlap regions. (A) Representation of the two expected results of attPattB.sub.WT recombination with a randomized nucleotide at the last position of the overlap region. If the first position is included in the strand exchange region (8-bp overlap), only the nucleotide complementary to attB.sub.WT (i.e. T) will be recovered. If the first position is excluded from the strand exchange region (7-bp overlap), the permissive nucleotides will be observed in one of the two attL or attR hybrid sites. (B) Results of attB.sub.WTattP Lib1 in vitro recombination. The red circle highlights the randomized position before and after recombination. (C) Results of attP.sub.WTattB Lib8 in vitro recombination.

    [0104] FIG. 5. Characterization of the constraints exerted at the overlap region. (A) Results of attB Lib7attP.sub.WT in vitro recombination. The dark vertical arrows indicate the two .sup.mv4Int cleavage sites surrounding the overlap region O. (B) Results of attB.sub.WTattP Lib2 in vitro recombination. (C) Results of attB Lib7 recombination against 3 different attP overlap sequences. (D) Results of attB Lib7attP Lib2 in vitro recombination. (E) Effect of the overlap sequence on recombination activity. The four different pairs of attB/attP sharing the same overlap sequence are represented. Nucleotides differing from the WT sequence are indicated in bold. Lanes WT, attB.sub.WT/attP.sub.WT pair; 1, attB1/attP1 pair; 2, attB2/attP2 pair; 3, attB3/attP3 pair, (-), no .sup.mv4Int.

    [0105] FIG. 6. NGS characterization of the nucleotide constraints surrounding the overlap region in attB and attP sites. (A) Results of attB Lib6attP.sub.WT in vitro recombination. The upper Sequence Logo shows the nucleotides distribution in attB Lib6 library before performing the in vitro recombination. The two attL Sequence Logos represent the nucleotides distribution observed from two independent recombination experiments. Numbering above nucleotides represents the position at the randomized sequence. For attB Lib6, only the seven nucleotides that belong to the attB site are indicated (position 4 to 10). NGS Read number used for the sequence logo construction is indicated (M=10.sup.6). (B) Results of attB Lib8attP.sub.WT in vitro recombination. (C) Results of attP Lib3attB.sub.WT in vitro recombination. (D) Results of attP Lib1attB.sub.WT in vitro recombination. (E) Organization of the 21-bp attB.sub.mv4 site and the attP.sub.mv4-core region.

    [0106] FIG. 7. DNA sequence representativeness after in vitro recombination of randomized libraries. (A) Occurrence (read count) of each motif relative to its ranking distribution on raw NGS data. Each in vitro recombination experiment was performed twice. Abbreviations: rep1, experimental repetition 1; rep2, experimental repetition 2. B corresponds to attP.sub.WTattB Lib6 recombination, B to attP.sub.WTattB Lib8 recombination, C to attB.sub.WTattP Lib3 recombination, and C to attB.sub.WTattP Lib1 recombination. Horizontal dashed line represents cut-off at 1000 occurrence. Vertical dashed line corresponds to a cut-off for the 100 most represented motifs. The region up to 2000 occurrences and the 2000 most represented motifs is enlarged for better visibility. (B) Sequence Logos determined from the nucleotide sequences of the 100 most enriched motifs (determined as the ratio of occurrence after recombination/occurrence before recombination).

    [0107] FIG. 8. .sup.mv4Int binding characteristics by EMSA experiments. (A) Effect of P12 arm-type oligonucleotide on .sup.mv4Int binding to the core-attP region. A fluorescent fragment containing the 21-bp COC sequence (0.87 mol) was incubated in the presence or absence of .sup.mv4Int (25 mol) and in the presence or absence of unlabelled DNA containing P12 arm binding sites (4.48 mol). Reactions were analysed by native 7.5% PAGE and fluorescence was visualized on the Chemidoc MP Imaging system (Biorad). The presence or absence of .sup.mv4Int and arm-type sites (28 bp or 40 bp) is indicated above the gel. The different complexes are indicated on the gel: ss DNA, single-stranded DNA; ds DNA, double-stranded DNA; I, one monomer of .sup.mv4Int bound to the core region; II, one monomer of .sup.mv4Int bound to the core and arm region of 28 bp (II) or 40 bp (II*); III, dimer of .sup.mv4Int bound to the core and arm region of 28 bp (III) and 40 bp (III*). (B).sup.mv4Int binding to the COC region of attP. Legend is identical to A. The reaction contains 0 pmol (lane 1), 3.6 pmol (lane 3), 10.7 pmol (lane 4), 18 pmol (lane 5), 25.1 pmol (lanes 2 and 6) and 35.8 pmol (lane 7) of .sup.mv4Int. (C).sup.mv4Int binding to the BOB sequence. Legend is identical to B. (D).sup.mv4Int binding to the B/C core-binding site. Legend is identical to B. (E).sup.mv4Int binding to the B core-binding site. Legend is identical to B. (F).sup.mv4Int binding to the C core-binding site. Legend is identical to B.

    [0108] FIG. 9. .sup.mv4Int arm-binding sites characterization. (A) Characterization of the .sup.mv4Int arm-binding site by stabilization of .sup.mv4Int binding to the COC core sequence. A fluorescent fragment including the COC core sequence (0.87 mol) was incubated in the presence of 4.48 pmol of unlabelled arm-binding sites (the arm-binding region is indicated above the lanes) and 25.1 pmol of .sup.mv4Int. Lanes: 0, no .sup.mv4Int; -, no arm-binding oligonucleotide. The three complexes are indicated and are identical to those determined previously. (B) Effect of size and relative orientation of the arm-binding sites on the stabilization of .sup.mv4Int binding to the COC core region. The legend is identical to A. (C) Schematic representation of the published attP.sub.mv4 and attB.sub.mv4 sites and comparison with the structure described in this study. Dark grey boxes indicate arm-binding sites, and their orientation is represented by arrows. Light grey triangles represent the excisionase binding sites (Coddeville and Ritzenthaler, 2010) and grey boxes represent the core region (region of identity between attB and attP).

    [0109] FIG. 10. .sup.mv4Int mediates recombination towards other bacterial tRNA.sup.SER sequences. (A) tRNA.sup.SER sequences and the adapted attP core sequence used for the in vitro assay. The two .sup.mv4Int cleavage sites surrounding the overlap region (O) are indicated by the vertical black arrows. Nucleotides that differ from the attB.sub.WT sequence are shown in bold. (B) Heterologous tRNA.sup.SERsites able to recombine both attP.sub.WT and the adapted attP. The recombination reaction contains 7.2 pmol of .sup.mv4Int and was incubated for 16 h at 42 C. Lanes: a, attP.sub.WTattB.sub.X recombination; b, attP.sub.XattB.sub.X recombination; c, b without .sup.mv4Int; + attP.sub.WTattB.sub.WT recombination; attP.sub.WTattB.sub.WT recombination without .sup.mv4Int. The bacteria from which attB.sub.X (tRNA.sup.SER) comes from is indicated above the gels. Lanes: a, attP.sub.WTattB.sub.X recombination; b, attP.sub.XattB.sub.X recombination; c, b without .sup.mv4Int. (C) Sequences where recombination is only possible with the adapted attP. (D) Sequences where recombination is impaired. Lanes: a, attP.sub.WTattB.sub.X recombination; b, attP.sub.XattB.sub.X recombination; c, b without .sup.mv4Int. Nucleotides absent for the attB consensus sequence are shown in red.

    [0110] FIG. 11. .sup.mv4Int-mediated site-specific integration in the chromosomal tRNA.sup.SER from E. coli and L. lactis. (A) Nucleotide sequences of the 21-bp attB consensus, mv4 attB.sub.WT, E. coli and L. lactis tRNA.sup.SER. Nucleotides differing from attB.sub.WT are indicated in bold. Nucleotide excluded from the consensus sequence is indicated in red. (B) Theoretical outcomes of the chromosomal integration into the E. coli or L. lactis tRNA.sup.SER. Primers used for the PCR are indicated with black arrows and the size of each amplicon is indicated. (C) PCR amplification of E. coli or L. lactis attB sites, before (I) and after (II) integration (five integrants). Lane M, 1 kb DNA ladder (New England Biolabs). (D) PCR amplification of E. coli or L. lactis attL sites (I) from the five integrants used in (C). Lane M, 100 pb DNA ladder (New England Biolabs).

    [0111] FIG. 12. In vitro reprogramming of the recombination to 3 putative sites present in E. coli lacZ gene. (A) Artificial attB sites and adapted attP used. The nucleotides different from the native attB.sub.WT are indicated in bold. (B) Fluorescent in vitro assay against the 3 lacZ sites. Recombination reaction contains 7.2 pmol of .sup.mv4Int, 40 g heated E. coli crude extract and was incubated 1 h 30 at 42 C. The attB/attP pair used is indicated above each lane. Fluorescent linear attB and recombination product (I) are indicated on the gel. Lane T, reaction without .sup.mv4Int. (C) PCR amplification of attL and attR for each attB/attP pair tested. Lane M, 100 bp DNA ladder (New England Biolabs). (D) Sanger sequencing of attL and attR recombination products. Nucleotides in grey are the nucleotides belonging to attP.sub.WT and nucleotides that differ from attB.sub.WT are indicated in bold.

    [0112] FIG. 13. Effect of nucleotides excluded from BOB consensus sequence on in vitro recombination. A) Sequences of attB variants. Nucleotides not included in the consensus sequence are indicated in bold. (B) Fluorescent in vitro assay of recombination attP.sub.WTeach attB variant. Recombination reaction contains 7.2 pmol of .sup.mv4Int, 40 g heated E. coli crude extract and was incubated 1 h 30 at 42 C. Lanes: 1, attP.sub.WTattB.sub.WT; 2, attP.sub.WTattB.sub.A5C5; 3, attP.sub.WTattB.sub.A5T5; 4, attP.sub.WTattB.sub.T7C7; 5, attP.sub.WTattB.sub.T7G7. Fluorescent linear attB and recombination product (I) are indicated on the gel. Lane T, reaction without .sup.mv4Int.

    DETAILED DESCRIPTION

    Examples

    Material and Methods

    Strains, Plasmids, Primers, and Media

    [0113] The different strains and plasmids used in this study are listed in Table 5 and 6. All sequences of primers that were used are available in Table 7A, 7B, 7C and 7D. The E. coli strain NEB5- repA+ was built by using the protocol from Datsenko and Warner (Datsenko and Wanner, 2000). It was constructed by replacing the glgB gene with the glgB::Kan-repA region from E. coli strain EC1000. E. coli strains were grown in Lysogenic Broth (LB) at 37 C. L. lactis were grown on GM17 at 28 C. Antibiotics were used at the following concentration: carbenicillin, 100 g/ml; chloramphenicol, 12.5 g/ml; erythromycin, 150 g/ml (1 g/ml for L. lactis); kanamycin, 50 g/ml.

    DNA Procedures

    [0114] Standard techniques were used for DNA manipulation and cloning. Polymerase chain reaction (PCR) was performed with Q5-HF polymerase (New England Biolabs) or with CloneAmp Hifi polymerase (Takara Bio), according to the manufacturer's instructions. PCR products were purified using the QIAquick PCR purification kit (Qiagen). Plasmids were constructed using Gibson assembly (42) with NEBuilder HIFI DNA Assembly (New England Biolabs) or blunt-end cloning with T4 PNK (New England Biolabs) and T4 DNA ligase (New England Biolabs), according to the manufacturer's instructions. Plasmid DNA was extracted using QIAprep Spin Miniprep kit (Qiagen) or Nucleobond Xtra Midi (Macherey-Nagel) and their sequence was verified by Sanger sequencing (Mix2seq, Eurofins).

    Constructing a Randomized attB Library and Core-attP Library

    [0115] The randomized oligonucleotides (109 bp, attB library; 184 bp, core-attP library) were obtained by chemical synthesis (IDT, USA). PCR was used to create double-stranded DNA using primers attBlibrary-F and attBlibrary-R for attB and attPlibrary-F and attPlibrary-R for attP (Table 7B). Each PCR product was separately cloned either into pCC1Fos (Lucigen, USA) for attB libraries, or plasmid pMET359 (Table 6) for attP libraries by DNA assembly (Gibson et al., 2009). Clones were propagated in E. coli EP1300 (Lucigen, USA) under chloramphenicol selection for attB libraries and NEB5- repA+(Table 5) under carbenicillin selection for attP libraries.

    Purification of .sup.mv4Int

    [0116] For .sup.mv4Int purification, the pET-Int plasmid (Table 6) was transferred into E. coli strain BL21(DE3) (New England Biolabs). The resulting strain was grown in LB at 42 C. up to an OD600 of 0.6. Integrase gene expression was induced by addition of 0.1 mM of IPTG, and the culture was incubated at 22 C. for 3 h. Cells were recovered by centrifugation, resuspended in buffer A (50 mM Tris pH 8, 500 mM NaCl, 20 mM imidazole, 10% glycerol, 1 mg/ml lysozyme, and one tablet of SIGMAFAST Protease Inhibitor Cocktail Tablets EDTA-Free [Merck, Germany]), and disrupted by sonication (10 cycles of 30 sec at 40% intensity in ice, followed by 45 sec of rest between each cycle). The lysate was cleared by centrifugation (20000 g, 4 C., 20 min). .sup.mv4Int was first purified on nickel-nitrilotriacetic acid affinity resin (1 ml His-trap HP, GE Healthcare). Column equilibration was performed by injecting 10 column volumes of buffer B (50 mM Tris pH 8, 500 mM NaCl, 20 mM imidazole, 10% glycerol). After equilibration, the lysate was injected and unbound protein were washed using 10 column volumes of buffer B. .sup.mv4Int was eluted using a buffer C gradient of 0 to 30% (50 mM Tris pH 8, 500 mM NaCl, 500 mM imidazole, 10% glycerol). Eluted fractions were then injected in a gel filtration column (HiLoad 16/60 Superdex 200, GE Healthcare, USA). This column was equilibrated using 2 column volumes of buffer D (50 mM Tris pH 8, 500 mM NaCl, 10% glycerol, 1 mM DTT, 1 mM EDTA) and the fractions containing .sup.mv4Int were injected and eluted using the same buffer. Eluted fractions containing .sup.mv4Int were then 2-fold diluted in buffer E (50 mM Tris pH 8, 10% glycerol, 1 mM DTT, 1 mM EDTA). A heparin column (1 ml HiTrap Heparin HP, GE Healthcare, USA) was equilibrated using 10 column volumes of buffer F (50 mM Tris pH 8, 250 mM NaCl, 20% glycerol, 1 mM DTT, 1 mM EDTA). Eluted fractions containing .sup.mv4Int were then injected and unbound protein were removed using 10 column volumes of buffer F. .sup.mv4Int was eluted using a buffer G gradient of 0 to 100% (50 mM Tris pH 8, 1 M NaCl, 20% glycerol, 1 mM DTT, 1 mM EDTA). Purified integrase was aliquoted, snap-frozen in liquid N.sub.2 and stored at 80 C. in buffer containing 50 mM Tris pH 8, 500 mM NaCl, 20% glycerol, 1 mM DTT, 1 mM EDTA.

    In Vitro Fluorescent Assay

    [0117] Reaction mixtures (20 l) contained 0.08 pmol (300 ng) of supercoiled plasmid carrying the attP site, 0.08 pmol (15 ng) of linear fluorescent (Cy3) 308-bp attB fragment, 7.2 pmol (300 ng) of .sup.mv4Int and 40 g of a crude-extract from E. coli BL21(DE3) heated at 95 C. for 10 min, in 25 mM Tris pH 7.5, 1 mM EDTA, 150 mM NaCl, 1 mM DTT, and 10% PEG8000 (TENDP 1 buffer). The reaction was incubated at 42 C. either 1 h 30 or 16 h and was stopped by addition of 0.1% SDS. Samples were analysed by electrophoresis in 0.8% agarose gels. Fluorescence was revealed using the ChemidocMP imaging system (Biorad).

    In Vitro Recombination Assay Using Libraries

    [0118] Reaction (20 l) containing 0.08 pmol (450 ng) of attB plasmid, 0.08 pmol of attP plasmid, 7.2 pmol of .sup.mv4Int and 40 g of crude-extract from E. coli BL21(DE3), heated at 95 C. for 10 min, in TENDP 1 buffer (25 mM Tris pH 7.5, 1 mM EDTA, 150 mM NaCl, 1 mM DTT, and 10% PEG8000) and incubated 1 h 30 at 42 C. The attB, attP, attL and attR sites were amplified by PCR using SeqbanqueattB-F/SeqbanqueattB-R (Table 7B) primers for attB; SeqbanqueattP-F/SeqbanqueattP-R (Table 7B) primers for attP; SeqbanqueattB-F/SeqbanqueattL-R (Table 7B) primers for attL and SeqbanqueattR-F/seqbanqueattP-R (Table 7B) primers for attR. PCR products were purified and analysed by Sanger sequencing (Mix2seq, Eurofins).

    NGS Sequencing

    [0119] PCR products (attL from recombination attB Lib6attPWT; attBWTattP Lib1 and attR from recombination attBWTattP Lib3; attB Lib8attPWT) used for Sanger sequencing were also used for NGS sequencing (Eurofins). Data were uploaded on the public server at usegalaxy.org (Afgan et al., 2018) for various analyses. Sequence Logo were generated using Weblogo3 (Crooks et al., 2004) and the occurrence of each word was characterized using the Wordcount program (Rice et al., 2000).

    Electrophoretic Mobility Shift Assay

    [0120] 5 Cy3 end-labelled synthetic oligonucleotides (HPLC purified) were obtained from Eurofins. Labelled double strand DNA substrates were prepared by hybridization of complementary oligonucleotides (Table 7C) in 10 mM Tris pH 7.5, 50 mM NaCl by incubating the samples 5 min at 95 C. in a thermal cycler (Biorad) and decreasing the temperature of 1.5 C./min until it reaches 25 C. Binding reactions (20 l) were performed with 0.87 pmol of labelled core- or arm-type DNA and 4.48 pmol of unlabelled arm- or core-type DNA in buffer containing 25 mM Tris pH 8, 75 mM NaCl, 10% glycerol, 0.5 mM DTT, 0.5 mM EDTA, 1 g polydIdC (Sigma), 0.1 mg/ml BSA. The protein was added, the reaction performed at room temperature for 20 min and samples were loaded onto a non-denaturing 7.5% polyacrylamide gel (Mini-PROTEAN TGX, Biorad). The gels were run at 4 C., 75V for 2 h. Fluorescence was revealed using the ChemidocMP imaging system (Biorad).

    In Vivo Recombination

    [0121] L. lactis strain MG1363 was transformed as described by Le Bourgeois et al., (Le Bourgeois et al., 2000) by using 1 g of plasmid pMET306 (Table 6). Cells were incubated 3 h at 28 C. and selected for erythromycin resistance on M17 plates supplemented with 5 g/L of glucose. For E. coli, commercially electrocompetent EPI300 cells (Lucigen) were used and transformed with 300 ng of plasmid pMET376 (Table 6). Cells were incubated for 5 h at 37 C. and selected for carbenicillin resistance on LB agar plates. Genomic DNA of antibiotic resistant cells was extracted using the DNeasy Blood and Tissue kit (Qiagen). Site-specific recombination into the targeted tRNA.sup.SER(CGA) was verified by amplifying the attB and attL sites by PCR. PCR amplification was performed using 1 ng of genomic DNA in 25 L of 1Q5 buffer (New England Biolabs), containing 800 M of dNTP, 0.5 U of Q5 polymerase (New England Biolabs) and 0.5 M of each primer. For attB amplification, the thermal cycle program consisted of a 5 min denaturation period at 98 C., followed by 30 cycles of a three-steps thermal profile (10 s at 98 C., 30 s at 60 C., and 3 min at 72 C.) ended with one cycle at 72 C. for 2 min. For attL amplification, the thermal cycle program consisted of a 5 min denaturation period at 98 C., followed by 30 cycles of a three-steps thermal profile (10 s at 98 C., 30 s at 60 C., and 30 s at 72 C.) ended with one cycle at 72 C. for 2 min. PCR products were analysed after electrophoresis in 0.8% agarose.

    Results

    The .sup.mv4Int is a 369-Aminoacids Tyrosine Integrase

    [0122] The original analysis of the integration region of mv4 bacteriophage described the .sup.mv4Int as a 427-aminoacids (AA) protein with significant similarity with the .sup.Int integrase (Dupont et al., 1995a). This result was confirmed through its comparison with other Y recombinases (Nunes-Diby et al., 1998), although .sup.mv4Int contains only six from the seven conserved residues defining the Int family of SSR, with the structurally important D215 residue of .sup.Int (E176 in P1 Cre) missing (FIG. 1C). Moreover, when .sup.mv4Int-427-AA was subjected to protein structure prediction program (Jumper et al., 2021b), discrepancies were observed when compared to .sup.Int, such as the lack of the two -strands surrounding the catalytic K235 residue, and absence of a structured AB domain (FIG. 1A). The .sup.mv4int gene from the pMC1 plasmid (Dupont et al., 1995a) was then resequenced and revealed five single nucleotide deletions and two inversions (FIG. 1B) compared to the published sequence. These nucleotides modifications were also observed from the different .sup.mv4int gene-containing plasmids that have been published previously, p3Aint and pET-Int (Auvray et al., 1999a), as well as from a PCR amplicon of the int region from the bacteriophage mv4 (data not shown). The .sup.mv4Int amino acid sequence is deeply impacted by these variations (FIG. 1B), since .sup.mv4Int corresponds to a 369-aminoacids protein that differs from the published sequence by fifty-three residues. Comparison of its catalytic domain (residues 172 to 369) against Int, Cre, XerC, XerD and .sup.HP1Int recombinases (FIG. 1C) indicates that .sup.mv4Int-369-AA contains the 7 conserved residues R-D/E-K-H-R-H/W-Y of the tyrosine recombinases (YR) catalytic pocket (Gibb et al., 2010). In addition, the new .sup.mv4Int predicted structure (FIG. 1A) reveals better similarity with the three different domains of the .sup.Int monomer (Wojciak et al., 2002; Aihara et al., 2003). The native form of the .sup.mv4Int protein was overproduced, purified, and used for in vitro recombination assays (see Materials and Methods) between an attP site located on a supercoiled plasmid pMC1, (Dupont et al., 1995a) and a 308-pb PCR amplicon of the L. bulgaricus attB region. The Inventors demonstrated that the 369-AA .sup.mv4Int alone, i.e. without any accessory protein, was sufficient to catalyse site-specific recombination, and that reaction was abolished when using variants of each of two important residues of the YR catalytic site (Gibb et al., 2010), Y349F or K248A (data not shown, FIG. 1C).

    Global Strategy of the Use of Randomized Libraries

    [0123] Due to the originality of the core region of attP and of the attB site, these regions were reanalysed by an approach based on the use of randomized DNA libraries. Those DNA libraries, corresponding either to the attB site or to the core-attP region, contained 7 to 10 randomized positions (FIG. 2A). Oligonucleotides containing at precise positions the 4 possible nucleotides were synthetized, amplified by PCR, and cloned in E. coli into pCC1Fos or pMET359 (see Materials and Methods). Each plasmid library was recovered, verified by Sanger sequencing, and used for in vitro recombination experiments (FIG. 2B) with either the native partner site (attP.sub.WT or attB.sub.WT) or the cognate partner library (same randomized region for the 2 sites). After recombination, attL and attR sites were amplified by PCR and sequenced. Among the randomized positions, only the nucleotides allowing the recombination should be recovered in these hybrid sites, which will make it possible to determine the constraints exerted on the nature of the nucleotide at each position tested (FIG. 2B).

    Redefining the Minimal attB.sub.Mv4 Site

    [0124] The attB minimal site has been previously characterized by Auvray et al., and resulted in a 16-bp sequence, the shortest attB described in the literature (Auvray et al., 1999b). In order to validate the particular size of this site, different sizes of attB.sub.WT were amplified by PCR (FIG. 3A) to obtain fluorescent attB fragments, which were tested in our in vitro recombination assay against a plasmid-borne attP.sub.WT site (FIG. 3B). A recombination product was obtained with the attB.sub.WT region and with the 23 bp attB site but not with the 16 bp site indicating that the published site is not functional and that attB requires a longer sequence on its left side. To precisely characterise the boundaries of attB site, a library composed of five randomized positions overlapping each end of the published attB site was constructed (attB Lib9, FIG. 2A) and tested by in vitro recombination against the attP.sub.WT site. After recombination, attL and attR sites were amplified by PCR and analysed by Sanger sequencing (FIG. 3C). On the right side of attB, the attR site displays several nucleotides at every randomized position, except for positions 6 and 7 that lack G and A, respectively, indicating that the attB site ends with the sequence 5-CTCCTT-3, in agreement with the previous study (Auvray et al., 1999b). On the left side, strong constraints are observed on attL since only one C is recovered at positions 4 and 5 after recombination. These positions correspond to the left end of the overlap region (Coddeville et al., 2014a) where nucleotides must be identical between the attB and attP sites, as observed for most recombination systems mediated by tyrosine-recombinases. Constraints are also observed at positions 1, 2, and 3 of the randomized region since only purines were detected, thought it was a position previously described outside of the minimal attB site (Auvray et al., 1999b). To precisely locate the left end of the attB site, two additional random libraries (Lib 5 and Lib6, FIG. 2A) were constructed and tested in vitro against the attP.sub.WT site (FIG. 3D-E). Sequencing of the attL site from attB Lib6attP.sub.WT recombination indicates that positions 1 to 3 did not revealed any constraints (FIG. 3D), suggesting their location outside attB. This left border was also validated by the sequencing of the attL site from attB Lib5attP.sub.WT recombination (FIG. 3E), which confirmed the absence of constraints at the same positions. Altogether, results strongly suggest that the attB site of the mv4 recombination system corresponds to a site of 21-bp in length of sequence 5-TTCAAATCCTGTACTCTCCTT-3 (SEQ ID No 6).

    Redefining the Size of the attB and attP Overlap Regions

    [0125] A previous study based on the use of DNA suicide substrates determined the length of the attB.sub.mv4 and attP.sub.mv4 overlap regions to 8 bp (Coddeville et al., 2014a), a size-range typical of YR recombination systems (Grindley et al., 2006). However, the Inventors noticed that all heterobivalent recombination systems currently characterised, such as phages (Craig and Nash, 1983), HK022 (Kolot and Yagil, 1994), HP1 (Hauser and Scocca, 1992), L5 (Pea et al., 1996), P22 (Smith-Mungo et al., 1994) or the ICE CTnDOT (Malanowska et al., 2006), display a 7-bp overlap region. As the knowledge of the strand exchange mechanism used by tyrosine recombinases make it possible to determine indirectly the length of the overlap region (Grindley et al., 2006), it has been taken advantage of this property to reanalyse the overlap region of the .sup.mv4Int/attP/attB system using random DNA libraries. Indeed, as the two overlap regions of attP/attB from phage systems must contain identical DNA sequence to promote a full recombination reaction, only one of the four nucleotides present on the attP or attB random libraries should be recovered on both attL and attR sites if the position is included into the overlap region (FIG. 4A). In contrast, if this position is located outside the overlap region, all permissive nucleotides should be recovered at one of the recombined sites, attL or attR depending on if the random library corresponds to attP or attB sites. As the left border of the overlap region was already verified with the use of the random library attB Lib9 (FIG. 3C), its right border was determined by the use of two different libraries (attB Lib8 and attP Lib1, FIG. 2A) tested by in vitro recombination against their cognate site, attP.sub.WT or attB.sub.WT, respectively. In both cases, one of the two recombined sites (attL or attL) contained three nucleotides at positions 1 of the random libraries (FIG. 4BC), demonstrating that the overlap regions of attP and attB is indeed of 7-bp in length instead of the 8-pb determined previously (Coddeville et al., 2014a).

    Characterization of the Nucleotide Constraints Existing on attB and attP Overlap Regions

    [0126] For the model integrase .sup.Int, it has been soon observed that the nature of the bases in the overlap region was not important for recombination but that sequence identity between attB and attP overlap was mandatory (Weisberg et al., 1983; Bauer et al., 1985). A similar feature has been observed for the phage HK022 recombination module (Kolot et al., 2015) and for Flp (McLeod et al., 1986) and Cre (Hoess et al., 1986), although it has been shown that presence of heteroduplex in the overlap of loxP sites can sometimes be functional (Lee and Saito, 1998; Sheren et al., 2007). In order to characterize the constraints exerted in the overlap region, two libraries composed of 7 randomized positions in the overlap region were constructed (attB Lib7 and attP Lib2, FIG. 2A) and tested in vitro against attP.sub.WT (FIG. 5A) or attB.sub.WT (FIG. 5B), respectively. After recombination, among the 16,384 (4.sup.7) starting sequences, only the sequence corresponding to the WT overlap sequence (5-CCTGTAC-3) was recovered on attL and attR sites, whatever the library used (attP Lib2 or attB Lib7). In addition, the same attB Lib7 library was tested against three different attP overlap sequences (FIG. 5C). After recombination, only the sequence corresponding to the attP overlap region tested was recovered, though an adenine could be found in addition to the expected guanine at the 6.sup.th position of the 5-TTTCGGC-3 overlap sequence (FIG. 5C, right), suggesting that some heteroduplexes might happen to some extent. Lastly, in vitro recombination between attB Lib7 and attP Lib2 libraries revealed that any nucleotide can be recovered, except perhaps for the seventh position where no peak of C could be observed in the chromatogram (FIG. 5D). Altogether, these results indicate that almost no constraint appears to exist into the attB and attP overlap regions at the level of nucleotide composition as long as these two regions remain identical. However, the in vitro recombination efficiency seems to depend on the nature of the overlap sequences because differences in fluorescence intensities of the recombined product, from a 2-fold (FIG. 5E, lane 2) to a 15-fold (FIG. 5E, lane 3) decrease compared to the WT overlap region, can be observed in our in vitro assays.

    Characterization of the Nucleotide Constraints Exerted on the DNA Regions Surrounding the Overlap Sequence of attB Site and Core-attP Region

    [0127] The attB Lib6 library (FIG. 2A) that allowed us to define a 21-bp attB site also allowed the analysis of the nucleotide constraints exerted on the left side of the attB overlap sequence. After recombination against the attP.sub.WT site, Sanger sequencing revealed a 7-bp degenerated pattern of the attL site (FIG. 3D), with no C at positions 4 and 5, no A in position 6, only purines at positions 7, 8 and 9, and only A or T at position 10. Similar constraints were observed at the corresponding positions when analysing the attL site from the recombinations attP.sub.WTattB Lib9 (FIG. 3C) and attP.sub.WTattB Lib5 (FIG. 3E). These results strongly indicate that .sup.mv4Int-mediated recombination supports nucleotide variations of the attB site to some extend at its 7-bp left region, when recombined against the native attP site, defining a 7-pb degenerated pattern 5-DDBRRRW-3. However, when applied on a DNA population such as randomized libraries, the Sanger sequencing method is only resolutive enough to reveal prevalent nucleotides at each position. The nucleotide constraints of attB site and core-attP region were thus analysed through NGS sequencing (Illumina) of the recombined attL or attR sites (depending on the randomized region), allowing the individual sequencing of every molecule amongst millions of DNA fragments (Table 8) and the construction of Sequence Logos (Schneider and Stephens, 1990). In order to consider any biased nucleotide distribution during the construction of the four randomised DNA libraries used for these experiments (attB Lib6, attB Lib8, attP Lib3, and attP Lib1, FIG. 2A), PCR-amplified attB and attP sites were sequenced by NGS and their respective Sequence Logos constructed (FIG. 6).

    [0128] Each library has the tendency to contain A or C slightly underrepresented compared to T and G, with a minimum of 15% of C and 20% of A for attP Lib3 library (FIG. 6D). The randomized region of the attL site from attB Lib6attP.sub.WT in vitro recombination was then reanalysed by NGS sequencing, and its Sequence Logo (FIG. 6A) strongly confirmed the 7-pb consensus pattern 5-DDBRRRW-3 determined by Sanger sequencing, though additional nucleotides appeared at low frequencies (see for instance the T observed at two of the three purines, FIG. 6A). In a similar manner, the right-side of the attB overlap region was analysed by performing Sanger (data not shown) and NGS sequencing (FIG. 6B) of the attR site from attB Lib 8attP.sub.WT recombination. Interestingly, the Sequence Logo revealed a 7-bp consensus pattern 5-DYYYVHB-3 complementary to the left side of the attB overlap region, though more degenerated (see for instance the C/G at position 6 or the G/A at position 7). The symmetry in nucleotide composition between the left and the right sides of the attB site suggests the presence of an imperfect inverted-repeated pattern surrounding its overlap region, an organization more characteristic of the recombination sites associated to tyrosine recombinases. Similar analyses were performed on the core-attP region using randomized DNA libraries surrounding the overlap region, attP Lib3 (FIG. 2A) for the left side of the O region and attP Lib1 (FIG. 2A) for its right side. After recombination against attB.sub.WT, NGS sequencing of the randomized attR or attL regions, respectively, revealed two 7-bp symmetrical degenerated patterns, 5-DDBDRDW-3 at the left side of attP (FIG. 6C) and 5-DYYYVHY-3 at its right side (FIG. 6D), almost identical to their attB counterparts. In conclusion, the use of appropriate randomized DNA libraries covering the 21-pb of the core-attP region and attB site highlighted the classical organization of recombination sites of the mv4 system, with two inverted-repeat sequences (B and C for the left sides of attB and attP, and B and C for the right sides of attP, according to the lambda's recombination sites terminology) corresponding to putative core-binding sites of the integrase surrounding the strand-exchange region (the overlap region) of 7-bp (FIG. 6E).

    Characterization of the B, B, C and C Motifs Most Permissive for .sup.mv4Int-Mediated In Vitro Recombination

    [0129] By determining the individual sequence of each molecule from a DNA population, NGS sequencing not only allows high resolution in the nucleotide composition of the randomized part of the libraries, but also to determine the occurrence of each 7-bp motif into the recombined DNA population. As it is highly unlikely that the thousands of motifs from the libraries will all be functional because of uncharacterized constraints, the Inventors assumed that comparing motifs occurrence in the randomized libraries before and after recombination by NGS sequencing will give relevant information about their ability to form a productive recombination complex with the mv4 integrase. Once constructed, each attB or attP library was sequenced and found to contains from 16312 to 16384 motifs (Table 8), corresponding from 99.5 to 100% of the theoretical number for 7-bp random library (4.sup.7). Depending on the library and the experimental repetition, one-third (33.32%) to two-thirds (71.91%) of the motifs were recovered after .sup.mv4Int-mediated in vitro recombination (Table 8). In addition, their occurrence was highly biased, with a factor ranging from 2,000 to 70,000, depending on the experiment, between the least and the most represented motif (Table 8), with a rapid drop in read counts relative to the rank (FIG. 7A). For example, from the several thousands of motifs recovered after recombination, only 250 to 850 have an occurrence of at least 1,000 (FIG. 6A), and fifty percent of the read counts are represented by only 167 (2.3%) to 784 (7.5%) of the recovered motifs (Table 8), depending on the library. In order to determine the B, B, C, and C sites most permissive for recombination, occurrence of each motif from attL (B and C sites, FIG. 6A-D) or attR (B and C sites, FIG. 6B-C) hybrid site was divided by its occurrence before the recombination (attP or attB libraries), allowing to calculate an enrichment factor, with the hypothesis that the more a motif is enriched, the more it is permissive to recombination. However, as most of the motifs have a very low occurrence (for example, 30% to 50% of the motifs have an occurrence lower than 10) and are thus not representative of the NGS read population, the enrichment determination was only performed on the 250 to 850 motifs with an occurrence of at least 1,000. At last, only the 100 most frequent motifs were considered for the ranking. These motifs correspond from 10% to 35% of the total counts of the recovered motifs (FIG. 7A). Altogether, these results strongly indicate that amongst the thousands of functional sequences, only a small percentage allows recombination at a significant efficiency. Performing Sequence Logos on these 100 most enriched motifs not only confirmed the 7-bp consensus patterns previously obtained (FIG. 6E), as seen for B site (5-DDBRRRW-3), but reduced the degeneracy to some extend for sites B (5-WYYYVHB-3 instead of 5-DYYYVHB-3), C (5-DDKDRDW-3 instead of 5-DDBDRDW-3), and C (5-WYYYMHY-3 instead of 5-DYYYVHY-3) (FIG. 7B). One should note that all natural sites belong to these reduced degeneracy, except the C site for which the A in third position is excluded from the consensus. Remarkably, the natural motifs are not necessary the most enriched sequences after recombination (Tables 1-4), since B, B, C, and C sites are ranked 40.sup.th, 411.sup.th, 253.sup.rd and 16.sup.th, respectively. This suggest that either other motifs can recombine more efficiently than the natural sites, or that natural sites are the best adapted sites for in vivo but not in vitro .sup.mv4Int-mediated recombination.

    B, B, C and C Sites are the Core-Binding Sites for the Mv4 Integrase

    [0130] To experimentally demonstrate that imperfect inverted-repeats B, B, C and C correspond to the .sup.mv4Int core-binding sites, the Inventors used a gel shift assay (EMSA) based on the protocol developed on the COC region of .sup.Int recombination system (Sarkar et al., 2001). In this study, Sarkar et al., demonstrated that the N-ter domain exerts an inhibitory effect on the C-ter domain when not bound to the P arm-binding sites. However, if DNA containing the P1-P2 arm-binding sites was added to the reaction, the inhibition was removed and .sup.Int was able to stably bind to the core-binding sites. To determine if .sup.mv4Int N-ter domain also exerts such inhibitory effect on its C-ter domain, they compared the .sup.mv4Int binding to a labelled 35-bp dsDNA containing the 21-bp core-attP region (Table 7C), in the presence or absence of unlabelled dsDNA containing the P1 and P2 arm-type binding sites (Table 7C). .sup.mv4Int binding to the COC sequence appeared quite unstable, as only faint bands and strong background smear can be observed (lane 3, FIG. 8A). Addition of a 28-bp dsDNA containing the P12 sequences (Table 7C) greatly stimulates and stabilizes .sup.mv4Int binding to the core-attP sequence and led to the formation of three complexes (lane 4, FIG. 8A). In accordance with results from other YR systems, our data suggests that complex I should consist of a single .sup.mv4Int monomer bound to one core-binding site, whereas complex II should consist of a single .sup.mv4Int monomer bound to one arm and one core-binding site, and that complex III should correspond to a dimer of .sup.mv4Int bound to arm- and core-binding sites. To ensure that the stabilization effect was due to the binding of the N-ter domain of .sup.mv4Int to the arm-type binding site, the 28-pb P12 dsDNA was replaced by a DNA duplex of 40 pb containing the P1 and P2 sites (Table 7C). A stabilization effect on the binding to the COC sequence was observed (lane 5, FIG. 8A), with the formation of two complexes (II* and III*) of mobility lower than the complexes observed with the 28-bp P12 fragment. EMSA experiments clearly showed that .sup.mv4Int stably binds to both attB and core-attP sites, with an increase of complex III band intensity when increasing the .sup.mv4Int concentration (FIG. 8BC). As the left and right half-sites of attB and core-attP are not perfect inverted repeats, a study of the .sup.mv4Int binding to each half-site was performed using 35-bp dsDNAs mutated for either B, C, or C sites (Table 7C). In all cases, a fluorescence signal weaker than for DNA fragments containing two binding sites (BOB or COC) was observed (FIG. 8DEF), suggesting strong cooperativity between .sup.mv4Int molecules for the binding to the core-type sites. In addition, asymmetric binding of .sup.mv4Int, with the integrase able to form three complexes (I, II and III) with the right arm (B/C, FIG. 8D) thought at much lower efficiency than for the WT sites, but not with the left arms (B/C, FIG. 8EF). This behaviour is typical of YR systems and strongly suggests that one .sup.mv4Int monomer binds preferentially to the B or C core-binding site and promote the binding of the second .sup.mv4Int monomer at the B or C sites.

    attP.sub.mv4 contains two pairs of direct repeats of .sup.mv4Int Arm-Binding Sites.

    [0131] As the EMSA experiments allowed the Inventors to indirectly observe .sup.mv4Int binding to the arm-binding sites by stabilizing the binding of .sup.mv4Int to the core-type sites, they reanalysed the number and locations of the five arm-binding sites previously described (Auvray et al., 1999a), by studying eleven sets of .sup.mv4Int arm-binding sites (Table 7C). Among all the combinations used, only those containing either the P1-P2 pair or P1-P2 pair allow the stabilization of .sup.mv4Int binding to COC (FIG. 9A). This suggests that the P3 sequence is not a P arm-binding site, in contrast to previous results (Auvray et al., 1999a). To characterize the binding of the N-ter domain on the P arm-type sites, different orientation of the P1-P2 pair was tested (i.e. direct repeats and inverted repeats), and results indicate that .sup.mv4Int binds cooperatively to the arm-binding sites only if P1-P2 sites are in direct repeats (FIG. 9B). The Inventors also used EMSA to characterize the size of the arm-binding sequence, and found that two direct repeats of 11 bp were necessary for the binding of .sup.mv4Int (FIG. 9B), suggesting that .sup.mv4Int binding sequence is a 11-bp word, in contrast to the 9-bp size previously proposed (Auvray et al., 1999a). In conclusion, the characterization of the DNA binding properties of .sup.mv4Int led us to show that attP.sub.mv4 site has a more classical organization (FIG. 9C) than previously suggested (Auvray et al., 1999a; Coddeville et al., 2014a), with two pairs of adjacent arm-binding sites, one (P1P2) on the left arm (the P-arm) and the other (P1P2) on the right arm (the P-arm), with a core region consisting of two .sup.mv4Int core-binding sites surrounding the 7-bp overlap sequence.

    The .sup.mv4Int/attP system can be reprogrammed to target tRNA.sup.SER of Other Bacterial Species

    [0132] The characterization of the high degeneracy of mv4 attB site and core-attP region led us to postulate that .sup.mv4Int may be able to recombine DNA targets other than its cognate site by reprogramming the core-attP region, as long as these targets belong to the consensus pattern defined in this study (FIG. 6E). To test this hypothesis, the Inventors attempted to redirect the specificity of the in vitro recombination towards different bacterial tRNA.sup.SER sites, from the most to the less conserved genes (FIG. 10A, left). To do this, they used different variants of attP with the overlap and C sequences modified to make them identical to the different overlap and B sequences (adapted attP, FIG. 10A, right) of the tRNA.sup.SER tested. Seven native tRNA.sup.SERsequences and two artificial sequences have been tested by in vitro recombination. Three different outputs have been obtained. First, some sites were permissive to site-specific recombination against either the native attP.sub.mv4 site or its adapted counterpart (FIG. 10B), as the tRNA.sup.SER from L. sakei and L. acidophilus. Such results were expected since these tRNAs are either identical to the native attB site (L. sakei) or contain the tolerated A at the first position of the B site (L. acidophilus). Second, several sites were permissive to recombination only against the adapted mv4 attP sites (lanes b, FIG. 10C), such as tRNA.sup.SER from L. lactis, Streptococcus mutans, Lactobacillus reuteri, the artificial sequence 1, and Leuconostoc mesenteroides, though the latter site displays lower recombination efficiency. These attB sites contains up to four nucleotide differences on their overlap region compared to the native attB site, explaining why no recombination signal has been observed when tested against the native mv4 attP site (lanes a, FIG. 10C). Each of these sites contains one nucleotide modification into their B sequences, but all belong to the consensus pattern tolerated by .sup.mv4Int. The last output corresponds to attB sequences, one artificial and the tRNA.sup.SER from E. faecalis, that were refractory to .sup.mv4Int-mediated recombination, even when using the adapted attP site (FIG. 10D). As both sequences contain two nucleotides out of the consensus pattern, it is tempting to postulate that .sup.mv4Int cannot tolerated more than one nucleotide that derogate the nucleotide constraint found in the degenerated pattern. Altogether, in vitro recombination results demonstrated that it is possible to reprogram the .sup.mv4Int-mediated site-specific recombination towards new tRNA.sup.SER target sequences by adapting the 21-pb core region of the attP.sub.mv4 site.

    Reprogramming the Core-attP Site Allows In Vivo Recombination at the tRNA.sup.SER of E. coli and L. lactis

    [0133] As .sup.mv4Int promote site-specific recombination in vitro into the L. lactis tRNA.sup.SER when using the adapted attP site, the Inventors tested if this retargeted recombination could be performed in vivo. In addition, as the tRNA.sup.SER of E. coli contains only one nucleotide that derogates the consensus pattern (FIG. 11A), but which is located at a less constrained position (6 position, FIG. 6B), they hypothesised that reprogramming the .sup.mv4Int/attP system could lead to in vivo integration into the E. coli tRNA.sup.SER. To do that, the overlap and C sequences of mv4 core-attP region of pMET359 plasmid (Table 6), a plasmid only able to replicate into the NEB5-repA E. coli strain (Table 5), were replaced by the corresponding overlap and C sequences adapted to E. coli tRNA.sup.SER (FIG. 11A), generating pMET376 (Table 6). For integration into the L. lactis genome, the overlap and C sequences adapted to L. lactis tRNA.sup.SER (FIG. 11A) were used to replace the WT region of pMC1 plasmid to create pMET306 (Table 6). After transformation of these two suicide plasmids into their corresponding bacterial species, and selection of integrants based on antibiotic resistance, several clones were randomly picked and their genome structure at the tRNA.sup.SER locus was analysed by PCR (FIG. 11BCD). In every case, all clones have integrated the plasmid at the expected target site, suggesting that tRNA.sup.SER locus can be used as a landing pad to integrate foreign DNA by .sup.mv4Int/attP site-specific recombination in these two phylogenetically unrelated bacterial species.

    Extending the .sup.mv4Int/attP system reprogramming to sequences other than tRNA.sup.SER

    [0134] The Inventors performed an in-silico analysis of the E. coli MG1655 genome in order to identify putative recombination sites that obey the consensus pattern of attB.sub.mv4. After analysis, 7959 putative sites have been identified, with three of them located into the lacZ gene. To test if the reaction can be reprogrammed to target these three sites in vitro, the overlap region of the mv4 core-attP was replaced by the overlap sequence of either lacZ1, lacZ2 and lacZ3 (FIG. 12A). One should note that, in contrast to the experiments performed with the tRNAs, the mv4 natural core-binding sites C and C were conserved. No recombined product could be observed for any attB/attP pair with our in vitro recombination assay (FIG. 12B), indicating either that no recombination occurred or that recombination efficiency was below the detection threshold (estimated as corresponding to less than 5% of recombination compared to the WT attB/attP pair, data not shown). However, recombined product could be detected for each target after PCR-amplification of the hybrid sites attL and attR, albeit with a very low intensity for the lacZ3 attB.sub.lacZ3/attP.sub.lacZ3 pair (FIG. 12C). The latter result was expected because they already have observed that replacement of the attB.sub.mv4 site overlap by the sequence 5-AAGGTTT-3 induced a 15-fold decrease in recombination activity (FIG. 5E). Nevertheless, the sequence of each PCR-amplified hybrid site (FIG. 12D) confirmed that all the three lacZ putative sites were permissive to .sup.mv4Int-mediated site-specific recombination. These results show that it is possible to specifically reprogram the recombination reaction toward target sites meeting the requirements of the degenerated consensus sequence and exhibiting more than 50% of nucleotide difference with the attB.sub.mv4 site, even if the recombination efficiency remains very low and only detectable by PCR-amplification. It seems reasonable to consider that such low recombination efficiency could be explained by the poor ranking of B and B target sites in the enrichment experiment (FIG. 7A) since, for example, although they belong to the reduced degeneracy motifs, B and B motif of lacZ2 site were ranked only 165.sup.th and 477.sup.th respectively, (FIG. 7B).

    [0135] At last, the Inventors determined if targeted sequences containing one nucleotide that did not belong to the consensus patterns would still be productive in recombination, as two nucleotides excluded from the patterns seemed detrimental (see above). For that purpose, two nucleotide positions at the B site were independently replaced by forbidden nucleotides (FIG. 13A) and tested for in vitro recombination (FIG. 13B). In all cases, recombined product was observed at a significance level, indicating that one nucleotide out of the consensus pattern is not sufficient to impair recombination, probably through unpredictable compensatory effects.

    TABLE-US-00009 TABLE 5 Strains. Strain Relevant characteristics Reference E. coli B F.sup. ompT gal dcm lon hsdSB(r.sub.Bm.sub.B) (DE3 [lacI (Studier et al., BL21(DE3) lacUV5-T7p07 ind1 sam7 nin5]) [malB.sup.+].sub.K-12(.sup.S) 1990) E. coli E. coli EC101, repA(pWV01) glgB::repA(pWV01), Kan.sup.R (Leenhouts et al., EC1000 1996) E. coli F endA1 glnV44 thi-1 recA1 relA1 gyrA96 deoR New England Neb5 nupG purB20 80dlacZM15 (lacZYA- Biolabs argF)U169, hsdR17(r.sub.Km.sub.K+), E. coli F mcrA (mrr-hsdRMS-mcrBC) Epicentre EPI300 80dlacZM15 (lac)X74 recA1 endA1 araD139 (ara, leu)7697 galU galK rpsL (Str.sup.R) nupG trfA dhfr E. coli Strain NEB5 glgB::repA(pWV01), Kan.sup.R This study NEB5-RepA L. lactis Wild Type strain. Plasmid-free derivative (Gasson, 1983) MG1363 strain of NCDO712

    TABLE-US-00010 TABLE 6 Plasmids Name Relevant properties Reference pBSattB pBS::attB.sub.mv4; Amp.sup.R (Dupont et al., 1995) pMC1 pRC1::.sup.mv4Int-attP.sub.mv4; Erm.sup.R (Dupont et al., 1995) pET-Int pET15b::.sup.mv4Int; Amp.sup.R (Auvray et al., 1999a) pCC1Fos ori F-factor and oriV; Cm.sup.R Epicentre pMET302 pMC1::core attP tRNA.sup.SER L. acidophilus; Erm.sup.R This study pMET303 pMC1::core attP artificial 2; Erm.sup.R This study pMET304 pMC1::core attP tRNA.sup.SER L. mesenteroides; Erm.sup.R This study pMET305 pMC1::core attP tRNA.sup.SER E. faecalis; Erm.sup.R This study pMET306 pMC1::core attP tRNA.sup.SER L. lactis; Erm.sup.R This study pMET307 pMC1::core attP tRNA.sup.SER S. mutans; Erm.sup.R This study pMET308 pMC1::core attP tRNA.sup.SER L. reuteri; Erm.sup.R This study pMET309 pMC1::core attP artificial 1; Erm.sup.R This study pMET310 pMC1::core attP L. sakei; Erm.sup.R This study pMET311 pMC1 IS10::.sup.mv4Int; Erm.sup.R This study pMET320 pBSattB::tRNA.sup.SER L. acidophilus; Amp.sup.R This study pMET321 pBSattB::artificial 2; Amp.sup.R This study pMET322 pBSattB::tRNA.sup.SER E. faecalis; Amp.sup.R This study pMET323 pBSattB::tRNA.sup.SER L. lactis; Amp.sup.R This study pMET324 pBSattB::tRNA.sup.SER S. mutans; Amp.sup.R This study pMET325 pBSattB::tRNA.sup.SER L. mesenteroides; Amp.sup.R This study pMET326 pBSattB::tRNA.sup.SER L. reuteri; Amp.sup.R This study pMET327 pBSattB::tRNA.sup.SER L. sakei; Amp.sup.R This study pMET328 pBSattB::artificial 1; Amp.sup.R This study pMET340 pMC1 colE1::ori pWV01; Erm.sup.R This study pMET348 pCC1Fos::attB.sub.mv4; Cm.sup.R This study pMET349 pCC1Fos::attB.sub.mv4 23 bp; Cm.sup.R This study pMET357 pCC1Fos::attB.sub.mv4 16 bp; Cm.sup.R This study pMET359 pMET340 + bla; Erm.sup.R; Amp.sup.R This study pMET364 pMET348::overlap lacZ.sub.1; Cm.sup.R This study pMET365 pMET348::overlap lacZ.sub.3; Cm.sup.R This study pMET366 pMET348::attB lacZ.sub.1; Cm.sup.R This study pMET367 pMET348::attB lacZ.sub.2; Cm.sup.R This study pMET368 pMET348::attB lacZ.sub.3; Cm.sup.R This study pMET370 pMET348::overlap lacZ.sub.2; Cm.sup.R This study pMET371 pMET359::overlap lacZ.sub.1; Erm.sup.R, Amp.sup.R This study pMET372 pMET359::overlap lacZ.sub.2; Erm.sup.R, Amp.sup.R This study pMET373 pMET359::overlap lacZ.sub.3; Erm.sup.R, Amp.sup.R This study pMET376 pMET359::core attP tRNA.sup.SER E. coli; Erm.sup.R, Amp.sup.R This study

    TABLE-US-00011 TABLE7 Listofoligonucleotides A.Oligonucleotidesusedforcloning. Primername Sequence(5-3) SDM-attB4-F GACCTGTACTCTCCTTAATAAGGTCAAATG(SEQIDNo7) SDM-attB4-R GACCTTATTAAGGAGAGTACAGGTCTTGAAC(SEQIDNo8) SDM-attB5-F CCTGTACTCTCCTGAATAAGGTCAAATGGTATC(SEQIDNo9) SDM-attB5-R GACCTTATTAAGGAGAGTACAGGTCTTGAAC(SEQIDNo10) SDM-LbsakattB-F CCTGTACTCTCCTTTTAAAGGTCAAATGGTATC(SEQIDNo11) SDM-LbsakattB-R CATTTGACCTTTAAAAGGAGAGTACAGGATTTG(SEQID No12) SDM-LbacidoattB-F CCTGTACACTCCTTTTTAAGGTCAAATGGTATC(SEQIDNo13) SDM-LbacidoattB-R CATTTGACCTTAAAAAGGAGTGTACAGGATTTG(SEQID No14) SDM-LbreutattB-F CCCCTTGTCTCCTTAGAAAGGTCAAATGGTATC(SEQIDNo15) SDM-LbreutattB-R CTTTCTAAGGAGACAAGGGGATTTGAACCTGCG(SEQID No16) SDM-LlactisattB-F CCCCTTGACTCCTTTTAAAGGTCAAATGGTATCC(SEQID No17) SDM-LlactisattB-R CTTTAAAAGGAGTCAAGGGGATTTGAACCTGCG(SEQID No18) SDM-SmutansattB-F CCCGTCCTTTCCTTAACAAGGTCAAATGGTATC(SEQIDNo19) SDM-SmutansattB-R CTTGTTAAGGAAAGGACGGGATTTGAACCTGCG(SEQID No20) SDM-EfaecattB-F CCCTTATCCTCCGTACCAAGGTCAAATGGTATCC(SEQID No21) SDM-EfaecattB-R CTTGGTACGGAGGATAAGGGATTTGAACCTGCG(SEQID No22) SDM-LnmesentattB-F CCCGCTAGCTCCTTTATAAGGTCAAATGGTATC(SEQIDNo23) SDM-LnmesentattB-R CTTATAAAGGAGCTAGCGGGATTTGAACCTGCG(SEQID No24) SDM-EcoliattB-F CCTCTCTCCGCCACTTTAAGGTCAAATGGTATCC(SEQID No25) SDM-EcoliattB-R CTTAAAGTGGCGGAGAGAGGATTTGAACCTGCG(SEQID No26) SDMgibsonermAM-F GAGAATATCGTCAACTGTTTACTAAAAATC(SEQIDNo27) SDMgibsonermAM-R GTAAACAGTTGACGATATTCTCGATTG(SEQIDNo28) SDMgibsonattP5-F AAAGAACCTGTACTCTCCTGAATCAAAGCAATAATC(SEQID No29) SDMgibsonattP5-R TTCAGGAGAGTACAGGTTCTTTCAACCATGTTTC(SEQID No30) SDMgibsonattPsakei- AAAGAACCTGTACTCTCCTTTTACAAAGCAATAATC(SEQID F No31) SDMgibsonattPsakei- AAAAGGAGAGTACAGGTTCTTTCAACCATGTTTC(SEQID R No32) SDMgibsonattPacido- AAAGAACCTGTACACTCCTTTTTCAAAGCAATAATC(SEQID F No33) SDMgibsonattPacido- AAAAGGAGTGTACAGGTTCTTTCAACCATGTTTC(SEQID R No34) SDMgibsonattPmutans- AAAGAACCCGTCCTTTCCTTAACCAAAGCAATAATCCC(SEQ F IDNo35) SDMgibsonattPmutans- TTAAGGAAAGGACGGGTTCTTTCAACCATGTTTC(SEQID R No36) SDMgibsonattPreuteri- AAAGAACCCCTTGTCTCCTTAGACAAAGCAATAATCCC(SEQ F IDNo37) SDMgibsonattPreuteri- CTAAGGAGACAAGGGGTTCTTTCAACCATGTTTC(SEQID R No38) SDMgibsonattPlactis- AAAGAACCCCTTGACTCCTTTTACAAAGCAATAATCCC(SEQ F IDNo39) SDMgibsonattPlactis- AAAAGGAGTCAAGGGGTTCTTTCAACCATGTTTC(SEQID R No40) SDMgibsonattPfaeca- AAAGAACCCTTATCCTCCGTACCCAAAGCAATAATCCC(SEQ F IDNo41) SDMgibsonattPfaeca- GTACGGAGGATAAGGGTTCTTTCAACCATGTTTC(SEQID R No42) SDMgibsonattPcoli-F AAAGAACCTCTCTCCGCCACTTTCAAAGCAATAATCCC(SEQ IDNo43) SDMgibsonattPcoli-R AAGTGGCGGAGAGAGGTTCTTTCAACCATGTTTC(SEQID No44) SDMgibsonattPmesent- AAAGAACCCGCTAGCTCCTTTATCAAAGCAATAATCCC(SEQ F IDNo45) SDMgibsonattPmesent- TAAAGGAGCTAGCGGGTTCTTTCAACCATGTTTC(SEQID R No46) pBS-antiattB-F GCTTGGGCTGCAGGA(SEQIDNo47) pBS-antiattB-R GGAAAGGACATCTAAATCAAATGG(SEQIDNo48) SDM-Lbreuteri2attB-F TCCATAGAAAGGTCAAATGGTATC(SEQIDNo49) SDM-Lbreuteri2attB-R GACAAGGGGATTTGAACCTG(SEQIDNo50) oripGh9-F TTAAATTTATACTGCAATCGGATGC(SEQIDNo51) oripGh9-R GTATTTTTAATAGCCATGATATAATTACCTTATC(SEQID No52) pMC1-sans-colE1-F TGGAACGAAAACTCACGTTAAG(SEQIDNo53) pMC1-sans-colE1-R TGATTCTGTGGATAACCGTATTAC(SEQIDNo54) pCC1FOS-F GTGGGATCCCCGGGTAC(SEQIDNo55) pCC1FOS-R GTGGGATCCTCTAGAGTCGAC(SEQIDNo56) attB-gibsonpCC1Fos- GCAGGTCGACTCTAGAGGATCCCACGAATTCCTGCAGCCCAA F GC(SEQIDNo57) attB-gibsonpCC1Fos- GAGCTCGGTACCCGGGGATCCCACCCCCATTTGATTTAGATG R TCCTTTC(SEQIDNo58) attBWT-F ATCCTGTACTCTCCTTAAT(SEQIDNo59) attBWT-R ATTAAGGAGAGTACAGGAT(SEQIDNo60) attB-WTGGTT-F GGTTCAAATCCTGTACTCTCCTTAAT(SEQIDNo61) attB-WTGGTT-R ATTAAGGAGAGTACAGGATTTGAACC(SEQIDNo62) pMC1delTOPO-F AAATCGAAACAGCAAAGAATGG(SEQIDNo63) Bla-F TGGCACTTTTCGGGGAAATG(SEQIDNo64) Bla-R GTGCTACAGAGTTCTTGAAGTG(SEQIDNo65) Site1LacZ-Gib-F AAAGAATTTCGGCTCTCCTTAATCAAAGCAATAATCCC(SEQ IDNo66) Site2LacZ-Gib-F AAAGAATGCCAACTCTCCTTAATCAAAGCAATAATCCC(SEQ IDNo67) Site3LacZ-Gib-F AAAGAAAAGGTTTTCTCCTTAATCAAAGCAATAATCCC(SEQ IDNo68) Site1LacZ-Gib-R TTAAGGAGAGCCGAAATTCTTTCAACCATGTTTCTGGAG (SEQIDNo69) Site2LacZ-Gib-R TTAAGGAGAGTTGGCATTCTTTCAACCATGTTTCTGGAG(SEQ IDNo70) Site3LacZ-Gib-R TTAAGGAGAAAACCTTTTCTTTCAACCATGTTTCTGGAG(SEQ IDNo71) AmpGib-F GAATTATGCAGTGCTGCCATAAC(SEQIDNo72) AmpGib-R GTTATGGCAGCACTGCATAATTC(SEQIDNo73) OverlapSite1lacZ-R AAGGAGAGCCGAAAATTTGAACCTGCGCACC(SEQIDNo74) OverlapSite2lacZ-R AAGGAGAGTTGGCAATTTGAACCTGCGCACC(SEQIDNo75) OverlapSite3lacZ-R AAGGAGAAAACCTTATTTGAACCTGCGCACC(SEQIDNo76) OverlapSite1lacZ-F TCAAATTTTCGGCTCTCCTTAATAAGGTCAAATGGTATC(SEQ IDNo77) OverlapSite2LacZ-F TCAAATTGCCAACTCTCCTTAATAAGGTCAAATGGTATC(SEQ IDNo78) OverlapSite3lacZ-F TCAAATAAGGTTTTCTCCTTAATAAGGTCAAATGGTATC(SEQ IDNo79) attBsite1LacZ-F GAAATCCCGAACCTGCGCACCAATTCAACATTG(SEQID No80) attBsite1LacZ-R GGCGCTCCACAATAAGGTCAAATGGTATCCCTATAGG(SEQ IDNo81) attBsite2LacZ-F GGCAATTTAACCCTGCGCACCAATTCAACATTG(SEQIDNo82) attBsite2LacZ-R AACGCTTATTAATAAGGTCAAATGGTATCCCTATAGG(SEQ IDNo83) attBsite3LacZ-F CCTTATTTATCCCTGCGCACCAATTCAACATTG(SEQIDNo84) attBsite3LacZ-R TTTTCCCCTGAATAAGGTCAAATGGTATCCCTATAGG(SEQID No85) B.OligonucleotidesusedforRandomised DNAlibrariesconstructionandsequencing. Primername Sequence(5-3) SeqbanqueattB-F TGCCTGCAGGTCGACTCTAG(SEQIDNo86) SeqbanqueattL-R ACGCTAATGCCATCTATTAACTAGC(SEQIDNo87) SeqbanqueattR-F GAAACAACCAGAAACGCTTTTTAG(SEQIDNo88) SeqbanqueattB-R GGCGAATTCGAGCTCGGTAC(SEQIDNo89) SeqbanqueattP-R GGTCGACGGTATCGATAAGC(SEQIDNo90) SeqbanqueattP-F GCAGGCGGAATGTTGAAAGAG(SEQIDNo91) attP-N10-N19-R GTTCTGGAGGTTTCGAATCTTG(SEQIDNo92) attP-N10-N19-Vf-F GTGATAATCGCCTGCCCGTTTGAC(SEQIDNo93) attPlibrary-F CAAGATTCGAAACCTCCAGAAC(SEQIDNo94) attPlibrary-R GTCAAACGGGCAGGCGATTATCAC(SEQIDNo95) attBlibrary-F GCAGGTCGACTCTAGAGGATCCCACAAGCTTCGAAATCCGCCGA ACCAATG(SEQIDNo96) attBlibrary-R GAGCTCGGTACCCGGGGATCCCACGAATTCACTACTGGCTACTT TGAAATACTTCC(SEQIDNo97) attB-gibsonpCC1Fos-F GCAGGTCGACTCTAGAGGATCCCACGAATTCCTGCAGCCCAAGC (SEQIDNo57) attB-gibsonpCC1Fos-R GAGCTCGGTACCCGGGGATCCCACCCCCATTTGATTTAGATGTC CTTTC(SEQIDNo58) attBLib9 CGAAATCCGCCGAACCAATGTTGANNNNNNNNGCAGGTTCANN NNNTGTACTCTCCNNNNNAAGGTCAAATNNNNNNNNTATAGGA AGTATTTCAAAGTAGCCAGTAGT(SEQIDNo98) attBLib7 CGAAATCCGCCGAACCAATGTTGANNNNNNNCGCAGGTTCAAA TNNNNNNNTCTCCTTAATAAGGTCAAATGNNNNNNNTATAGGA AGTATTTCAAAGTAGCCAGTAGT(SEQIDNo99) attBLib8 CGAAATCCGCCGAACCAATGTTGANNNNNNNCGCAGGTTCAAA TCCTGTACNNNNNNNAATAAGGTCAAATGNNNNNNNTATAGGA AGTATTTCAAAGTAGCCAGTAGT(SEQIDNo100) attPLib2 CAAGATTCGAAACCTCCAGAACCCTCCAGAAACATGGTTGAAAG AANNNNNNNTCTCCTTAATCAAAGCAATAATCCCCGAGAAATCA ACATTCTCGGGGATTATTTTTGTTTTTAACTAGAAAATAACTAGA AAGAGCTAGTTAATAGATGGCATTAGCGTGATAATCGCCTGCCC GTTTGAC(SEQIDNo101) attPLib1 CAAGATTCGAAACCTCCAGAACCCTCCAGAAACATGGTTGAAAG AACCTGTACNNNNNNNAATCAAAGCAATAATCCCCGAGAAATC AACATTCTCGGGGATTATTTTTGTTTTTAACTAGAAAATAACTAG AAAGAGCTAGTTAATAGATGGCATTAGCGTGATAATCGCCTGCC CGTTTGAC(SEQIDNo102) attBLib5 CGAAATCCGCCGAACCAATGTTGAATTGGTGNNNNNNNNNNAA TCCTGTACTCTCCTTAATAAGGTCAAATGGTATCCCTATAGGAAG TATTTCAAAGTAGCCAGTAGT(SEQIDNo103) attPLib4 CAAGATTCGAAACCTCCAGAACCCTCCAGAAACNNNNNNNNNN GAACCTGTACTCTCCTTAATCAAAGCAATAATCCCCGAGAAATC AACATTCTCGGGGATTATTTTTGTTTTTAACTAGAAAATAACTAG AAAGAGCTAGTTAATAGATGGCATTAGCGTGATAATCGCCTGCC CGTTTGAC(SEQIDNo104) attBLib6 CGAAATCCGCCGAACCAATGTTGAATTGGTGCGCNNNNNNNNN NCCTGTACTCTCCTTAATAAGGTCAAATGGTATCCCTATAGGAA GTATTTCAAAGTAGCCAGTAGT(SEQIDNo105) attPLib3 CAAGATTCGAAACCTCCAGAACCCTCCAGAAACATGGTTNNNNN NNCCTGTACTCTCCTTAATCAAAGCAATAATCCCCGAGAAATCA ACATTCTCGGGGATTATTTTTGTTTTTAACTAGAAAATAACTAGA AAGAGCTAGTTAATAGATGGCATTAGCGTGATAATCGCCTGCCC GTTTGAC(SEQIDNo106) C.OligonucleotidesusedEMSA. Primername Sequence(5-3) COCWT-F GGTATTGGAAAGAACCTGTACTCTCCTTGCGTAAC(SEQID No107) COCWT-Cy3-R GTTACGCAAGGAGAGTACAGGTTCTTTCCAATACC(SEQID No108) P12WT-28bp-F GTTTTTAACTAGAAAATAACTAGAATTC(SEQIDNo109) P12WT-28bp-R GAATTCTAGTTATTTTCTAGTTAAAAAC(SEQIDNo110) P12WT-40bp-F CACGTCGTTTTTAACTAGAAAATAACTAGAATTCCACGTC (SEQIDNo111) P12WT-40bp-R GACGTGGAATTCTAGTTATTTTCTAGTTAAAAACGACGTG (SEQIDNo112) P12WTdelP3-F TTTTTAACTAGAAAATAACTAGAAAGCGACCGAGCGGTCG (SEQIDNo113) P12WTdelP3-R CGACCGCTCGGTCGCTTTCTAGTTATTTTCTAGTTAAAAA (SEQIDNo114) P123WT-F TTTTTAACTAGAAAATAACTAGAAAGAGCTAGTTAATAGA (SEQIDNo115) P123WT-R TCTATTAACTAGCTCTTTCTAGTTATTTTCTAGTTAAAAA (SEQIDNo116) P1WT-F TTTTTAACTAGAACGACCGAGCGGAGGCGGCACGCTGTCG (SEQIDNo117) P1WT-R CGACAGCGTGCCGCCTCCGCTCGGTCGTTCTAGTTAAAAA (SEQIDNo118) P2WT-F GCCGACCGAGCGGAATAACTAGAAAGGCGGCACGCTGTCG (SEQIDNo119) P2WT-R CGACAGCGTGCCGCCTTTCTAGTTATTCCGCTCGGTCGGC (SEQIDNo120) P3WT-F GCCGACCGAGCGGGCGGCACGCTGAGAGCTAGTTAATAGA (SEQIDNo121) P3WT-R TCTATTAACTAGCTCTCAGCGTGCCGCCCGCTCGGTCGGC (SEQIDNo122) P23WT-F GCCGACCGAGCGGAATAACTAGAAAGAGCTAGTTAATAGA (SEQIDNo123) P23WT-R TCTATTAACTAGCTCTTTCTAGTTATTCCGCTCGGTCGGC (SEQIDNo124) P13WT-F TTTTTAACTAGAACGACCGAGCGGAGAGCTAGTTAATAGA (SEQIDNo125) P13WT-R TCTATTAACTAGCTCTCCGCTCGGTCGTTCTAGTTAAAAA (SEQIDNo126) P12WT-F CACGTCGTGATCAACTAGATTTTTAACTAGAAACCACGTC (SEQIDNo127) P12WT-R GACGTGGTTTCTAGTTAAAAATCTAGTTGATCACGACGTG (SEQIDNo128) P1WT-F CACGTCGTGATCAACTAGATTCGACCGAGCGGACCACGTC (SEQIDNo129) P1WT-R GACGTGGTCCGCTCGGTCGAATCTAGTTGATCACGACGTG (SEQIDNo130) P2WT-F CACGTCGTGCGACCGAGCGGTTTTAACTAGAAACCACGTC (SEQIDNo131) P2WT-R GACGTGGTTTCTAGTTAAAACCGCTCGGTCGCACGACGTG (SEQIDNo132) Nositebras-F CACGTCGTGCGACCGAGCGGTGCGGCACGCTGACCACGTC (SEQIDNo133) Nositebras-R GACGTGGTCAGCGTGCCGCACCGCTCGGTCGCACGACGTG (SEQIDNo134) P12WT-40bp-R- CACGTCGTTTTTAACTAGAAAATAACTAGAATTCCACGTC Cy3 (SEQIDNo135) COCWT-R GTTACGCAAGGAGAGTACAGGTTCTTTCCAATACC(SEQID No136) BOBWT-F GGTATTGTTCAAATCCTGTACTCTCCTTGCGTAAC(SEQID No137) BOBWT-R GTTACGCAAGGAGAGTACAGGATTTGAACAATACC(SEQ IDNo138) BOBWT-R-Cy3 GTTACGCAAGGAGAGTACAGGATTTGAACAATACC(SEQ IDNo139) XOC-F GGTATTGGAACCCGCCTGTACTCTCCTTGCGTAAC(SEQID No140) XOC-R-Cy3 GTTACGCAAGGAGAGTACAGGCGGGTTCCAATACC(SEQID No141) COX-F GGTATTGGAAAGAACCTGTACCGGGCTTGCGTAAC(SEQID No142) COX-R-Cy3 GTTACGCAAGCCCGGTACAGGTTCTTTCCAATACC(SEQID No143) BOX-F GGTATTGTTCAAATCCTGTACCGGGCTTGCGTAAC(SEQID No144) BOX-R-Cy3 GTTACGCAAGCCCGGTACAGGATTTGAACAATACC(SEQID No145) P12-9pb-F CACGTCGTGCTCAACTAGAGTCTTAACTAGAGACCACGTC (SEQIDNo146) P12-9pb-R GACGTGGTCTCTAGTTAAGACTCTAGTTGAGCACGACGTG (SEQIDNo147) P12-6pb-F CACGTCGTGCCGAACTAGGGTCCGAACTAGGGACCACGTC (SEQIDNo148) P12-6pb-R GACGTGGTCCCTAGTTCGGACCCTAGTTCGGCACGACGTG (SEQIDNo149) P23forward-F GCCGACCGAGCGGAATAACTAGAAAGATTAACTAGCTAGA (SEQIDNo150) P23forward-R TCTAGCTAGTTAATCTTTCTAGTTATTCCGCTCGGTCGGC (SEQIDNo151) P12reverse-F TTTTTAACTAGAATTCTAGTTATTAGCGACCGAGCGGTCG (SEQIDNo152) P12reverse-R CGACCGCTCGGTCGCTAATAACTAGAATTCTAGTTAAAAA (SEQIDNo153) D.Oligonucleotidesusedfor invitrofluorescentrecombination. Primername Sequence LbbulgattB-F GAATTCCTGCAGCCCAAGC(SEQIDNo154) Cy3-New-attB-R GATGTAGATAATTTTTGGGCCAAGG(SEQIDNo155) E.Oligonucleotidesusedtovalidate invivointegrationintotRNASER Primername Sequence(5-3) ARNtSERcoli-F ACAGTGACGATCTAACCCTTC(SEQIDNo156) ARNtSERcoli-R TGACTAATTTGCTTTGTTCCTG(SEQIDNo157) ARNtSERlactis-F CATCATTTTTCTTCTTTCAAATTAATATAAATGC(SEQID No158) ARNtSERlactis-R CAGGAGGAAAAGGAGTAAGC(SEQIDNo159) attL-R ACGCTAATGCCATCTATTAACTAGC(SEQIDNo160) Bold nucleotides indicate the localization of arm or core-binding sites as well as the overlap sequence. Underlined nucleotides are nucleotides that differ from the WT sites.

    TABLE-US-00012 TABLE 8 NGS data of the randomized DNA libraries. DNA Motif number Motif number Library Motif number after Motif occurrence representing 50% (core- before recombination after recombination of the read count binding recombination (attL or attR) Rep1 Rep2 after recombination region) (attB or attP) Rep1 Rep2 min max min max Rep1 Rep2 attB 16384 5459 6955 1 14405 1 11737 196 199 Lib6 (100%) (33.32%) (42.45%) (3.59%) (2.86%) (B) attB 16312 9497 10085 1 49310 1 69525 371 460 Lib8 (99.56%) (57.97%) (61.55%) (3.91%) (4.56%) (B) attP 16340 10416 11781 1 2139 1 3925 784 799 Lib3 (99.73%) (63.57%) (71.91%) (7.53%) (6.78%) (C) attP 16315 7549 7247 1 16157 1 46672 254 167 Lib1 (99.58%) (46.08%) (44.23%) (3.36%) (2.30%) (C)

    REFERENCES

    [0136] Afgan, E., Baker, D., Batut, B., van den Beek, M., Bouvier, D., Cech, M., et al. (2018) The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Research 46: W537-W544 https://doi.org/10.1093/nar/gky379. Accessed Feb. 2, 2022. [0137] Aihara, H., Kwon, H. J., Nunes-Diby, S. E., Landy, A., and Ellenberger, T. (2003) A conformational switch controls the DNA cleavage activity of lambda integrase. Mol Cell 12: 187-198. [0138] Auvray, F., Coddeville, M., Espagno, G., and Ritzenthaler, P. (1999a) Integrative recombination of Lactobacillus delbrueckii bacteriophage mv4: functional analysis of the reaction and structure of the attP site. Mol Gen Genet 262: 355-366. [0139] Auvray, F., Coddeville, M., Ordonez, R. C., and Ritzenthaler, P. (1999b) Unusual structure of the attB site of the site-specific recombination system of Lactobacillus delbrueckii bacteriophage mv4. J Bacteriol 181: 7385-7389. [0140] Auvray, F., Coddeville, M., Ritzenthaler, P., and Dupont, L. (1997) Plasmid integration in a wide range of bacteria mediated by the integrase of Lactobacillus delbrueckii bacteriophage mv4. J Bacteriol 179: 1837-1845. [0141] Bauer, C. E., Gardner, J. F., and Gumport, R. I. (1985) Extent of sequence homology required for bacteriophage lambda site-specific recombination. J Mol Biol 181: 187-197. [0142] Biswas, T., Aihara, H., Radman-Livaja, M., Filman, D., Landy, A., and Ellenberger, T. (2005) A structural basis for allosteric control of DNA recombination by k integrase. Nature 435: 1059-1066 https://www.nature.com/articles/nature03657. Accessed Aug. 10, 2021. [0143] Cluzel, P.-J., Veaux, M., Rousseau, M., and Accolas, J.-P. (1987) Evidence for temperate bacteriophages in two strains of Lactobacillus bulgaricus. J Dairy Res 54: 397-405. [0144] Coddeville, M., and Ritzenthaler, P. (2010) Control of directionality in bacteriophage mv4 site-specific recombination: functional analysis of the Xis factor. J Bacteriol 192: 624-635. [0145] Coddeville, M., Spinella, J.-F., Cassart, P., Girault, G., Daveran-Mingot, M.-L., Le Bourgeois, P., and Ritzenthaler, P. (2014a) Bacteriophage mv4 site-specific recombination: the central role of the P2 mv4Int-binding site. J Virol 88: 1839-1842. [0146] Coddeville, M., Spinella, J. F., Cassart, P., Girault, G., Daveran-Mingot, M. L., Le Bourgeois, P., and Ritzenthaler, P. (2014b) Bacteriophage mv4 site-specific recombination: the central role of the P2 .sup.mv4Int-binding site. J Virol 88: 1839-1842. [0147] Craig, N. L., and Nash, H. A. (1983) The mechanism of phage lambda site-specific recombination: site-specific breakage of DNA by Int topoisomerase. Cell 35: 795-803. [0148] Crooks, G. E., Hon, G., Chandonia, J.-M., and Brenner, S. E. (2004) WebLogo: a sequence logo generator. Genome Res 14: 1188-1190. [0149] Datsenko, K. A., and Wanner, B. L. (2000) One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proc Natl Acad Sci USA 97: 6640-6645. [0150] Dupont, L., Boizet-Bonhoure, B., Coddeville, M., Auvray, F., and Ritzenthaler, P. (1995a) Characterization of genetic elements required for site-specific integration of Lactobacillus delbrueckii subsp. bulgaricus bacteriophage mv4 and construction of an integration-proficient vector for Lactobacillus plantarum. J Bacteriol 177: 586-595. [0151] Dupont, L., Boizet-Bonhoure, B., Coddeville, M., Auvray, F., and Ritzenthaler, P. (1995b) Characterization of genetic elements required for site-specific integration of Lactobacillus delbrueckii subsp. bulgaricus bacteriophage mv4 and construction of an integration-proficient vector for Lactobacillus plantarum. J Bacteriol 177: 586-595. [0152] Gasson, M. J. (1983) Plasmid complements of Streptococcus lactis NCDO 712 and other lactic streptococci after protoplast-induced curing. J Bacteriol 154: 1-9. [0153] Gibb, B., Gupta, K., Ghosh, K., Sharp, R., Chen, J., and Van Duyne, G. D. (2010) Requirements for catalysis in the Cre recombinase active site. Nucleic Acids Res 38: 5817-5832. [0154] Gibson, D. G., Young, L., Chuang, R.-Y., Venter, J. C., Hutchison, C. A., and Smith, H. O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6: 343-345. [0155] Grindley, N. D. F., Whiteson, K. L., and Rice, P. A. (2006) Mechanisms of site-specific recombination. Annu Rev Biochem 75: 567-605. [0156] Hauser, M. A., and Scocca, J. J. (1992) Site-specific integration of the Haemophilus influenzae bacteriophage HP1. Identification of the points of recombinational strand exchange and the limits of the host attachment site. J Biol Chem 267: 6859-6864. [0157] Hoess, R. H., Wierzbicki, A., and Abremski, K. (1986) The role of the loxP spacer region in P1 site-specific recombination. Nucleic Acids Res 14: 2287-2300. [0158] Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., et al. (2021a) Highly accurate protein structure prediction with AlphaFold. Nature 596: 583-589 http://www.nature.com/articles/s41586-021-03819-2. Accessed Jan. 16, 2022. [0159] Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., et al. (2021b) Highly accurate protein structure prediction with AlphaFold. Nature 596: 583-589. [0160] Kolot, M., Malchin, N., Elias, A., Gritsenko, N., and Yagil, E. (2015) Site promiscuity of coliphage HK022 integrase as a tool for gene therapy. Gene Ther 22: 521-527. [0161] Kolot, M., and Yagil, E. (1994) Position and direction of strand exchange in bacteriophage HK022 integration. Mol Gen Genet 245: 623-627. [0162] Le Bourgeois, P., Langella, P., and Ritzenthaler, P. (2000) Electrotransformation of Lactococcus lactis. In Electrotransformation of Bacteria. Eynard, N., and Teissi, J. (eds). Springer, Berlin, Heidelberg. pp. 56-65 https://doi.org/10.1007/978-3-662-04305-9_6. Accessed Sep. 7, 2021. [0163] Lee, G., and Saito, I. (1998) Role of nucleotide sequences of loxP spacer region in Cre-mediated recombination. Gene 216: 55-65. [0164] Leenhouts, K., Buist, G., Bolhuis, A., Berge, A. ten, Kiel, J., Mierau, I., et al. (1996) A general system for generating unlabelled gene replacements in bacterial chromosomes. Mol Gen Genet 253: 217-224 https://doi.org/10.1007/s004380050315. Accessed Sep. 7, 2021. [0165] Malanowska, K., Salyers, A. A., and Gardner, J. F. (2006) Characterization of a conjugative transposon integrase, IntDOT. Mol Microbiol 60: 1228-1240. [0166] Mata, M., Trautwetter, A., Luthaud, G., and Ritzenthaler, P. (1986) Thirteen virulent and temperate bacteriophages of Lactobacillus bulgaricus and Lactobacillus lactis belong to a single DNA homology group. Appl Environ Microbiol 52: 812-818. [0167] McLeod, M., Craft, S., and Broach, J. R. (1986) Identification of the crossover site during FLP-mediated recombination in the Saccharomyces cerevisiae plasmid 2 microns circle. Mol Cell Biol 6: 3357-3367. [0168] Nunes-Duby, S. E., Kwon, H. J., Tirumalai, R. S., Ellenberger, T., and Landy, A. (1998) Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res 26: 391-406. [0169] Nunes-Duby, S. E., Kwon, H. J., Tirumalai, R. S., Ellenberger, T., and Landy, A. (1998) Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res 26: 391-406. [0170] Pea, C. E., Stoner, J. E., and Hatfull, G. F. (1996) Positions of strand exchange in mycobacteriophage L5 integration and characterization of the attB site. J Bacteriol 178: 5533-5536. [0171] Rice, P., Longden, I., and Bleasby, A. (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16: 276-277 http://www.cell.com/trends/genetics/abstract/S0168-9525(00)02024-2. Accessed Feb. 2, 2022. [0172] Sarkar, D., Radman-Livaja, M., and Landy, A. (2001) The small DNA binding domain of lambda integrase is a context-sensitive modulator of recombinase functions. EMBO J 20: 1203-1212. [0173] Schneider, T. D., and Stephens, R. M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18: 6097-6100. [0174] Sheren, J., Langer, S. J., and Leinwand, L. A. (2007) A randomized library approach to identifying functional lox site domains for the Cre recombinase. Nucleic Acids Res 35: 5464-5473. [0175] Smith-Mungo, L., Chan, I. T., and Landy, A. (1994) Structure of the P22 att site. Conservation and divergence in the lambda motif of recombinogenic complexes. J Biol Chem 269: 20798-20805. [0176] Studier, F. W., Rosenberg, A. H., Dunn, J. J., and Dubendorff, J. W. (1990) Use of T7 RNA polymerase to direct expression of cloned genes. In Methods in Enzymology. Academic Press, pp. 60-89 https://www.sciencedirect.com/science/article/pii/007668799085008C. Accessed May 30, 2022. [0177] Weisberg, R. A., Enquist, L. W., Foeller, C., and Landy, A. (1983) Role for DNA homology in site-specific recombination. The isolation and characterization of a site affinity mutant of coliphage lambda. J Mol Biol 170: 319-342. [0178] Wojciak, J. M., Sarkar, D., Landy, A., and Clubb, R. T. (2002) Arm-site binding by lambda-integrase: solution structure and functional characterization of its amino-terminal domain. Proc Natl Acad Sci USA 99: 3434-3439. [0179] Zhu, B., Cai, G., Hall, E. O., and Freeman, G. J. (2007) In-Fusion assembly: seamless engineering of multidomain fusion proteins, modular vectors, and mutations. BioTechniques 43: 354-359 https://www.future-science.com/doi/10.2144/000112536. Accessed Jun. 3, 2022.

    Sequences

    TABLE-US-00013 SEQIDNo1:nucleotidesequenceof tRNASER(CGA)oftheLactobacillusdelbrueckii subsp.bulgaricus GGAGAGTTGGCAGAGCGGTAATGCAGCGGACTCGAAATCCGCCGA ACCAATGTTGAATTGGTGCGCAGGTTCAAATCCTGTACTCTCCTT AAT SEQIDNo2:polynucleotidefragmentP1-P2 5-ATCAACTAGATTTTTAACTAGAA-3 SEQIDNo3:polynucleotidefragmentP1-P2 5-TTTAACTAGAAAATAACTAGAA-3 SEQIDNo4:nucleotidesequenceof.sup.mv4Int ATGCCAAAGCGTAATCCTGCAATCAAAAAATACACCAGCCGGGGC CAAACAAAATACAAATTCCAGATTTACCTGGGCCAGGACGAAAGC GGAAAATCAATCAACACGACCCGGAGTGGTTTTAAATCTTACTCC CAAGCATCAGCAGCTTACAACAAGCTTAAGGCCCAAGGATTGGCC GCCAAAGCACCCAAAAAAGCGACCACCGATGAGGTGTGGTCGCTT TGGTTTGATAGCTATAAAGGCGGAGTTAAAGAGTCGTCAGCAAAC AAAACGCTGACTAGTTACAGAGTCCACATCAAGCCTGCTTTTGGT GATAAAATGATCAGCTCGATCAAGACGGCCACCGTACAACTCTGG GCAAACAATTTGGCCACCAAGCTGGTCAACTACAAGGTGGTTGTG CGCCTGCTAGGGACTCTTTTTGAATTTGCCAAGCGCCTGGACTAT TGCAAGGACAACCCGGTCAAGCAGATCATCATGCCAAAAGCTACC TCCAGGCCTCGCAGAGACATCAGCACCAACTACTATAACCGTGAT GAGCTTCAGCAGTTCCTGCAGGCCGCTAAAGAAGTAGGATCCCGG ACTTATGTCTTCTTTCTACTCCTTGCTACCACGGGCCTCCGAAAA GGCGAAGCACTAGCCCTGGATTGGTCGGACATCGACTACGATCAA GGAAAAATCTCCGTCACTAAGACTCTTGCCTATGGCCTGGGTGGC AAGTACGGGATCCAGCCACCTAAGACTAAGGCAGGGATCCGCACG GTGCCACTGACTGATCAGATGGCAGCCGTTTTAAAAGACTACCAT AGTGATCTCTGCCCGCACCTTTTTCACACGCTTGATGGTGATTAT CTCCGTCTTAGTAAGCCAGATCAGTGGCTTCAGGCTGTTTATAAA CACGACCCAGACCTCCGACAAATTAGAATCCATGGCTTCCGTCAT ACTTTTGCGTCCCTGCTCATCACTGCGGATCCGTCAATCAAGCCA ACAGACGTGCAAGCAATCCTGGGTCATGAATCAATCGATATTACC ATGGAGATTTACATGCACGCCACTCAAGAAGGCAGGCGGAATGTT GAAAGAGTTCTAAATCAACTAGATTTTTAA SEQIDNo5:peptidesequenceof.sup.mv4Int MPKRNPAIKKYTSRGQTKYKFQIYLGQDESGKSINTTRSGFKSYS QASAAYNKLKAQGLAAKAPKKATTDEVWSLWFDSYKGGVKESSAN KTLTSYRVHIKPAFGDKMISSIKTATVQLWANNLATKLVNYKVVV RLLGTLFEFAKRLDYCKDNPVKQIIMPKATSRPRRDISTNYYNRD ELQQFLQAAKEVGSRTYVFFLLLATTGLRKGEALALDWSDIDYDQ GKISVTKTLAYGLGGKYGIQPPKTKAGIRTVPLTDQMAAVLKDYH SDLCPHLFHTLDGDYLRLSKPDQWLQAVYKHDPDLRQIRIHGFRH TFASLLITADPSIKPTDVQAILGHESIDITMEIYMHATQEGRRNV ERVLNQLDF