PIF1-LIKE HELICASE AND USE THEREOF

20220290223 · 2022-09-15

Assignee

Inventors

Cpc classification

International classification

Abstract

The present application provides a Pif1-like helicase and the use thereof, specifically a modified Pif1-like helicase, a construct comprising the Pif1-like helicase, and its use in characterising a target polynucleotide or controlling the movement of a target polynucleotide through a pore. The present application also provides a method for characterising the target polynucleotide or controlling the movement of the target polynucleotide through the pore. The Pif1-like helicase of the present application can effectively control the movement of the polynucleotide through the pore.

Claims

1. A Pif1-like helicase in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into a tower domain, a pin domain and/or a 1A domain of the Pif1-like helicase, wherein the Pif1-like helicase retains its ability to control the movement of a polynucleotide.

2. The Pif1-like helicase according to claim 1, wherein the helicase comprises: (a) a variant of SEQ ID NO: 1 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues E264-P278 and N296-A394 of the tower domain, and/or residues K89-E105 of the pin domain, and/or residues M1-L88 and M106-V181 of the 1A domain; (b) a variant of SEQ ID NO: 2 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues E265-P279 and N297-A392 of the tower domain, and/or residue K89-D105 of the pin domain, and/or residue M1-L88 and I106-M180 of the 1A domain; (c) a variant of SEQ ID NO: 3 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues T266-P280 and N298-5403 of the tower domain, and/or residues K89-A109 of the pin domain, and/or residues M1-L88 and K110-V182 of the 1A domain; (d) a variant of SEQ ID NO: 4 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues T266-P280 and N298-5404 of the tower domain, and/or residues K89-A109 of the pin domain, and/or residues M1-L88 and K110-V182 of the 1A domain; (e) a variant of SEQ ID NO: 5 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues E260-P274 and N292-A391 of the tower domain, and/or residues K86-E102 of the pin domain, and/or residues M1-L84 and M103-K177 of the 1A domain; (f) a variant of SEQ ID NO: 6 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues E266-P280 and N298-A396 of the tower domain, and/or residues K91-E107 of the pin domain, and/or residues M1-L90 and M108-M183 of the 1A domain; (g) a variant of SEQ ID NO: 7 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues T276-P290 and N308-P402 of the tower domain, and/or residues K100-D116 of the pin domain, and/or residues M1-L99 and D117-M191 of the 1A domain; (h) a variant of SEQ ID NO: 8 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues D274-P288 and N306-A404 of the tower domain, and/or residues K95-E112 of the pin domain, and/or residues M1-L95 and I113-K187 of the 1A domain; (i) a variant of SEQ ID NO: 9 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues E260-P274 and N292-A391 of the tower domain, and/or residues K86-E102 of the pin domain, and/or residues M1-L85 and M103-K177 of the 1A domain; (j) a variant of SEQ ID NO: 10 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues E265-P279 and H297-A393 of the tower domain, and/or residues K88-E104 of the pin domain, and/or residues M1-L87 and I105-K180 of the 1A domain; or (k) a variant of SEQ ID NO: 11 in which at least one cysteine residue and/or at least one non-natural amino acid have been introduced into residues E264-P278 and N296-P389 of the tower domain, and/or residues K97-A113 of the pin domain, and/or residues M1-L96 and P114-K184 of the 1A domain; preferably, the helicase comprises: (a) a variant of SEQ ID NO: 11, which comprises (i) E105C and/or A362C; (ii) E104C and/or K360C; (iii) E104C and/or A362C; (iv) E104C and/or Q363C; (v) E104C and/or K366C; (vi) E105C and/or M356C; (vii) E105C and/or K360C; (viii) E104C and/or M356C; (ix) E105C and/or Q363C; (x) E105C and/or K366C; (xi) F108C and/or M356C; (xii) F108C and/or K360C; (xiii) F108C and/or A362C; (xiv) F108C and/or Q363C; (xv) F108C and/or K366C; (xvi) K134C and/or M356C; (xvii) K134C and/or K360C; (xviii) K134C and/or A362C; (xix) K134C and/or Q363C; (xx) K134C and/or K366C; (xxi) any of (i) to (xx) and G359C; (xxii) any of (i) to (xx) and Q111C; (xxiii) any of (i) to (xx) and I138C; (xxiv) any of (i) to (xx) and Q111C and I138C; (xxv) E105C and/or F377C; (xxvi) Y103L, E105Y, N352N, A362C and Y365N; (xxvii) E105Y and A362C; (xxviii) A362C; (xxix) Y103L, E105C, N352N, A362Y and Y365N; (xxx) Y103L, E105C and A362Y; (xxxi) E105C and/or A362C, and I280A; (xxxii) E105C and/or L358C; (xxxiii) E104C and/or G359C; (xxxiv) E104C and/or A362C ; (Xxxv) K106C and/or W378C; (xxxvi) T102C and/or N382C; (xxxvii) T102C and/or W378C; (xxxviii) E104C and/or Y355C; (xxxix) E104C and/or N382C; (xl) E104C and/or K381C; (xli) E104C and/or K379C; (xlii) E104C and/or D376C; (xliii) E104C and/or W378C; (xliv) E104C and/or W374C; (xlv) E105C and/or Y355C; (xlvi) E105C and/or N382C; (xlvii) E105C and/or K381C; (xlviii) E105C and/or K379C; (xlix) E105C and/or D376C; (1) E105C and/or W378C; (1i) E105C and/or W374C; (lii) E105C and A362Y; (liii) E105C, G359C and A362C; or (liv)I2C, E105C and A362C; or, (b) a variant of any one of SEQ ID NOs: 1 to 10, which comprises a cysteine residue at the positions which correspond to those in SEQ ID NO: 11 as defined in any of (i) to (liv).

3. The Pif1-like helicase according to claim 1, wherein the non-natural amino acid is selected from 4-Azido-L-phenylalanine (Faz), 4-Acetyl-L-phenylalanine, 3-Acetyl-L-phenylalanine, 4-Acetoacetyl-L-phenylalanine, O-Allyl-L-tyrosine, 3-(Phenylselanyl)-L-alanine, O-2-Propyn-l-yl-L-tyrosine, 4-(Dihydroxyboryl)-L-phenylalanine, 4-[(Ethylsulfanyl)carbonyl]-L-phenylalanine, (2S)-2-amino-3-{4-[(propan-2-ylsulfanyl)carbonyl]phenyl}propanoic acid, (2S)-2-amino-3-{4-[(2-amino-3-sulfanylpropanoyl)amino]pheny}propanoic acid, O-Methyl-L-tyrosine, 4-Amino-L-phenylalanine, 4-Cyano-L-phenylalanine, 3-Cyano-L-phenylalanine, 4-Fluoro-L-phenylalanine, 4-Iodo-L-phenylalanine, 4-Bromo-L-phenylalanine, O-(Trifluoromethyl)tyrosine, 4-Nitro-L-phenylalanine, 3-Hydroxy-L-tyrosine, 3-Amino-L-tyrosine, 3-Iodo-L-tyrosine, 4-Isopropyl-L-phenylalanine, 3-(2-Naphthyl)-L-alanine, 4-Phenyl-L-phenylalanine, (2S)-2-amino-3-(naphthalen-2-ylamino)propanoic acid, 6-(Methylsulfanyl)norleucine, 6-Oxo-L-lysine, D-tyrosine, (2R)-2-Hydroxy-3-(4-hydroxyphenyl)propanoic acid, (2R)-2-Ammoniooctanoate3-(2,2′-Bipyridin-5-yl)-D-alanine, 2-amino-3-(8-hydroxy-3-quinolyl)propanoic acid, 4-Benzoyl-L-phenylalanine, S-(2-Nitrobenzyl)cysteine, (2R)-2-amino-3-[(2-nitrobenzyl)sulfanyl]propanoic acid, (2S)-2-amino-3-[(2-nitrobenzyl)oxy]propanoic acid, O-(4,5-Dimethoxy-2-nitrobenzyl)-L-serine, (2S)-2-amino-6-({[(2-nitrobenzyl)oxy]carbonyl}amino)hexanoic acid, O-(2-Nitrobenzyl)-L-tyrosine, 2-Nitrophenylalanine, 4-[(E)-Phenyldiazenyl]-L-phenylalanine, 4-[3-(Trifluoromethyl)-3H-diaziren-3-yl]-D-phenylalanine, 2-amino-3-[[5-(dimethylamino)-1-naphthyl]sulfonylamino]propanoic acid, (2S)-2-amino-4-(7-hydroxy-2-oxo-2H-chromen-4-yl)butanoic acid, (2S)-3-[(6-acetylnaphthalen-2-yl)amino]-2-aminopropanoic acid, 4-(Carboxymethyl)phenylalanine, 3-Nitro-L-tyrosine, O-Sulfo-L-tyrosine, (2R)-6-Acetamido-2-ammoniohexanoate, 1-Methylhistidine, 2-Aminononanoic acid, 2-Aminodecanoic acid, L-Homocysteine, 5-Sulfanylnorvaline, 6-Sulfanyl-L-norleucine, 5-(Methyl sulfanyl)-L-norvaline, N.sup.6-{[(2R,3R)-3-Methyl-3,4-dihydro-2H-pyrrol-2-yl]carbonyl}-L-lysine, N.sup.6-[(Benzyloxy)carbonyl]lysine, (2S)-2-amino-6-[(cyclopentylcarbonyl)amino]hexanoic acid, N.sup.6-[(Cyclopentyloxy)carbonyl]-L-lysine, (2S)-2-amino-6-{[(2R)-tetrahydrofuran-2-ylcarbonyl]amino}hexanoic acid, (2S)-2-amino-8-[(2R,3S)-3-ethynyltetrahydrofuran-2-yl]-8-oxooctanoic acid, N.sup.6-(tert-Butoxycarbonyl)-L-lysine, (2S)-2-Hydroxy-6-({[(2-methyl-2-propanyl)oxy]carbonyl}amino)hexanoic acid, N.sup.6-[(Allyloxy)carbonyl]lysine, (2S)-2-amino-6-({[(2-azidobenzyl)oxy]carbonyl}amino)hexanoic acid, N.sup.6-L-Prolyl-L-lysine, (2S)-2-amino-6-{[(prop-2-yn-l-yloxy)carbonyl]amino}hexanoic acid and N.sup.6-[(2-Azidoethoxy)carbonyl]-L-lysine.

4. The Pif1-like helicase according to claim 2, wherein the amino acid sequence of the Pif1-like helicase is any of the amino acid sequences shown in SEQ ID NOs:1 to 11 or has at least 30%, at least 40%, at least 50%, 60%, at least 70%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.9% homology with any of the amino acid sequences shown in SEQ ID NOs: 1 to 11, and has the ability to control the movement of a polynucleotide.

5. The Pif1-like helicase according to claim 1, wherein the introduced cysteines are connected to one another, the introduced non-natural amino acids are connected to one another, the introduced cysteine and the introduced non-natural amino acid are connected to one another, the introduced cysteine and the native amino acid are connected to one another, or the introduced non-natural amino acid and the native amino acid are connected to one another.

6. The Pif1-like helicase according to claim 1, wherein the helicase further comprises: (A) substitution or deletion of at least one or more native amino acids; and/or (B) substitution of at least one amino acid that interacts with one or more nucleotides in single-stranded DNA or double-stranded DNA, wherein the Pif1-like helicase retains its ability to control the movement of a polynucleotide.

7. The Pif1-like helicase according to claim 6, wherein in (A), the native amino acid is substited with a non-polar amino acid, a polar amino acid or a charged amino acid; the non-polar amino acid includes but not limited to glycine (G), alanine (A) or valine (V); the polar amino acids include but are not limited to serine (S), threonine (T), tyrosine (Y), asparagine (N) or glutamine (Q); the charged amino acids include but are not limited to Aspartic acid (D), glutamic acid (E) or histidine (H); and/or the one or more cysteines (C) are substited with alanine (A), serine (S) , threonine (T), aspartic acid (D) or valine (V); preferably, the Pif1-like helicase comprises: (a) a variant of SEQ ID NO: 6, and the one or more substituted or deleted native amino acids are a combination of (i) C308 and/or C419 and (ii) one or more of C114, C119, and C141; (b) a variant of SEQ ID NO: 1, and the one or more substituted or deleted native amino acids are one or more of C39, C112, C117, and C417; (c) a variant of SEQ ID NO: 2, and the one or more substituted or deleted native amino acids are one or more of C112, C117, C139, C202, C307, C327, C344, C395, and C415; or, (d) a variant of SEQ ID NO: 3, and the one or more substituted or deleted native amino acids are one or more of C15, C62, C142, C420, and C422; (e) a variant of SEQ ID NO: 4, and the one or more substituted or deleted native amino acids are one or more of C16, C63, C143, C422, and C444; (f) a variant of SEQ ID NO: 5, and the one or more substituted or deleted native amino acids are one or more of C109, C114, C136, C199, C302, C414, and C394; (g) a variant of SEQ ID NO: 7, and the one or more substituted or deleted native amino acids are one or more of C23, C123, C128, C150, C318, C405, and C425; (h) a variant of SEQ ID NO: 8, and the one or more substituted or deleted native amino acids are one or more of C119, C124, C146, C209, C407, and C424; (i) a variant of SEQ ID NO: 9, and the one or more substituted or deleted native amino acids are one or more of C109, C114, C136, C199, C302, C414, and C394; (j) a variant of SEQ ID NO: 10, and the one or more substituted or deleted native amino acids are one or more of C14, C111, C138, C116, C243, C396, C410, C416, and C347; or, (k) a variant of SEQ ID NO: 11, and the one or more substituted or deleted native amino acids are one or more of C125, C128, C412, and C315.

8. The Pif1-like helicase according to claim 6, wherein in (B), at least one amino acid that interacts with a sugar ring and/or base of one or more nucleotides in single-stranded DNA or double-stranded DNA is substituted with an amino acid containing a larger side chain; preferably, the Pif1-like helicase comprises: (a) a variant of SEQ ID NO: 11, wherein the at least one amino acid that interacts with a sugar ring and/or base of one or more nucleotides in single-stranded DNA or double-stranded DNA is at least one of P73, H93, N99, F109, I280, A161, F130, D132, D162, D163, E277, K415, Q291, H396, Y244 or P100; (b) a variant of SEQ ID NO: 6, wherein the at least one amino acid that interacts with a sugar ring and/or base of one or more nucleotides in single-stranded DNA or double-stranded DNA is a combination of (i) H87, V422 and/or I282 and (ii) at least one of P94, F103, V155, P67, M124, D126, E156, P157, E279, S293, N93, H403 or F246; or, (c) a variant of any one of SEQ ID NOs: 1 to 5 and 7 to 10, wherein the at least one amino acid that interacts with a sugar ring and/or base of one or more nucleotides in single-stranded DNA or double-stranded DNA corresponds to at least one of P73, H93, N99, F109, I280, A161, F130, D132, D162, D163, E277, K415, Q291, H396, Y244 or P100 in SEQ ID NO: 11, or a combination of (I) H87, V422 and/or I282 and (ii) at least one of P94, F103, V155, P67, M124, D126, E156, P157, E279, S293, N93, H403 or F246 in SEQ ID NO: 6.

9. The Pif1-like helicase according to claim 8, wherein the larger side chain comprises an increased number of carbon atoms, has an increased length, an increased molecular volume, and/or has an increased van der Waals volume; preferably, the larger side chain increases (i) electrostatic interaction, (ii) hydrogen bonding and/or (iii) cation-pi interaction and/or (iv) 7E-7E interaction between the at least one amino acid and one or more nucleotides in the single-stranded DNA or double-stranded DNA; preferably, the amino acid containing the larger side chain is not alanine (A), cysteine (C), glycine (G) , selenocysteine (U), methionine (M), aspartic acid (D) or glutamic acid (E).

10. The Pif1-like helicase according to claim 8, wherein: A) histidine (H) is substituted with (i) arginine (R) or lysine (K); (ii) glutamine (Q) or asparagine (N); (iii) phenylalanine (F), tyrosine (Y) or tryptophan (W); (iv) tyrosine (Y), arginine (R) or glutamine (Q); or (v) arginine (R), tyrosine (Y), asparagine (N) or glutamine (Q); B) asparagine (N) is substituted with (i) arginine (R) or lysine (K); (ii) glutamine (Q) or histidine (H); (iii) phenylalanine (F), tyrosine (Y) or tryptophan (W); or (iv) glutamine (Q), arginine (R), histidine (H) or lysine (K); C) proline (P) is substituted with (i) arginine (R) or lysine (K); (ii) glutamine (Q), asparagine (N), threonine (T) or histidine (H); (iii) tyrosine (Y), phenylalanine (F) or tryptophan (W); (iv) leucine (L), valine (V) or isoleucine (I); or (v) tryptophan (W), phenylalanine (F) or leucine (L); D) phenylalanine (F) is substituted with (i) arginine (R) or lysine (K); (ii) histidine (H); (iii) tyrosine (Y) or tryptophan Acid (W); or (iv) arginine (R), tyrosine (Y), glutamine (Q) or histidine (H); E) aspartic acid (D) is substituted with (i) arginine (R) or lysine (K); (ii) glutamine (Q), asparagine (N) or histidine (H); or (iii) phenylalanine (F), tyrosine (Y) or tryptophan (W); F) valine (V) is substituted with (i) arginine (R) or lysine (K); (ii) glutamine (Q), asparagine (N) or histidine (H); (iii) phenylalanine (F), tyrosine (Y) or tryptophan (W); (iv) isoleucine (I) or leucine (L); or (v) tyrosine (Y), arginine (R), histidine (H) or tryptophan (W); G) serine (S) is substituted with (i) arginine (R) or lysine (K); (ii) glutamine (Q), asparagine (N) or histidine (H); (iii) phenylalanine (F), tyrosine (Y) or tryptophan (W); (iv) isoleucine (I) or leucine (L); or (v) or arginine (R), asparagine (N) or glutamine (Q); H) tyrosine (Y) is substituted with (i) arginine (R) or lysine (K); or (ii) tryptophan (W); and/or, i) isoleucine (I) is (i) phenylalanine (F) or tryptophan (W); (ii) valine (V) or leucine (L); (iii) histidine (H), lysine (K) or arginine (R).

11. The Pif1-like helicase according to claim 8, wherein the Pif1-like helicase is a variant of SEQ ID NO: 11 comprising: P73L; P73V; P73I; P73E; P73T; P73F; H93N; H93Q; H93W; N99R; N99H; N99W; N99Y; P100L; P100V; P100I; P100E; P100T; P100F; F130W; F130Y; F130H; D132H; D132Y; D132K; A161I; A161L; A161N; A161W; A161H; D162H; D162Y; D162K; D163W; D163F; D163Y; D163H; D163I; D163L; D163V; Y244W; Y244H; E277G; I280H; I280K; I280W; Q291K; Q291R; Q291W; Q291F; H396N; H396Q; H396W; K415W; K415R; K415H; K415Y; F109W/P73L; F109W/P73V; F109W/P73I; F109W/P73E; F109W/P73T; F109W/P73F; F109W/H93N; F109W/H93Q; F109W/H93W; F109W/N99R; F109W/N99H; F109W/N99W; F109W/N99Y; F109W/P100L; F109W/P100V; F109W/P100I; F109W/P100E; F109W/P100T; F109W/P100F; F109W/F130W; F109W/F130Y; F109W/F130H; F109W/D132H; F109W/D132Y; F109W/D132K; F109W/A161I; F109W/A161L; F109W/A161N; F109W/A161W; F109W/A161H; F109W/D162H; F109W/D162Y; F109W/D162K; F109W/D163W; F109W/D163F; F109W/D163Y; F109W/D163H; F109W/D163I; F109W/D163L; F109W/D163V; F109W/Y244W; F109W/Y244H; F109W/E277G; F109W/I280H; F109W/I280K; F109W/I280W; F109W/Q291K; F109W/Q291R; F109W/Q291W; F109W/Q291F; F109W/H396N; F109W/H396Q; F109W/H396W; F109W/K415W; F109W/K415R; F109W/K415H; or F109W/K415Y; and/or the Pif1-like helicase is a variant of SEQ ID NO: 6 comprising: P94W; P94F; F103W; I282F; V155L; V155I; D126H; D126N; D126Q; P157W; P157F; P157L; V422Y; V422R; V422H; V422W; S293R; S293N; S293Q; H87R; H87Y; H87N; H87Q; N93Q; N93R; N93K; N93H; H403Y; H403R; H403Q; F246R; F246Y; or F246Q.

12. The Pif1-like helicase according to claim 6, wherein, in (B), at least one amino acid that interacts with one or more phosphate groups of one or more nucleotides in single-stranded DNA or double-stranded DNA is substituted; preferably, the Pif1-like helicase comprises: (a) a variant of SEQ ID NO: 11, in which the at least one amino acid that interacts with one or more phosphate groups of one or more nucleotides in ssDNA or dsDNA is at least one of H75, T91, S94, K97, N246, N247, N284, K288, N297, T394 or K397; (b) a variant of SEQ ID NO: 6, in which the at least one amino acid that interacts with one or more phosphate groups of one or more nucleotides in ssDNA or dsDNA is at least one of K91, T85, R88, H69, K404, T401, N299, N248, E280, K290 or K249; or, (c) a variant of any one of SEQ ID NOs: 1 to 5 and 7 to 10, in which the at least one amino acid that interacts with one or more phosphate groups of one or more nucleotides in ssDNA or dsDNA is at least one amino acid corresponding to H75, T91, S94, K97, N246, N247, N284, K288, N297, T394 or K397 in SEQ ID NO: 11, or at least one amino acid corresponding to K91, T85, R88, H69, K404, T401, N299, N248, E280, K290 or K249 in SEQ ID NO: 6.

13. The Pif1-like helicase according to claim 12, wherein: a) histidine (H) is substituted with (i) arginine (R) or lysine (K); (ii) asparagine (N), serine (S), glutamine (Q) or threonine (T); (iii) phenylalanine (F), tryptophan (W) or tyrosine (Y); or (iv) asparagine (N), glutamine (Q) or arginine (R) b) threonine (T) is substituted with (i) arginine (R), histidine (H) or lysine (K); (ii) asparagine (N), serine (S), glutamine (Q) or histidine (H); (iii) phenylalanine (F), tryptophan (W), tyrosine (Y) or histidine (H); (iv) asparagine (N), glutamine (Q) or arginine (R); or (v) asparagine (N), histidine (H), lysine (K) or arginine (R); c) serine (S) is substituted with (i) arginine (R), histidine (H) or lysine (K); (ii) asparagine (N), glutamine (Q), threonine (T) or histidine (H); or (iii) phenylalanine (F), tryptophan (W), tyrosine (Y) or histidine (H); d) asparagine (N) is substituted with (i) arginine (R), histidine (H) or lysine (K); (ii) serine (S), glutamine (Q), threonine (T) or histidine (H); (iii) phenylalanine (F), tryptophan (W), tyrosine (Y) or histidine (H); or (iv) glutamine (Q), arginine (R), histidine (H) or lysine (K); e) lysine (K) is substituted with (i) arginine (R) or histidine (H); (ii) asparagine (N), serine (S), glutamine (Q), threonine (T) or histidine (H); (iii) phenylalanine (F), tryptophan (W), tyrosine (Y) or histidine (H); or (iv) arginine (R), glutamine (Q) or asparagine (N); and/or, f) arginine (R) is substituted with (i) asparagine (N), serine (S) or glutamine (Q).

14. The Pif1-like helicase according to claim 12, wherein the Pif1-like helicase is a variant of SEQ ID NO: 11 comprising one or more of (a) to (k), (a) H75N, H75Q, H75K or H75F; (b) T91K, T91Q or T91N; (c) S94H, S94N, S94K, S94T, S94R or S94Q; (d) K97Q, K97H or K97Y; (e) N246H or N246Q; (f) N247Q or N247H; (g) N284H or N284Q; (h) K288Q or K288H; (i) N297Q, N297K or N297H; (j) T394K, T394H or T394N; or (k) K397R, K397H or K397Y; or the Pif1-like helicase is a variant of SEQ ID NO: 6 comprising one or more of (a) to (i), (a) K91R; (b) T85N, T85Q or T85R; (c) R88N or R88Q; (d) H69N, H69Q or H69R; (e) K404R; (f) T401N, T401H, T401K or T401R; (g) N299Q, N299R, N299H or N299K; (h) N248Q, N248R, N248H or N248K; or (i) K249R, K249Q or K249N.

15. The Pif1-like helicase according to claim 6, wherein, in (B), at least one amino acid that interacts with the double strand of one or more nucleotides in the double stranded DNA is substituted; preferably, the Pif1-like helicase comprises: (a) a variant of SEQ ID NO: 6, in which the at least one amino acid that interacts with the double strand of one or more nucleotides in the double dsDNA is at least one of M81, V59, Q52, E286 or K290; or, (b) a variant of any one of SEQ ID NOs: 1 to 5 and 7 to 11, in which the at least one amino acid that interacts with the double strand of one or more nucleotides in the double dsDNA is at least one amino acid corresponding to M81, V59, Q52, E286 or K290 in SEQ ID NO: 6; preferably, the Pif1-like helicase is a variant of SEQ ID NO: 6 comprising one or more of (a) to (e), (a) M81K, M81R or M81H; (b) V59K, V59R or V59H; (c) Q52K, Q52R or Q52H; (d) E286K, E286R or E286H; or (e) K290R.

16. The Pif1-like helicase according to claim 6, wherein the Pif1-like helicase is a variant of SEQ ID NO: 6, which comprises one or more of the following amino acid substitutions: T85N, H87Y, H87Q, H87N, R88Q, R88N, V155I, V155L, K91R, F103W, S239N, F246R, F246Y, K249R, I282F, E286K or V422H.

17. The Pif1-like helicase according to claim 1, wherein the Pif1-like helicase further comprises substitution or modification of surface negatively-charged amino acids, polar or non-polar amino acids; more preferably, the substitution comprises substitution of negatively charged amino acids, uncharged amino acids, aromatic amino acids, polar or non-polar amino acids with positively charged amino acids, or uncharged amino acids; preferably, the Pif1-like helicase comprises: (a) a variant of SEQ ID NO: 6 and the one or more negatively charged or uncharged amino acids are one or more of S9, S173, D208 or T218; (b) a variant of SEQ ID NO: 7 and the one or more negatively charged or uncharged amino acids are one or more of D8, E11, T26, S186, D216 or S226; or, (c) a variant of any one of SEQ ID NOs: 1 to 5 and 8 to 11, wherein the one or more negatively charged or uncharged amino acids correspond to one or more of S9, S173, D208 or T218 in SEQ ID NO: 6; or one or more of D8, E11, T26, S186, D216 or S226 in SEQ ID NO: 7.

18. The Pif1-like helicase according to claim 1, wherein the helicase comprises substitution of at least one amino acid that interacts with the transmembrane pore: (a) a variant of SEQ ID NO: 11, in which the Pif1-like helicase comprises one or more substitutions on (a) E196 (b) W202 (c) N199 or (d) G201; or , (b) a variant of any one of SEQ ID NOs: 1 to 10, in which the Pif1-like helicase comprises at least one substitutions on amino acid corresponding to (a) E196 (b) W202 (c) N199 or (d) G201; preferably, the Pif1-like helicase is a variant of SEQ ID NO: 11 comprising substitutions at the following positions: F109/E196/H75, such as, F109W/E196L/H75N, F109W/E196L/H75Q, F109W/E196L/H75K or F109W/E196L/H75F; F109/E196/T91, such as, F109W/E196L/T91K, F109W/E196L/T91Q or F109W/E196L/T91N; F109/S94/E196, such as, F109W/S94H/E196L, F109W/S94T/E196L, F109W/S94R/E196L, F109W/S94Q/E196L, F109W/S94N/E196L or F109W/S94K/E196L; F109/N99/E196, such as, F109W/N99R/E196L, F109W/N99H/E196L, F109W/N99W/E196L or F109W/N99Y/E196L; F109/S94/E196/1280, such as, F109W/S94H/E196L/1280K; F109/P100/E196, such as, F109W/P100L/E196L, F109W/P100V/E196L, F109W/P100I/E196L or F109W/P100T/E196L; F109/D132/E196, such as, F109W/D132H/E196L, F109W/D132Y/E196L or F109W/D132K/E196L; F109/A161/E196, such as, F109W/A161I/E196L, F109W/A161L/E196L, F109W/A161N/E196L, F109W/A161W/E196L or F109W/A161H/E196L; F109/D163/E196, such as, F109W/D163W/E196L, F109W/D163F/E196L, F109W/D163Y/E196L, F109W/D163H/E196L, F109W/D163I/E196L, F109W/D163L/E196L or F109W/D163V/D163L/E196L; F109/Y244/E196, such as, F109W/Y244W/E196L, F109W/Y244Y/E196L or F109W/Y244H/E196L; F109/N246/E196, for example, F109W/N246H/E196L or F109W/N246Q/E196L; F109/E196/1280, such as, F109W/E196L/I280K, F109W/E196L/I280H, F109W/E196L/I280W or F109W/E196L/I280R; F109/E196/Q291, such as, F109W/E196L/Q291K, F109W/E196L/Q291R, F109W/E196L/Q291W or F109W/E196L/Q291F; F109/N297/E196, such as, F109W/N297Q/E196L, F109W/N297K/E196L or F109W/N297H/E196L; F109/T394/E196, such as, F109W/T394K/E196L, F109W/T394H/E196L or F109W/T394N/E196L; F109/H396/E196, such as, F109W/H396Y/E196L, F109W/H396F/E196L, F109W/H396Q/E196L or F109W/H396K/E196L; F109/K397/E196, such as, F109W/K397R/E196L, F109W/K397H/E196L or F109W/K397Y/E196L; or, F109/Y416/E196, such as, F109W/Y416W/E196L or F109W/Y416R/E196L.

19. A construct comprising at least one Pif1-like helicase according to claim 1; preferably, the construct further comprises a polynucleotide binding moiety.

20. A nucleic acid encoding the Pif1-like helicase according to claim 1.

21. An expression vector comprising the nucleic acid according to claim 20.

22. A host cell comprising the nucleic acid according to claim 20.

23. A host cell comprising the expression vector according to claim 21.

24. A method of controlling the movement of a polynucleotide, comprising contacting the Pif1-like helicase according to claim 1 with the polynucleotide.

25. A method of characterising a target polynucleotide, comprising: I) contacting the Pif1-like helicase according to claim 1 with the target polynucleotide and a pore, such that the Pif1-like helicase controls the movement of the target polynucleotide through the pore; and II) taking one or more characteristics of the target polynucleotide when the nucleotide of the target polynucleotide interacts with the pore, and thereby characterising the target polynucleotide; preferably, the method further comprises the step of applying a potential difference across the pore contacting the helicase and the target polynucleotide; more preferably, the pore is selected from a biological pore, a solid state pore, or a biological and solid state hybrid pore; preferably, the target polynucleotide is single-stranded, double-stranded, or at least partially double-stranded; preferably, the one or more characteristics are selected from the source, length, identity, sequence, secondary structure of the target polynucleotide, or whether or not the target polynucleotide is modified; preferably, the one or more characteristics are measured by electrical measurement and/or optical measurement.

26. A product for characterising a target polynucleotide comprising the Pif1-like helicase according to claim 1, and a pore; preferably, the product is selected from a kit, a device or a sensor.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0253] Hereinafter, the embodiments of the present application will be described in detail with reference to the accompanying drawings, in which:

[0254] FIG. 1 shows a schematic diagram of fluorescence assay for testing enzyme activity of Pif1-like helicase, wherein the fluorescent substrate strand has a 5′ ssDNA overhang, and a 50 base section of hybridized dsDNA. As shown in a), the major upper strand (B) has a black-hole quencher (BHQ-1) base (E) at the 3′ end, and the hybridized complement (D) has a carboxyfluorescein base (C) at the 5′ end. 0.5 μM of a capture strand (F) that is complementary to the shorter strand of the fluorescent substrate (D) is also included. As shown in b), in the presence of ATP (5 mM) and MgCl.sub.2 (5 mM), helicase (200 nM) added to the substrate binds to the 5′ part of the fluorescent substrate, moves along the major strand, and unwinds the complementary strand. After that, excess of capture strand preferentially anneals to the complementary DNA to prevent re-annealing of initial substrate and loss of fluorescence. As shown in c), after adding the excess of the capture strand (A) that is completely complementary to the major strand, part of the entangled dsDNA will have a strand unwinding effect due to the presence of excessive A, and finally all the dsDNA will be untwisted, and the fluorescence value will reach the highest.

[0255] FIG. 2 shows a measurement of the hybridized dsDNA unwinding capabilities of Pif1-like helicase using a fluorescence assay, specifically a graph of the time-dependent dsDNA unwinding ratio in a buffer containing 400 mM NaCl.

[0256] FIG. 3 shows gel measurements of the DNA-binding capabilities of different PIF1-like helicases. Lane 1 represents the pre-constructed dsDNA (hybridization of SEQ ID NO: 18 with SEQ ID NO: 19 modified with 5′FAM). Lanes 2-6 comprise Aph Acj61, Aph PX29, Sph CBH8, PphPspYZU05, and Mph MP1 pre-linked to dsDNA, respectively. Lanes 7-comprise Aph Acj61-D97C/A363C, Aph PX29-D96C/A371C, Sph CBH8-A94C/A361C/C136A, PphPspYZU05-D104C/A375C/C146A and Mph MP 1-E105C/A362C pre-linked to dsDNA, respectively. Band A corresponds to SEQ ID NO:19 to be hybridized with SEQ ID NO:18. The regions labeled 1A, 2A, 3A, 4A, and 5A correspond to combination of 1, 2, 3, 4, and 5 helicases with hybridization of SEQ ID NO: 18 with SEQ ID NO: 19, respectively.

[0257] FIG. 4 shows DNA construct which is used in examples, in which SEQ ID NO: 13 (labelled B) is attached at its 5′ end to twenty iSpC3 spacers (labelled A) and at its 3′ end to four iSpC3 spacers (labelled C) which are attached to the 5′ end of SEQ ID NO: 14 (labelled D) which is attached at its 3′ end to SEQ ID NO: 17 or 24 (labelled E), and the SEQ ID NO: 15 region (labelled F) of this construct is hybridized to SEQ ID NO: 16 (labelled G, which has a 3′ cholesterol tether).

[0258] FIG. 5 shows an illustrative current trace (y-axis=Current (pA, 0 to 250), x-axis=Sampling frequency (hz, 0 to 3.5*10.sup.5)) when Pif1-like helicase (Mph MP1-E105C/A362C (SEQ ID NO: 11 with mutations E105C/A362C)) controlled the translocation of DNA construct A through the nanopore MspA (SEQ ID NO: 12).

[0259] FIG. 6 shows an enlarged current trace in regions of the Pif1-like helicase-controlled DNA movement shown in FIG. 5 (y-axis=Current (pA, 30 to 100, x-axis=Sampling frequency (hz, 2.346 to 2.366*10.sup.5)).

[0260] FIG. 7 shows an illustrative current trace (y-axis=Current (pA), x-axis=Time (s)) when Pif1-like helicase (Sph CBH8-A94C/A361C/C136A (SEQ ID NO: 5 with A94C/A361C/C136A mutation)) controlled the translocation of DNA construct B through the MspA nanopore.

[0261] FIG. 8 shows an illustrative current trace (y-axis=Current (pA), x-axis=Time (s)) when helicase (Eph Pei26-D99C/A366C/C141A (SEQ ID NO: 6 with D99C/A366C/C141A mutation)) controlled the translocation of DNA construct B through the MspA nanopore.

[0262] FIG. 9 shows an illustrative current trace (y-axis=Current (pA), x-axis=Time (s)) when helicase (Pph PspYZU05-D104C/A375C/C146A (SEQ ID NO: 8 with D104C/A375C and C146A mutations)) controlled the translocation of DNA construct B through the MspA nanopore.

[0263] FIG. 10 shows the SDS-PAGE gel electrophoresis image of purified Mph MP1 (SEQ ID NO: 11), in which M represents Marker (Kd), and lane 1 represents the electrophoresis result of the helicase Mph MP1.

[0264] FIGS. 11A and 11B show the correspondence among the amino acid sequences of SEQ NOs: 1 to 11.

[0265] FIG. 12 shows the speed distribution of a nucleic acid library passing through a nanopore controlled by the mutants Eph-Pei26-D99C/A366C/F103W/E286K (A) and Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/ F103W/E286K (B).

[0266] FIG. 13 shows movements of a nucleic acid through a nanopore controlled by the mutants Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D (SEQ ID NO: 6 with mutation D99C/A366C/C114V/C119V/C141S/C308S/C419D) and Eph-Pei26-D99C/A366C/C114V /C119V/C141S/C308S/C419D/F103W (SEQ ID NO: 6 with mutation D99C/A366C/C114V/ C119V/C141S/C308S/C419D/F103W). Figure A is a simulated current signal characteristic of the nucleic acid sequence SEQ ID NO: 24 moving through the nanopore one by one; Figure B is a current signal generated by the movement of the nucleic acid sequence SEQ ID NO: 24 through the nanopore controlled by the mutant Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W; and Figure C is a current signal generated by the movement of the nucleic acid sequence SEQ ID NO: 24 through the nanopore controlled by the mutant Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D.

[0267] FIG. 14 shows an accuracy rate of the validation set of randomly digested E. coli and human genomes library sequenced by using mutants Eph-Pei26-24 and Eph-Pei26-25.

DETAILED DESCRIPTION

[0268] The following examples further illustrate the content of the present application, but should not be construed as limiting the present application. Without departing from the spirit and essence of the present application, modifications or substitutions made to the methods, steps or conditions of the present application fall within the scope of claims.

[0269] Unless otherwise specified, the technical means used in the examples are conventional means well known to those skilled in the art. The equipment and reagents used in each example are conventionally commercially available. The sequencer used for sequencing is the QNome-9604 gene sequencer, and the sequencing algorithm is the open-source sequencing model bonito-ctc of ONT (Oxford Nanopore Technologies), which may be found at https://github.com/nanoporetech/bonito. This is an open source CTC-based end-to-end seq2seq model, based on NVIDIA's open source speech recognition network QuartzNet that is described in paper titled “QUARTZNET: DEEP AUTOMATIC SPEECH RECOGNITION WITH 1D TIME-CHANNEL SEPARABLE CONVOLUTIONS”.

EXAMPLE 1

Preparation of Pif1-like Helicase

[0270] 1. Materials and Methods

[0271] A recombinant plasmid containing the Pif1-like helicase sequence (a variant of amino acid sequence SEQ ID NOs:1-11, corresponding to nucleotide sequence SEQ ID NOs:25-35) was transformed into BL21(DE3) competent cell by heat shock. The resuscitated bacteria solution was spread on an ampicillin-resistant solid LB plate, and then cultured overnight at 37° C. A single colony was picked and inoculated into 100 ml of ampicillin-resistance liquid LB medium and cultivated at 37° C. 1% of the inoculum was transferred to ampicillin-resistant LB liquid medium for expansion culture at 37° C. and 200 rpm, and its OD600 value was measured continuously. When OD600=0.6-0.8, the culture solution in LB medium was cooled to 18° C., and isopropyl thiogalactoside (Isopropyl β-D-Thiogalactoside, IPTG) was added to induce expression, so that the final concentration reached 1 mM. After 12-16h, the bacteria were collected at 18° C. The bacteria were crushed under high pressure, purified by FPLC, and samples were collected.

[0272] 2. Results

[0273] FIG. 10 shows an SDS-PAGE gel electrophoresis image of purified Mph MP1 (variant of SEQ ID NO: 11).

EXAMPLE 2

[0274] This Example illustrates the capabilities of Pif1-like helicase to unwind hybridized dsDNA by using a fluorescence assay to detect enzyme activity.

[0275] 1. Materials and Methods

[0276] In FIG. 1, as shown in a), the fluorescent substrate strand (100 nM final) has a 5′ ssDNA overhang, and a 50 base section of hybridized dsDNA. The major upper strand has a black-hole quencher (BHQ-1) base (SEQ ID NO:20-BHQ-3′) at the 3′ end, and the hybridized complement has a carboxyfluorescein base (5′FAM-SEQ ID NO:21) at the 5′ end. When the fluorescence from the fluorescein is quenched by the local BHQ-1, and the substrate is essentially non-fluorescent. 0.5 μM of a capture strand (SEQ ID NO:22) that is complementary to the shorter strand of the fluorescent substrate is also included. As shown in b), in the presence of ATP (5 mM) and MgCl.sub.2 (5 mM), helicase (200 nM) added to the substrate binds to the 5′ part of the fluorescent substrate, moves along the major strand, and unwinds the complementary strand. After that, excess of capture strand preferentially anneals to the complementary DNA to prevent re-annealing of initial substrate and loss of fluorescence. A certain amount of hybrid dsDNA that is not unwound by Pif1-like helicase is present in the system. As shown in c), after adding excessive capture strand A (SEQ ID NO:23) that is completely complementary to the major strand, part of the entangled dsDNA will have a strand unwinding effect due to the presence of excessive A, and finally all the dsDNA will be untwisted, and the fluorescence value will reach the highest.

[0277] 2. Results

[0278] FIG. 2 shows a graph of the time-dependent dsDNA unwinding ratio in a buffer containing 400 mM NaCl (10 mM Hepes pH 8.0, 5 mM ATP, 5 mM MgCl.sub.2, 100 nM fluorescent substrate DNA, 0.5 μM capture DNA).

EXAMPLE 3

[0279] This example shows an exemplary DNA binding capabilities of modified Pif1-like helicase of the present application by using gel measurements.

[0280] Specifically, DNA binding capabilities of Aph Acj61-D97C/A363C (SEQ ID NO: 2 with D97C/A363C mutation), Aph PX29-D96C/A371C (SEQ ID NO: 3 with D96C/A371C mutation), Sph CBH8-A94C/A361C/C136A (SEQ ID NO: 5 with A94C/A361C and C136A mutations) and Pph PspYZU05-D104C/A375C/C146A (SEQ ID NO: 8 with D104C/A375C and C146A mutations) and Mph MP1-E105C/A362C (SEQ ID NO: 11 with E105C/A362C mutant) were measured.

[0281] 1. Materials and Methods

[0282] The annealed DNA complex (hybridization of SEQ ID NO: 18 with SEQ ID NO: 19 modified with 5′FAM) were mixed with Aph Acj61, Aph PX29, Sph CBH8, Pph PspYZU05, Mph MP1, Aph Acj61-D97C/A363C, Aph PX29-D96C/A371C, Sph CBH8-A94C/A361C/C136A, Pph PspYZU05-D104C/A375C/A375C/MPC/A375C E105C/A362C, respectively, in a ratio of (1:1, volume/volume) in 10 mM HEPE, pH 8.0, and 400 mM potassium chloride to reach a final concentration of Pif1-like helicase of 600nM and a final concentration of DNA of 30nM. The Pif1-like helicase was allowed to bind to DNA for 1 hour at room temperature. To each sample TMAD was added to a final concentration of 5 μM, and incubated at room temperature for 1 hour. The sample was loaded on a 4-20% TBE gel and run the gel at 160V for 1.5 hours. The gel was placed under blue fluorescence to observe the DNA bands.

[0283] 2. Results

[0284] FIG. 3 shows the effect of modifications on the DNA-binding capabilities of Pif1-like helicases. Lanes 2 to 6 show that a large amount of DNA was not bound by the Pif1-like helicase during electrophoresis and no obvious binding bands are observed. Lanes 7-11 show the binding bands of different numbers of enzymes bound to DNA, and lanes 9 and 10 show that up to 5 Pif1-like helicases could bind to the single-stranded moiety of SEQ ID NO:18. This indicates that the modified Pif1-like helicase significantly enhanced its binding strength to DNA.

EXAMPLE 4

[0285] This example describes how a Mph MP1-E105C/A362C (SEQ ID NO: 11 with mutation E105C/A362C) controlled the movement of intact DNA strands through a single MspA nanopore (SEQ ID NO: 12).

[0286] 1. Materials and Methods

[0287] DNA construct A as shown in FIG. 4 was prepared. SEQ ID NO: 13 (labelled B) was attached at its 5′ end to twenty iSpC3 spacers (labelled A) and at its 3′ end to four iSpC3 spacers (labelled C) which were attached to the 5′ end of SEQ ID NO: 14 (labelled D) which was attached at its 3′ end to SEQ ID NO: 17 (labelled E), and the SEQ ID NO: 15 region (labelled F) of this construct was hybridized to SEQ ID NO: 16 (labelled G, having a 5′ cholesterol tether).

[0288] The prepared DNA construct and Mph MP1-E105C/A362C were pre-incubated together for 30 minutes at 25° C. in a buffer (10 mM HEPES, pH 8.0, 400 mM NaCl, 5% glycerol, 2 mM DTT).

[0289] Electrical measurements were acquired from MspA nanopores inserted in 1,2-diphytanoyl-glycero-3-phosphocholine lipid bilayers. Bilayers were formed across about 25 μm diameter apertures on a PTFE film via the Montal-Mueller technique, separating two 100 μL buffered solutions. All experiments were carried out in the stated buffered solution. Single-channel currents were measured on an amplifier equipped with digitizers. Ag/AgCl electrodes were connected to the buffered solutions so that the cis compartment is connected to the ground of the amplifier, and the trans compartment is connected to the active electrode of the amplifier.

[0290] After achieving a single pore in the bilayer, DNA polynucleotide and the Pif1-like helicase were added to 70 μL of buffer in the cis compartment of the electrophysiology chamber to initiate capture of the helicase-DNA complexes in the nanopore. Helicase ATPase activity was initiated as required by the addition of divalent metal (5 mM MgCl.sub.2) and NTP (2.86 μM ATP) to the cis compartment. Experiments were carried out at a constant potential of +180 mV.

[0291] Results and Discussion

[0292] DNA movement controlled by the Pif1-like helicase was observed for the DNA construct. Observation of Pif1-like helicase-controlled DNA movement is shown in FIG. 5. The Pif1-like helicase-controlled DNA movement was 50 seconds long and corresponded to approximately 10 kB of the DNA construct's translocation through the nanopore. FIG. 6 shows an enlarged graph of a partial region of the Pif1-like helicase-controlled DNA movement.

EXAMPLE 5

[0293] This example describes how Sph CBH8-A94C/A361C/C136A (SEQ ID NO: 5 with A94C/A361C/C136A mutation), Eph Pei26-D99C/A366C/C141A (SEQ ID NO: 6 with D99C/A366C/C141A mutation) and Pph PspYZU05-D104C/A375C/C146A (SEQ ID NO: 8 with D104C/A375C and C146A mutations) controlled the movement of intact DNA strands through a single MspA nanopore.

[0294] 1. Materials and Methods

[0295] DNA construct B as shown in FIG. 4 was prepared. SEQ ID NO: 13 was attached at its 5′ end to twenty iSpC3 spacers and at its 3′ end to four iSpC3 spacers which were attached to the 5′ end of SEQ ID NO: 14 which was attached at its 3′ end to SEQ ID NO: 24, and the SEQ ID NO: 15 region of this construct was hybridized to SEQ ID NO: 16 (having a 3′ cholesterol tether). This DNA construct B was similar to the construct used in Example 4, except that the region labeled E corresponded to SEQ ID NO: 24.

[0296] DNA construct B in a buffer (in 50 mM NaCl, 10 mM Tris pH 7.5) and Sph CBH8-A94C/A361C/C136A, Eph Pei26-D99C/A366C/C141A or Pph PspYZU05-D104C/A375C/C146A in a buffer (50 mM KCl, 10 mM HEPES, pH 8.0) were pre-incubated together for 30 minutes at room temperature. TMAD was then added to the DNA/enzyme pre-mix and incubated for a further 30 minutes. Finally, a buffer (10 mM HEPES, 600 mM KCl, pH 8.0, 3 mM MgCl.sub.2) and ATP were added to the pre-mix.

[0297] Electrical measurements were acquired at room temperature from single MspA nanopore inserted in block co-polymer in a buffer (10 mM HEPES, 400 mM KCl, pH 8.0). After achieving a single pore inserted in the block co-polymer, the pre-mix of the Pif1-like helicase (Sph CBH8-A94C/A361C/C136A, Eph Pei26-D99C/A366C/C141A or Pph PspYZU05-D104C/A375C/C146A (1 nM final)), DNA (0.3 nM final), fuel (ATP, 3 mM final) was then added to the single nanopore experimental system. Each experiment was carried out for 2 hours at a holding potential of 180mV and helicase-controlled DNA movement was monitored.

[0298] 2. Results

[0299] DNA movement controlled by Pif1-like helicase was observed for the DNA construct B. Observations of Sph CBH8-A94C/A361C/C136A, Eph Pei26-D99C/A366C/C141A or Pph PspYZU05-D104C/A375C/C146A-controlled DNA movements are shown in FIGS. 7-9, respectively.

EXAMPLE 6

[0300] In this example Eph-Pei26-D99C/A366C/F103W/E286K (SEQ ID NO: 6 with D99C/A366C/F103W/E286K mutations) and Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W/E286K (SEQ ID NO: 6 with D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W/E286K mutation) are taken as an example, to illustrate that by replacing or deleting certain types of natural amino acids of helicase, non-specific modifications of amino acids at one or more sites of this type can be avoided during chemical modification or treatment of helicase, the heterogeneity among enzymes can be reduced, and the speed uniformity of different enzyme molecules controlling different nucleic acid molecules passing through the MspA nanopore (SEQ ID NO: 12) can be improved.

[0301] 1. Materials and Methods

[0302] DNA construct C as shown in FIG. 4 was prepared. SEQ ID NO: 13 (labelled B) was attached at its 5′ end to twenty iSpC3 spacers (labelled A) and at its 3′ end to four iSpC3 spacers (labelled C) which were attached to the 5′ end of SEQ ID NO: 14 (labelled D) which was attached at its 3′ end to SEQ ID NO: 17 (labelled E), and the SEQ ID NO: 15 region (labelled F) of this construct was hybridized to SEQ ID NO: 16 (labelled G, having a 5′ cholesterol tether).

[0303] A ligated segment was synthesized from the A, B, C, and D segments at a concentration of 10 μM. The ligated segment, and the F segment and G segment in a ratio of 1:1:1 were added to an annealing buffer (10 mM Tris, pH7.0, 50 mM NaCl). The annealing process was carried out in accordance with 98° C. 10 min, −0.1° C./0.6s, 300 cycles, 65° C. 5min, −0.1° C./0.6s, 400 cycles (wherein segments A, B, C, D, F, and G were provided by Sangon Biotech).

[0304] The prepared DNA construct C and Eph-Pei26-D99C/A366C/F103W/E286K or Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W/E286K were pre-incubated together for 30 minutes at 25° C. in a buffer (10 mM HEPES, 50 mM KCl, pH 7.0). Then, TMAD with a final concentration of 1 mM was added thereto for catalytic treatment for 30 minutes. Finally, the DNA construct/enzyme mixture purified by magnetic beads and end-repaired nucleic acid library SEQ ID NO: 17 with a fixed-length of 10 kb (E region) were added to a T4 rapid ligation reaction system (as shown in Table 8 below, the T4 rapid ligation kit was provided by Enzymatics) at a molar concentration ratio of 1.5:1, the rapid TA ligation was performed at room temperature for 10 minutes. The ligation product was purified by magnetic beads, and then put into the subsequent sequencing system.

TABLE-US-00009 TABLE 8 T4 rapid ligation reaction system Reaction component Final concentration 2× rapid ligation Buffer 1× DNA construct/enzyme complex 10 uM End-repaired nucleic acid library 10 uM SEQ ID NO: 17 with a fixed-length of 10 kb Nuclease-free water N/A T4 DNA Ligase (600 U/ul) 30 U/ul Total volume 20 ul

[0305] Electrical measurements were acquired from MspA nanopores inserted in 1,2-diphytanoyl-glycero-3-phosphocholine lipid bilayers. Bilayers were formed across about 25 μm diameter apertures on a PTFE film via the Montal-Mueller technique, separating two 100 μL buffered solutions. All experiments were carried out in the stated buffered solution. Single-channel currents were measured on an amplifier equipped with digitizers. Ag/AgCl electrodes were connected to the buffered solutions so that the cis compartment was connected to the ground of the amplifier, and the trans compartment was connected to the active electrode of the amplifier. After achieving a single pore in the bilayer, the incubated DNA/enzyme mixture was added to 70 μL of buffer (10 mM HEPES, 500 mM KCl, 50 mM MgCl.sub.2, 50 mM ATP, pH 7.0) in the cis compartment of the electrophysiology chamber and a constant potential of +180 mV was applied at 35° C. to initiate capture of the DNA construct/enzyme complex in the MspA nanopore (SEQ ID NO: 12) and the movement of nucleic acid through the nanopore controlled by the helicase, and the DNA movement controlled by helicase was monitored and recorded.

[0306] 2. Results

[0307] Certain types of native amino acids in the helicase sequence have an effect on the uniformity of the speeds at which the helicase controls the movement of nucleic acids through a nanopore. These types of native amino acids are present in or introduced into a nucleic acid binding pocket of the helicase, which was shrunk after chemical or biological modification. In this example, in order to identify and analyze the improvement of the speed uniformity of nucleic acid moving through nanopores controlled by the helicase by replacing or deleting these types of native amino acids in the helicase, the time of each original current signal obtained by sequencing was extracted, and the speed of each original current signal was calculated according to the base length of the fixed-length library. Current signals indicating passing nucleic acid library through the nanopore controlled by Eph-Pei26-D99C/A366C/F103W/E286K and Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W/E286K wwere identified and processed, the speeds of complete passage of the different nucleic acid libraries through the nanopore was calculated and statistically analyzed.

[0308] As shown in FIG. 12, it can be seen from FIG. 12A that the speed at which the mutant Eph-Pei26-D99C/A366C/F103W/E286K controlled nucleic acid passing through the nanopore shows multimodal stribution, while it can be seen from FIG. 12B that the speed at which the mutant Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W/E286K controlled the nucleic acid passing through the nanopore shows unimodal stribution. In mutants Eph-Pei26-D99C/A366C/F103W/E286K and Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W/E286K, the mutation sites D99C/A366C are cysteine mutations introduced in domain 1B and domain 2B related to the nucleic acid binding pocket of Eph-Pei26 helicase, respectively. The mutant Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W/E286K is a mutant with C114V/C119V/C141S/C308S/C419D mutations in the cysteine (C) at positions other than the cysteines introduced into the 1B domain and 2B domain of the Eph-Pei26 helicase. C114V/C119V/C141S/C308S/C419D mutation may also avoid the introduction of modifications at other cysteine positions such as C114, C119, C141, C308 or C419 during the formation of a disulfide bond between D99C/A366C catalyzed by TMAD, which would otherwise affect the function of the enzyme and cause the heterogeneity among enzyme molecules, leading to a multimodal distribution of the rate. The multimodal distribution of speed makes it more difficult for the algorithm to recognize signals or is not conducive to the identification of bases. By replacing some types of native amino acids in the helicase sequence, the uniformity of speed of helicase can be effectively improved, thus improving the throughput and accuracy of effective sequencing data.

EXAMPLE 7

[0309] This example provides a comparative analysis of the current signal generated by the movement of a nucleic acid through the MspA nanopore (SEQ ID NO: 12) controlled by Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D (SEQ ID NO: 6 with D99C/A366C/C114V/C119V/C141S/C308S/C419D mutation) and Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W (SEQ ID NO: 6 with D99C/A366C/C114V/C119V/C141S/C308S/C419D/F 103W mutations) respectively, to illustrate the influence of the amino acid side chain interacting with the phosphate backbone or sugar ring or base of the nucleic acid on the current signal, which in turn affects the accuracy of sequencing.

[0310] 1. Materials and Methods

[0311] DNA construct D as shown in FIG. 4 was prepared. SEQ ID NO: 13 (labelled B) was attached at its 5′ end to twenty iSpC3 spacers (labelled A) and at its 3′ end to four iSpC3 spacers (labelled C) which were attached to the 5′ end of SEQ ID NO: 14 (labelled D) which was attached at its 3′ end to SEQ ID NO: 17 (labelled E), and the SEQ ID NO: 15 region (labelled F) of this construct was hybridized to SEQ ID NO: 16 (labelled G, having a 5′ cholesterol tether).

[0312] A ligated segment were synthesized from the A, B, C, and D segments at a concentration of 10 μM. The ligated segment, and the F segment and G segment in a ratio of 1:1:1 were added to an annealing buffer (10 mM Tris, pH7.0, 50 mM NaCl). The annealing process was carried out in accordance with 98° C. 10 min, −0.1° C./0.6s, 300 cycles, 65° C. 5min, −0.1° C./0.6s, 400 cycles (wherein segments A, B, C, D, F, and G were provided by Sangon Biotech).

[0313] The prepared DNA construct D and Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D or Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W were pre-incubated together for 30 minutes at 25° C. in a buffer (10 mM HEPES, 50 mM KCl, pH 7.0). Then, TMAD with a final concentration of 1 mM was added thereto for catalytic treatment for 30 minutes. Finally, the DNA construct/enzyme mixture purified by magnetic beads and a double-stranded DNA target sequence formed by annealing of SEQ ID NO: 24 and its complementary sequence (E region) were added to the T4 rapid ligation reaction system (as shown in Table 9 below, the T4 rapid ligation kit iwas provided by Enzymatics) at a molar concentration ratio of 1.5:1, and the rapid ligation was performed at room temperature for 10 minutes. The ligation product was purified by magnetic beads, and then put into the subsequent sequencing system.

TABLE-US-00010 TABLE 9 T4 rapid ligation reaction system Reaction component Final concentration 2× rapid ligation Buffer 1× DNA construct/enzyme complex 10 uM dsDNA formed by annealing of 10 uM SEQ ID NO: 24 and its complementary sequence Nuclease-free water N/A T4 DNA Ligase (600 U/ul) 30 U/ul Total volume 20 ul

[0314] Electrical measurements were acquired from MspA nanopores inserted in 1,2-diphytanoyl-glycero-3-phosphocholine lipid bilayers. Bilayers were formed across about 25 μm diameter apertures on a PTFE film via the Montal-Mueller technique, separating two 100 μL buffered solutions. All experiments were carried out in the stated buffered solution. Single-channel currents were measured on an amplifier equipped with digitizers. Ag/AgCl electrodes were connected to the buffered solutions so that the cis compartment was connected to the ground of the amplifier, and the trans compartment was connected to the active electrode of the amplifier. After achieving a single pore in the bilayer, the incubated DNA/enzyme mixture was added to 70 μL of buffer (10 mM HEPES, 600 mM KCl, 3 mM MgCl.sub.2, 3 mM ATP, pH 7.0) in the cis compartment of the electrophysiology chamber and a constant potential of +180 mV was applied at 30° C. to initiate capture of the DNA construct/enzyme complex in the MspA nanopore (SEQ ID NO: 12) and the movement of nucleic acid through the nanopore controlled by the helicase, and the DNA movement controlled by helicase was monitored and recorded.

[0315] 2. Results

[0316] DNA movement controlled by the Eph-Pei26 helicase mutant was observed for the DNA construct. FIG. 13 shows the current signal characteristics of the target nucleic acid sequence SEQ ID NO: 24 passing through the nanopore controlled by the Eph-Pei26 helicase mutant. Figure A is a simulated current signal characteristic of the target nucleic acid sequence SEQ ID NO: 24 moving through the nanopore one by one; Figure B is an actual current signal generated by the movement of the target nucleic acid sequence SEQ ID NO: 24 through the nanopore controlled by the mutant Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W; and Figure C is an actual current signal generated by the movement of the target nucleic acid sequence SEQ ID NO: 24 through the nanopore controlled by the mutant Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D. It can be seen from FIG. 13, there are more current step signals (labelled with arrows in Figure B) when the mutant Eph-Pei26-D99C/A366C/C114V/C119V/C141S/C308S/C419D/F103W is used to control the target nucleic acid SEQ ID NO: 24 to pass through the nanopore, resulting in an increase in errors of insertion. The results suggest that the accuracy of sequencing is affected by the amino acid side chain that interacts with the nucleic acid base sequence or the amino acid side chain that interacts with the nucleic acid base at a specific domain site.

EXAMPLE 8

[0317] This example uses different Eph-Pei26 helicase mutants to control the movement of DNA constructs through a nanopore. The tested helicases have a substitution of at least one amino acid that interacts with one or more nucleotides of a template single-stranded DNA or complementary single-stranded DNA. By analyzing the different parameters of different Eph-Pei26 helicase mutants that control the movement of nucleic acid through the MspA nanopore (SEQ ID NO: 12), this example shows the effects of the substitution of amino acids in helicases that interact with one or more nucleotides of a template or complementary single-stranded DNA on the speed and accuracy of nucleic acid sequencing and provides several mutants that improve the accuracy and/or speed of sequencing.

[0318] 1. Materials and Methods

[0319] DNA construct E as shown in FIG. 4 was prepared. SEQ ID NO: 13 (labelled B) was attached at its 5′ end to twenty iSpC3 spacers (labelled A) and at its 3′ end to four iSpC3 spacers (labelled C) which were attached to the 5′ end of SEQ ID NO: 14 (labelled D) which was attached at its 3′ end to SEQ ID NO: 17 (labelled E), and the SEQ ID NO: 15 region (labelled F) of this construct was hybridized to SEQ ID NO: 16 (labelled G, having a 5′ cholesterol tether).

[0320] A ligated segment were synthesized from the A, B, C, and D segments at a concentration of 10 μM. The ligated segment, and the F segment and G segment in a ratio of 1:1:1 were added to an annealing buffer (10 mM Tris, pH7.0, 50 mM NaCl). The annealing process was carried out in accordance with 98° C. 10 min, −0.1° C./0.6s, 300 cycles, 65° C. 5min, −0.1° C./0.6s, 400 cycles (wherein segments A, B, C, D, F, and G were provided by Sangon Biotech).

[0321] The prepared DNA construct and the different mutants of the helicase EPH-PEI26 shown in the table 10 below were pre-incubated together for 30 minutes at 25° C. in a buffer (10 mM HEPES, 50 mM KCl, pH 7.0). Then, TMAD with a final concentration of 1 mM was added thereto for catalytic treatment for 30 minutes. Finally, the DNA construct/enzyme mixture purified by magnetic beads and the double-stranded DNA sequence of target nucleic acid of SEQ ID NO: 17 (E region) were added to a T4 rapid ligation reaction system (as shown in Table 11 below, the T4 rapid ligation kit was provided by Enzymatics) at a molar concentration ratio of 1.5:1, and the rapid ligation was performed at room temperature for 10 minutes. The ligation product was purified by magnetic beads, and then put into the subsequent sequencing system.

TABLE-US-00011 TABLE 10 Different mutants of helicase EPH-PEI26 used in this example Mutant Mutant No. region 1 Mutant region 2 Mutant region 3 Eph-Pei26-1 D99C/A366C Eph-Pei26-2 D99C/A366C R88Q Eph-Pei26-3 D99C/A366C H87Y Eph-Pei26-4 D99C/A366C R88N Eph-Pei26-5 D99C/A366C K249R Eph-Pei26-6 D99C/A366C H87Q Eph-Pei26-7 D99C/A366C V422H Eph-Pei26-8 D99C/A366C S293R Eph-Pei26-9 D99C/A366C E286K Eph-Pei26-10 D99C/A366C V155I Eph-Pei26-11 D99C/A366C K91R Eph-Pei26-12 D99C/A366C V155L Eph-Pei26-13 D99C/A366C S293N Eph-Pei26-14 D99C/A366C F246Y Eph-Pei26-15 D99C/A366C F246R Eph-Pei26-16 D99C/A366C C308T/C419D Eph-Pei26-17 D99C/A366C C308T/C419D E286K Eph-Pei26-18 D99C/A366C C308T/C419D E286K/V422H Eph-Pei26-19 D99C/A366C C308T/C419D E286K/F246R Eph-Pei26-20 D99C/A366C C308T/C419D E286K/F246R/V422H Eph-Pei26-21 D99C/A366C C308T/C419D H87Q/E286K/F246R Eph-Pei26-22 D99C/A366C C308T/C419D H87Q/E286K/F246R/ V422H Eph-Pei26-23 D99C/A366C C308T/C419D E286K/F246Y/S293N Eph-Pei26-24 D99C/A366C C308T/C419D E286K/F246Y/S293N/ V422H Eph-Pei26-25 D99C/A366C C114V/C119V/ F103W/E286K C141S/C308S/ C419D Eph-Pei26-26 D99C/A366C C114V/C119V/ C141S/C308S/ C419D Eph-Pei26-27 D99C/A366C C114V/C141S/ E286K/F246Y C308T/C419D Eph-Pei26-28 D99C/A366C C114V/C141S/ I282F/E286K/F246Y C308T/C419D

TABLE-US-00012 TABLE 11 T4 rapid ligation reaction system Reaction component Final concentration 2× rapid ligation Buffer 1× DNA construct/enzyme complex 10 uM End-repaired nucleic acid library 10 uM SEQ ID NO: 17 with a fixed-length of 10 kb Nuclease-free water N/A T4 DNA Ligase (600 U/ul) 30 U/ul Total volume 20 ul

[0322] Electrical measurements were acquired from MspA nanopores inserted in 1,2-diphytanoyl-glycero-3-phosphocholine lipid bilayers. Bilayers were formed across about 25 μm diameter apertures on a PTFE film via the Montal-Mueller technique, separating two 100 μL buffered solutions. All experiments were carried out in the stated buffered solution. Single-channel currents were measured on an amplifier equipped with digitizers. Ag/AgCl electrodes were connected to the buffered solutions so that the cis compartment was connected to the ground of the amplifier, and the trans compartment was connected to the active electrode of the amplifier. After achieving a single pore in the bilayer, the incubated DNA/enzyme mixture was added to 70 μL of buffer (10 mM HEPES, 500 mM KCl, 50 mM MgCl.sub.2, 50 mM ATP, pH 7.0) in the cis compartment of the electrophysiology chamber and a constant potential of +180 mV was applied at 35° C. to initiate capture of the DNA construct/enzyme complex in the MspA nanopore (SEQ ID NO: 12) and the movement of nucleic acid through the nanopore controlled by the helicase, and the DNA movement controlled by helicase was monitored and recorded.

[0323] 2. Results

[0324] This example analyzes the characteristic parameters of the current signals generated by the movement of the nucleic acid sequence of SEQ ID NO: 17 through the nanopore controlled by different mutants of helicase Eph-Pei26, such as a median speed, a deletion ratio, an insertion ratio, an uneven ratio, and flick ratio, etc., to illustrate the influence of substitutions of amino acids that interact with the template strand or complementary single-stranded nucleic acid on the ability of the enzyme to control the movement of nucleic acids, thereby screening out helicase mutants that effectively control the movement of nucleic acids through a nanopore and improve the accuracy and throughput of sequencing. The above parameters can be calculated by the method as follows. The sequencing current signal of an original fixed-length library was segmented and compared with that of a reference base sequence by a common sequence alignment algorithm (such as dynamic time warping), giving a processed current signal and corresponding base sequence information, and each parameter can be calculated separately based on the segmentation and comparison results. The above parameters are defined or calculated as follows: (1) The median speed is calculated in the same way as described in Example 6; (2) The deletion ratio is defined as the value obtained by dividing the number of missing bases by the total number of bases in the reference sequence, wherein the missing bases are bases missing in the original current signal compared to the reference sequence; (3) The insertion ratio is defined as the ratio of insertion steps in the original current signal, and one insertion is present if two or more steps correspond to one base according to the comparison result; (4) The uneven ratio is defined as the proportion of uneven steps. Ideally, each base may be expressed as a flat continuous signal on the original current signal. In fact, however the continuous signal is only relatively flat, and a signal block is considered as an uneven step if the standard deviation of the current value is greater than a certain threshold. The proportion of uneven steps is defined as the value obtained by dividing the number of uneven steps by the total number of steps; (5) The flick ratio is defined as the proportion of signal blocks with large signal disturbances. The signal is high-pass filtered to obtain a component with a relatively high frequency. If the high-pass filtered signal block has a standard deviation greater than a certain threshold, it is considered that this signal block has high interference. The number of high interference blocks divided by the total number is the flick ratio. The lower the value of parameters (2)-(5), the better the sequencing results.

[0325] The analysis results of various parameters of different mutants are shown in Table 12. It can be seen from the results of parameter analysis that different amino acid positions have different effects on the movement of nucleic acids controlled by enzyme, including the speed, especially the uneven ratio, deleltion ratio and insertion ratio. This may be caused by interference of amino acid side chains on base shift or slippage. The smaller the above parameters, the better the quality of the current signals generated by the movement of nucleic acids controlled by the enzyme. In the evaluation of different mutants, the effects on various parameters were combined to investigate the effects of mutations. The mutants with large differences among the above parameters were selected for subsequent base recognition model training in order to verify that the effect of modifications of amino acid sites on the difference in the parameters of movement of the nucleic acid controlled by the enzyme and the effect of modifications of amino acid sites on the accuracy of sequencing.

TABLE-US-00013 TABLE 12 Current signal characteristic parameters of different mutants Insertion Uneven Deleltion Flick Median Mutant No. ratio ratio ratio ratio speed (bp/s) Eph-Pei26-1 7.26% 57.10% 11.47% 14.39% 541 (control) Eph-Pei26-2 4.38% 49.10% 13.19% 12.70% 640 Eph-Pei26-3 5.49% 51.48% 11.52% 10.71% 453 Eph-Pei26-4 5.55% 51.33% 11.89% 12.82% 590 Eph-Pei26-5 5.95% 55.25% 12.47% 14.53% 588 Eph-Pei26-6 6.58% 54.65% 13.28% 13.92% 566 Eph-Pei26-7 6.72% 56.13% 12.23% 14.60% 583 Eph-Pei26-8 6.94% 55.05% 10.44% 13.47% 513 Eph-Pei26-9 7.27% 59.38% 10.98% 15.26% 502 Eph-Pei26-10 7.46% 56.22% 8.23% 11.19% 416 Eph-Pei26-11 7.53% 54.77% 9.64% 12.12% 477 Eph-Pei26-12 7.57% 56.39% 9.01% 11.89% 477 Eph-Pei26-13 7.65% 55.39% 8.66% 12.57% 471 Eph-Pei26-14 9.16% 63.49% 6.16% 15.06% 405 Eph-Pei26-15 10.39% 67.01% 9.24% 17.07% 366 Eph-Pei26-16 5.39% 53.48% 12.97% 13.77% 587 Eph-Pei26-17 5.41% 45.64% 13.19% 10.45% 583 Eph-Pei26-18 5.94% 50.28% 13.23% 12.95% 608 Eph-Pei26-19 7.79% 52.13% 12.73% 12.78% 471 Eph-Pei26-20 7.00% 51.80% 13.10% 12.93% 505 Eph-Pei26-21 7.95% 63.02% 12.44% 19.58% 427 Eph-Pei26-22 6.80% 51.42% 14.07% 11.81% 471 Eph-Pei26-23 5.27% 47.07% 13.27% 10.50% 587 Eph-Pei26-24 4.56% 45.18% 12.52% 10.28% 598 Eph-Pei26-25 8.14% 60.40% 11.11% 16.65% 445 Eph-Pei26-26 6.26% 56.24% 10.50% 14.33% 443 Eph-Pei26-27 11.79% 51.93% 8.84% 15.2% 494 Eph-Pei26-28 12.9% 53.49% 9.1% 16.1% 497

EXAMPLE 9

[0326] In this example, mutants Eph-Pei26-24 and Eph-Pei26-25 were selected because they showed significant differences in the comprehensive performance of the four parameters including deletion ratio, insertion ratio, uneven ratio and flick ratio, as shown in Table 12 of Example 8. The sequencing data of the movement of randomly digested library from Escherichia coli and the human genome through the MspA nanopore (SEQ ID NO: 12) controlled by the mutants were collected, and the same algorithm model was used to train base recognition models for different mutants. By evaluating the performance of models of different mutants in the validation set of chip data and test chip data, the differences in the ability of different mutants to control nucleic acid movement were evaluated with the various parameters in Table 12 of Example 8, enabling selection of suitable mutants for nanopore nucleic acid sequencing.

[0327] 1. Methods

[0328] The E. coli and human genome nucleic acids were randomly digested, and then were used to replace the E fragment of the DNA construct shown in Figure 4. Fragment E and fragments A/B/C/D/F were annealed to form a complex. The ligation product was purified by magnetic beads and incubated with Eph-Pei26-24 and Eph-Pei26-25 helicase mutants and catalyzed with TMAD. The processed products were purified and added to the sequencer chip for sequencing and collecting signals.

[0329] 2. Results

[0330] FIG. 14 shows the accuracy of the model after multiple rounds of training of the validation set data of randomly digested E. coli and human genome library sequenced by Eph-Pei26-24 and Eph-Pei26-25. Table 13 shows the test results of the base recognition models trained for mutants EPH-PEI26-24 and EPH-PEI26-25 on their respective chip data. The accuracy rate of the validation set refers to the ratio of the sequence aligned completely with the reference sequence in a single library signal after the model obtained after training performs base recognition on the chip data of the validation set. Correspondingly, the accuracy rate of the test chip refers to the ratio of the sequence aligned completely with the reference sequence in a single library signal when the trained model is used to identify the data of the test chip. The accuracy rates in the example are the median values after statistics. It can be seen that the accuracy of the mutant Eph-Pei26-24 on the E. coli and human genome validation set data is higher than that of Eph-Pei26-25, and this difference has also been verified on the test chip. The difference in accuracy between the test chips and between the test chip and the validation set, especially the difference between the chips is related to the repeatability of batches of experiments. The results demonstrate that the parameters in Example 8 regarding the control of the movement of nucleic acid by different mutants are effective for assessing the difference in the ability of enzymes to control the movement of the nucleic acid.

TABLE-US-00014 TABLE 13 Test results of mutants Eph-Pei26-24 and Eph-Pei26-25 on their respective test chip data Library type Test parameters Eph-Pei26-25 Eph-Pei26-24 E. coli Number of signals 9676 47537 genome Total throughput 43.07M 256.75M Alighed throughput 40.04M 247.71M Median accuracy 83.29% 86.50% Alighment rate 92.96% 96.48% Human Number of signals 10285 68274 genome Total throughput 51.06M 390.97M Alighed throughput 48.23M 374.2M Median accuracy 86.72% 87.54% Alighment rate 94.46% 95.78% Phage Number of signals 5745 11939 genome Total throughput 40.17M 88.84M Alighed throughput 38.25M 85.51M Median accuracy 82.88% 85.20% Alighment rate 95.22% 96.25%

[0331] The preferred embodiments of the present application are described in detail above. However, the present application is not limited to the specific details in the above embodiments. Within the scope of the technical concept of the present application, various simple variations could be made to the technical solution of the present application, and these simple variations belong to the protection scope of claims.

[0332] In addition, it should be noted that the various specific technical features described in the above specific embodiments can be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the present application does not specify any possible combination mode separately.