NGS LIBRARY PREPARATION USING COVALENTLY CLOSED NUCLEIC ACID MOLECULE ENDS

Abstract

The current invention pertains to adapters comprising a protelomerase recognition sequence, preferably a TeIN protelomerase recognition sequence. The adapters of the invention can be used for the preparation of a nucleic acid molecule library. The invention also relates to a method for producing a nucleic acid molecule library using one or more adapters comprising a protelomerase recognition sequence. The adapters may be contacted with a protelomerase to cleave and close the ends of the adapters. Said closed adapters are e.g. protected against exonuclease treatment. The method of the invention further concerns an amplification method and a sequencing method using adapters having a protelomerase recognition sequence.

Claims

1. An adapter comprising at least partly a double-stranded nucleic acid and a protelomerase recognition sequence.

2. The adapter according to claim 1, wherein the protelomerase recognition sequence is a TeIN protelomerase recognition sequence.

3. The adapter according to claim 1, further comprising an identifier sequence.

4. The adapter according to claim 1, wherein the adapter comprises at least one staggered end.

5. A method for preparing a nucleic acid molecule library, the method comprising: (a) providing a sample comprising at least a first and a second nucleic acid molecule, wherein the first nucleic acid molecule comprises a first target sequence not present in the second nucleic acid molecule and wherein optionally the second nucleic acid molecule comprises a second target sequence; (b) ligating an adapter according to claim 1 to the ends of the first and second nucleic acid molecules to provide adapter ligated nucleic acid molecules; (c) contacting the adapter ligated nucleic acid molecules with a protelomerase to cleave and covalently close the cleaved ends, resulting in a first and second nucleic acid molecule comprising closed ends; and (d) cleaving the first nucleic acid molecule comprising the closed ends at the first target sequence, to provide a first nucleic acid comprising one open end and one closed end.

6. The method according to claim 1, wherein the protelomerase recognition sequence is a TeIN protelomerase recognition sequence.

7. The method according to claim 5, wherein the sample further comprises a plurality of additional nucleic acid molecules.

8. The method according to claim 5, wherein the first nucleic acid molecule is cleaved by a programmable nuclease or a restriction endonuclease.

9. The method according to claim 8, wherein the programmable nuclease is an RNA-guided CRISPR nuclease.

10. The method according to claim 5, wherein the first and second nucleic acid molecules are provided by fragmentation of a genomic nucleic acid molecule.

11. The method according to claim 5, wherein the adapter is ligated by tagmentation.

12. The method according to claim 5, wherein the method further comprises (c1) exposing the sample to an exonuclease, after obtaining the nucleic acid molecules comprising closed ends and prior to cleaving the first nucleic acid molecule comprising the closed ends.

13. The method according to claim 5, wherein the method further comprises (e) exposing the sample to an exonuclease after obtaining the first nucleic acid molecule comprising one open end and one closed end.

14. The method according to claim 13, wherein the method further comprises (f) cleaving the second nucleic acid molecule comprising the closed ends at the second target sequence, resulting in a second nucleic acid comprising one open end and one closed end.

15. The method according to claim 5, wherein the method further comprises (g) linking a further adapter to the open end of the first, or optionally second, nucleic acid molecule comprising one open and one closed end, wherein the further adapter comprises at least one of an amplification primer binding site and sequence primer binding site and optionally an identifier sequence.

16. The method according to claim 5, wherein a nucleic acid molecule library is prepared from a plurality of samples and pooled.

17. The method according to claim 6, wherein the adapter ligated nucleic acid molecules are repaired to remove single-stranded breaks prior to contacting the molecules with a TeIN protelomerase.

18. A method for amplification of a nucleic acid molecule library, the method comprising: (a) preparing a nucleic acid molecule library according to claim 5; and (b) amplifying the nucleic acid molecule library using at least one of: (i) a first, and optionally second, primer annealing to the first nucleic acid molecule comprising one open and one closed end; (ii) a first, and optionally a second, primer annealing to the second nucleic acid molecule comprising one open and one closed end; (iii) a first, and optionally a second, primer annealing to the further adapter; and (iv) a combination of a first primer as defined in (i) or (ii), and a second primer as defined in (iii).

19. A method for analysing a sequence of interest in a sample comprising a first and a second nucleic acid molecule, comprising: (a) preparing a nucleic acid molecule library according to claim 5; (b) optionally amplifying the prepared nucleic acid molecule library; and (c) sequencing the nucleic acid molecule library.

20. The method according to claim 19, wherein the sequencing comprises deep-sequencing.

21. A kit of parts comprising: (a) one or more adapters according to claim 1; and (b) optionally a protelomerase.

Description

DETAILED DESCRIPTION OF THE INVENTION

[0086] The inventors discovered that adapters comprising a Protelomerase recognition site can be used for library preparation. In particular, adapters containing a recognition site for the Protelomerase enzyme can be ligated to nucleic acid molecules, wherein these nucleic acid molecules are either double stranded or made double stranded after adapter ligation. These adapters are subsequently cut by the Protelomerase enzyme and simultaneously the ends of the nucleic acid molecules are covalently closed. In case both ends of a nucleic acid molecule are closed this way, the molecule is protected against exonuclease degradation as it lacks free “end” nucleotides.

[0087] A terminus of a double stranded nucleic acid, wherein the 3′-end terminal nucleotide of the respective upper strand is covalently linked to the 5′-end terminal nucleotide of the respective bottom strand, is annotated herein as a “closed end”. Likewise, a terminus of a double stranded nucleic acid, wherein the 5′-end terminal nucleotide of the respective upper strand is covalently linked to the 3′-end terminal nucleotide of the respective bottom strand, is also annotated herein as a “closed end”. A “closed end” is thus understood herein as a terminus of a double stranded nucleic acid wherein said terminal nucleic acids from opposite strands are covalently linked to each other, as opposed to an “open end” which is understood herein as a terminus of a double stranded nucleic acid wherein said terminal nucleic acids from opposite strands are not covalently linked to each other.

[0088] For the novel library preparation methods detailed herein, preferably all nucleic acid molecules that are present in a particular nucleic acid sample are tagged on both sides with a Protelomerase adapter and are thus cut upon Protelomerase treatment, rendering covalently closed nucleic acid molecules that are insensitive for 5′ or 3′ modifying enzymes. An optional step of exonuclease treatment of the Protelomerase-treated sample can be added to remove any possible nucleic acid molecules that are not covalently closed on both ends. Subsequently, the (covalently closed) nucleic acid molecules can be selectively opened by using for instance targeted or programmable endonucleases. Although all nucleic acid molecules are still present in the reaction mixture, only those cleaved in the last opening reaction are able to be used in a subsequent (sequencing) process, for instance by ligating sequencing adapters to the open ends thereby selectively rendering these opened fragments ready for sequencing. Alternatively, the opened fragments may be degraded using exonuclease treatment, thereby enriching for the non-opened nucleic acid molecules for further processing. For instance, these non-opened molecules may be opened in a second round of selective opening using for instance programmable endonucleases targeted to these non-opened molecules.

The above mentioned approach has at least the following advantages. [0089] The addition of barcodes to the ends of nucleic acid molecules after which samples can be pooled before further sample preparation steps are performed. [0090] The need for a single CRISPR enzyme/guide complex for targeting a locus, in contrast to the usual use of two gRNAs per target locus. [0091] The approach is in principle sequencing platform agnostic. [0092] The approach can be used to target nucleic acid molecules without an amplification step, which enables the detection of native base modifications. [0093] The mentioned approach can be applied to nucleic acid molecules of any length, i.e. short molecules (<1 Kbp) or long molecules (>5 Kbp).

[0094] Therefore in a first aspect, the invention pertains to an adapter comprising a protelomerase recognition sequence. Preferably, the adapter comprises a TeIN protelomerase recognition sequence. Preferably, the adapter is for use in a method of the invention. Preferably, the adapter can be linked to a nucleic acid molecule used in the method of the invention.

[0095] The adapter may be single-stranded. A single-stranded adapter preferably comprises a section, preferably at its 3′ end, that is capable of hybridizing to a nucleic acid molecule used in the method of the invention. The single-stranded adapter preferably can hybridize to a single-stranded overhang of the nucleic acid molecule, preferably a 3′ overhang of the nucleic acid molecule. The single-stranded part of the annealed single-stranded adapter may subsequently be filled in, i.e. is made double-stranded, using a polymerase, such as, but not limited to, Klenow (known by the skilled person to have 5′.fwdarw.3′ polymerase activity and 3′.fwdarw.5′ exonuclease activity but lacking 5′.fwdarw.3′ exonuclease activity) or a Bst-polymerase (known by the skilled person to be a DNA polymerase from Bacillus stearothermophilus having polymerase activity and strand displacement activity, but lacking 3′.fwdarw.5′ exonuclease activity). The filling-in step optionally results in the generation of a double-stranded protelomerase recognition sequence.

[0096] Preferably, the adapter is at least partly double-stranded. The at least partly double-stranded adapter may be ligated to a nucleic acid molecule in the method of the invention as defined herein. Preferably, at least 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or 100% of the nucleotides in the adapter are double-stranded. Preferably, the protelomerase recognition sequence is double-stranded. The adapter may be 100% or “fully” double-stranded. The adapter may become fully double-stranded after ligation of the adapter to the nucleic acid molecule, e.g. by filing in the single-stranded part of the adapter using a DNA polymerase.

[0097] Preferably, the at least partly double-stranded adapter comprises two single-stranded molecules that may at least partly anneal to each other, i.e. the double-stranded adapter preferably comprises two open ends prior to ligating the adapter to the nucleic acid molecules as defined herein.

[0098] One end of the at least partly double-stranded adapter can be ligated to the nucleic acid molecule. Hence preferably at least the one end that is ligated to the nucleic acid molecule is double-stranded. The at least one end of double-stranded end of the adapter can be a blunt or a staggered or “sticky” end. Preferably, the adapter comprises at least one staggered end. Preferably, the end of the adapter that is ligated to the nucleic acid molecule has an end that is compatible with an end of the nucleic acid molecule. For example, in case the nucleic acid molecule comprises an end having an A-overhang, the adapter preferably comprises an end having a T-overhang. Similarly, in case the nucleic acid molecule is obtained by enzyme digestion leaving an overhang of 1, 2, 3, 4, 5 or more nucleotides, the adapter preferably comprises an overhang of respectively 1, 2, 3, 4, 5 or more nucleotides that are complementary to the overhang of the nucleic acid molecule.

[0099] The other end of the adapter preferably cannot be ligated to a nucleic acid molecule or an adapter. Any means to block ligation of an adapter end is suitable for use in the method of the invention. As a non-limiting example, the other end of the adapter may be single-stranded or comprises an incompatible overhang.

[0100] The adapter of the invention comprises a protelomerase recognition sequence, preferably a TeIN protelomerase recognition sequence. A protelomerase recognition sequence is any DNA sequence whose presence in a DNA template allows for its conversion into a closed linear DNA by the enzymatic activity of protelomerase. In other words, the protelomerase recognition sequence is required for the cleavage and religation of double stranded DNA by protelomerase to form covalently closed linear DNA. Typically, a protelomerase recognition sequence comprises a perfect palindromic sequence, i.e. a double-stranded DNA sequence having two-fold rotational symmetry.

[0101] The length of the perfect inverted repeat differs depending on the specific organism. In Borrelia burgdorferi, the perfect inverted repeat is 14 base pairs in length. In various mesophilic bacteriophages, the perfect inverted repeat is 22 base pairs or greater in length. Also, in some cases, e.g. E. coli N15, the central perfect inverted palindrome is flanked by inverted repeat sequences, i.e. forming part of a larger imperfect inverted palindrome.

[0102] A protelomerase recognition sequence as used in the invention preferably comprises a double stranded palindromic (perfect inverted repeat) sequence of at least 14 base pairs in length. Preferred perfect inverted repeat sequences include the sequences of SEQ ID NOs: 1-9 and variants thereof. SEQ ID NO: 1 (NCATNNTANNCGNNTANNATGN) is a 22 base consensus sequence. As e.g. disclosed in WO2010/086626, base pairs of the perfect inverted repeat are conserved at certain positions, while flexibility in sequence is possible at other positions. Thus preferably, SEQ ID NO: 1 is a minimum consensus sequence for a perfect inverted repeat sequence for use with a protelomerase in the process of the present invention. The protelomerase recognition sequence may have a sequence as described in WO2010/086626, which is incorporated herein by reference.

[0103] Preferably, the protelomerase recognition sequence has at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity with SEQ ID NO: 10. The sequence of SEQ ID NO: 10 is:

5′-TATCAGCACACAATTGCCCATTATACGCGCGTATAATGGACTATTGTGTGCTGATA-3′.

[0104] Preferably, the protelomerase cleaves the adapter sequence between positions 28-29 in the recognition sequence and closes the cleaved ends.

[0105] The adapter may consists of the protelomerase recognition sequence. Alternatively, the adapter may comprise additional nucleotides. The adapter may comprise an identifier sequence or “barcode” or “tag”. The identifier is preferably at least one of a sample identifier and an UMI. Preferably, the recognition sequence remains part of the nucleic acid molecule after cleaving and closing the cleaved ends.

[0106] The UMI may be a separate sequence within the adapter or, in case the protelomerase recognition sequence comprises degenerate nucleotides, these degenerate nucleotides may be used to introduce an identifier. For instance, in case of degenerate nucleotides in the protelomerase recognition sequence for one sample an adapter may be used with one or more specific nucleotides within this recognition sequence, whereas for a second or further sample, other specific nucleotides are used at this position, thereby creating an identifier sequence within the protelomerase recognition sequence. The adapter may comprise and sample identifier as well as an UMI.

[0107] A sample identifier may connect the sequence of a nucleic acid molecule to a specific sample. For example, the adapters used in the method of the invention may comprise an identifier sequence that is specific for a certain sample. Each additional sample can be processed using adapters having an identifier sequence specific for said additional sample. The processed samples can subsequently be pooled and the obtained sequences can be assigned to a specific sample using the sample identifier sequence.

[0108] A UMI is a substantially unique sequence or barcode, preferably fully unique, that is specific for a nucleic acid molecule, i.e. unique for each nucleic acid molecule used in the method of the invention. The UMI may have random, pseudo-random or partially random, or non-random nucleotide sequences. A UMI can be used to uniquely identify the originating molecule from which a sequencing read is derived. For example, reads of amplified nucleic acid molecules can be collapsed into a single consensus sequence from each originating nucleic acid molecule. As indicated above, the UMI may be fully or substantially unique. Fully unique is to be understood herein as that every adapter-ligated nucleic acid molecule provided in the method of the invention comprises a unique tag that differs from all the other tags comprised in further adapter-ligated nucleic acid molecules used in the method of the invention. Substantially unique is to be understood herein in that each adapter-ligated nucleic acid molecule provided in the method of the invention comprises a random UMI, but a low percentage of these adapter-ligated nucleic acid molecules may comprise the same UMI. Preferably, substantially unique molecular identifiers are used in case the chances of tagging the exact same molecule comprising the same sequence with the same UMI is negligible. Preferably, the UMI is fully unique in relation to a specific sequence of the nucleic acid molecule. The UMI preferably has a sufficient length to ensure this uniqueness. In some implementations, a less unique molecular identifier (i.e. a substantially unique identifier, as indicated above) can be used in conjunction with other identification techniques to ensure that each nucleic acid molecule is uniquely identified during the sequencing process.

[0109] An identifier sequence may range in length from about 2 to 100 nucleotide bases or more, and preferably has a length between about 4-16 nucleotide bases. The identifier sequence can be a consecutive sequence or may be split into several subunits. Each of these subunits. These subunits may be present in a single adapter or may be present in separate adapters. For instance, if the nucleic acid molecule is flanked by two adapters, each of these two adapters may comprise a subunit of the identifier sequence. In order to obtain consensus sequences, the sequence reads obtained in the method of the invention may be grouped based on the information each of the two subunits.

[0110] Preferably the identifier sequence does not contain two or more consecutive identical bases. Furthermore, there is preferably a difference between identifier sequences of at least two, preferably at least three bases.

[0111] Means for designing and constructing an adapter for use in the invention are well known to the skilled person and the invention is not limited to any particular adapter design and/or construction. As a non-limiting example, two oligonucleotides can be constructed and annealed to one another under controlled conditions, resulting in at least partly double-stranded adapter for use in the invention. As a further non-limiting example, a long and a short oligonucleotide can be constructed, wherein the short oligonucleotide can anneal to the end of the long oligonucleotide. Preferably at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% or 100% of the nucleotides of the short oligonucleotide can anneal to the long oligonucleotide. Preferably the short oligonucleotide is 100% complementary to a section of the long oligonucleotide. Preferably this complementary section is located 3′ of the protelomerase recognition sequence, e.g. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more nucleotides 3′ of the recognition sequence. The complementary section may be located in between the protelomerase recognition sequence and the 3′ end of the long oligonucleotide. The complementary section may be located at the 3′ end of long oligonucleotide. Alternatively, the complementary section may be located upstream of the 3′ end of the long oligonucleotide, e.g. at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15 or more nucleotides upstream of the 3′ end of the long oligonucleotide. After annealing the short and long oligonucleotide, the part of the long oligonucleotide located 5′ of the complementary section may be filled in, thus producing a double-stranded adapter, wherein the double-stranded adapter may have 3′ overhang, wherein the 3′ overhang is the 3′ end of the long oligonucleotide. Filling in the single-stranded sequence, i.e. to generate a double-stranded sequence can be done using any conventional polymerase, such as, but not limited to Klenov or BST-polymerase. A preferred polymerase is a BST-polymerase.

[0112] Optionally, the adapter of the invention further comprises a restriction enzyme recognition site between the protelomerase recognition sequence and the part of the adapter for ligation to the nucleic acid molecule.

[0113] In a further aspect, the invention pertains to a method for preparing a nucleic acid molecule library. Preferably, the method comprises one or more of the following steps: [0114] a) providing a sample comprising at least a first and a second nucleic acid molecule, wherein the first nucleic acid molecule comprises a first target sequence not present in the second nucleic acid molecule; [0115] b) ligating an adapter as defined herein, i.e. comprising a protelomerase recognition sequence, to the ends of the first nucleic acid molecule to provide an adapter ligated nucleic acid molecule; [0116] c) contacting the adapter ligated nucleic acid molecule with a protelomerase to cleave and covalently close the cleaved ends, resulting in a first nucleic acid molecule comprising closed ends; and [0117] d) cleaving the first nucleic acid molecule comprising the closed ends to provide a first nucleic acid comprising one open end and one closed end.

[0118] Optionally, no adapters comprising protelomerase recognitions sequences are ligated to the ends of the second nucleic acid molecule, or amplicons thereof. Within such embodiment, the second nucleic acid molecules are eliminated, e.g. by exonuclease treatment between step c and d. Selective adapter ligation to a specific nucleic acid molecule may be achieved by creating specific ends, suitable for selective adapter ligation in step b, at the first nucleic acid molecule, which specific ends are not created at the ends of the second nucleic acid molecule. For instance, specific staggered ends may be created by a specific endonuclease capable of creating such staggered ends, such as, but not limited to, a type V CRISPR endonuclease such as Cpf1 in combination with a first crRNA targeted to a sequence upstream of the first target sequence and a second crRNA targeted to a sequence downstream of the first target sequence. Within such embodiment, the adapters used in step b should, at their side for ligation to the first nucleic acid molecule, comprise an overhang compatible for ligation to the staggered ends so created. Within this embodiment, the closed first nucleic acid molecule may be opened in step d by cleavage at a specific sequence within the adapter. For instance, if adapters are used that comprise a particular restriction enzyme recognition site between the side for ligation and the protelomerase recognition sequence. Alternatively, the closed first nucleic acid molecule may be opened in step d by cleavage at a sequence within the first nucleic acid molecule, such as the first target sequence.

[0119] Alternatively, in step b of the method of the invention, adapters are ligated to both the first and second nucleic acid molecules. Within such an embodiment, the closed second nucleic acid molecule obtained in step c may to be eliminated specifically from the reaction mixture comprising the closed first nucleic acid molecule prior to step d. This may be achieved by cleaving the closed second nucleic acid molecule at a specific sequence, i.e. a second target sequence, not present in the closed first nucleic acid molecule. Within such embodiment the second nucleic acid molecule of the method as defined herein comprises a second target sequence that is not present in the first nucleic acid molecule. The subsequent opened second nucleic acid molecule can be eliminated by exonuclease treatment. As the second nucleic acid molecule is now absent, the closed first nucleic acid may be opened in a specific or aspecific manner, for instance by cleaving at a sequence within the adapter as indicated herein above or at a sequence present in the first nucleic acid molecule. In case the method is such that the closed second nucleic acid molecule is not eliminated prior to step d, this closed second nucleic acid molecule is still present in the reaction mixture comprising the closed first nucleic acid molecule in step d. In such a design, the first nucleic acid is preferably selectively opened by cleaving at the first target sequence not present in the second nucleic acid molecule. Such a method preferably comprises the following steps: [0120] a) providing a sample comprising at least a first and a second nucleic acid molecule, wherein the first nucleic acid molecule comprises a first target sequence not present in the second nucleic acid molecule and wherein optionally the second nucleic acid molecule comprises a second target sequence; [0121] b) ligating an adapter as defined herein, i.e. comprising a protelomerase recognition sequence, to the ends of the first and second nucleic acid molecule to provide adapter ligated nucleic acid molecules; [0122] c) contacting the adapter ligated nucleic acid molecules with a protelomerase to cleave and covalently close the cleaved ends, resulting in a first and second nucleic acid molecule comprising closed ends; and [0123] d) cleaving the first nucleic acid molecule comprising the closed ends at the first target sequence, to provide a first nucleic acid comprising one open end and one closed end.
A preferred protelomerase is a TeIN protelomerase.

[0124] It is to be understood herein that an effective number of components is used in the method of the invention. A nucleic acid molecule library prepared by the method of the invention is preferably suitable for further processing of the nucleic acid molecule such as, but not limited to, cloning, amplification, sequencing and the like. Hence in additional aspects, the invention also concerns a method for cloning a nucleic acid molecule library, a method for amplifying a nucleic acid molecule library or a method for sequencing a nucleic acid molecule library, using the steps as described herein.

[0125] Preferably, the prepared nucleic acid molecule library is enriched for a nucleic acid molecule comprising a sequence of interest. “Enriched” is understood herein to mean a reduction or elimination of nucleic acid molecules not having a sequence of interest, either by (i) selective exclusion of nucleic acid molecules not having a sequence of interest from further processing steps, or by (ii) selective inclusion of nucleic acid molecules having a sequence of interest for further processing steps. The selectively excluded nucleic acid molecules may be degraded, e.g. by exonuclease treatment. The selectively included nucleic acid molecules may e.g. be cloned, amplified and/or sequenced.

The prepared nucleic acid library preferably comprises nucleic acid molecules having one closed end and one open end.

[0126] In an embodiment, the method as defined herein comprises a step a) of providing a sample comprising at least a first and a second nucleic acid molecule. Preferably the first nucleic acid molecule comprises a first target sequence not present in the second nucleic acid molecule. Preferably, the second nucleic acid molecule comprises a second target sequence. Optionally, the second target sequence is also present in the first nucleic acid molecule. Alternatively, the second target sequence is not present in the first nucleic acid molecule.

[0127] Preferably, the first nucleic acid molecule comprises a sequence of interest and the second nucleic molecule does not comprise said sequence of interest. In this embodiment, the first nucleic acid molecule will be present in the prepared nucleic acid molecule library and will preferably be processed further.

[0128] In an alternative embodiment, the first nucleic acid molecule does not comprise a sequence of interest, but the second nucleic acid molecule comprises said sequence of interest. In this embodiment, the second nucleic acid molecule will be present in the prepared nucleic acid molecule library and will preferably be processed further.

[0129] The sample comprising at least a first and a second nucleic acid molecule may be from any source, e.g. human, animal, plant, microorganism, and maybe of any kind, e.g. endogenous or exogenous to the cell, for example genomic DNA, chromosomal DNA, artificial chromosomes, plasmid DNA, or episomal DNA, cDNA, RNA, mitochondrial, or of an artificial library such as a BAC or YAC or the like. The DNA may be nuclear or organellar DNA. Preferably, the DNA is chromosomal DNA, preferably endogenous to the cell. Preferably, the first, second and optionally further nucleic acid molecules present in the sample that is used as starting material for the method of the invention is any one of DNA, such as genomic DNA, chromosomal DNA, organellar DNA, mitochondrial DNA, artificial chromosomes, plasmid DNA, episomal DNA, cDNA and RNA.

[0130] The first and second nucleic acid molecules may be long nucleic acid molecules, provided e.g. by cell lysis and optionally lysis of an organelle. The nucleic acid molecules used in the method of the invention may have a size of at least about 50 kb, 100 kb, 150 kb, 200 kb, 300 kb, 400 kb, 500 kb, 600 kb, 700 kb, 800 kb, 900 kb or at least about 1000 kb (1 Mb). The first and/or second nucleic acid for use in the invention may be high molecular weight (HMW) nucleic acids or ultra-high molecular weight (uHMW) nucleic acids. uHMW nucleic acids may have a length of at least 1 Mb. The nucleic acid molecules used in the method of the invention may have a size of at least 1.1 Mb, 1.3 Mb, 1.5 Mb, 1.7 Mb, 2 Mb, 2.5 Mb, 3 Mb, 4 Mb, 5 Mb, 6 Mb, 7 Mb, 8 Mb, 9 Mb or at least about 10 Mb.

[0131] Alternatively long nucleic acid molecules may first be fragmented, resulting in a first and second nucleic acid molecule. Therefore in an embodiment, the first and second nucleic acid molecules in step a) are provided by fragmentation. The fragmentation is preferably the fragmentation of a genomic nucleic acid molecule.

[0132] The skilled person is familiar with means to fragment longer nucleic acid molecules and the invention is not limited to any specific means for fragmenting the longer nucleic acid molecule. The fragmented nucleic acids are preferably fragmented genomic DNA. DNA, and in particular genomic DNA, can be fragmented using any suitable method known in the art. Methods for DNA fragmentation include, but are not limited to, enzymatic digestion and mechanical force.

[0133] Non-limited examples of fragmenting the nucleic acid molecule using mechanical force include the use of acoustic shearing, nebulization, sonication, point-sink shearing, needle shearing and French pressure cells.

[0134] Enzymatic digestion for fragmenting a nucleic acid molecule, which molecule comprises at least one of the first and second nucleic acid molecule as defined herein, includes, but is not limited to, endonuclease restriction. Enzymatic digestion, such as e.g. used in AFLP® technology, may further result in a complexity reduction of the nucleic acid sample. The skilled person knows which enzymes to select for the DNA fragmentation. As a non-limiting example, at least one frequent cutter and at least one rare cutter can be used for the fragmentation of the nucleic acid sample. A frequent cutter preferably has a recognition site of about 3-5 bp, such as, but not limited to MseI. A rare cutter preferably has a recognition site of >5 bp, such as but not limited to EcoRI.

[0135] In certain embodiments, in particular when the sample contains or is derived from a relative large genome, it may be preferred to use a third enzyme, rare or frequent cutter, to obtain a larger set of restriction fragments of shorter size.

[0136] The method of the invention is not limited to any specific restriction endonuclease. The endonuclease may be a type II endonuclease, such as EcoRI, Msel, Pstl etc. In certain embodiments a type IIS or type III endonuclease may be used, i.e. an endonuclease of which the recognition sequence is located distant from the restriction site, such as, but not limited to, Acelll, Alwl, AlwXl, Alw26l, Bbvl, Bbvll, Bbsl, Bed, Bce83l, Bcefl, Bcgl, Binl, Bsal, Bsgl, BsmAl, BsmFl, BspMl, Earl,Ecil, Eco3ll, Eco57l, Esp3l, Faul, Fokl, Gsul, Hgal, HinGUll, Hphl, Ksp632l, Mboll, Mmel, Mnll, NgoVlll, Plel, RleAl, Sapl, SfaNl, TaqJl and Zthll III. Restriction fragments can be blunt-ended or have protruding ends, depending on the endonuclease used.

[0137] In a preferred embodiment, the recognition site of at least one of the frequent cutter and the rare cutter is within or in close proximity of the sequence of interest, e.g. the recognition site of the frequent cutter or the rare cutter is located about 0-10000, 10-5000, 50-1000 or about 100-500 bases from the sequence of interest.

[0138] The current method as disclosed herein can also be used in AFLP® technology, e.g. for polyploid cells. The AFLP® technology is e.g. described in more detail in WO2007/114693, WO2006/137733 and WO2007/073165, which are incorporated herein by reference. The AFLP® technology as described in the art can be modified by attaching an adapter comprising a protelomerase recognition sequence as described herein, to the restricted nucleic acid sample.

[0139] In addition or alternatively, the nucleic acid sample may be digested using a programmable nuclease, preferably using at least one of a CRISPR nuclease, a zinc finger nuclease, TALENs and meganucleases.

[0140] Optionally, the first and/or second nucleic acid molecule may be modified to comprise an A-tail, preferably to facilitate ligation to the partly, or fully, double-stranded adapter comprising a protelomerase recognition sequence and further comprising a T-overhang. Hence prior to annealing an adapter to the fragmented nucleic acid, the method of the invention may optionally comprise a step of A-tailing the fragmented nucleic acid sample. A-tailing reactions are well-known in the art and the skilled person straightforwardly understands how to perform an A-tailing reaction, such as e.g. using a Klenow fragment (exo-).

[0141] The nucleic acid sample comprising at least one of a first and a second nucleic acid molecule, may comprise a plurality of further nucleic acid molecules. Hence in some embodiments, the nucleic acid sample comprises only a first nucleic acid molecule and only a second nucleic acid molecule. In other embodiments, the nucleic acid sample comprises a first nucleic acid molecule, a second nucleic acid molecule, in addition to a plurality of other nucleic acid molecules. Preferably, said further nucleic acid molecules do not comprise a first target sequence. Optionally, the further nucleic acid molecules do not comprise a second target sequence. This plurality of other nucleic acid molecules may be derived from at least one of the same organism, the same tissue, the same cell, the same organelle and/or the same molecule from which the first and second nucleic acid molecules are derived.

[0142] It is understood herein that a nucleic acid sample comprising a first nucleic acid molecule may also include a nucleic acid sample comprising a plurality of first nucleic acid molecules. Similarly, it is understood herein that a nucleic acid sample comprising a second nucleic acid molecule may also include a nucleic acid sample comprising a plurality of second nucleic acid molecules. Preferably, the first nucleic acid molecule is derived from the same organism, the same tissue, the same cell, the same organelle and/or the same molecule from which the second nucleic acid molecule is derived. The first and second nucleic acid molecule may have essentially the same sequence, with the exception of one or more nucleotides. As a non-limiting example, the first and second nucleic acid molecule may be allele variants. Alternatively, the first and second nucleic acid molecules may be very dissimilar, e.g. have less than 40%, 30%, 20%, 10% or 5% sequence identity. A predominant difference between the first and second nucleic acid nucleic acid molecule used in the invention, is that the first nucleic acid molecule comprises a target sequence that is not present in the second nucleic acid molecule.

[0143] Optionally, the second nucleic acid molecule may comprise a second target sequence. This second target sequence may or may not also be present in the first nucleic acid molecule.

[0144] In an embodiment, the method comprises a step b) of ligating an adapter to the ends of the first and second nucleic acid molecule to provide adapter ligated nucleic acid molecules. The adapter is preferably an adapter as defined herein, i.e. an adapter comprising a protelomerase recognition sequence. The adapter is preferably ligated to both ends of the first nucleic acid molecule and both ends of the second nucleic acid molecule. Preferably, the adapter is ligated to both ends of at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% of the nucleic acids present in the sample. Preferably after the ligation step, all nucleic acid molecules in the sample comprise an adapter on both ends. Put differently preferably all, or substantially all, nucleic acids in the sample are flanked on both sites by a covalently linked adapter. Ligation of an adapter can be performed using any conventional method known to the skilled person and the invention is not limited to any specific ligation method or ligation enzyme (ligase). Preferably to facilitate the ligation, the adapter comprises an end that is compatible to the end of the nucleic acid molecules, e.g. by using nucleic acid molecules obtained through the use of restriction endonucleases and compatible staggered ends on the adapters.

[0145] In an embodiment, the fragmented nucleic acid molecules may be polished to create blunt ends, followed by the addition of a 3′-A staggered overhang. The polishing step may be performed using any conventional means known in the art. Similarly, the addition of a 3′-A overhang may be achieved using any conventional method known to the skilled person. The nucleic acid molecules comprising a 3′-A-overhang may subsequently be ligated to compatible adapters comprising a 5′-T-overhang.

[0146] In an embodiment, the step of fragmentation and adapter ligation may be combined in a single step, e.g. by means of tagmentation. In this embodiment, the adapter in step b) is ligated by tagmentation, preferably using a Tn5 transposase. Transposases randomly cut the long DNA molecules in shorter nucleic acid molecules and adapters can be ligated on either side of the cleaved points. Tagmentation or “transposase mediated fragmentation and tagging” is a process that is well-known for the person skilled in the art, for example as exemplified in the workflow for Nextera™. The adapters may comprise sequences that make them compatible for use in a tagmentation reaction. Preferably, the adapters used in a tagmentation reaction further comprise a transposase sequence. The transposase sequence is preferably compatible with the transposase used in the tagmentation reaction. The tagmentation reaction may be followed by a repair step to ensure that all, or substantially all, generated nucleic acid molecules comprise an adapter on both sides. Hence the nucleic acid molecules comprising ligated adapters, optionally obtained by tagmentation, may be repaired to remove any single-stranded breaks. Preferably, the repair step takes place prior to contacting the molecules with a TeIN protelomerase in step c). Such repair step can be performed using any conventional means known in the art.

[0147] Optionally, the protelomerase recognition sequence is attached to the nucleic acid molecules via a primer instead of an adapter. Preferably said primer comprises [0148] i) a 3′-end for annealing to a primer binding site present in at least the first and/or second nucleic acid molecule, or to an, optionally universal, primer binding site in an adapter that has been ligated to said at least first and/or second nucleic acid molecule; and [0149] ii) a protelomerase recognition site in a 5′-tail of such primer.
Optionally, the primer binding site is a unique sequence, i.e. a sequence that is only present in the first and/or second nucleic acid molecule. Using one or more of these primers, the protelomerase sequence may be introduced in amplicons produced via PCR using the first and/or second nucleic acid molecule as template. Within such embodiment, in step b) instead of ligating an adapter as defined herein to the ends of the first nucleic acid molecule and optional second nucleic acid molecule to provide an adapter ligated nucleic acid molecules, the first and optional second nucleic acid molecules are amplified using at least one primer comprising a protelomerase recognition site; the subsequent steps are then performed on the resulting amplicons which can be closed upon protelomerase treatments. Alternatively, the protelomerase sequence may be introduced via a single step of denaturation, annealing of the primer and filling in the single strand overhang.

[0150] In those embodiments wherein the adapter is attached via a primer or tagmentation, instead of ligation via a (partly) double-stranded adapter, the terms “ligating” or “ligation” as used herein may be thus be replaced for the terms “attaching” or “attachment”.

[0151] In an embodiment, the method of the invention comprises a step c) of contacting the adapter ligated nucleic acid molecules with a protelomerase to cleave and covalently close the cleaved ends, resulting in a first and a second nucleic acid molecule comprising closed ends. Preferably, the protelomerase is a TeIN protelomerase.

[0152] Preferably, the first nucleic acid molecule comprises an adapter on both ends (i.e. at the 5′ and 3′ end) of the molecule and the second nucleic acid molecule comprises and adapter on both ends of the molecule, wherein said adapters have a protelomerase recognition sequence. Contacting the first and the second molecule comprising the adapters with a protelomerase under suitable conditions results in the cleavage, or “restriction” of the adapters. Simultaneously, the protelomerase can covalently close the nucleic molecules, resulting in a closed first nucleic acid and a closed second nucleic acid. Closed linear DNA molecules typically comprise covalently closed ends resulting in protection of terminal nucleotides against loss or damage.

[0153] A preferred protelomerase for use in the invention is a bacteriophage protelomerase. A protelomerase can be selected from the group consisting of:phiHAP-1 from Halomonas aquamarina, PY54 from Yersinia enterolytica, phiKO2 from Klebsiella oxytoca, VP882 from Vibrio sp. and NI 5 from Escherichia coli, or variants of any thereof. The protelomerase may have an amino acid sequence as disclosed in WO2010/086626, which is incorporated herein by reference. The use of bacteriophage NI 5 (TeIN) protelomerase or a variant thereof is particularly preferred. A preferred protelomerase has a sequence of at least 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity with SEQ ID NO: 11. Variants include homologues or mutants thereof. Mutants include truncations, substitutions or deletions with respect to the native sequence. A variant preferably produces closed linear DNA from a template comprising a protelomerase recognition sequence as described herein above.

[0154] The method may optionally comprise a step c1) of exposing the sample to an exonuclease after obtaining the nucleic acid molecules comprising closed ends in step c) and prior to cleaving the first nucleic acid molecule comprising the closed ends in step d). Hence in an embodiment, the method of the invention comprises the steps of: [0155] a) providing a sample comprising at least a first and a second nucleic acid molecule, wherein the first nucleic acid molecule comprises a first target sequence not present in the second nucleic acid molecule and wherein optionally the second nucleic acid molecule comprises a second target sequence; [0156] b) ligating an adapter as defined herein, i.e. comprising a protelomerase recognition sequence, to the ends of the first and second nucleic acid molecule to provide adapter ligated nucleic acid molecules; [0157] c) contacting the adapter ligated nucleic acid molecules with a protelomerase to cleave and covalently close the cleaved ends, resulting in a first and second nucleic acid molecule comprising closed ends; [0158] c1) exposing the sample comprising the first and second nucleic acid molecule comprising closed ends to an exonuclease; and [0159] d) cleaving the first nucleic acid molecule comprising the closed ends at the first target sequence, to provide a first nucleic acid comprising one open end and one closed end.

[0160] The exonuclease may digest any nucleic acid molecule not comprising two closed ends, i.e. comprising one or two open ends. Such nucleic acid molecules are for example, but not limited to, nucleic acid molecules without adapters, nucleic acid molecules with one or two adapters having an open end, and/or cleaved nucleic acid molecules having one open end and one closed end.

[0161] The nucleic acid molecules having two closed ends are protected from degradation, while the non-protected fragments are degraded, resulting in enrichment or complexity reduction of the nucleic acid molecules comprising the sequence of interest, i.e. the first or optionally the second nucleic acid molecule. Therefore in an embodiment, the method of the invention takes the approach of removal of an undesired (non-target) part of the nucleic acid sample. As a non-limiting example, the adapters in step b) may be ligated to nucleic acid molecules having a selective staggered overhang, for example created by enzymatic digestion. The molecules comprising the adapters are subsequently closed in step c), and the exonuclease treatment in step c1) may digest any nucleic acid molecule not having two closed ends. The exonuclease treatment in step c1) may thus result in an enrichment of nucleic acid molecules comprising closed ends.

[0162] The exonuclease may be exonuclease I, III, V, VII, VIII, or related enzyme, or any combination thereof. Exonuclease III recognizes nicks and extend the nick to a gap until a piece of ssDNA is formed. Exonuclease VII can degrade this ssDNA. Exonuclease I also degrades ssDNA. ExoIII and ExoVII is a preferred combination of exonucleases for use in step c) of the method of the invention.

[0163] Exonuclease V is capable of degrading ssDNA and dsDNA in both 3′ to 5′ and in 5′ to 3′ direction. Therefore in a preferred embodiment, the exonuclease in step c) of the method of the invention is an exonuclease that is capable of degrading ssDNA and dsDNA in both 3′ to 5′ and in 5′ to 3′ direction, preferably an exonuclease V.

[0164] Further information on methods for degrading non-target sequences is provided in U.S. Patent Publication No. 2014/0134610, which is incorporated herein by reference in its entirety for all purposes.

[0165] Step c1) is preferably performed at conditions (e.g. time, temperature, enzyme concentration) sufficient for the exonucleases to degrade substantially all non-protected fragments. Preferably, step c1) is performed at conditions and time sufficient for the exonucleases to degrade all non-protected fragments. Step c1) is preferably performed for about 1 minute to about 12 hours, preferably 30 min, at about 10-90° C., preferably about 37° C.,

[0166] After step c1), the exonuclease may be inactivated by, for example, but not limited to, at least one of a Proteinase, e.g. Proteinase K, treatment or heat inactivation. Such techniques are standard in the art and the skilled person straightforwardly understands how to inactivate an exonuclease. A preferred inactivation step is heating the sample at a temperature of about 50-90° C., preferably about 75° C., for about 1-120 minutes, preferably about 10 minutes.

[0167] In an embodiment, the method of the invention, comprises a step d) of cleaving the first nucleic acid molecule comprising the closed ends at the first target sequence, to provide a first nucleic acid comprising one open end and one closed end. “Cleaving” is understood herein the generation of a double-stranded break. The double-stranded break may be created by the use of a nuclease or by the use of two nickases that cleave opposite stands. The double stranded break may create a blunt open end of the first, and optionally second, nucleic acid molecule. After cleavage the cleaved nucleic acid molecule may thus have one open blunt end and one closed end. Alternatively, the double stranded break may create a staggered open end of the cleaved nucleic acid molecule. After cleavage, the cleaved nucleic acid molecule may thus have one open staggered end and one closed end.

[0168] Preferably, the first nucleic acid molecule in step d) is cleaved by a programmable nuclease or a restriction endonuclease. The first nucleic acid molecule thus comprises a target sequence that is not present in the second nucleic acid molecule. The first nucleic acid molecule may comprise the target sequence more than once, e.g. the first nucleic acid molecule may comprise the target sequence 1, 2, 3, 4, 5, 6 or more times. In an embodiment, the second nucleic acid molecule may comprise a target sequence that is not present in the first nucleic acid molecule. The second nucleic acid molecule may comprise the target sequence more than once, e.g. the second nucleic acid molecule may comprise the target sequence 1, 2, 3, 4, 5, 6 or more times.

[0169] The skilled person easily understands that this step can be extended to additional nucleic acid molecules, e.g. at least a third, fourth or fifth or further nucleic acid molecule. Each nucleic acid molecule may optionally comprise a target sequence that is absent in any of the other nucleic acid molecules.

[0170] It is thus understood herein that the nucleic acid sample comprises at least one nucleic acid molecule comprising a sequence of interest, i.e. the first nucleic acid molecule as defined herein or optionally the second nucleic acid molecule as defined herein. Put differently, the nucleic acid sample thus may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more sequences of interest, such as at least about 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 750, 1000 or more sequences of interest, wherein preferably each sequences of interest within the sample has a distinct target sequence. The method of the invention may provide for a simultaneous enrichment of these sequences of interest from a nucleic acid sample. Therefore optionally, in step d) of the method of the invention, multiple gRNA-CAS complexes are added for enrichment of nucleic acid molecules from a nucleic acid sample. Preferably, these multiple gRNA-CAS complexes may comprise the same CRISPR-nuclease, but may differ in their gRNA. For example, for each nucleic acid molecule comprising a sequence of interest, a distinct gRNA molecule may be used. For e.g. at least about 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 750, 1000 or more nucleic acid molecules, preferably at least about 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 750, 1000 or more gRNA molecules may be used in the method of the invention.

[0171] The first, and optionally second, nucleic acid molecule comprising closed ends may be cleaved by a restriction endonuclease. In an embodiment wherein the first and second nucleic acid molecule are cleaved, the first and second nucleic acid molecule are cleaved by a different endonuclease. Any sequence-specific endonuclease may be suitable for use in the invention. The endonuclease may be a so-called “restriction endonuclease” or “restriction enzyme”, e.g. a Type I, Type II, Type III, Type IV or Type V restriction endonuclease. A preferred restriction endonuclease is a Type II restriction endonuclease, preferably Type IIP or Type IIS. In case a fragmentation in step a) is performed by cleaving the DNA with a restriction enzyme, the enzyme used in step d) is preferably a different endonuclease.

[0172] The first nucleic acid molecule, and optionally the second nucleic acid molecule, may be cleaved by a programmable nuclease. In an embodiment wherein the first and second nucleic acid molecule are cleaved, the first and second nucleic acid molecule are cleaved by a different programmable nuclease, i.e. programmable nucleases that recognize different target sequences. A programmable nuclease may be selected from the group consisting of a zinc finger nuclease, a meganuclease, a TAL-effector nuclease and an RNA-guided CRISPR nuclease. Preferably, the programmable nuclease is an RNA-guided CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) nuclease.

[0173] The RNA-guided CRISPR nuclease is preferably part of a gRNA-Cas complex. A gRNA-CAS complex is to be understood herein as a CRISPR associated (CAS) protein, or CRISPR-nucleases, complexed with a guide RNA. A CRISPR-nuclease comprises a nuclease domain and at least one domain that interacts with a guide RNA. When complexed with a guide RNA, the CRISPR-nuclease is directed to the target sequence by a guide RNA. The guide RNA interacts with the CRISPR-nuclease as well as with the target sequence, such that, once directed to the site comprising the specific target sequence via the guide sequence, the CRISPR-nuclease is able to introduce a break at the target sequence. Preferably, the CRISPR-nuclease is able to introduce a single or double strand break at the target sequence, in case one or both domains of the nuclease are catalytically active, respectively. The skilled person is well aware of how to design a guide RNA in a manner that it, when combined with a CRISPR-nuclease, effects the introduction of a single- or double-stranded break at a predefined target site in the first, and/or optionally second, nucleic acid molecule.

[0174] CRISPR-nucleases can generally be categorized into six major types (Type I-VI), which are further subdivided into subtypes, based on core element content and sequences (Makarova et al, 2011, Nat Rev Microbiol 9:467-77 and Wright et al, 2016, Cell 164(1-2):29-44). In general, the two key elements of a CRISPR-CAS system complex is a CRISPR-nuclease and a crRNA. CrRNA consists of short repeat sequences interspersed with spacer sequences derived from invader DNA. CAS proteins have various activities, e.g., nuclease activity. Thus, gRNA-CAS complexes provide mechanisms for targeting a specific sequence as well as certain enzyme activities upon the sequence.

[0175] Type I CRISPR-CAS systems typically comprise a Cas 3 protein having separate helicase and DNase activities. For example, in the Type 1-E system, crRNAs are incorporated into a multi-subunit effector complex called Cascade (CRISPR-associated complex for antiviral defense) (Brouns et al, 2008, Science 321 : 960-4), which specifically binds to duplex DNA and triggers degradation by the Cas3 protein (Sinkunas et al., 2011, EMSO J 30: 1335-1342; Beloglazova et al., 2011, EMBO J 30:616-627).

[0176] Type II CRISPR-CAS systems include a signature Cas9 protein, a single protein (about 160 KDa), capable specifically cleaving duplex DNA. The Cas9 protein typically contains two nuclease domains, a RuvC-like nuclease domain near the amino terminus and the HNH (or McrA-like) nuclease domain near the middle of the protein. Each nuclease domain of the Cas9 protein is specialized for cutting one strand of the double helix (Jinek et al, 2012, Science 337 (6096): 816-821). The Cas9 protein is an example of a CAS protein of the type II CRISPR/-CAS system and forms an endonuclease, when combined with the crRNA and a second RNA termed the trans-activating crRNA (tracrRNA), which targets the invading pathogen DNA for degradation by the introduction of DNA double strand breaks (DSBs) at the position in the pathogen genome defined by the crRNA. Jinek et al. (2012, Science 337: 816-820) demonstrated that a single chain chimeric guide RNA (a sgRNA) produced by fusing an essential portion of the crRNA and tracrRNA was able to form a functional endonuclease in combination with the Cas9 protein.

[0177] Type III CRISPR-CAS systems contain polymerase and RAMP modules. Type III systems can be further divided into sub-types III-A and III-B. Type III-A CRISPR-CAS systems have been shown to target plasmids, and the polymerase-like proteins of Type III-A systems are involved in the specific cleavage of DNA (Marraffini and Sontheimer, 2008, Science 322: 1843-1845). Type III-B CRISPR-CAS systems have also been shown to target RNA (Hale et al, 2009, Cell 139:945-956).

[0178] Type IV CRISPR-CAS systems include Csf1, an uncharacterized protein proposed to form part of a Cascade-like complex, though these systems are often found as isolated cas genes without an associated CRISPR array.

[0179] A Type V CRISPR-CAS system has recently been described, the Clustered Regularly Interspaced Short Palindromic Repeats from Prevotella and Francisella 1 or CRISPR/Cpf1. Cpf1 genes are associated with the CRISPR locus and coding for an endonuclease that use a crRNA to target DNA. Cpf1 is a smaller and simpler endonuclease than Cas9, which may overcome some of the CRISPR-Cas9 system limitations. Cpf1 is a single RNA-guided endonuclease lacking tracrRNA, and it utilizes a T-rich protospacer-adjacent motif. Cpf1 cleaves DNA via a staggered DNA double-stranded break (Zetsche et al (2015) Cell 163 (3): 759-771). The type V CRISPR-CAS system preferably includes at least one of Cpf1, C2c1 and C2c3.

[0180] A Type VI CRISPR-CAS system may comprise a Cas13a protein, which comprises RNaseA activity. In case the target nucleic acid fragment is RNA, the at least first and second gRNA-CAS complex of the method of the invention may comprise Cas13a, such as, but not limited to Cas13 a from Leptotreichia wadee (LwCas13a) or from Leptotrichia shahii (LshCas13a) such as described in Gootenberg et al., Science. 2017 Apr 28; 356(6336):438-442.

[0181] The gRNA-CAS complex of the method of the invention may comprise any CRISPR-nuclease as defined herein above. Preferably, the gRNA-CAS complex used in the method of the invention comprises a Type II CRISPR-nuclease, e.g., Cas9 (e.g., the protein of SEQ ID NO: 12, encoded by SEQ ID NO: 13, or the protein of SEQ ID NO: 14) or a Type V CRISPR-nuclease, e.g. Cpf1 (e.g., the protein of SEQ ID NO: 15, encoded by SEQ ID NO: 16) or Mad7 (e.g. the protein of SEQ ID NO: 17 or 18), or protein derived thereof, having preferably at least about 70%, 80%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity to said protein over its whole length.

[0182] Preferably, the gRNA-CAS complex of the method of the invention comprises a Type II CRISPR-nuclease, preferably a Cas9 nuclease.

[0183] The skilled person knows how to prepare the different components of the CRISPR-CAS system, including CRISPR-nuclease. In the prior art, numerous reports are available on its design and use. See for example the review by Haeussler et al (J Genet Genomics. (2016)43(5):239-50. doi: 10.1016/j.jgg.2016.04.008.) on the design of guide RNA and its combined use with a CAS-protein (originally obtained from S. pyogenes), or the review by Lee et al. (Plant Biotechnology Journal (2016) 14(2) 448-462).

[0184] In general, a CRISPR-nuclease, such as Cas9, comprises two catalytically active nuclease domains. For example, a Cas9 protein can comprise a RuvC-like nuclease domain and an HNH-like nuclease domain. The RuvC and HNH domains work together, both cutting a single strand, to make a double-stranded break in DNA. (Jinek et al., Science, 337: 816-821). A dead CRISPR-nuclease comprises modifications such that none of the nuclease domains shows cleavage activity. The CRISPR-nuclease of the gRNA-CAS complex used in the method of the invention may be a variant of a CRISPR-nuclease wherein one of the nuclease domains is mutated such that it is no longer functional (i.e., the nuclease activity is absent), thereby creating a nickase. An example is a SpCas9 variant having either the D10A or H840A mutation. Preferably, the nuclease of the gRNA-CAS complex is not a dead nuclease. Preferably, the CRISPR-nuclease of the gRNA-CAS complex is either a nickase or an (endo)nuclease.

[0185] The gRNA-CAS complex that may be used in the method of the invention may comprise or consist of a whole Cas9 protein or variant or may comprise a fragment thereof. Preferably such a fragment does bind crRNA and tracrRNA or sgRNA, and maintains at least one of nuclease or nickase activity.

[0186] Preferably, the gRNA-CAS complex comprises a Cas9 protein. The Cas9 protein may be derived from the bacteria Streptococcus pyogenes (SpCas9; NCBI Reference Sequence NC_017053.1; UniProtKB—Q99ZW2), Geobacillus thermodenitrificans (UniProtKB—A0A178TEJ9), Corynebacterium ulcerous (NCBI Refs: NC_015683.1, NC_017317.1); Corynebacterium diphtheria (NCBI Refs: NC_016782.1, NC_016786.1); Spiroplasma syrphidicola (NCBI Ref: NC_021284.1); Prevotella intermedia (NCBI Ref: NC_017861.1); Spiroplasma taiwanense (NCBI Ref: NC_021846.1); Streptococcus iniae (NCBI Ref: NC_021314.1); Belliella baltica (NCBI Ref: NC_018010.1); Psychroflexus torquisl (NCBI Ref: NC_018721.1); Streptococcus thermophilus (NCBI Ref: YP_820832.1); Listeria innocua (NCBI Ref: NP_472073.1); Campylobacter jejuni (NCBI Ref: YP_002344900.1); or Neisseria meningitidis (NCBI Ref: YP_002342100.1). Encompassed are Cas9 variants from these, having an inactivated HNH or RuvC domain homologues to SpCas9, e.g. the SpCas9_D10A or SpCas9_H840A, or a Cas9 having equivalent substitutions at positions corresponding to D10 or H840 in the SpCas9 protein, rendering a nickase.

[0187] The programmable nuclease may be derived from Cpf1, e.g., Cpf1 from Acidaminococcus sp; UniProtKB—U2UMQ6. The variant may be a Cpf1-nickase having an inactivated RuvC or NUC domain, wherein the RuvC or NUC domain no longer has nuclease activity. The skilled person is well aware of techniques available in the art such as site-directed mutagenesis, PCR-mediated mutagenesis, and total gene synthesis that allow for inactivated nucleases such as inactivated RuvC or NUC domains. An example of a Cpf1 nickase with an inactive NUC domain is Cpf1 R1226A (see Gao et al. Cell Research (2016) 26:901-913, Yamano et al. Cell (2016) 165(4): 949-962). In this variant, there is an arginine to alanine (R1226A) conversion in the NUC-domain, which inactivates the NUC-domain.

[0188] The gRNA-CAS complex further comprise a CRISPR-nuclease associated guide RNA that directs the complex to the target sequence or “target site” in the nucleic acid molecule, also annotated as the protospacer sequence. A guide RNA comprises a guide sequence for targeting the gRNA-CAS complex to the protospacer sequence that is preferably near, at or within the sequence of interest in the nucleic acid molecule, and may be a sgRNA or the combination of a crRNA and a tracrRNA (e.g. for Cas9) or a crRNA only (e.g. in case of Cpf1). Optionally, more than one type of guide RNA may be used in the same experiment, for example aimed at two or more different nucleic acid molecules of interest, or even aimed at the same nucleic acid molecule of interest.

[0189] In an optional embodiment, the method of the invention is for polymorphism detection and/or detecting genetic variation by using an enzyme that recognizes and cuts heteroduplexes at the site of a mismatch. Within such embodiment, one or more nucleotide samples are fragmented and subsequently undergo at least one round of denaturation and annealing prior or after step b) of the method of the invention. Then, after step c) of the method of the invention, the closed nucleic acids can be treated with the enzyme recognizing and cutting heteroduplexes such as CEL I or an enzyme as described in Langhans MT and Palladino MJ (Curr Issues Mol Biol. 2009; 11(1): 1-12), which is incorporated herein by reference. This results in the opening of only double stranded DNA molecules containing heteroduplexes, which can then be selectively included for further processing (e.g. by ligating sequencing adapters to these open ends and subsequent sequencing) or selectively excluded for further processing (e.g. by degrading these fragments by exonuclease treatment).

[0190] In an embodiment, the method may comprise a step e) of exposing the sample to an exonuclease after obtaining the first nucleic acid molecule comprising one open end and one closed end in step d). In this embodiment, the first nucleic acid thus comprises an open end and the second nucleic acid comprises two closed ends. Hence, the second nucleic acid molecule, but not the first nucleic acid molecule, will be protected against exonuclease degradation. Exposure to the exonuclease thus results in digestion of the first nucleic acid, but not the second nucleic acid. In this embodiment, the second nucleic acid preferably comprises a sequence of interest.

[0191] The exonuclease may be an exonuclease as defined herein under step c1), optionally under the same or similar conditions as defined herein under step c1). Preferably, exonuclease digestion results in the digestion of all, or substantially all nucleic acid molecules comprising at least one open end. In this embodiment, the method of the invention may therefore comprise the steps of: [0192] a) providing a sample comprising at least a first and a second nucleic acid molecule, wherein the first nucleic acid molecule comprises a first target sequence not present in the second nucleic acid molecule and wherein optionally the second nucleic acid molecule comprises a second target sequence; [0193] b) ligating an adapter as defined herein, i.e. comprising a protelomerase recognition sequence, to the ends of the first and second nucleic acid molecule to provide adapter ligated nucleic acid molecules; [0194] c) contacting the adapter ligated nucleic acid molecules with a protelomerase to cleave and covalently close the cleaved ends, resulting in a first and second nucleic acid molecule comprising closed ends; [0195] d) cleaving the first nucleic acid molecule comprising the closed ends at the first target sequence, to provide a first nucleic acid comprising one open end and one closed end; and [0196] e) exposing the sample to an exonuclease.
Optionally in between step c) and d), the method may further comprise a step c1) as described herein above.

[0197] Optionally, step e) may comprise a step e1) of removing and/or inactivating the restriction endonuclease and/or programmable nuclease, followed by a step e2) of exposing the sample to an exonuclease.

[0198] Step e1) may comprise heating the sample to a suitable temperature to remove and/or inactivate the restriction endonuclease and/or programmable nuclease. As a non-limiting example, the temperature may be increased to at least 40° C., 45° C., 50° C., 55° C., 60° C., 65° C., 70° C., 75° C., 80° C. or more. The temperature may be increased for a period of at least about 5′, 10′, 15′, 20′, 25′, 30′, 35′, 40′, 45′, 50′, 55′, 60′ (minutes) or longer.

[0199] Alternatively or in addition, step e1) may comprise the purification of the cleaved first nucleic acid molecule. Purification of the cleaved first nucleic acid molecule may be performed using any conventional means, such as, but not limited to an AMPure bead-based purification process and/or partial or complete digestion of the restriction endonuclease and/or programmable nuclease with a proteinase, such as, but not limited to, digestion with proteinase K.

The second nucleic acid molecule comprising two closed ends may subsequently be cleaved at a target sequence. The method of the invention may therefore further comprise a step f) of cleaving the second nucleic acid molecule comprising the closed ends at the second target sequence, resulting in a second nucleic acid comprising one open end and one closed end. The target sequence in the second nucleic acid molecule is preferably not present in the first nucleic acid molecule. However, as within this embodiment the first nucleic acid molecule is already removed at the time the cleaving of the second nucleic acid molecule takes place, optionally the target sequence in the second nucleic acid molecule is also present in the first nucleic acid molecule. Preferably, the method of the invention may comprise the steps of: [0200] a) providing a sample comprising at least a first and a second nucleic acid molecule, wherein the first nucleic acid molecule comprises a first target sequence not present in the second nucleic acid molecule and wherein optionally the second nucleic acid molecule comprises a second target sequence; [0201] b) ligating an adapter as defined herein, i.e. comprising a protelomerase recognition sequence, to the ends of the first and second nucleic acid molecule to provide adapter ligated nucleic acid molecules; [0202] c) contacting the adapter ligated nucleic acid molecules with a protelomerase to cleave and covalently close the cleaved ends, resulting in a first and second nucleic acid molecule comprising closed ends; [0203] d) cleaving the first nucleic acid molecule comprising the closed ends at the first target sequence, to provide a first nucleic acid comprising one open end and one closed end; [0204] e) exposing the sample to an exonuclease; and [0205] f) cleaving the second nucleic acid molecule comprising the closed ends at the second target sequence, resulting in a second nucleic acid comprising one open end and one closed end.
Optionally in between step c) and d), the method may further comprise a step c1) as described herein above.

[0206] Preferably, the second nucleic acid molecule in step f) is cleaved by a programmable nuclease or a restriction endonuclease, preferably a restriction endonuclease as defined in step d) or a programmable nuclease as defined in step d). Preferably, the second nucleic acid molecule in step f) may be digested using a programmable nuclease, preferably using at least one of a CRISPR nuclease, a zinc finger nuclease, TALENs and meganucleases.

[0207] Preferably, the second nucleic acid molecule is digested by an RNA-guided CRISPR nuclease. The CRISPR nuclease used for cleaving the first and second nucleic acid molecule may be the same or different. In the case that the CRISPR nucleases used for cleaving the first and second nucleic acid molecules are the same, the guide RNA sequence bound to the CRISPR nuclease is not the same. Put differently, in case CRISPR nucleases are used to cleave the first and second nucleic acid molecule, it is understood herein that the gRNA-Cas complex that recognizes and cleaves the first nucleic acid molecule is a different gRNA-Cas complex that recognizes and cleaves the second nucleic acid molecule.

The method may further comprise a step g) of linking an additional (or “further”) adapter to the open end at least one of the first and second nucleic acid molecule comprising one open and one closed end.

[0208] Hence in an embodiment, the method may comprise step a), step b), step c), step d) and step g). Optionally, the method may comprise step a), step b), step c), step c1), step d), and step g). In this embodiment, the additional adapter is linked to the open end of the first nucleic acid molecule. The first nucleic acid molecule preferably comprises a sequence of interest.

[0209] In another embodiment, the method may comprise step a), step b), step c), step d), step e), step f) and step g). Optionally, the method may comprise step a), step b), step c), step c1), step d), step e), step f) and step g). In this embodiment, the additional adapter is linked to the open end of the second nucleic acid molecule. The second nucleic acid molecule preferably comprises a sequence of interest.

[0210] The additional adapter may be an adapter suitable for amplification and/or sequencing. The additional adapter may be a sequencing adapter, e.g. comprises a functional domain that allows for Roche 454A and 454B sequencing, ILLUMINA™ SOLEXA™ sequencing, Applied Biosystems' SOLID™ sequencing, the Pacific Biosciences' SMRT™ sequencing, Pollonator Polony sequencing, Oxford Nanopore Technologies (ONT), Ontera sequencing or Complete Genomics sequencing.

[0211] Hence preferably, the additional adapter comprises at least one sequencing primer binding site and/or the additional adapter comprises at least one amplification primer binding site. The additional adapter may comprise at least two sequencing primer binding sites and/or the further adapter may comprise at least two amplification primer binding site. The additional adapter may be a single-stranded, double-stranded, partly double-stranded, Y-shaped or a hairpin nucleic acid molecule. Preferably, the adapter hairpin adapter or a Y-shaped adapter.

[0212] Stem-loop or hairpin adapters are single-stranded, but their termini are complementary such that the adapter folds back on itself to generate a double-stranded portion and a single-stranded loop. A stem-loop adapter can be linked to an end of the linear, double-stranded nucleic acid molecule. For example, where a stem-loop adapter is joined in step g) to the open end of respectively the first or second nucleic acid molecule, there are no terminal nucleotides. The resulting molecule hence lacks terminal nucleotides.

[0213] The first or second nucleic acid molecule in step g) may be linked to circularizable adapters. In this respect, nucleic acid molecules comprising an open end may be circularized by self-circularization of compatible structures on either side of the fragment (which may result from adapter ligation or as a result of restriction enzyme digestion of ligated adapters) or circularized by hybridization to a selector probe that is complementary to the ends of the desired fragment. Extension and a final step of ligation creates a covalently closed circular, optionally double-stranded, polynucleotide.

[0214] The additional adapter may be a protective adapter. In this context, a protective adapter is to be understood herein as an adapter that is specifically designed to protect the nucleic acid molecule captured by the adapter for exonuclease digestion. Such adapter preferably protects against exonuclease degradation either by the inclusion of chemical moieties or blocking groups (e.g. phosphorothioate) or by a lack of terminal nucleotides (hairpin or stem-loop adapters, or circularizable adapters).

[0215] Optionally the additional adapter comprises an identifier sequence, preferably an identifier sequence as defined herein.

[0216] Preferably, a nucleic acid molecule library is prepared from a plurality of samples. Optionally, the method of the invention is multiplexed, i.e. applied simultaneously for multiple nucleic acid samples, such as for at least about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 500, 1000 or more nucleic acid samples. The method may thus be performed in parallel on a plurality of samples, wherein “in parallel” is to be understood herein as substantially simultaneously but each sample being processed in a separate reaction tube or vessel.

[0217] In addition or alternatively, one or more steps of the method of the invention may be performed on pooled samples. In order to trace back the first and/or second nucleic acid molecule to the originating sample, the first and/or second nucleic acid molecule may be tagged with an identifier prior to pooling the samples. Such an identifier can be any detectable entity, such as, but not limited to, a radioactive or fluorescent label, but preferably is a particular nucleotide sequence or combination of nucleotide sequences, preferably of defined length. In addition or alternatively, the samples can be pooled using a clever pooling strategy, such as, but not limited to, a 2D and 3D pooling strategy, such that after pooling each sample is encompassed in at least two or three pools, respectively. A particular nucleic acid molecule can be traced back to the originating sample by using the coordinates of the respective pools comprising the first and/or second nucleic acid molecule. The plurality of samples may be pooled prior to step b) step c), step d) , step e), step f) or prior to step g), or after step g).

In between step a) and b), between step b) and c), between step c) and d) and/or after step d) as described herein, the nucleic sample may be purified and/or the reaction enzyme may be inactivated.

[0218] In an embodiment of the invention, between step c) and c1) and/or between step c1) and d) as described herein, the nucleic sample may be purified and/or the reaction enzyme may be inactivated.

[0219] In an embodiment of the invention, between step d) and e), between e) and f), between step f) and g), between step d) and g), and/or after step g) as described herein, the nucleic acid sample may be purified and/or the reaction enzyme may be inactivated.

[0220] A purification step, e.g., an AMPure bead-based purification process, may be included to remove complexes, enzymes, free nucleotides, possible free adapters, and possible small, non-relevant, nucleic acid molecules. The first, and/or optionally second, nucleic acid molecule may be recovered after purification and subjected to further processing and/or analysis, such as single-molecule sequencing.

[0221] An optional purification step is a proteinase K treatment. Alternatively or in addition, said purification may comprise the following steps: [0222] I. exposing the nucleic acid sample to one or more solid supports that specifically and effectively bind the first, and/or optionally second, nucleic acid molecule; and optionally, [0223] II. washing the one or more solid supports and eluting the first, and/or optionally second, nucleic acid molecule from the one or more solid supports.
The one or more solid supports may be, but not limited to, Ampure beads. As after purification, at least one isolated nucleic acid molecule is obtained, the method as defined herein may also be regarded as a method for isolation of one or more nucleic acid molecules from a nucleic acid sample.

[0224] The method of the invention may further comprise a size-selection step. Optionally, the size-selection step is performed prior to step b), between step b) and c), between step c) and d), and/or after step d) of the method of the invention.

[0225] In an embodiment, the size selection step is performed in between step c) and c1) and/or between step c1) and d) of the invention.

[0226] In an embodiment, the size selection step is performed in between step d) and e), between step e) and f), between step f) and g), or after step g) of the invention.

[0227] Alternatively, there is no further purification, inactivation and/or size selection step. Hence in an embodiment, the method of the invention does not require any purification steps between steps a), b), c), d), e), f) and g), or after step g). In addition or alternatively, the method of the invention does not require any inactivation step between steps a), b), c), d), e), f) and g), or after step g). In addition or alternatively, the method of the invention does not require any size selection step between steps a), b), c), d), e), f) and g), or after step g).

[0228] The method of the invention may be followed by a step of sequencing one or more target nucleic acid molecules. The method as defined herein may therefore also be also regarded as a method for sequencing one or more target nucleic acid molecules from a nucleic acid sample.

[0229] Preferably, the sequencing step is performed after the addition of an adapter comprising a protelomerase recognition sequence. Preferably, the sequencing step is performed after step c), i.e. the sequencing of circular nucleic acid molecule. Preferably, the sequencing step is performed after the addition of a further adapter. Preferably, the sequencing step is performed after step g). Sequencing of at least one of the first and second nucleic acid molecule may be performed after step b), after step c1), after step d), after step e) or after step f).

[0230] Optionally, the method of the invention further comprises an amplification step. The amplification step may be performed after closing the nucleic acid molecules comprising an adapter, wherein the adapter comprises a protelomerase recognition sequence. Preferably, the amplification step is performed after step c), i.e. the amplification of a circular nucleic acid molecule. Optionally, the amplification step is performed after annealing a further adapter to the first or second nucleic acid molecule. Preferably, the amplification step is performed after step g). Amplification of at least one of the first and second nucleic acid molecule may be performed after step a), after step b), after step c1), after step d), after step e) and/or after step f). Amplification can be done by PCR or by any amplification method known in the art.

[0231] In an embodiment, the method of the invention is a sequencing method that is free of amplification and/or cloning steps. Reduction of amplification steps is beneficial, as epigenetic information (e.g., 5-mC, 6-mA, etc.) will get lost in amplicons. Further amplification can introduce variations in the amplicons (e.g., via errors during amplification) such that their nucleotide sequence is not reflective of the original sample. Similarly, cloning of a target region into another organism often does not maintain modifications present in the original sample nucleic acid, so in some embodiments, target sequences to be enriched for further analysis are typically not amplified and/or cloned in the methods herein.

[0232] In an aspect, the method of the invention pertains to a method for amplification of a nucleic acid molecule library. The method preferably comprises a step of preparing nucleic acid molecule library as defined as defined herein. The nucleic acid molecule library is preferably prepared using at least one of: [0233] steps a), b) and c) as defined herein; [0234] steps a), b), c) and c1) as defined herein; [0235] steps a), b), c) and d) as defined herein; [0236] steps a), b), c), c1) and d) as defined herein; [0237] steps a), b), c), d) and g) as defined herein; [0238] steps a), b), c), c1), d) and g) as defined herein; [0239] steps a), b), c), d) and e) as defined herein; [0240] steps a), b), c), c1), d) and e) as defined herein; [0241] steps a), b), c), d), e) and f) as defined herein; [0242] steps a), b), c), c1), d), e) and f) as defined herein; [0243] steps a), b), c), d), e), f) and g) as defined herein; and [0244] steps a), b), c), c1), d), e), f) and g) as defined herein.

[0245] The method further comprises a step of amplifying the nucleic acid molecule library. Amplification may be performed using a single primer, e.g. by means of “rolling circle” amplification. The single primer is preferably at least one of: [0246] i) a primer annealing to the first nucleic acid molecule comprising one open and one closed end as obtained in step d); [0247] ii) a primer annealing to the second nucleic acid molecule comprising one open and one closed end as obtained in step f); and [0248] iii) a primer annealing to the further adapter as defined in step g);
Alternatively or in addition, amplification may be performed using a primer pair, i.e. using a first and a second primer, wherein preferably the first and the second primer can anneal to the first nucleic acid molecule and/or wherein the first and second primer can anneal to the second nucleic acid molecule in a manner that allows for amplification of respectively the first and/or the second nucleic acid molecule.

[0249] Preferably, the primer pair comprises a first primer and a second primer that can anneal to the first nucleic acid molecule, preferably the first nucleic acid molecule obtained in step a), b), c), c1), d) or step g) as defined herein. Preferably, the primer pair comprises a first primer and a second primer that can anneal to the first nucleic acid molecule comprising one open and one closed end, as obtained in step d) or step g) as defined herein.

[0250] Alternatively or in addition, the primer pair may comprise a first primer and a second primer that can anneal to the second nucleic acid molecule, preferably the second nucleic acid molecule obtained in step a), b), c), c1), d), e), f) or step g) as defined herein. Preferably, the primer pair comprises a first primer and a second primer that can anneal to the second nucleic acid molecule comprising one open and one closed end as obtained in step f) or step g) as defined herein.

[0251] Preferably the first primer in the primer pair is not, or not substantially, complementary to the second primer in the primer pair.

[0252] In an embodiment, at least one of the first and the second primer may anneal to a sequence present in an adapter, preferably an adapter comprising a protelomerase recognition sequence as defined herein and/or a further adapter as defined in step g).

[0253] The first and second primer may anneal to a first sequence and second sequence present in the same adapter, preferably an adapter of step g) as defined herein. As a non-limiting example, the adapter may be a Y-shaped adapter and the first primer binding site may be present in the first single stranded arm of the Y-shaped adapter and the second primer binding site may be present in the other single-stranded arm of the Y-shaped adapter.

[0254] Alternatively or in addition, the first amplification primer may anneal to a sequence present in the first nucleic acid molecule and the second amplification primer may anneal to a sequence present in an adapter, preferably an adapter comprising a protelomerase recognition sequence or a further adapter of step g) as defined herein.

[0255] Alternatively or in addition, the first amplification primer may anneal to a sequence present in the second nucleic acid molecule and the second amplification primer may anneal to a sequence present in an adapter, preferably an adapter comprising a protelomerase recognition sequence or a further adapter of step g) as defined herein.

[0256] Alternatively or in addition, the first amplification primer may anneal to a sequence present in the adapter comprising a protelomerase recognition sequence and the second amplification primer may anneal to a sequence present in the further adapter of step g) as defined herein.

[0257] In a further aspect, the invention concerns a method for analysing a sequence of interest in a sample comprising a first and a second nucleic acid molecule. The method preferably comprises a step of preparing a nucleic acid molecule library as defined herein.

[0258] The sample may comprise at least a first and a second nucleic acid molecule. The first and/or second nucleic acid molecule may be part of a longer nucleic acid molecule. The nucleic acid sample may comprise a plurality of nucleic acid molecules, including a first and a second nucleic acid molecule.

[0259] As detailed herein, the prepared nucleic acid library preferably comprises at least one of a first and a second nucleic acid molecule. In an embodiment, the prepared nucleic acid library comprises a first nucleic acid molecule, but does not comprise the second nucleic acid molecule. In an alternative embodiment, the prepared nucleic acid library comprises a second nucleic acid molecule, but does not comprise the first nucleic acid molecule.

[0260] Said first or second nucleic acid molecule preferably comprises a sequence of interest. The nucleic acid molecule library is preferably prepared using at least one of: [0261] steps a), b) and c) as defined herein; [0262] steps a), b), c) and c1) as defined herein; [0263] steps a), b), c), d) and g) as defined herein; [0264] steps a), b), c), c1), d) and g) as defined herein; [0265] steps a), b), c), d), e), f) and g) as defined herein; and [0266] steps a), b), c), c1), d), e), f) and g) as defined herein.

[0267] The method preferably further comprises a step of analysing the prepared nucleic acid molecule library. Analysis can be performed using any conventional means known in the art. The analysis may include at least one of: [0268] detection of a sequence using a label, e.g. a radioactive or fluorescent label; [0269] analysis of the size of the prepared nucleic acid molecule library; [0270] cloning, optionally part of, the library into a vector, optionally followed by gene expression and/or restriction analysis; and [0271] sequencing the nucleic acid molecule library.
Preferably, the prepared nucleic acid molecule library is sequenced, preferably deep-sequenced. Sequencing may include at least one of ILLUMINA™, SOLEXA™ sequencing, Ion Torrent sequencing, the Pacific Biosciences' SMRT™ sequencing, Sanger sequencing, Genapsys, Pollonator Polony sequencing, Oxford Nanopore Technologies (ONT), Ontera sequencing and Complete Genomics sequencing.

[0272] In a preferred embodiment, the prepared nucleic acid molecule library is sequenced by nanopore selective sequencing. In nanopore selective sequencing, during real time sequencing the generated data (either direct current signals or base calls translated from these current signals) is compared to one or more reference sequence(s). In case a set number of nucleotides or amount of signals of the target sequence align with the reference sequence, sequencing will proceed, if not, current is reversed thereby removing the nucleic acid from the pore and making the pore available for sequencing of a new nucleic acid. The set number of nucleotides may be at least the first 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, or 500 nucleotides of the nucleic acid read. The one or more reference sequences may be a multitude of different sequences. Preferably, the each of these reference sequences is at least 50, 60, 70, 80, 90, 92, 93, 94, 95, 96, 97 98, 99 or 100% identical to the sequence of a target nucleic acid fragment of the nucleic acid molecule library obtained by the method of the invention. In an embodiment, each of the reference sequences is at least 50, 60, 70, 80, 90, 92, 93, 94, 95, 96, 97 98, 99 or 100% identical to a particular subset of the one or more sequences of target nucleic acid fragments of the nucleic acid molecule library obtained by the method of the invention. One of the benefits of selectively sequencing a particular subset by nanopore selective sequencing is that in different sequencing runs, different subsets may be sequenced using the prepared nucleic acid molecule library.

[0273] In an embodiment, the adapter comprising a protelomerase recognition sequence comprises at least one binding site for a sequencing primer.

[0274] Alternatively or in addition, the further adapter in step g) comprises at least one binding site for a sequencing primer. The further adapter in step g) may comprise two different binding sites for two sequencing primers. As a non-limiting example, the adapter in step g) may be a Y-shaped adapter and the first sequencing primer binding site may be present in the first single stranded arm of the Y-shaped adapter and the second sequencing primer binding site may be present in the other single-stranded arm of the Y-shaped adapter.

[0275] In an aspect, the invention pertains to a method for enriching a nucleic acid sample for a nucleic acid molecule comprising a sequence of interest. The method preferably uses at least method steps a)-d) as detailed herein above, but may use any of the additional steps as detailed herein, such as step c1), step e), step f) and/or step g).

[0276] In an aspect, the invention concerns a kit of parts for performing the method of the invention as described herein. Preferably, the kit of parts is for use in a method as defined herein. Preferably, the kit of parts comprises at least one or more adapters comprising a protelomerase recognition sequence as defined herein.

[0277] The adapters for use in a method as defined herein preferably do not comprise a recognition site for the restriction endonuclease or the programmable nuclease that is used in step d) and/or step f) of the method of the invention. More preferably the part of the adapter that is located in between the protelomerase recognition sequence and the end ligated to the first and/or second nucleic acid molecule does not comprise a recognition site for a restriction endonuclease or a programmable nuclease that us used in step d) and/or step f) of the method of the invention.

[0278] The one or more adapters may be combined in one vial or may be present in separate vials, e.g. wherein the adapters of one vial comprise the same identifier sequence, preferably the same sample identifier sequence. The kit of parts may further comprise a vial comprising a protelomerase as defined herein.

[0279] The kit of parts may comprise one or more reagents for performing a method as described herein. Hence the kit of parts may comprise at least one of: [0280] one or more vials comprising adapters comprising a protelomerase recognition sequence as defined herein; [0281] one or more vials comprising a further adapter as defined herein for step g); [0282] one or more vials comprising a protelomerase as defined herein; [0283] one or more vials comprising at a gRNA-CAS complex as defined herein; [0284] one or more vials comprising a gRNA for complexing with a CRISPR-CAS protein to form a gRNA-CAS complex, and a further vial comprising said CRISPR-CAS protein; [0285] a further vial comprising one or more exonucleases.

[0286] Preferably, the kit comprises at least 2, 4, 10, 20, 30, or 50 vials comprising one or more gRNAs as defined herein. Preferably, the volume of any of the vials within the kit do not exceed 100 mL, 50 mL, 20 mL, 10 mL, 5 mL, 4 mL, 3 mL, 2 mL or 1 mL.

[0287] The reagents may be present in lyophilized form, or in an appropriate buffer. The kit may also contain any other component necessary for carrying out the present invention, such as buffers, pipettes, microtiter plates and written instructions. Such other components for the kits of the invention are known to the skilled person.

[0288] In an aspect, the invention pertains to the use of an adapter comprising a protelomerase recognition sequence as defined herein for at least one of: [0289] i) preparation of a nucleic acid molecule library; [0290] ii) amplification of a nucleic acid molecule library; and [0291] iii) analysing a sequence of interest in a sample.

TABLE-US-00001 TABLE 1 Description sequence identifiers SEQ ID NO Description 1 Protelomerase consensus recognition sequence 2 recognition sequence for E. coli phage NI 5 and Klebsiella phage phiKO2 protelomerase 3 recognition sequence for Yersinia phage PY54 protelomerase 4 recognition sequence for Halomonas phage phiHAP-1 protelomerase 5 recognition sequence for Vibrio phage VP882 6 recognition sequence for Borrelia burgdorferi plasmid lpB31.16 protelomerase 7 recognition sequence for Vibrio phage VP882 8 recognition sequence for Yersinia phage PY54 protelomerase 9 Recognition sequence for from Halomonas phage phiHAP-1 protelomerase 10 TelN protelomerase recognition sequence 11 Amino acid sequence protelomerase 12 Cas9 amino acid sequence 13 Cas9 nucleotide sequence 14 Cas9 amino acid sequence 15 Cpf1 amino acid sequence 16 Cpf1 nucleotide sequence 17 MAD7 protein sequence 18 MAD7 protein sequence

Figure Legend

[0292] FIG. 1: Agilent 2100 Bioanalyzer DNA Analysis using the DNA 12000 kit. On the left side of the boxes the DNA size marker including the fragment lengths are indicated. The left box shows the amplicon (˜1050 bp) which is used as input in the experiment, without (left) and with (right) exonuclease (ExoV) treatment. In the middle box the amplicon to which a TeIN adapter is ligated is shown, without (left) and with (right) ExoV treatment. The right box shows the amplicon with ligated TeIN adapters treated with TeIN protelomerase, with (left) and without (right) ExoV treatment. Results show that TeIN treatment of the adapter ligated amplicon results in protection from ExoV degradation.

EXAMPLES

Material and Methods

[0293] An adapter containing the TeIN recognition site was prepared by combining:

Oligo 19_04626 (100 μM): 2 μl

Oligo 19_03053 (100 μM): 2 μl

[0294] Sequences of the oligos:

TABLE-US-00002 19_04626 (SEQ ID NO: 19) 5′-AGGACCGGATCAACTTATCAGCACACAATTGCCCATTATACGCGC GTATAATGGACTATTGTGTGCTGATAAAGAAAGTTGTCGGTGTCTTTG TGAGATGTGTATAAGAGACAGT-3′ 19_03053 (SEQ ID NO: 20) 5′-CTGTCTCTTATACACATCTCACAAAGACACCGACAACTTTCTTTA TCAGCACACAATAGTCCATTATACGCGCGTATAATGGGCAATTGTGTG CTGATAAGTTGATCCGGTCCT-3′
The 5′-end is preferably phosphorylated.
To enable hybridization of the oligos the following thermal profile was used:

95° C. 10 min

90° C. for 1 min

[0295] Reduce temp by 1° C./cycle 60 times
4° C. hold
The resulting adapter solution (50 μM) was diluted to 15 μM concentration.
Input material for the example was a 1 Kbp amplicon derived from Lambda DNA.
The amplification was performed using the following setup:

TABLE-US-00003 Lambda DNA 5 ng/μl 5 μl MilliQ water 9.3 μl PCR buffer 4 μl 25 mM dNTP (each) 0.2 μl Herculase Polymerase 0.5 μl Forward primer (10 μM) 0.5 μl Reverse primer (10 μM) 0.5 μl

TABLE-US-00004 Forward primer: 18_03029: (SEQ ID NO: 21) 5′-TCACGCTGATTTACAGCGGCA-3′ Reverse primer: 18_03032: (SEQ ID NO: 22) 5′-CGATGCTGATTGCCGTTCCG-3′
For amplification the thermoprofile used was:

95° C. 2 min

95° C. for 30 sec

[0296] 65° C. for 30 sec -> reducing temp by 0.7° C./cycle

72° C. for 4 min

[0297] 13 cycles

95° C. for 30 sec

56° C. for 30 sec

72° C. for 5 min

[0298] 25 cycles

72° C. for 2 min

[0299] 12° C. hold
The resulting amplicon was 0.8× purified and eluted in 20 ul MQ. The concentration was measured at the Qubit BR: 554 ng/μl
The purified amplicon was end repaired and A-tailed.
End repair (two reactions performed):
2 μl of purified amplicon

7 μl NEBNext Ultra II End Prep Reaction Buffer (New England Biolabs Inc.)

3 μl NEBNext Ultra II End Prep Enzyme Mix (New England Biolabs Inc.)

[0300] 48 μl MilliQ water
Total volume=60 μl -> incubated for 30 min 20° C., 30 min 65° C. and hold at 4° C. until further use.
Adapter ligation:

60 μl NEBNext Ultra II End Prep Reaction Mixture (New England Biolabs Inc.)

30 μl NEBNext Ultra II Ligation Master Mix (New England Biolabs Inc.)

1 μl NEBNext Ligation Enhancer (New England Biolabs Inc.)

2.5 μl Adapter (50 μM)

[0301] Total volume=93.5 μl -> incubated for 20 min 15° C.
The resulting ligated sample was purified using 1:1 Ampure beads and eluted in 20 μl MilliQ water.
To remove remaining adapters, an additional Ampure purification (0.75×) was performed.
Concentration of the adapter ligated product is 40 ng/μl
The adapter ligated product was treated with TeIN to covalently close the ends.

TABLE-US-00005 Adapter ligated product 4 μl ThermoPol Reaction Buffer (10x) (New England Biolabs Inc.) 2 μl TeIN Protelomerase (New England Biolabs Inc.) 2 μl MilliQ water 12 μl
The reaction mixture was gently mixed by pipetting up and down, briefly centrifuged and incubated at 30° C. for 30 min. The enzyme was inactivated by incubating at 75° C. for 5 min.
The resulting sample was purified using 1:1 Ampure beads and eluted in 15 μl MilliQ water.
To verify exonuclease protection, the TeIN treated sample was incubated with Exonuclease V.

TABLE-US-00006 Sample 10 μl NEB Buffer 3.1 (10x) 2.0 μl ATP (100 mM) 1.0 μl Exonuclease V (10 units) 1.0 μl MilliQ water 6.0 μl
The reaction mixture was incubated at 37° C. for 60 min. and the Exonuclease was inactivated at 70° C. for 30 min.
The sample was purified using Ampure (1×) and eluted in 10 ul MilliQ water.

Results

[0302] Results of the Bioanalyzer analysis are shown in FIG. 1: In brief;
Amplicons and adapter-ligated amplicons are readily degraded using Exonuclease V Adapter-ligated and TeIN treated amplicons are resistant to Exonuclease degradation

Conclusion

[0303] Covalently closing the ends of DNA fragments using TeIN, renders ExonucleaseV resistant fragments.

NGS LIBRARY PREPARATION USING COVALENTLY CLOSED NUCLEIC ACID MOLECULE ENDS

Inventors

Cpc classification

Classification Explorer

C12Q2521/113

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2521/113

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1093

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1093

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6855

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/191

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6855

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/191

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/10

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6855

CHEMISTRY; METALLURGY

Abstract

Claims

Description