METHODS FOR TAGGING DNA-ENCODED LIBRARIES

Abstract

The present invention relates to methods for producing encoded chemical entities. In particular, the oligonucleotides and methods can include encoded chemical entities having wild-type linkages formed through chemical ligation techniques. One strategy that can be utilized that simultaneously takes advantage of chemical ligation as a means to encode chemical history, while also retaining the ability of polymerases to directly recover tag sequence and association information, is to perform chemical ligation in a manner that generates wildtype phosphodiester linkages. Such methods generally utilize condensing agents such as cyanogen bromide or similar along with 5′-phosphate and 3′-hydroxyl oligonucleotides in a double-stranded or templated context. Similarly cyanogen bromide has also been shown to chemically ligate pairs of substrate oligonucleotides that are 5′-hydroxyl and 3′-phosphate. However, these methods suffer from poor efficiency making them ill-suited for use in an iterative process such as tagging DNA-en-coded libraries.

Claims

1. A method of producing an encoded chemical entity, said method comprising: (a) providing a headpiece comprising a first functional group and a second functional group; (b) binding said first functional group of said headpiece to a component of said chemical entity, wherein said headpiece is directly connected to said component or said headpiece is indirectly connected to said component by a bifunctional spacer; (c) ligating said second functional group of said headpiece to a first oligonucleotide tag via chemical ligation to form an encoded chemical entity, wherein said chemical ligation generates a phosphodiester, phosphonate, or phosphorothioate linkage; wherein steps (b) and (c) can be performed in any order and wherein said first oligonucleotide tag encodes for the binding reaction of said step (b), thereby producing an encoded chemical entity.

2. The method of claim 1, wherein said chemical ligation generates a phosphodiester linkage.

3. The method of claim 1 or 2, wherein said headpiece comprises a double-stranded oligonucleotide, a single-stranded oligonucleotide, or a hairpin oligonucleotide.

4. The method of claim 3, wherein said headpiece comprises a double stranded oligonucleotide or a hairpin oligonucleotide.

5. The method of claim 4, wherein said headpiece comprises a third functional group.

6. The method of claim 5, wherein said method further comprises (d) ligating said third functional group of said headpiece to a second oligonucleotide tag via chemical ligation, wherein said chemical ligation generates a phosphodiester, phosphonate, or phosphorothioate linkage.

7. The method of claim 5, wherein said method further comprises (d) ligating said third functional group of said headpiece to a second oligonucleotide tag, wherein said ligation is not via chemical ligation that generates a phosphodiester, phosphonate, or phosphorothioate linkage

8. The method of claim 2, wherein said headpiece comprises a phosphate at the 5′-terminus and/or the 3′-terminus.

9. The method of claim 2, wherein said chemical ligation comprises the ligation of a 5′- or 3′-phosphate on said headpiece to a 5′- or 3′-hydroxyl oligonucleotide.

10. The method of claim 9, wherein said chemical ligation comprises the ligation of a 5′-phosphate on said headpiece to a 3′-hydroxyl oligonucleotide and/or a 3′-phosphate on said headpiece to a 5′-hydroxyl oligonucleotide.

11. The method of claim 10, wherein said chemical ligation comprises the simultaneous ligation of a 5′-phosphate on said headpiece to a 3′-hydroxyl oligonucleotide and a 3′-phosphate on said headpiece to a 5′-hydroxyl oligonucleotide.

12. The method of claim 8, wherein said chemical ligation comprises the use of cyanoimidazole.

13. The method of claim 12, wherein said chemical ligation further comprises the use of a divalent metal source.

14. The method of claim 13, wherein said divalent metal source is a soluble Zn.sup.2+ source.

15. The method of claim 14, wherein said soluble Zn.sup.2+ source is ZnCl.sub.2.

16. The method of claim 1, wherein said headpiece is indirectly connected to said component by a bifunctional spacer.

17. The method of claim 1, wherein said headpiece is directly connected to said component.

18. A library comprising one or more chemical entities produced by the method of claim 1.

19. The library of claim 18, wherein said library comprises a plurality of headpieces.

20. The library of claim 18, wherein each chemical entity is different.

21. A method of screening a plurality of chemical entities, said method comprising: (a) contacting a target with an encoded chemical entity prepared by a method of claim 1; and (b) selecting one or more encoded chemical entities having a predetermined characteristic for said target, as compared to a control, thereby screening a plurality of said chemical entities.

22. The method of claim 21, where said predetermined characteristic comprises increased binding for said target, as compared to a control.

Description

BRIEF DESCRIPTION OF THE FIGURES

[0090] FIG. 1 is an image that illustrates a double-stranded hairpin structure utilized as a headpiece oligonucleotide that offers sites for both chemical ligation of encoding oligonucleotide tags and a protected primary amine for the synthesis of a covalently attached encoded small-molecule.

[0091] FIG. 2 is an image of a gel illustrating the progress of an exemplary ligation reaction.

[0092] FIG. 3 is an image of two LCMS traces illustrating the progress of an exemplary ligation reaction.

[0093] FIG. 4A is an image illustrating a deprotection reaction of a protected amine.

[0094] FIG. 4B is an image of a gel illustrating the progress of the deprotection reaction.

[0095] FIG. 4C is an image of a LCMS trace illustrating the progress of the deprotection reaction.

[0096] FIG. 5A is an image of a mass spectrum of the product of the reaction of HP006 with 1-cyanoimidazole.

[0097] FIG. 5B is an image illustrating the reaction of HP006 with 1-cyanoimidazole.

DETAILED DESCRIPTION

Encoded Chemical Entities

[0098] This invention features methods of producing encoded chemical entities including a chemical entity, one or more tags, and a headpiece operatively associated with the first chemical entity and one or more tags. The chemical entities, headpieces, tags, linkages, and bifunctional spacers are further described below.

Chemical Entities

[0099] The chemical entities or members (e.g., small molecules or peptides) of the invention can include one or more building blocks and optionally include one or more scaffolds.

[0100] The scaffold S can be a single atom or a molecular scaffold. Exemplary single atom scaffolds include a carbon atom, a boron atom, a nitrogen atom, or a phosphorus atom, etc. Exemplary polyatomic scaffolds include a cycloalkyl group, a cycloalkenyl group, a heterocycloalkyl group, a heterocycloalkenyl group, an aryl group, or a heteroaryl group. Particular embodiments of a heteroaryl scaffold include a triazine, such as 1,3,5-triazine, 1,2,3-triazine, or 1,2,4-triazine; a pyrimidine; a pyrazine; a pyridazine; a furan; a pyrrole; a pyrrolline; a pyrrolidine; an oxazole; a pyrazole; an isoxazole; a pyran; a pyridine; an indole; an indazole; or a purine.

[0101] The scaffold S can be operatively linked to the tag by any useful method. In one example, S is a triazine that is linked directly to the headpiece. To obtain this exemplary scaffold, trichlorotriazine (i.e., a chlorinated precursor of triazine having three chlorines) is reacted with a nucleophilic group of the headpiece. Using this method, S has three positions having chlorine that are available for substitution, where two positions are available diversity nodes and one position is attached to the headpiece. Next, building block A.sub.n is added to a diversity node of the scaffold, and tag A.sub.n encoding for building block A.sub.n (“tag A.sub.n”) is ligated to the headpiece, where these two steps can be performed in any order. Then, building block B.sub.n is added to the remaining diversity node, and tag B.sub.n encoding for building block B.sub.n is ligated to the end of tag A.sub.n. In another example, S is a triazine that is operatively linked to a tag, where trichlorotriazine is reacted with a nucleophilic group (e.g., an amino group) of a PEG, aliphatic, or aromatic linker of a tag. Building blocks and associated tags can be added, as described above.

[0102] In yet another example, S is a triazine that is operatively linked to building block A.sub.n. To obtain this scaffold, building block A.sub.n having two diversity nodes (e.g., an electrophilic group and a nucleophilic group, such as an Fmoc-amino acid) is reacted with the nucleophilic group of a linker (e.g., the terminal group of a PEG, aliphatic, or aromatic linker, which is attached to a headpiece). Then, trichlorotriazine is reacted with a nucleophilic group of building block A.sub.n. Using this method, all three chlorine positions of S are used as diversity nodes for building blocks. As described herein, additional building blocks and tags can be added, and additional scaffolds S.sub.n can be added.

[0103] Exemplary building block A.sub.n's include, e.g., amino acids (e.g., alpha-, beta-, gamma-, delta-, and epsilon-amino acids, as well as derivatives of natural and unnatural amino acids), chemical-reactive reactants (e.g., azide or alkyne chains) with an amine, or a thiol reactant, or combinations thereof. The choice of building block A.sub.n depends on, for example, the nature of the reactive group used in the linker, the nature of a scaffold moiety, and the solvent used for the chemical synthesis.

[0104] Exemplary building block B.sub.n's and C.sub.n's include any useful structural unit of a chemical entity, such as optionally substituted aromatic groups (e.g., optionally substituted phenyl or benzyl), optionally substituted heterocyclyl groups (e.g., optionally substituted quinolinyl, isoquinolinyl, indolyl, isoindolyl, azaindolyl, benzimidazolyl, azabenzimidazolyl, benzisoxazolyl, pyridinyl, piperidyl, or pyrrolidinyl), optionally substituted alkyl groups (e.g., optionally substituted linear or branched C.sub.1-6 alkyl groups or optionally substituted C.sub.1-6 aminoalkyl groups), or optionally substituted carbocyclyl groups (e.g., optionally substituted cyclopropyl, cyclohexyl, or cyclohexenyl). Particularly useful building block B.sub.n's and C.sub.n's include those with one or more reactive groups, such as an optionally substituted group (e.g., any described herein) having one or optional substituents that are reactive groups or can be chemically modified to form reactive groups. Exemplary reactive groups include one or more of amine (—NR.sub.2, where each R is, independently, H or an optionally substituted C.sub.1-6 alkyl), hydroxy, alkoxy (—OR, where R is an optionally substituted C.sub.1-6 alkyl, such as methoxy), carboxy (—COOH), amide, or chemical-reactive substituents. A restriction site may be introduced, for example, in tag B.sub.n or C.sub.n, where a complex can be identified by performing PCR and restriction digest with one of the corresponding restriction enzymes.

Site for Reversible Immobilization

[0105] In some embodiments, the encoded chemical entities optionally include a site for reversible immobilization. Reversible immobilization can be utilized to facilitate buffer-exchange and reagent/contaminant removal during the split-and-mix synthesis of encoded libraries. For example, after a chemical reaction to add a building block to the first chemical entity, the complex may be reversibly immobilized. The excess reagents and solvents may then be removed, the reagents and solvents for the ligation reaction added, and then the complex may be detached from the support. This method incorporates the benefits of solid supported synthesis, such as ease of purification and/or removal of solvents and reagents incompatible with subsequent steps, while allowing the steps that are used to construct the library and oligonucleotide tags to be performed in solution or alternatively while the nascent library is reversibly immobilized.

[0106] Exemplary reversible immobilization strategies include: oligonucleotide hybridization including substituted oligonucleotides (2′-modified, PNA, LNA etc.), including double and triple-stranded; Oligonucleotide-ion exchange interactions (e.g. with DEAE-Cellulose); small-molecule-small molecule interactions (e.g. adamantane-cyclodextrin); reversible chemistry (e.g. disulfide bond formation); reversible photochemistry (e.g. cyanovinyl uridine photo-cross-linking); reversible chemical cross-linking (e.g. with an exogenously added reactive entity); immobilized metal affinity chromatography (e.g., immobilized Ni-NTA with His.sub.6); antibody-epitope interaction (e.g. immobilized anti-FLAG antibody and FLAG peptide); protein-protein interaction; protein-small-molecule interaction (e.g. immobilized streptavidin with iminobiotin or immobilized maltose-binding protein and maltose); reversible oligonucleotide ligation (e.g. the ligation of restricted dsDNA followed by restriction); and hydrophobic interaction (e.g. a fluorous tag and a hydrophobic surface). In some embodiments, the site for reversible immobilization comprises one member of a binding pair of any of the reversible immobilization strategies described herein, e.g., a nucleic acid, peptide, or small molecule.

[0107] Headpiece

[0108] In an encoded chemical entity, the headpiece operatively links each chemical entity to its encoding oligonucleotide tag. Generally, the headpiece is a starting oligonucleotide having at least two functional groups that can be further derivatized, where the first functional group operatively links the first chemical entity (or a component thereof) to the headpiece and the second functional group operatively links one or more tags to the headpiece. A bifunctional spacer can optionally be used as a spacing moiety between the headpiece and a chemical entity.

[0109] The functional groups of the headpiece can be used to form a covalent bond with a component of a chemical entity and another covalent bond with a tag. The component can be any part of the small molecule, such as a scaffold having diversity nodes or a building block. Alternatively, the headpiece can be derivatized to provide a spacer (e.g., a spacing moiety separating the headpiece from the small molecule to be formed in the library) terminating in a functional group (e.g., a hydroxyl, amine, carboxyl, sulfhydryl, alkynyl, azido, or phosphate group), which is used to form the covalent linkage with a component of the chemical entity. The spacer can be attached to the 5′-terminus, at one of the internal positions, or to the 3′-terminus of the headpiece. When the spacer is attached to one of the internal positions, the spacer can be operatively linked to a derivatized base (e.g., the C5 position of uridine) or placed internally within the oligonucleotide using standard techniques known in the art. Exemplary spacers are described herein.

[0110] The headpiece can have any useful structure. The headpiece can be, e.g., 1 to 100 nucleotides in length, preferably 5 to 20 nucleotides in length, and most preferably 5 to 15 nucleotides in length. The headpiece can be single-stranded or double-stranded and can consist of natural or modified nucleotides, as described herein. For example, the chemical moiety can be operatively linked to the 3′-terminus or 5′-terminus of the headpiece. In particular embodiments, the headpiece includes a hairpin structure formed by complementary bases within the sequence. For example, the chemical moiety can be operatively linked to the internal position, the 3′-terminus, or the 5′-terminus of the headpiece.

[0111] Generally, the headpiece includes a non-self-complementary sequence on the 5′- or 3′-terminus that allows for binding an oligonucleotide tag by polymerization, enzymatic ligation, or chemical reaction. The headpiece can allow for ligation of oligonucleotide tags and optional purification and phosphorylation steps. After the addition of the last tag, an additional adapter sequence can be added to the 5′-terminus of the last tag. Exemplary adapter sequences include a primer-binding sequence or a sequence having a label (e.g., biotin). In cases where many building blocks and corresponding tags are used (e.g., 100), a mix-and-split strategy may be employed during the oligonucleotide synthesis step to create the necessary number of tags. Such mix-and-split strategies for DNA synthesis are known in the art. The resultant library members can be amplified by PCR following selection for binding entities versus a target(s) of interest.

[0112] The headpiece or the complex can optionally include one or more primer-binding sequences. For example, the headpiece has a sequence in the loop region of the hairpin that serves as a primer-binding region for amplification, where the primer-binding region has a higher melting temperature for its complementary primer (e.g., which can include flanking identifier regions) than for a sequence in the headpiece. In other embodiments, the complex includes two primer-binding sequences (e.g., to enable a PCR reaction) on either side of one or more tags that encode one or more building blocks. Alternatively, the headpiece may contain one primer-binding sequence on the 5′- or 3′-terminus. In other embodiments, the headpiece is a hairpin, and the loop region forms a primer-binding site or the primer-binding site is introduced through hybridization of an oligonucleotide to the headpiece on the 3′ side of the loop. A primer oligonucleotide, containing a region homologous to the 3′-terminus of the headpiece and carrying a primer-binding region on its 5′-terminus (e.g., to enable a PCR reaction) may be hybridized to the headpiece and may contain a tag that encodes a building block or the addition of a building block. The primer oligonucleotide may contain additional information, such as a region of randomized nucleotides, e.g., 2 to 16 nucleotides in length, which is included for bioinformatics analysis.

[0113] The headpiece can optionally include a hairpin structure, where this structure can be achieved by any useful method. For example, the headpiece can include complementary bases that form intermolecular base pairing partners, such as by Watson-Crick DNA base pairing (e.g., adenine-thymine and guanine-cytosine) and/or by wobble base pairing (e.g., guanine-uracil, inosine-uracil, inosine-adenine, and inosine-cytosine). In another example, the headpiece can include modified or substituted nucleotides that can form higher affinity duplex formations compared to unmodified nucleotides, such modified or substituted nucleotides being known in the art. In yet another example, the headpiece includes one or more cross-linked bases to form the hairpin structure. For example, bases within a single strand or bases in different double strands can be cross-linked, e.g., by using psoralen.

[0114] The headpiece or complex can optionally include one or more labels that allow for detection. For example, the headpiece, one or more oligonucleotide tags, and/or one or more primer sequences can include an isotope, a radioimaging agent, a marker, a tracer, a fluorescent label (e.g., rhodamine or fluorescein), a chemiluminescent label, a quantum dot, and a reporter molecule (e.g., biotin or a his-tag).

[0115] In other embodiments, the headpiece or tag may be modified to support solubility in semi-, reduced-, or non-aqueous (e.g., organic) conditions. Nucleotide bases of the headpiece or tag can be rendered more hydrophobic by modifying, for example, the C5 positions of T or C bases with aliphatic chains without significantly disrupting their ability to hydrogen bond to their complementary bases. Exemplary modified or substituted nucleotides are 5′-dimethoxytrityl-N4-diisobutylaminomethylidene-5-(1-propynyl)-2′-deoxycytidine,3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite; 5′-dimethoxytrityl-5-(1-propynyl)-2′-deoxyuridine,3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite; 5′-dimethoxytrityl-5-fluoro-2′-deoxyuridine,3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite; and 5′-dimethoxytrityl-5-(pyren-1-yl-ethynyl)-2′-deoxyuridine, or 3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite.

[0116] In addition, the headpiece oligonucleotide can be interspersed with modifications that promote solubility in organic solvents. For example, azobenzene phosphoramidite can introduce a hydrophobic moiety into the headpiece design. Such insertions of hydrophobic amidites into the headpiece can occur anywhere in the molecule. However, the insertion cannot interfere with subsequent tagging using additional DNA tags during the library synthesis or ensuing PCR once a selection is complete or microarray analysis, if used for tag deconvolution. Such additions to the headpiece design described herein would render the headpiece soluble in, for example, 15%, 25%, 30%, 50%, 75%, 90%, 95%, 98%, 99%, or 100% organic solvent. Thus, addition of hydrophobic residues into the headpiece design allows for improved solubility in semi- or non-aqueous (e.g., organic) conditions, while rendering the headpiece competent for oligonucleotide tagging. Furthermore, DNA tags that are subsequently introduced into the library can also be modified at the C5 position of T or C bases such that they also render the library more hydrophobic and soluble in organic solvents for subsequent steps of library synthesis.

[0117] In particular embodiments, the headpiece and the first tag can be the same entity, i.e., a plurality of headpiece-tag entities can be constructed that all share common parts (e.g., a primer-binding region) and all differ in another part (e.g., encoding region). These may be utilized in the “split” step and pooled after the event they are encoding has occurred.

[0118] In particular embodiments, the headpiece can encode information, e.g., by including a sequence that encodes the first split(s) step or a sequence that encodes the identity of the library, such as by using a particular sequence related to a specific library.

[0119] Oligonucleotide Tags

[0120] The oligonucleotide tags described herein (e.g., a tag or a portion of a headpiece or a portion of a tailpiece) can be used to encode any useful information, such as a molecule, a portion of a chemical entity, the addition of a component (e.g., a scaffold or a building block), a headpiece in the library, the identity of the library, the use of one or more library members (e.g., use of the members in an aliquot of a library), and/or the origin of a library member (e.g., by use of an origin sequence).

[0121] Any sequence in an oligonucleotide can be used to encode any information. Thus, one oligonucleotide sequence can serve more than one purpose, such as to encode two or more types of information or to provide a starting oligonucleotide that also encodes for one or more types of information. For example, the first tag can encode for the addition of a first building block, as well as for the identification of the library. In another example, a headpiece can be used to provide a starting oligonucleotide that operatively links a chemical entity to a tag, where the headpiece additionally includes a sequence that encodes for the identity of the library (i.e., the library-identifying sequence). Accordingly, any of the information described herein can be encoded in separate oligonucleotide tags or can be combined and encoded in the same oligonucleotide sequence (e.g., an oligonucleotide tag, such as a tag, or a headpiece).

[0122] A building block sequence encodes for the identity of a building block and/or the type of binding reaction conducted with a building block. This building block sequence is included in a tag, where the tag can optionally include one or more types of sequence described below (e.g., a library-identifying sequence, a use sequence, and/or an origin sequence).

[0123] A library-identifying sequence encodes for the identity of a particular library. In order to permit mixing of two or more libraries, a library member may contain one or more library-identifying sequences, such as in a library-identifying tag (i.e., an oligonucleotide including a library-identifying sequence), in a ligated tag, in a part of the headpiece sequence, or in a tailpiece sequence. These library-identifying sequences can be used to deduce encoding relationships, where the sequence of the tag is translated and correlated with chemical (synthesis) history information. Accordingly, these library-identifying sequences permit the mixing of two or more libraries together for selection, amplification, purification, sequencing, etc.

[0124] A use sequence encodes the history (i.e., use) of one or more library members in an individual aliquot of a library. For example, separate aliquots may be treated with different reaction conditions, building blocks, and/or selection steps. In particular, this sequence may be used to identify such aliquots and deduce their history (use) and thereby permit the mixing together of aliquots of the same library with different histories (uses) (e.g., distinct selection experiments) for the purposes of the mixing together of samples together for selection, amplification, purification, sequencing, etc. These use sequences can be included in a headpiece, a tailpiece, a tag, a use tag (i.e., an oligonucleotide including a use sequence), or any other tag described herein (e.g., a library-identifying tag or an origin tag).

[0125] An origin sequence is a degenerate (random, stochastically-generated) oligonucleotide sequence of any useful length (e.g., about six oligonucleotides) that encodes for the origin of the library member. This sequence serves to stochastically subdivide library members that are otherwise identical in all respects into entities distinguishable by sequence information, such that observations of amplification products derived from unique progenitor templates (e.g., selected library members) can be distinguished from observations of multiple amplification products derived from the same progenitor template (e.g., a selected library member). For example, after library formation and prior to the selection step, each library member can include a different origin sequence, such as in an origin tag. After selection, selected library members can be amplified to produce amplification products, and the portion of the library member expected to include the origin sequence (e.g., in the origin tag) can be observed and compared with the origin sequence in each of the other library members. As the origin sequences are degenerate, each amplification product of each library member should have a different origin sequence. However, an observation of the same origin sequence in the amplification product could indicate multiple amplicons derived from the same template molecule. When it is desired to determine the statistics and demographics of the population of encoding tags prior to amplification, as opposed to post-amplification, the origin tag may be used. These origin sequences can be included in a headpiece, a tailpiece, a tag, an origin tag (i.e., an oligonucleotide including an origin sequence), or any other tag described herein (e.g., a library-identifying tag or a use tag).

[0126] Any of the types of sequences described herein can be included in the headpiece. For example, the headpiece can include one or more of a building block sequence, a library-identifying sequence, a use sequence, or an origin sequence.

[0127] Any of these sequences described herein can be included in a tailpiece. For example, the tailpiece can include one or more of a library-identifying sequence, a use sequence, or an origin sequence.

[0128] Any of tags described herein can include a connector at or in proximity to the 5′- or 3′-terminus having a fixed sequence. Connectors facilitate the formation of linkages (e.g., chemical linkages) by providing a reactive group (e.g., a chemical-reactive group or a photo-reactive group) or by providing a site for an agent that allows for a linkage (e.g., an agent of an intercalating moiety or a reversible reactive group in the connector(s) or cross-linking oligonucleotide). Each 5′-connector may be the same or different, and each 3′-connector may be the same or different. In an exemplary, non-limiting complex having more than one tags, each tag can include a 5′-connector and a 3′-connector, where each 5′-connector has the same sequence and each 3′-connector has the same sequence (e.g., where the sequence of the 5′-connector can be the same or different from the sequence of the 3′-connector). The connector provides a sequence that can be used for one or more linkages. To allow for binding of a relay primer or for hybridizing a cross-linking oligonucleotide, the connector can include one or more functional groups allowing for a linkage (e.g., a linkage for which a polymerase has reduced ability to read or translocate through, such as a chemical linkage).

[0129] These sequences can include any modification described herein for oligonucleotides, such as one or more modifications that promote solubility in organic solvents (e.g., any described herein, such as for the headpiece), that provide an analog of the natural phosphodiester linkage (e.g., a phosphorothioate analog), or that provide one or more non-natural oligonucleotides (e.g., 2′-substituted nucleotides, such as 2′-O-methylated nucleotides and 2′-fluoro nucleotides, or any described herein).

[0130] These sequences can include any characteristics described herein for oligonucleotides. For example, these sequences can be included in tag that is less than 20 nucleotides (e.g., as described herein). In other examples, the tags including one or more of these sequences have about the same mass (e.g., each tag has a mass that is about +/−10% from the average mass between within a specific set of tags that encode a specific variable); lack a primer-binding (e.g., constant) region; lack a constant region; or have a constant region of reduced length (e.g., a length less than 30 nucleotides, less than 25 nucleotides, less than 20 nucleotides, less than 19 nucleotides, less than 18 nucleotides, less than 17 nucleotides, less than 16 nucleotides, less than 15 nucleotides, less than 14 nucleotides, less than 13 nucleotides, less than 12 nucleotides, less than 11 nucleotides, less than 10 nucleotides, less than 9 nucleotides, less than 8 nucleotides, or less than 7 nucleotides).

[0131] Sequencing strategies for libraries and oligonucleotides of this length may optionally include concatenation or catenation strategies to increase read fidelity or sequencing depth, respectively. In particular, the selection of encoded libraries that lack primer-binding regions has been described in the literature for SELEX, such as described in Jarosch et al., Nucleic Acids Res. 34: e86 (2006), which is incorporated herein by reference. For example, a library member can be modified (e.g., after a selection step) to include a first adapter sequence on the 5′-terminus of the complex and a second adapter sequence on the 3′-terminus of the complex, where the first sequence is substantially complementary to the second sequence and result in forming a duplex. To further improve yield, two fixed dangling nucleotides (e.g., CC) are added to the 5′-terminus.

[0132] Linkages

[0133] The linkages of the invention are present between oligonucleotides that encode information (e.g., such as between the headpiece and a tag, between two tags, or between a tag and a tailpiece). Exemplary linkages include phosphodiesters, phosphonates, and phosphorothioates. In some embodiments, a polymerase has reduced ability to read or translocate through one or more linkages. In certain embodiments, chemical linkages include one or more of a chemical-reactive group such as a monophosphate and/or a hydroxyl group, a photo-reactive group, an intercalating moiety, a cross-linking oligonucleotide, or a reversible co-reactive group.

[0134] A linkage may be tested to determine whether a polymerase has reduced ability to read or translocate through that linkage. This ability can be tested by any useful method, such as liquid chromatography-mass spectrometry, RT-PCR analysis, sequence demographics, and/or PCR analysis.

[0135] In some embodiments, chemical ligation includes the use of one or more chemical-reactive pairs to provide a linkage such as a monophosphate and a hydroxyl. As described herein, readable linkages may be synthesized by chemical ligation, for example, by reaction of a monophosphate, a monophosphotioate, or monophosphanate on a 5′- or 3′-terminus with a hydroxyl group on a 5′- or 3′-terminus in the presence of cyanoimidazole and a divalent metal source (e.g., ZnCl.sub.2).

[0136] Other exemplary chemical-reactive pairs are a pair including an optionally substituted alkynyl group and an optionally substituted azido group to form a triazole via a Huisgen 1,3-dipolar cycloaddition reaction; an optionally substituted diene having a 4π-electron system (e.g., an optionally substituted 1,3-unsaturated compound, such as optionally substituted 1,3-butadiene, 1-methoxy-3-trimethylsilyloxy-1,3-butadiene, cyclopentadiene, cyclohexadiene, or furan) and an optionally substituted dienophile or an optionally substituted heterodienophile having a 2π-electron system (e.g., an optionally substituted alkenyl group or an optionally substituted alkynyl group) to form a cycloalkenyl via a Diels-Alder reaction; a nucleophile (e.g., an optionally substituted amine or an optionally substituted thiol) with a strained heterocyclyl electrophile (e.g., optionally substituted epoxide, aziridine, aziridinium ion, or episulfonium ion) to form a heteroalkyl via a ring opening reaction; a phosphorothioate group with an iodo group, such as in a splinted ligation of an oligonucleotide containing 5′-iodo dT with a 3′-phosphorothioate oligonucleotide; an optionally substituted amino group with an aldehyde group or a ketone group, such as a reaction of a 3′-aldehyde-modified oligonucleotide, which can optionally be obtained by oxidizing a commercially available 3′-glyceryl-modified oligonucleotide, with 5′-amino oligonucleotide (i.e., in a reductive amination reaction) or a 5′-hydrazido oligonucleotide; a pair of an optionally substituted amino group and a carboxylic acid group or a thiol group (e.g., with or without the use of succinimidyl trans-4-(maleimidylmethyl)cyclohexane-1-carboxylate (SMCC) or 1-ethyl-3-(3-dimethylaminopropyl)carbodiimide (EDAC); a pair of an optionally substituted hydrazine and an aldehyde or a ketone group; a pair of an optionally substituted hydroxylamine and an aldehyde or a ketone group; or a pair of a nucleophile and an optionally substituted alkyl halide.

[0137] Platinum complexes, alkylating agents, or furan-modified nucleotides can also be used as a chemical-reactive group to form inter- or intra-strand linkages. Such agents can be used between two oligonucleotides and can optionally be present in the cross-linking oligonucleotide.

[0138] Exemplary, non-limiting platinum complexes include cisplatin (cis-diamminedichloroplatinum (II), e.g., to form GG intra-strand linkages), transplatin (trans-diaminedichloroplatinum (II), e.g., to form GXG inter-strand linkages, where X can be any nucleotide), carboplatin, picolatin (ZD0473), ormaplatin, or oxaliplatin to form, e.g., GC, CG, AG, or GG linkages. Any of these linkages can be inter- or intra-strand linkages.

[0139] Exemplary, non-limiting alkylating agents include nitrogen mustard (mechlorethamine, e.g., to form GG linkages), chlorambucil, melphalan, cyclophosphamide, prodrug forms of cyclophosphamide (e.g., 4-hydroperoxycyclophosphamide and ifosfamide)), 1,3-bis(2-chloroethyl)-1-nitrosourea (BCNU, carmustine), an aziridine (e.g., mitomycin C, triethylenemelamine, or triethylenethiophosphoramide (thio-tepa) to form GG or AG linkages), hexamethylmelamine, an alkyl sulfonate (e.g., busulphan to form GG linkages), or a nitrosourea (e.g., 2-chloroethylnitrosourea to form GG or CG linkages, such as carmustine (BCNU), chlorozotocin, lomustine (CCNU), and semustine (methyl-CCNU)). Any of these linkages can be inter- or intra-strand linkages.

[0140] Furan-modified nucleotides can also be used to form linkages. Upon in situ oxidation (e.g., with N-bromosuccinimide (NBS)), the furan moiety forms a reactive oxo-enal derivative that reacts with a complementary base to form an inter-strand linkage. In some embodiments, the furan-modified nucleotides forms linkages with a complementary A or C nucleotide. Exemplary, non-limiting furan-modified nucleotides include any 2′-(furan-2-yl)propanoylamino-modified nucleotide; or an acyclic, modified nucleotides of 2-(furan-2-yl)ethyl glycol nucleic acid.

[0141] Photo-reactive groups can also be used as a reactive group. Exemplary, non-limiting photo-reactive groups include an intercalating moiety, a psoralen derivative (e.g., psoralen, HMT-psoralen, or 8-methoxypsoralen), an optionally substituted cyanovinylcarbazole group, an optionally substituted vinylcarbazole group, an optionally substituted cyanovinyl group, an optionally substituted acrylamide group, an optionally substituted diazirine group, an optionally substituted benzophenone (e.g., succinimidyl ester of 4-benzoylbenzoic acid or benzophenone isothiocyanate), an optionally substituted 5-(carboxy)vinyl-uridine group (e.g., 5-(carboxy)vinyl-2′-deoxyuridine), or an optionally substituted azide group (e.g., an aryl azide or a halogenated aryl azide, such as succinimidyl ester of 4-azido-2,3,5,6-tetrafluorobenzoic acid (ATFB)).

[0142] Intercalating moieties can also be used as a reactive group. Exemplary, non-limiting intercalating moieties include a psoralen derivative, an alkaloid derivative (e.g., berberine, palmatine, coralyne, sanguinarine (e.g., iminium or alkanolamine forms thereof), or aristololactam-β-D-glucoside), an ethidium cation (e.g., ethidium bromide), an acridine derivative (e.g., proflavine, acriflavine, or amsacrine), an anthracycline derivative (e.g., doxorubicin, epirubicin, daunorubicin (daunomycin), idarubicin, and aclarubicin), or thalidomide.

[0143] For a cross-linking oligonucleotide, any useful reactive group (e.g., described herein) can be used to form inter- or intra-strand linkages. Exemplary reactive groups include chemical-reactive group, a photo-reactive group, an intercalating moiety, and a reversible co-reactive group. Cross-linking agents for use with cross-linking oligonucleotides include, without limitation, alkylating agents (e.g., as described herein), cisplatin (cis-diamminedichloroplatinum(II)), trans-diaminedichloroplatinum(II), psoralen, HMT-psoralen, 8-methoxypsoralen, furan-modified nucleotides, 2-fluoro-deoxyinosine (2-F-dI), 5-bromo-deoxycytosine (5-Br-dC), 5-bromo deoxyuridine (5-Br-dU), 5-iodo-deoxycytosine (5-I-dC), 5-iodo-deoxyuridine (5-I-dU), succinimidyl trans-4-(maleimidylmethyl)cyclohexane-1-carboxylate, SMCC, EDAC, or succinimidyl acetylthioacetate (SATA).

[0144] Oligonucleotides can also be modified to contain thiol moieties that can be reacted with a variety of thiol reactive groups such as maleimides, halogens, and iodoacetamides and thus can be used for cross-linking two oligonucleotides. The thiol groups can be linked to the 5′- or the 3′-terminus of an oligonucleotide.

[0145] For inter-strand cross-linking between duplex oligonucleotides at a pyrimidine (e.g., thymidine) position, the intercalating, photo-reactive moiety psoralen can be chosen. Psoralen intercalates into the duplex and forms covalent inter-strand cross-links with pyrimidines, preferentially at 5′-TpA sites, upon irradiation with ultraviolet light (about 254 nm). The psoralen moiety can be covalently attached to a modified oligonucleotide (e.g., by an alkane chain, such as a C.sub.1-10 alkyl, or a polyethylene glycol group, such as —(CH.sub.2CH.sub.2O).sub.nCH.sub.2CH.sub.2—, where n is an integer from 1 to 50). Exemplary psoralen derivatives can also be used, where non-limiting derivatives include 4′-(hydroxyethoxymethy)-4,5′,8-trimethylpsoralen (HMT-psoralen) and 8-methoxypsoralen.

[0146] Various portions of the cross-linking oligonucleotide can be modified to introduce a linkage. For example, terminal phosphorothioates in oligonucleotides can also be used for linking two adjacent oligonucleotides. Halogenated uracils/cytosines can also be used as cross-linker modifications in the oligonucleotide. For example, 2-fluoro-deoxyinosine (2-F-dl) modified oligonucleotides can be reacted with disulfide-containing diamines or thiopropylamines to form disulfide linkages.

[0147] As described below, reversible co-reactive groups include those selected from a cyanovinylcarbazole group, a cyanovinyl group, an acrylamide group, a thiol group, or a sulfonylethyl thioethers. An optionally substituted cyanovinylcarbazole (CNV) group can also be used in oligonucleotides to cross-link to a pyrimidine base (e.g., cytosine, thymine, and uracil, as well as modified bases thereof) in complementary strands. CNV groups promote [2+2] cycloaddition with the adjacent pyrimidine base upon irradiation at 366 nm, which results in an inter-strand cross-link. Irradiation at 312 nm reverses the cross-link and thus provides a method for reversible cross-linking of oligonucleotide strands. A non-limiting CNV group is 3-cyanovinylcarbozaole, which can be included as a carboxyvinylcarbazole nucleotide (e.g., as 3-carboxyvinylcarbazole-1′-β-deoxyriboside-5′-triphosphate).

[0148] The CNV group can be modified to replace the reactive cyano group with another reactive group to provide an optionally substituted vinylcarbazole group. Exemplary non-limiting reactive groups for a vinylcarbazole group include an amide group of —CONR.sub.N1R.sub.N2, where each R.sub.N1 and R.sub.N2 can be the same or different and is independently H and C.sub.1-6 alkyl, e.g., —CONH.sub.2; a carboxyl group of —CO.sub.2H; or a C.sub.2-7 alkoxycarbonyl group (e.g., methoxycarbonyl). Furthermore, the reactive group can be located on the alpha or beta carbon of the vinyl group. Exemplary vinylcarbazole groups include a cyanovinylcarbazole group, as described herein; an amidovinylcarbazole group (e.g., an amidovinylcarbazole nucleotide, such as 3-amidovinylcarbazole-1′-β-deoxyriboside-5′-triphosphate); a carboxyvinylcarbazole group (e.g., a carboxyvinylcarbazole nucleotide, such as 3-carboxyvinylcarbazole-1′-β-deoxyriboside-5′-triphosphate); and a C.sub.2-7 alkoxycarbonylvinylcarbazole group (e.g., an alkoxycarbonylvinylcarbazole nucleotide, such as 3-methoxycarbonylvinylcarbazole-1′-β-deoxyriboside-5′-triphosphate). Additional optionally substituted vinylcarbazole groups and nucleotides having such groups are provided in the chemical formulas of U.S. Pat. No. 7,972,792 and Yoshimura and Fujimoto, Org. Lett. 10:3227-3230 (2008), which are both hereby incorporated by reference in their entirety.

[0149] Other reversible reactive groups include a thiol group and another thiol group to form a disulfide, as well as a thiol group and a vinyl sulfone group to form a sulfonylethyl thioethers. Thiol-thiol groups can optionally include a linkage formed by a reaction with bis-((N-iodoacetyl)piperazinyl)sulfonerhodamine. Other reversible reactive groups (e.g., such as some photo-reactive groups) include optionally substituted benzophenone groups. A non-limiting example is benzophenone uracil (BPU), which can be used for site- and sequence-selective formation of an interstrand cross-link of BPU-containing oligonucleotide duplexes. This cross-link can be reversed upon heating, providing a method for the reversible cross-linking of two oligonucleotide strands.

[0150] In other embodiments, chemical ligation includes introducing an analog of the phosphodiester bond, e.g., for post-selection PCR analysis and sequencing. Exemplary analogs of a phosphodiester include a phosphorothioate linkage (e.g., as introduced by use of a phosphorothioate group and a leaving group, such as an iodo group), a phosphoramide linkage, or a phosphorodithioate linkage (e.g., as introduced by use of a phosphorodithioate group and a leaving group, such as an iodo group). For any of the groups described herein (e.g., a chemical-reactive group, a photo-reactive group, an intercalating moiety, a cross-linking oligonucleotide, or a reversible co-reactive group), the group can be incorporated at or in proximity to the terminus of an oligonucleotide or between the 5′- and 3′-termini. Furthermore, one or more groups can be present in each oligonucleotide. When pairs of reactive groups are required, then oligonucleotides can be designed to facilitate a reaction between the pair of groups. In the non-limiting example of a cyanovinylcarbazole group that co-reacts with a pyrimidine base, the first oligonucleotide can be designed to include the cyanovinylcarbazole group at or in proximity to the 5′-terminus. In this example, a second oligonucleotide can be designed to be complementary to the first oligonucleotide and to include the co-reactive pyrimidine base at a position that aligns with the cyanovinylcarbazole group when the first and second oligonucleotide hybridizes. Any of the groups herein and any of the oligonucleotides having one or more groups can be designed to facilitate reaction between the groups to form one or more linkages.

[0151] Bifunctional Spacers

[0152] The bifunctional spacer between the headpiece and a chemical entity can be varied to provide an appropriate spacing moiety and/or to increase the solubility of the headpiece in organic solvent. A wide variety of spacers are commercially available that can couple the headpiece with the small molecule library. The spacer typically consists of linear or branched chains and may include a C.sub.1-10 alkyl, a heteroalkyl of 1 to 10 atoms, a C.sub.2-10 alkenyl, a C.sub.2-10 alkynyl, C.sub.5-10 aryl, a cyclic or polycyclic system of 3 to 20 atoms, a phosphodiester, a peptide, an oligosaccharide, an oligonucleotide, an oligomer, a polymer, or a poly alkyl glycol (e.g., a poly ethylene glycol, such as —(CH.sub.2CH.sub.2O).sub.nCH.sub.2CH.sub.2—, where n is an integer from 1 to 50), or combinations thereof.

[0153] The bifunctional spacer may provide an appropriate spacing moiety between the headpiece and a chemical entity of the library. In certain embodiments, the bifunctional spacer includes three parts. Part 1 may be a reactive group, which forms a covalent bond with DNA, such as, e.g., a carboxylic acid, preferably activated by a N-hydroxy succinimide (NHS) ester to react with an amino group on the DNA (e.g., amino-modified dT), an amidite to modify the 5′ or 3′-terminus of a single-stranded headpiece (achieved by means of standard oligonucleotide chemistry), chemical-reactive pairs (e.g., azido-alkyne cycloaddition in the presence of Cu(I) catalyst, or any described herein), or thiol reactive groups. Part 2 may also be a reactive group, which forms a covalent bond with the chemical entity, either building block A.sub.n or a scaffold. Such a reactive group could be, e.g., an amine, a thiol, an azide, or an alkyne. Part 3 may be a chemically inert spacing moiety of variable length, introduced between Part 1 and 2. Such a spacing moiety can be a chain of ethylene glycol units (e.g., PEGs of different lengths), an alkane, an alkene, a polyene chain, or a peptide chain. The spacer can contain branches or inserts with hydrophobic moieties (such as, e.g., benzene rings) to improve solubility of the headpiece in organic solvents, as well as fluorescent moieties (e.g. fluorescein or Cy-3) used for library detection purposes. Hydrophobic residues in the headpiece design may be varied with the spacer design to facilitate library synthesis in organic solvents. For example, the headpiece and spacer combination is designed to have appropriate residues wherein the octanol:water coefficient (P.sub.oct) is from, e.g., 1.0 to 2.5.

[0154] Spacers can be empirically selected for a given small molecule library design, such that the library can be synthesized in organic solvent, for example, in 15%, 25%, 30%, 50%, 75%, 90%, 95%, 98%, 99%, or 100% organic solvent. The spacer can be varied using model reactions prior to library synthesis to select the appropriate chain length that solubilizes the headpiece in an organic solvent. Exemplary spacers include those having increased alkyl chain length, increased poly ethylene glycol units, branched species with positive charges (to neutralize the negative phosphate charges on the headpiece), or increased amounts of hydrophobicity (for example, addition of benzene ring structures).

[0155] Examples of commercially available spacers include amino-carboxylic spacers, such as those being peptides (e.g., Z-Gly-Gly-Gly-Osu (N-alpha-benzyloxycarbonyl-(Glycine).sub.3-N-succinimidyl ester) or Z-Gly-Gly-Gly-Gly-Gly-Gly-Osu (N-alpha-benzyloxycarbonyl-(Glycine).sub.6-N-succinimidyl ester, SEQ ID NO: 1)), PEG (e.g., Fmoc-aminoPEG2000-NHS or amino-PEG (12-24)-NHS), or alkane acid chains (e.g., Boc-ε-aminocaproic acid-Osu); chemical-reactive pair spacers, such as those chemical-reactive pairs described herein in combination with a peptide moiety (e.g., azidohomoalanine-Gly-Gly-Gly-OSu (SEQ ID NO: 2) or propargylglycine-Gly-Gly-Gly-OSu (SEQ ID NO: 3)), PEG (e.g., azido-PEG-NHS), or an alkane acid chain moiety (e.g., 5-azidopentanoic acid, (S)-2-(azidomethyl)-1-Boc-pyrrolidine, 4-azidoaniline, or 4-azido-butan-1-oic acid N-hydroxysuccinimide ester); thiol-reactive spacers, such as those being PEG (e.g., SM(PEG)n NHS-PEG-maleimide), alkane chains (e.g., 3-(pyridin-2-yldisulfanyl)-propionic acid-Osu or sulfosuccinimidyl 6-(3′-[2-pyridyldithio]-propionamido)hexanoate)); and amidites for oligonucleotide synthesis, such as amino modifiers (e.g., 6-(trifluoroacetylamino)-hexyl-(2-cyanoethyl)-(N,N-diisopropyl)-phosphoramidite), thiol modifiers (e.g., S-trityl-6-mercaptohexyl-1-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite, or chemical-reactive pair modifiers (e.g., 6-hexyn-1-yl-(2-cyanoethyl)-(N,N-diisopropyl)-phosphoramidite, 3-dimethoxytrityloxy-2-(3-(3-propargyloxypropanamido)propanamido)propyl-1-O-succinoyl, long chain alkylamino CPG, or 4-azido-butan-1-oic acid N-hydroxysuccinimide ester)). Additional spacers are known in the art, and those that can be used during library synthesis include, but are not limited to, 5′-O-dimethoxytrityl-1′,2′-dideoxyribose-3′-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite; 9-O-dimethoxytrityl-triethylene glycol,1-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite; 3-(4,4′-dimethoxytrityloxy)propyl-1-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite; and 18-O-dimethoxytrityl hexaethyleneglycol,1-[(2-cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite. Any of the spacers herein can be added in tandem to one another in different combinations to generate spacers of different desired lengths.

[0156] Spacers may also be branched, where branched spacers are well known in the art and examples can consist of symmetric or asymmetric doublers or a symmetric trebler. See, for example, Newcome et al., Dendritic Molecules: Concepts, Synthesis, Perspectives, VCH Publishers (1996); Boussif et al., Proc. Natl. Acad. Sci. USA 92:7297-7301 (1995); and Jansen et al., Science 266:1226 (1994).

[0157] Enzymatic Ligation and Chemical Ligation Techniques

[0158] Various ligation techniques can be used to add tags, to the headpiece to produce a complex. Accordingly, any of the binding steps described herein can include any useful ligation techniques, such as enzymatic ligation and/or chemical ligation. These binding steps can include the addition of one or more tags to the headpiece or complex. In particular embodiments, the ligation techniques used for any oligonucleotide provide a resultant product that can be transcribed and/or reverse transcribed to allow for decoding of the library or for template-dependent polymerization with one or more DNA or RNA polymerases.

[0159] Generally, enzymatic ligation produces an oligonucleotide having a native phosphodiester bond that can be transcribed and/or reverse transcribed. Exemplary methods of enzyme ligation are provided herein and include the use of one or more RNA or DNA ligases, such as T4 RNA ligase 1 or 2, T4 DNA ligase, CircLigase™ ssDNA ligase, CircLigase™ II ssDNA ligase, and ThermoPhage™ ssDNA ligase (Prokazyme Ltd., Reykjavik, Iceland).

[0160] Chemical ligation can also be used to produce oligonucleotides capable of being transcribed or reverse transcribed or otherwise used as a template for a template-dependent polymerase. The efficacy of a chemical ligation technique to provide oligonucleotides capable of being transcribed or reverse transcribed may need to be tested. This efficacy can be tested by any useful method, such as liquid chromatography-mass spectrometry, RT-PCR analysis, PCR analysis, electrophoresis, and/or sequencing. In particular embodiments, chemical ligation includes the use of one or more chemical-reactive pairs to provide a spacing moiety that can be transcribed or reverse transcribed. An example of the methods of the present invention is outlined in FIG. 1 in which a double-stranded hairpin structure is utilized as a bifunctional headpiece oligonucleotide that offers sites for both chemical ligation of encoding oligonucleotide tags and a protected primary amine for the synthesis of a covalently attached encoded small-molecule. The headpiece bears both 3′- and 5′-phosphate groups, each of which may be ligated to a corresponding complementary unphosphorylated oligonucleotide using cyanoimidazole and a divalent metal ion such as Zn.sup.2+. The same construct may only be hemi-ligated using enzymatic ligation with T4 DNA ligase since this enzyme only supports the ligation of 5′-phosphate to 3′-hydroxyl oligonucleotides, not of 3′-phosphate to 5′-hydroxyl oligonucleotides, as indicated in FIG. 1. It was observed that unprotected primary amines reacted with the cyanoimidazole to give a guanidine adduct, however, Fmoc protection of the amine can prevent this from occurring, and the protected amine does not deprotect under the chemical ligation reaction conditions. Fmoc is readily removed with piperidine.

[0161] Reaction Conditions to Promote Enzymatic Ligation or Chemical Ligation

[0162] The methods described herein can include one or more reaction conditions that promote enzymatic or chemical ligation between the headpiece and a tag or between two tags. These reaction conditions include using modified nucleotides within the tag, as described herein; using donor tags and acceptor tags having different lengths and varying the concentration of the tags; using different types of ligases, as well as combinations thereof (e.g., CircLigase™ DNA ligase and/or T4 RNA ligase), and varying their concentration; using poly ethylene glycols (PEGs) having different molecular weights and varying their concentration; use of non-PEG crowding agents (e.g., betaine or bovine serum albumin); varying the temperature and duration for ligation; varying the concentration of various agents, including ATP, Co(NH.sub.3).sub.6Cl.sub.3, and yeast inorganic pyrophosphate; using enzymatically or chemically phosphorylated oligonucleotide tags; using 3′-protected tags; and using preadenylated tags. These reaction conditions also include chemical ligations.

[0163] The headpiece and/or tags can include one or more modified or substituted nucleotides. In preferred embodiments, the headpiece and/or tags include one or more modified or substituted nucleotides that promote enzymatic ligation, such as 2′-O-methyl nucleotides (e.g., 2′-O-methyl guanine or 2′-O-methyl uracil), 2′-fluoro nucleotides, or any other modified nucleotides that are utilized as a substrate for ligation. Alternatively, the headpiece and/or tags are modified to include one or more chemically reactive groups to support chemical ligation (e.g. an optionally substituted alkynyl group and an optionally substituted azido group). Optionally, the tag oligonucleotides are functionalized at both termini with chemically reactive groups, and, optionally, one of these termini is protected, such that the groups may be addressed independently and side-reactions may be reduced (e.g., reduced polymerization side-reactions).

[0164] As described herein, chemical ligation which results in phosphodiester, phophonate, or phosphorothioate linkages may be performed by reaction of a 5′- or 3′-phosphate, phosphonate, or phosphorothioate with a 5′- or 3′-hydroxyl group in the presence of cyanoimidazole and a divalent metal ion such as Zn.sup.2+.

[0165] Enzymatic ligation can include one or more ligases. Exemplary ligases include CircLigase™ ssDNA ligase (EPICENTRE Biotechnologies, Madison, Wis.), CircLigase™ II ssDNA ligase (also from EPICENTRE Biotechnologies), ThermoPhage™ ssDNA ligase (Prokazyme Ltd., Reykjavik, Iceland), T4 RNA ligase, and T4 DNA ligase. In preferred embodiments, ligation includes the use of an RNA ligase or a combination of an RNA ligase and a DNA ligase. Ligation can further include one or more soluble multivalent cations, such as Co(NH.sub.3).sub.6Cl.sub.3, in combination with one or more ligases.

[0166] Before or after the ligation step, a complex or encoded chemical entity can be purified. In some embodiments, the complex or encoded chemical entity can be purified to remove unreacted headpiece or tags that may result in cross-reactions and introduce “noise” into the encoding process. In some embodiments, the complex or encoded chemical entity can be purified to remove any reagents or unreacted starting material that can inhibit or lower the ligation activity of a ligase. For example, orthophosphate may result in lowered ligation activity. In certain embodiments, entities that are introduced into a chemical or ligation step may need to be removed to enable the subsequent chemical or ligation step. Methods of purifying the complex or encoded chemical entity are described herein. Purification of the complex may be carried out by reversible immobilization of the complex followed by purification and release prior to a subsequent step.

[0167] Enzymatic and chemical ligation can include poly ethylene glycol having an average molecular weight of more than 300 Daltons (e.g., more than 600 Daltons, 3,000 Daltons, 4,000 Daltons, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 20,000, 25,000, 30,000, 35,000, 40,000, or 45,000 Daltons). In particular embodiments, the poly ethylene glycol has an average molecular weight from about 3,000 Daltons to 9,000 Daltons (e.g., from 3,000 Daltons to 8,000 Daltons, from 3,000 Daltons to 7,000 Daltons, from 3,000 Daltons to 6,000 Daltons, and from 3,000 Daltons to 5,000 Daltons). In preferred embodiments, the poly ethylene glycol has an average molecular weight from about 3,000 Daltons to about 6,000 Daltons (e.g., from 3,300 Daltons to 4,500 Daltons, from 3,300 Daltons to 5,000 Daltons, from 3,300 Daltons to 5,500 Daltons, from 3,300 Daltons to 6,000 Daltons, from 3,500 Daltons to 4,500 Daltons, from 3,500 Daltons to 5,000 Daltons, from 3,500 Daltons to 5,500 Daltons, and from 3,500 Daltons to 6,000 Daltons, such as 4,600 Daltons). Poly ethylene glycol can be present in any useful amount, such as from about 25% (w/v) to about 35% (w/v), such as 30% (w/v).

Methods for Determining the Nucleotide Sequence of a Complex

[0168] This invention features a method for determining the nucleotide sequence of a complex, such that encoding relationships may be established between the sequence of the assembled tag sequence and the structural units (or building blocks) of the chemical entity. In particular, the identity and/or history of a chemical entity can be inferred from the sequence of bases in the oligonucleotide. Using this method, a library including diverse chemical entities or members (e.g., small molecules or peptides) can be addressed with a particular tag sequence.

[0169] Any of the linkages described herein can be reversible or irreversible. Reversible linkages include photo-reactive linkages (e.g., a cyanovinylcarbozole group and thymidine) and redox linkages. Additional linkages are described herein.

[0170] In an alternative embodiment, an “unreadable” linkage can be enzymatically repaired in order to generate a readable or at least translocatable linkage. Enzymatic repair processes are well known to those skilled in the art and include, but are not limited to, pyrimidine (e.g., thymidine) dimer repair mechanisms (e.g., using a photolyase or a glycosylase (e.g., T4 pyrimidine dimer glycosylase (PDG))), base excision repair mechanisms (e.g., using a glycosylase, an apurinic/apyrimidinic (AP) endonuclease, a Flap endonuclease, or a poly ADP ribose polymerase (e.g., human apurinic/apyrimidinic (AP) endonuclease, APE 1; endonuclease III (Nth) protein; endonuclease IV; endonuclease V; formamidopyrimidine [fapy]-DNA glycosylase (Fpg); human 8-oxoguanine glycosylase 1 (α isoform) (hOGG1); human endonuclease VIII-like 1 (hNEIL1); uracil-DNA glycosylase (UDG); human single-strand selective monofunctional uracil DNA glycosylase (SMUG1); and human alkyladenine DNA glycosylase (hAAG)), which can be optionally combined with one or more endonucleases, DNA or RNA polymerases, and/or a ligases for the repair), methylation repair mechanisms (e.g., using a methyl guanine methyltransferase), AP repair mechanisms (e.g., using an apurinic/apyrimidinic (AP) endonuclease (e.g., APE 1; endonuclease III; endonuclease IV; endonuclease V; Fpg; hOGG1; and hNEIL1), which can be optionally combined with one or more endonucleases, DNA or RNA polymerases, and/or a ligases for the repair), nucleotide excision repair mechanisms (e.g., using excision repair cross-complementing proteins or excision nucleases, which can be optionally combined with one or more endonucleases, DNA or RNA polymerases, and/or a ligases for the repair), and mismatch repair mechanisms (e.g., using an endonuclease (e.g., T7 endonuclease I; MutS, MutH, and/or MutL), which can be optionally combined with one or more exonucleases, endonucleases, helicases, DNA or RNA polymerases, and/or ligases for the repair). Commercial enzyme mixtures are available to readily provide these kinds of repair mechanisms, e.g., PreCR® Repair Mix (New England Biolabs Inc., Ipswich MA), which includes Taq DNA Ligase, Endonuclease IV, Bst DNA Polymerase, Fpg, Uracil-DNA Glycosylase (UDG), T4 PDG (T4 Endonuclease V), and Endonuclease VIII.

Methods for Tagging Encoded Libraries

[0171] This invention features a method for operatively associating oligonucleotide tags with chemical entities, such that encoding relationships may be established between the sequence of the tag and the structural units (or building blocks) of the chemical entity. In particular, the identity and/or history of a chemical entity can be inferred from the sequence of bases in the oligonucleotide. Using this method, a library including diverse chemical entities or members (e.g., small molecules or peptides) can be encoded with a particular tag sequence.

[0172] Generally, these methods include the use of a headpiece, which has at least one functional group that may be elaborated chemically and at least one functional group to which a oligonucleotide tag may be bound (or ligated). Binding can be effectuated by any useful means, such as by enzymatic binding (e.g., ligation with one or more of an RNA ligase and/or a DNA ligase) or by chemical binding (e.g., by a substitution reaction between two functional groups, such as a nucleophile and a leaving group).

[0173] To create numerous chemical entities within the library, a solution containing the headpiece can be divided into multiple aliquots and then placed into a multiplicity of physically separate compartments, such as the wells of a multiwell plate. Generally, this is the “split” step. Within each compartment or well, successive chemical reaction and ligation steps are performed with a oligonucleotide tag within each aliquot. The relationship between the chemical reaction conditions and the sequence of the—tag are associated. The reaction and ligation steps may be performed in any order. Then, the reacted and ligated aliquots are combined or “pooled,” and optionally purification may be performed at this point. Purification may be performed by reversible immobilization of the complex, removal of the solvent and any reagents/containments, followed by release of the complex prior to a subsequent step. These split and pool steps can be optionally repeated.

[0174] Next, the library can be tested and/or selected for a particular characteristic or function, as described herein. For example, the mixture of tagged chemical entities can be separated into at least two populations, where the first population is enriched for members that bind to a particular biological target and the second population that is less enriched (e.g., by negative selection or positive selection). The first population can then be selectively captured (e.g., by eluting on a column providing the target of interest or by incubating the aliquot with the target of interest) and, optionally, further analyzed or tested, such as with optional washing, purification, negative selection, positive selection, or separation steps. Finally, the chemical histories of one or more members (or chemical entities) within the selected population can be determined by the sequence of the operatively linked oligonucleotide. Upon correlating the sequence with encoded library members chemical history, this method can identify the individual members of the library with the selected characteristic (e.g., an increased tendency to bind to the target protein and thereby elicit a therapeutic effect). For further testing and optimization, candidate therapeutic compounds may then be prepared by synthesizing the identified library members with or without their associated oligonucleotide tags.

[0175] The methods described herein can include any number of optional steps to diversify the library or to interrogate the members of the library. For any tagging method described herein, successive “n” number of tags can be added with additional “n” number of ligation, separation, and/or phosphorylation steps. Exemplary optional steps include restriction of library member-associated encoding oligonucleotides using one or more restriction endonucleases; repair of the associated encoding oligonucleotides, e.g., with any repair enzyme, such as those described herein; ligation of one or more adapter sequences to one or both of the termini for library member-associated encoding oligonucleotides, e.g., such as one or more adapter sequences to provide a priming sequence for amplification and sequencing or to provide a label, such as biotin, for immobilization of the sequence; reverse-transcription or transcription, optionally followed by reverse-transcription, of the assembled tags in the complex using a reverse transcriptase, transcriptase, or another template-dependent polymerase; amplification of the assembled tags in the complex using, e.g., PCR; generation of clonal isolates of one or more populations of assembled tags in the complex, e.g., by use of bacterial transformation, emulsion formation, dilution, surface capture techniques, etc.; amplification of clonal isolates of one or more populations of assembled tag in the complex, e.g., by using clonal isolates as templates for template-dependent polymerization of nucleotides; and sequence determination of clonal isolates of one or more populations of assembled tags in the complex, e.g., by using clonal isolates as templates for template-dependent polymerization with fluorescently labeled nucleotides with reversible terminator chemistry. Additional methods for amplifying and sequencing the oligonucleotide tags are described herein.

[0176] These methods can be used to identify and discover any number of chemical entities with a particular characteristic or function, e.g., in a selection step. The desired characteristic or function may be used as the basis for partitioning the library into at least two parts with the concomitant enrichment of at least one of the members or related members in the library with the desired function. In particular embodiments, the method comprises identifying a small drug-like library member that binds or inactivates a protein of therapeutic interest. In another embodiment, a sequence of chemical reactions is designed, and a set of building blocks is chosen so that the reaction of the chosen building blocks under the defined chemical conditions will generate a combinatorial plurality of molecules (or a library of molecules), where one or more molecules may have utility as a therapeutic agent for a particular protein. For example, the chemical reactions and building blocks are chosen to create a library having structural groups commonly present in kinase inhibitors. In any of these instances, the oligonucleotide tags encode the chemical history of the library member and in each case a collection of chemical possibilities may be represented by any particular tag combination.

[0177] In one embodiment, the library of chemical entities, or a portion thereof, is contacted with a biological target under conditions suitable for at least one member of the library to bind to the target, followed by removal of library members that do not bind to the target, and analyzing the one or more oligonucleotide tags associated with the target. This method can optionally include amplifying the tags by methods known in the art. Exemplary biological targets include enzymes (e.g., kinases, phosphatases, methylases, demethylases, proteases, and DNA repair enzymes), proteins involved in protein:protein interactions (e.g., ligands for receptors), receptor targets (e.g., GPCRs and RTKs), ion channels, bacteria, viruses, parasites, DNA, RNA, prions, and carbohydrates.

[0178] In another embodiment, the chemical entities that bind to a target are not subjected to amplification but are analyzed directly. Exemplary methods of analysis include microarray analysis, including evanescent resonance photonic crystal analysis; bead-based methods for deconvoluting tags (e.g., by using his-tags); label-free photonic crystal biosensor analysis (e.g., a BIND® Reader from SRU Biosystems, Inc., Woburn, Mass.); or hybridization-based approaches (e.g. by using arrays of immobilized oligonucleotides complementary to sequences present in the library of tags).

[0179] In addition, chemical-reactive pairs (or functional groups) can be readily included in solid-phase oligonucleotide synthesis schemes and will support the efficient chemical ligation of oligonucleotides. In addition, the resultant ligated oligonucleotides can act as templates for template-dependent polymerization with one or more polymerases. Accordingly, any of the binding steps described herein for tagging encoded libraries can be modified to include one or more of enzymatic ligation and/or chemical ligation techniques. Exemplary ligation techniques include enzyme ligation, such as use of one of more RNA ligases and/or DNA ligases; and chemical ligation, such as use of chemical-reactive pairs (e.g., a pair including optionally substituted alkynyl and azido functional groups).

[0180] Furthermore, one or more libraries can be combined in a split-and-mix step. In order to permit mixing of two or more libraries, the library member may contain one or more library-identifying sequences, such as in a library-identifying tag, in a ligated tag, or as part of the headpiece sequence, as described herein.

Methods for Encoding Chemical Entities within a Library

[0181] The methods of the invention can be used to synthesize a library having a diverse number of chemical entities that are encoded by oligonucleotide tags. Examples of building blocks and encoding DNA tags are found in U.S. Patent Application Publication No. 2007/0224607, the building blocks and tags of which are hereby incorporated by reference.

[0182] Each chemical entity is formed from one or more building blocks and optionally a scaffold. The scaffold serves to provide one or more diversity nodes in a particular geometry (e.g., a triazine to provide three nodes spatially arranged around a heteroaryl ring or a linear geometry).

[0183] The building blocks and their encoding tags can be added directly or indirectly (e.g., via a spacer) to the headpiece to form a complex. When the headpiece includes a spacer, the building block or scaffold is added to the end of the spacer. When the spacer is absent, the building block can be added directly to the headpiece or the building block itself can include a spacer that reacts with a functional group of the headpiece. Exemplary spacers and headpieces are described herein.

[0184] The scaffold can be added in any useful way. For example, the scaffold can be added to the end of the spacer or the headpiece, and successive building blocks can be added to the available diversity nodes of the scaffold. In another example, building block A.sub.n is first added to the spacer or the headpiece, and then the diversity node of scaffold S is reacted with a functional group in building block A.sub.n. Oligonucleotide tags encoding a particular scaffold can optionally be added to the headpiece or the complex. For example, S.sub.n is added to the complex in n reaction vessels, where n is an integer more than one, and tag S.sub.n (i.e., tag S.sub.1, S.sub.2, S.sub.n-1, S.sub.n) is bound to the functional group of the complex.

[0185] Building blocks can be added in multiple, synthetic steps. For example, an aliquot of the headpiece, optionally having an attached spacer, is separated into n reaction vessels, where n is an integer of two or greater. In the first step, building block A.sub.n is added to each n reaction vessel (i.e., building block A.sub.1, A.sub.2, . . . A.sub.n-1, A.sub.n is added to reaction vessel 1, 2, . . . n-1, n), where n is an integer and each building block A.sub.n is unique. In the second step, scaffold S is added to each reaction vessel to form an A.sub.n-S complex. Optionally, scaffold S.sub.n can be added to each reaction vessel to from an A.sub.n-S.sub.n complex, where n is an integer of more than two, and each scaffold S.sub.n can be unique. In the third step, building block B.sub.n is to each n reaction vessel containing the A.sub.n-S complex (i.e., building block B.sub.1, B.sub.2, . . . B.sub.n-1, B.sub.n is added to reaction vessel 1, 2, . . . n-1, n containing the A.sub.1-S, A.sub.2-S, . . . A.sub.n-1-S, A.sub.n-S complex), where each building block B.sub.n is unique. In further steps, building block O.sub.n can be added to each n reaction vessel containing the B.sub.n-A.sub.n-S complex (i.e., building block C.sub.1, C.sub.2, . . . C.sub.n-1, C.sub.n is added to reaction vessel 1, 2, . . . n-1, n containing the B.sub.1-A.sub.1-S . . . B.sub.n-A.sub.n-S complex), where each building block C.sub.n is unique. The resulting library will have n.sup.3 number of complexes having n.sup.3 tags. In this manner, additional synthetic steps can be used to bind additional building blocks to further diversify the library.

[0186] After forming the library, the resultant complexes can optionally be purified and subjected to a polymerization or ligation reaction, e.g., to a tailpiece. This general strategy can be expanded to include additional diversity nodes and building blocks (e.g., D, E, F, etc.). For example, the first diversity node is reacted with building blocks and/or S and encoded by an oligonucleotide tag. Then, additional building blocks are reacted with the resultant complex, and the subsequent diversity node is derivatized by additional building blocks, which is encoded by the primer used for the polymerization or ligation reaction.

[0187] To form an encoded library, oligonucleotide tags are added to the complex after or before each synthetic step. For example, before or after the addition of building block A.sub.n to each reaction vessel, tag A.sub.n is bound to the functional group of the headpiece (i.e., tag A.sub.1, A.sub.2, . . . A.sub.n-1, A.sub.n is added to reaction vessel 1, 2, . . . n-1, n containing the headpiece). Each tag A.sub.n has a distinct sequence that correlates with each unique building block A.sub.n, and determining the sequence of tag A.sub.n provides the chemical structure of building block A.sub.n. In this manner, additional tags are used to encode for additional building blocks or additional scaffolds.

[0188] Furthermore, the last tag added to the complex can either include a primer-binding sequence or provide a functional group to allow for binding (e.g., by ligation) of a primer-binding sequence. The primer-binding sequence can be used for amplifying and/or sequencing the oligonucleotides tags of the complex. Exemplary methods for amplifying and for sequencing include polymerase chain reaction (PCR), linear chain amplification (LCR), rolling circle amplification (RCA), or any other method known in the art to amplify or determine nucleic acid sequences.

[0189] Using these methods, large libraries can be formed having a large number of encoded chemical entities. For example, a headpiece is reacted with a spacer and building block A.sub.n, which includes 1,000 different variants (i.e., n=1,000). For each building block A.sub.n, a DNA tag A.sub.n is ligated or primer extended to the headpiece. These reactions may be performed in a 1,000-well plate or 10×100 well plates. All reactions may be pooled, optionally purified, and split into a second set of plates. Next, the same procedure may be performed with building block B.sub.n, which also include 1,000 different variants. A DNA tag B.sub.n may be ligated to the A.sub.n-headpiece complex, and all reactions may be pooled. The resultant library includes 1,000×1,000 combinations of A.sub.n×B.sub.n (i.e., 1,000,000 compounds) tagged by 1,000,000 different combinations of tags. The same approach may be extended to add building blocks C.sub.n, D.sub.n, E.sub.n, etc. The generated library may then be used to identify compounds that bind to the target. The structure of the chemical entities that bind to the library can optionally be assessed by PCR and sequencing of the DNA tags to identify the compounds that were enriched.

[0190] This method can be modified to avoid tagging after the addition of each building block or to avoid pooling (or mixing). For example, the method can be modified by adding building block A.sub.n to n reaction vessels, where n is an integer of more than one, and adding the identical building block B.sub.1 to each reaction well. Here, B.sub.1 is identical for each chemical entity, and, therefore, an oligonucleotide tag encoding this building block is not needed. After adding a building block, the complexes may be pooled or not pooled. For example, the library is not pooled following the final step of building block addition, and the pools are screened individually to identify compound(s) that bind to a target. To avoid pooling all of the reactions after synthesis, a binding assay e.g. ELISA, SPR, ITC, Tm shift, SEC or similar, for example, may be used to monitor binding on a sensor surface in high throughput format (e.g., 384 well plates and 1,536 well plates). For example, building block A.sub.n may be encoded with DNA tag A.sub.n, and building block B.sub.n may be encoded by its position within the well plate. Candidate compounds can then be identified by using a binding assay (e.g., ELISA, SPR, ITC, Tm shift, SEC or similar) and by analyzing the A.sub.n tags by sequencing, microarray analysis and/or restriction digest analysis. This analysis allows for the identification of combinations of building blocks A.sub.n and B.sub.n that produce the desired molecules.

[0191] The method of amplifying can optionally include forming a water-in-oil emulsion to create a plurality of aqueous microreactors. The reaction conditions (e.g., concentration of complex and size of microreactors) can be adjusted to provide, on average, a microreactor having at least one member of a library of compounds. Each microreactor can also contain the target, a single bead capable of binding to a complex or a portion of the complex (e.g., one or more tags) and/or binding the target, and an amplification reaction solution having one or more necessary reagents to perform nucleic acid amplification. After amplifying the tag in the microreactors, the amplified copies of the tag will bind to the beads in the microreactors, and the coated beads can be identified by any useful method.

[0192] Once the building blocks from the first library that bind to the target of interest have been identified, a second library may be prepared in an iterative fashion. For example, one or two additional nodes of diversity can be added, and the second library is created and sampled, as described herein. This process can be repeated as many times as necessary to create molecules with desired molecular and pharmaceutical properties.

[0193] Various ligation techniques can be used to add the scaffold, building blocks, spacers, linkages, and tags. Accordingly, any of the binding steps described herein can include any useful ligation technique or techniques. Exemplary ligation techniques include enzymatic ligation, such as use of one of more RNA ligases and/or DNA ligases, as described herein; and chemical ligation, such as use of chemical-reactive pairs, as described herein.

EXAMPLES

Example 1

Preparation of the Components for the Chemical Ligation (Double-Stranded Headpiece and Double-Stranded Tag)

[0194] Headpiece HP006, chemically phosphorylated at its 5′ end SEQ ID NO: 1 -(p)CCTGTGTTZTTCACGGCCT, where Z stands for C6-amino dT modification, was acquired from Biosearch Inc. HP006 was then modified by DMT-MM acylation using Fmoc-NH-PEG4-CH2CH2COOH (Chem Pep Inc) using the following procedure.

[0195] 50 equivalents of Fmoc-NH-PEG4-CH2CH2COOH (Chem Pep Inc) were dissolved in DMA (Dimethyl acetamide, Acros) and added to 1 equivalent of HP006 dissolved in 0.5 M Borate buffer pH 9.5 together with 50 equivalents of DMT-MM (4-(4,6-Dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium Chloride, Acros), freshly dissolved in water. The reaction was allowed to proceed for 2-4 hrs, followed by a second addition of 50 equivalents of Fmoc-NH-PEG4-CH2CH2COOH and of 50 equivalents of DMT-MM and the reaction was then allowed to proceed overnight. Completion of the reaction was monitored by LCMS.

[0196] The product was ethanol precipitated and desalted by size exclusion spin filtration using 3,000 MW cut-off centrifugal spin filters (Millipore). LCMS of the product confirmed the MW as 6,803.3 (Calcd 6,802.5).

[0197] Oligonucleotides TagZA1+_deltaC_5OH: SEQ ID NO: 2-5′ CATCAAGACCCAGAAAG-3′, TagZB_CNIm_bot3OH; SEQ ID NO: 3-5′-(p)TCTGGGTCTTGATGGCTATCC-3′ (chemically phosphorylated at 5′ terminus), PrA_CNIm_bot5P; SEQ ID NO: 4-5′-(p)TGGCTGAGG-3′ (chemically phosphorylated at 5′ terminus) and PrA_top_extraC_3P; SEQ ID NO: 5-5′-(p)CAGCCAGGATAGC(p)-3′ (chemically phosphorylated at both 5′ and 3′ termini) were acquired from IDT DNA.

[0198] Oligos tagZA1+_deltaC and TagZB_CNIm_bot3OH were then dissolved to a 2 mM final concentration in water and combined at equimolar ratio to make 1 mM solution of double stranded TagZA.

[0199] Oligos PrA_CNIm_bot5P and PrA_top_extraC_3P were also dissolved to a 2 mM final concentration in water and combined at equimolar ratio to make 1 mM solution of double stranded “CNIm-PrA”.

[0200] Fmoc-amino-PEG4-HP006 was then enzymatically ligated to one equivalent of double stranded CNIm-PrA using T4 DNA ligase and a standard ligation protocol. The resulting oligo, (Fmoc-amino-PEG4-HP013), was ethanol precipitated and desalted using Illustra NAP-5 columns (GE Healthcare Life Science). LCMS confirmed MW 13,772 (calcd 13,770.7).

Example 2

The Chemical Ligation of a Double-Stranded Headpiece to a Double-Stranded Tag

[0201] Fmoc-amino-PEG4-HP013 and double-stranded TagZA oligonucleotides were dissolved to a final concentration of 0.33 mM in 80 mM MES buffer pH 6.0, containing 800 mM NaCl, and 8 mM ZnCl.sub.2. 1-Cyanoimidazole was freshly dissolved to 1 M in DMF and 1-2 additions were made to the reaction over a 12-hour period to a final concentration of 1-cyanoimidazole of 150 mM. The reaction was then incubated at 4° C. overnight.

[0202] The completed reaction was analyzed by denaturing gel electrophoresis as well as by LCMS. The samples were then resolved on a 15% denaturing analytical TBE-8M Urea gel and visualized by UV shadowing over a TLC plate with a fluorescent dye (254 nm). LCMS confirmed formation of the double-stranded ligated product with MW 25,417.3 (calc 25,415.3) with ˜70% conversion. Additional products with MW 20,254.7 and 18,935.4 were observed, which corresponded to either the (hemi-ligated) top or bottom strand ligation products.

[0203] Analytical gel electrophoresis of the chemical ligation products with a 15% TBE-8M Urea denaturing gel is shown in FIG. 2:

[0204] 1—Starting material—Fmoc-amino-PEG4-HP013

[0205] 2—dsTag ZA, which is an equimolar mixture of tagZA1_deltaC_5OH and TagZA1+_CNIm_bot3OH

[0206] 3, 4, 5—Cyanoimidazole ligation reactions

[0207] 6—Enzymatic ligation control (T4 DNA ligase) ligates only the bottom strand, the junction between 3′ OH and 5′phosphate; the junction between 3′phosphate and 5′ OH is not be ligated by this enzyme.

[0208] LCMS of the chemical ligation products are shown in FIG. 3. (In each panel—top UV (260 nm) LC trace, middle—TIC, bottom—mass spectrum) [0209] A.—Starting materials: mixture of double stranded TagZA (MW 5,182 and 6,500.2 Da) as well as Fmoc-amino-PEG4-HP013 (13,772). [0210] B—Products of the chemical ligation reaction: doubly ligated: MW 25,417.3 (calc 25,415.3). Hemi-ligated (either top or bottom strand) products: MW 20,254.7 and 18,935.4.

Example 3

Fmoc Deprotection of the Chemical Ligation Reaction Products

[0211] Products of the 1-cyanoimidazole ligation reaction were ethanol precipitated, dissolved in water and deprotected by incubation in 10% piperidine for 2 hours at room temperature. Following this deprotection step, the material was purified on a 15% TBE-8M urea gel. The LC-MS performed on the purified sample confirmed the presence of the deprotected amino-PEG4-HP013-TagZA (MW 25,192.4, calc 25,193.2) as well as the two hemi-ligated deprotected products (MW 18,738.6 and 20,029.3).

[0212] Integration of the LC trace gives the relative yields of the full length product at 64%, while the hemi-ligated products at roughly 18% each. The efficiency of the ligation per strand can be estimated at 83%.

[0213] Schematics of the amino deprotection by piperidine are shown in FIG. 4A. Gel purification of the ligation reaction products: 15% TBE-Urea gel, UV shadowing is shown in FIG. 4B. LCMS analysis of the purified material is shown in FIG. 4C. Full-length ligation product at MW 25,192.4 Da, hemi-ligated products at MW 18,738.6 and 20,029.3 Da.

Example 4

Illustration of the Necessity for Amino-Group Protection with Fmoc

[0214] HP006, which features an amino-C6 linker at T in the loop, as described above, was incubated in the reaction mixture with 1-cyanoimidazole for 12 hours at 4 ° C. Following the incubation, HP006 was ethanol-precipitated, incubated for 2 hrs at room temperature in 10% piperidine and ethanol precipitated again.

[0215] LCMS analysis of this material demonstrated that there are two products in the mixture, the MW 6,333.4 Da HP006 and a MW 6,426.4 reaction product (30-40% conversion). The addition of 94 Da corresponds to the formation of an N-imidazole guanidine derivative of HP006. Fmoc protection of the amino group completely eliminates this unwanted reaction.

[0216] A deconvoluted mass spectrum of the product of the reaction of HP006 with 1-cyanoimidazole is shown in FIG. 5A. MW 6,333.4 Da corresponds to unmodified HP006, MW 6,426.4 corresponds to an N-imidazole guanidine derivative of HP006.

[0217] A schematic of the generation of the N-imidazole guanidine derivative of HP006 is shown in FIG. 5B.

Example 5

Chemical Ligation with Alternative Divalent Metal Ions

[0218] Cyanoimidazole-mediated chemical ligation was performed as described above with the substitution of 8 mM of alternative divalent metals. Significant ligation yields were observed with CoCl.sub.2 (30% full-length product, 70% of hemi-ligated products), MnCl.sub.2 (75% full-length product, 25% of hemi-ligated products and ZnCl.sub.2 (60% of full-length product with 30% of hemi-ligated products). Soluble divalent salts of lead, magnesium, tin and copper produced no significant ligation.

Example 6

Chemical Ligation with Alternative Flanking Nucleotides

[0219] The following chemically phosphorylated oligonucleotides were acquired from IDT DNA Top strand, pair 1:

TABLE-US-00001 PrA_top: SEQ ID NO: 6- 5′-(p)CAGCCAGGATAG-3′; Tag_ZA1+: 5′-(p)CCATCAAGACCCAGAAAG-3′; Top strand, pair 2: PrA_top_extraC_3P: 5′-(p)CAGCCAGGATAGCp-3′; tagZA1_deltaC_5OH: 5′-CATCAAGACCCAGAAAG-3′ (overlap sequences in bold) Bottom strand, pair A: PrA_CNlm_bot5P: 5′-pTGGCTGAGG-3′; TagZB_CNlm_bot3OH: 5′-pTCTGGGTCTTGATGGCTATCC-3′ Bottom strand, pair B: PrA_CNlm_bot5OH: 5′-TGGCTGAGG-3′: TagZB_CNlm_bot3P: 5′-pTCTGGGTCTTGATGGCTATCCp-3′

[0220] Four combinations of oligonucleotides were tested for the efficiency of 1-cyanoimidazole ligation, as shown in Table 2. While bottom strands demonstrated consistently high yields of ligation with both 6- and 7-nucleotide overlaps (greater than 80%), and in both tested combinations of flanking nucleotides (C to C and C to T), the top strand ligations were apparently dependent on the identity of the flanking nucleotides, e.g. ligation of C to G was inefficient, whereas C to C junctions ligated at high yield.

TABLE-US-00002 TABLE 2 Summary of ligation junction designs and yields of chemical ligation Overlap Ligation Bottom strand Relative ligation length junction ligation conversion Reaction (nts) (top strand) junction (top strand) 1-A 6 .sup. C-3′ + 5′pG .sup. C-3′ + 5′pT 20% 1-B 6 .sup. C-3′ + 5′pG Cp-3′ + 5′-T 25% 2-A 7 Cp-3′ + 5′-C .sup. C-3′ + 5′pT 90% 2-B 7 Cp-3′ + 5′-C Cp-3′ + 5′-T 95%

Other Embodiments

[0221] Various modifications and variations of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific desired embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the fields of medicine, pharmacology, or related fields are intended to be within the scope of the invention.

METHODS FOR TAGGING DNA-ENCODED LIBRARIES

Assignee

Inventors

Cpc classification

Classification Explorer

C12N15/1065

CHEMISTRY; METALLURGY

Classification Explorer

C40B50/10

CHEMISTRY; METALLURGY

Classification Explorer

C07H1/00

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1068

CHEMISTRY; METALLURGY

Classification Explorer

C07H21/02

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/10

CHEMISTRY; METALLURGY

Abstract

Claims

Description