METHOD FOR PROVIDING A DNA-ENCODED LIBRARY, DNA-ENCODED LIBRARY AND METHOD OF DECODING A DNA-ENCODED LIBRARY

Abstract

Disclosed are a method for providing a DNA-encoding library, the DNA-encoding library and a method of decoding a DNA-encoded library. Many different DNA molecules are synthesized which differ from each other in DNA barcode sequences. Each DNA molecule is bonded to a specific substance forming different DNA-substance conjugates. The DNA-encoded library has the advantage that, for example after an enrichment experiment performed with the library, the library may be decoded in a faster and less expensive manner than known DNA-encoded libraries.

Claims

1-15. (canceled)

16. A method for providing a DNA-encoded library, comprising a) synthesizing many different DNA molecules which differ from each other by comprising different DNA barcode sequences, wherein each DNA barcode sequence comprises at least a first coding region DNA sequence comprising at least a first part, a second part and a third part, wherein the second part is located between the first and third part and the second part differs between all the DNA molecules by at least two nucleotides; and b) bonding each of the many different DNA molecules to at least a specific substance forming different DNA-substance conjugates, wherein the DNA-substance conjugates differ from each other by the specific substance and by their DNA molecules; wherein the first part and the third part encode information regarding the second part of the first coding region, wherein a certain first part and/or a certain third part uniquely codes for a certain group of DNA-substance conjugates which is smaller than the group of all DNA-substance conjugates in the DNA-encoded library.

17. The method according to claim 16, wherein i) the first coding region DNA sequence comprises at least a fourth part, wherein the second part is located between the fourth and third part and wherein both the combination of the first part and the fourth part and the combination of the first part and the third part of the first coding region encode information about the second part of the first coding region; and ii) each barcode sequence comprises at least a second coding region DNA sequence comprising at least a first part, a second part, a third part, and a fourth part, wherein the second part is located between the fourth and third part and the second part differs between all the DNA molecules by at least two nucleotides, wherein both the combination of the first part and the fourth part and the combination of the first part and the third part of the second coding region encode information about the second part of the second coding region; wherein a certain combination of a first part and fourth part in a certain coding region uniquely codes for a certain group of DNA-substance conjugates which is smaller than the group of all DNA-substance conjugates which is encoded by the first part alone.

18. The method according to claim 16, wherein i) each barcode sequence comprises at least a second coding region DNA sequence comprising at least a first part, a second part, a third part, and a fourth part, wherein the second part is located between the fourth and third part and the second part differs between all the DNA molecules by at least two nucleotides, wherein both the combination of the first part and the fourth part and the combination of the first part and the third part of the second coding region encode information about the second part of the second coding region; and ii) each barcode sequence comprises at least a third coding region DNA sequence comprising at least a first part, a second part, a third part, and a fourth part, wherein the second part is located between the fourth and third part and the second part differs between all the DNA molecules by at least two nucleotides, wherein both the combination of the first part and the fourth part and the combination of the first part and third part and the of the third coding region encode information about the second part of the third coding region; wherein a certain combination of a first part and fourth part in a certain coding region uniquely codes for a certain group of DNA-substance conjugates which is smaller than the group of DNA-substance conjugates which is encoded by the first part.

19. The method according to claim 16, wherein at least one coding region DNA sequence comprises at least a first part, a second part, a third part, a fourth part and a fifth part, wherein the second part is located between the fourth and fifth part and the second part differs between all the DNA molecules by at least two nucleotides, wherein the combination of the first part and the fourth part and the combination of the fifth part and the third part of the coding region encode information about the second part of the coding region, preferably of all coding regions, wherein a certain combination of a first part and fourth part uniquely codes for a certain group of DNA-substance conjugates which is smaller than the group of DNA-substance conjugates which is encoded by the first part alone, and wherein a certain combination of a fifth part and third part uniquely codes for a certain group of DNA-substance conjugates which is smaller than the group of DNA-substance conjugates which is encoded by the third part alone.

20. A DNA-encoded library, comprising many different DNA-ligand conjugates, wherein the DNA-ligand conjugates differ from each other by their ligand and by their DNA molecules, wherein the DNA molecules of the DNA-ligand conjugates differ from each other by comprising different DNA barcode sequences, wherein each DNA barcode sequence comprises at least a first coding region DNA sequence comprising at least a first part, a second part and a third part, wherein the second part is located between the first and third part and the second part differs between all the DNA molecules by at least two nucleotides; wherein the first part and the third part encode information regarding the second part of the first coding region, wherein a certain first part and/or a certain third part uniquely codes for a certain group of DNA-ligand conjugates which is smaller than the group of all DNA-ligand conjugates in the DNA-encoded library.

21. The DNA-encoded library according to claim 20, wherein i) the first coding region DNA sequence comprises at least a fourth part, wherein the second part is located between the fourth and third part and wherein both the combination of the first part and the fourth part and the combination of the first part and the third part of the first coding region encode information about the second part of the first coding region; and ii) each barcode sequence comprises at least a second coding region DNA sequence comprising at least a first part, a second part, a third part, and a fourth part, wherein the second part is located between the fourth and third part and the second part differs between all the DNA molecules by at least two nucleotides, wherein both the combination of the first part and the fourth part and the combination of the first part and the third part of the second coding region encode information about the second part of the second coding region; wherein a certain combination of a first part and fourth part in a certain coding region uniquely codes for a certain group of DNA-ligand conjugates which is smaller than the group of all DNA-ligand conjugates which is encoded by the first part alone.

22. The DNA-encoded library according to claim 21, wherein each barcode sequence comprises at least a third coding region DNA sequence, which is on the same DNA strand as the second coding region, comprising at least a first part, a second part, a third part, and a fourth part, wherein the second part is located between the fourth and third part and the second part differs between all the DNA molecules by at least two nucleotides, wherein both the combination of the first part and the fourth part and the combination of the first part and the third part and the of the third coding region encode information about the second part of the third coding region, wherein a certain combination of a first part and fourth part in the second coding region and in the third coding region uniquely codes for a certain group of DNA-ligand conjugates which is smaller than the group of DNA-ligand conjugates which is encoded by the first part alone.

23. The DNA-encoded library according to claim 20, wherein at least one coding region DNA sequence comprises at least a first part, a second part, a third part, a fourth part and a fifth part, wherein the second part is located between the fourth and fifth part and the second part differs between all the DNA molecules by at least two nucleotides, wherein the combination of the first part and the fourth part and the combination of the fifth part and the third part of the coding region encode information about the second part of the coding region, and wherein a certain combination of a first part and fourth part uniquely codes for a certain group of DNA-ligand conjugates which is smaller than the group of DNA-ligand conjugates which is encoded by the first part alone and wherein a certain combination of a fifth part and third part uniquely codes for a certain group of DNA-ligand conjugates which is smaller than the group of DNA-ligand conjugates which is encoded by the third part alone.

24. A method of decoding a DNA-encoded library according to claim 20, comprising a) performing a qPCR with the DNA-encoded library, wherein the following primers are utilized: a primer A and a primer B for amplifying the first coding region of every DNA-ligand conjugate; and many different primers A-xN which anneal to the different first parts of the first coding region and many different primers B-yN which anneal to the different third parts of the first coding region, wherein primer A-xN has an identical length like the coding region primer A by shortening x nucleotides at its 5-end, primer B-yN has an identical length like the coding region primer B by shortening y nucleotides at its 5-end, N represents a A, T, G or C and x and y represent the total number of any one of A, T, G or C at the 3-end of the primers, wherein x is an integer from 2 to 6; b) calculating a mathematical product of the signal value of each primer A-xN and each primer B-xN by following equation:
Value (A-xN).sub.i=signal value [(A-xN).sub.i+B].Math.signal value [(A-xN).sub.i+(B-xN).sub.i];
and
Value (B-yN).sub.i=signal value [(B-yN).sub.i+A].Math.signal value [(B-yN).sub.i+(A-xn).sub.i], wherein i is an integer and defines a specific primer, and the +-sign indicates a combination of two primers; wherein signal value is the percentage of abundance related to the whole set of qPCR quantification using different primers annealed to the same region; and c) comparing the obtained mathematical products for each of the primers (A-xN).sub.i and (B-yN).sub.i, wherein those primers with high values code for DNA-ligand conjugates which are present at a high concentration in the DNA-encoded library.

25. The method according to claim 24, wherein the method comprises i) calculating a mathematical product of the value obtained for each primer A-xN and each primer B-yN by following equation Value (AB).sub.i=Value (A-xN).sub.i.Math.Value (B-yN).sub.i; ii) comparing the obtained mathematical products for each of the combination of primers (A-xN).sub.i and (B-yN).sub.i, wherein those primer combinations with high values code for DNA-ligand conjugates which are present at a high concentration in the DNA-encoded library.

26. The method according to claim 24, wherein the qPCR is performed with a DNA-encoded library as a template, wherein the DNA-encoded library comprises many different DNA-ligand conjugates, wherein the DNA-ligand conjugates differ from each other by their ligand and by their DNA molecules, wherein the DNA molecules of the DNA-ligand conjugates differ from each other by comprising different DNA barcode sequences, wherein each DNA barcode sequence comprises at least a first coding region DNA sequence comprising at least a first part, a second part and a third part, wherein the second part is located between the first and third part and the second part differs between all the DNA molecules by at least two nucleotides; wherein the first part and the third part encode information regarding the second part of the first coding region, wherein a certain first part and/or a certain third part uniquely codes for a certain group of DNA-ligand conjugates which is smaller than the group of all DNA-ligand conjugates in the DNA-encoded library; the method comprising: i) performing a qPCR with the following primers: a first coding region primer A and a first coding region primer primer B for amplifying the first coding region of every DNA-ligand conjugate; and many different primers A-xN which anneal to the different first parts, or first and fourth parts of the first coding region and many different primers B-yN which anneal to the different third parts of the first coding region, wherein A-xN has an identical length like the coding region primer A by shortening x nucleotides at its 5-end, B-yN has an identical length like the coding region primer B by shortening y nucleotides at its 5-end, N represents a A, T, G or C and x and y represent the total number of any one of A, T, G or C at the 3-end of the primers, wherein x is an integer from 6 to 10, and y is an integer from 2 to 6; and a second coding region primer C and a second coding region primer D for amplifying the second coding region of every DNA-ligand conjugate; and many different primers D-yN which anneal to the different first parts, or first and fourth parts of the second coding region and many different primers C-xN which anneal to the different third parts of the second coding region, wherein primer C-xN has an identical length like the coding region primer C by shortening x nucleotides at its 5-end, primer D-yN has an identical length like the coding region primer D by shortening y nucleotides at its 5-end, N represents a A, T, G or C and x and y represent the total number of any one of A, T, G or C at the 3-end of the primers, wherein x is an integer from 6 to 10, and y is an integer from 2 to 6; ii) calculating a mathematical product of the signal value of each primer A-xN, each primer B-yN, each primer C-xN and each primer D-yN by following equation:
Value (A-xN).sub.i=signal value [(A-xN).sub.i+B].Math.signal value [(A-xN).sub.i+(B-xN).sub.i];
Value (B-yN).sub.i=signal value [(B-yN).sub.i+A].Math.signal value [(B-yN).sub.i+(A-xN).sub.i],
Value (C-xN).sub.i=signal value [(C-xN).sub.i+D].Math.signal value [(C-xN).sub.i+(D-xn).sub.i],
Value (D-yN).sub.i=signal value [(D-yN).sub.i+C].Math.signal value [(D-yN).sub.i+(C-xn).sub.i], wherein i is an integer and defines a specific primer, and the +-sign indicates a combination of two primers; wherein signal value is the percentage of abundance related to the whole set of qPCR quantification using different primers annealed to the same region; and iii) comparing the obtained mathematical products for each of the primers (A-xN).sub.i, (B-yN).sub.i, (C-xN).sub.i and (D-yN).sub.i, wherein those primers with high values code for DNA-ligand conjugates which are present at a high concentration in the DNA-encoded library.

27. The method according to claim 26, wherein the method comprises i) calculating a mathematical product of the value obtained for each primer A-xN and each primer B-yN, for each primer A-xN and each primer D-xN and for each primer C-yN and D-xN by following equation
Value (AB).sub.i=Value (A-xN).sub.i.Math.Value (B-yN).sub.i;
Value (AD).sub.i=Value (A-xN).sub.i.Math.Value (D-yN).sub.i;
Value (CD).sub.i=Value (C-xN).sub.i.Math.Value (D-yN).sub.i; ii) calculating the mathematical product of the Value (AB).sub.i, (AD).sub.i and (CD).sub.i for each primer i by the following equation Value.sup.i=value (AB).sub.i.Math.value (AD).sub.i.Math.value (CD).sub.i iii) comparing the obtained mathematical products Value.sup.i, wherein those primer combinations i with high values code for DNA-ligand conjugates which are present at a high concentration in the DNA-encoded library.

28. The method according to claim 24, wherein the qPCR is performed with a DNA-encoded library, wherein the DNA-encoded library comprises many different DNA-ligand conjugates, wherein the DNA-ligand conjugates differ from each other by their ligand and by their DNA molecules, wherein the DNA molecules of the DNA-ligand conjugates differ from each other by comprising different DNA barcode sequences, wherein each DNA barcode sequence comprises at least a first coding region DNA sequence comprising at least a first part, a second part and a third part, wherein the second part is located between the first and third part and the second part differs between all the DNA molecules by at least two nucleotides; wherein the first part and the third part encode information regarding the second part of the first coding region, wherein a certain first part and/or a certain third part uniquely codes for a certain group of DNA-ligand conjugates which is smaller than the group of all DNA-ligand conjugates in the DNA-encoded library; the method comprising: i) performing a qPCR with the following primers: a first coding region primer A and a first coding region primer B for amplifying the first coding region of every DNA-ligand conjugate; and many different primers A-xN which anneal to the different first parts, or first and fourth parts of the first coding region, and many different primers B-yN which anneal to the different third parts of the first coding region, wherein A-xN has an identical length like the coding region primer A by shortening x nucleotides at its 5-end, B has an identical length like the coding region primer B-yN by shortening y nucleotides at its 5-end, N represents a A, T, G or C and x and y represent the total number of any one of A, T, G or C at the 3-end of the primers, wherein x is an integer from 6 to 10, preferably 8, and y is an integer from 2 to 6, preferably 4; and a second coding region primer C and a second coding region primer D for amplifying the second coding region of every DNA-ligand conjugate; and many different primers D-yN which anneal to the different first parts, or first and fourth parts of the second coding region and many different primers C-xN which anneal to the different third parts of the second coding region, wherein primer C-xN has an identical length like the coding region primer C by shortening x nucleotides at its 5-end, primer D-yN has an identical length like the coding region primer D by shortening y nucleotides at its 5-end, N represents a A, T, G or C and x and y represent the total number of any one of A, T, G or C at the 3-end of the primers, wherein x is an integer from 6 to 10, and y is an integer from 2 to 6; a third coding region primer E and a third coding region primer F for amplifying the third coding region of every DNA-ligand conjugate; and many different primers E-xN which anneal to the different first parts of the third coding region and many different primers F-yN which anneal to the different third parts of the third coding region, wherein primer E-xN has an identical length like the coding region primer E by shortening x nucleotides at its 5-end, primer F-yN has an identical length like the coding region primer F by shortening y nucleotides at its 5-end, N represents a A, T, G or C and x and y represent the total number of any one of A, T, G or C at the 3-end of the primers, wherein x is an integer from 6 to 10, and y is an integer from 2 to 6; ii) calculating a mathematical product of the signal value of each primer A-xN, each primer B-yN, each primer C-xN, each primer D-yN, each primer E-xN and each primer F-yN by following equation:
Value (A-xN).sub.i=signal value [(A-xN).sub.i+B].Math.signal value [(A-xN).sub.i+(B-xN).sub.i];
Value (B-yN).sub.i=signal value [(B-yN).sub.i+A].Math.signal value [(B-yN).sub.i+(A-xN).sub.i],
Value (C-xN).sub.i=signal value [(C-xN).sub.i+D].Math.signal value [(C-xN).sub.i+(D-xN).sub.i],
Value (D-yN).sub.i=signal value [(D-yN).sub.i+C].Math.signal value [(D-yN).sub.i+(C-xN).sub.i],
Value (E-xN).sub.i=signal value [(E-xN).sub.i+F].Math.signal value [(E-xN).sub.i+(F-xn).sub.i],
Value (F-yN).sub.i=signal value [(F-yN).sub.i+E].Math.signal value [(F-yN).sub.i+(E-xN).sub.i], wherein i is an integer and defines a specific primer, and the +-sign indicates a combination of two primers; wherein signal value is the percentage of abundance related to the whole set of qPCR quantification using different primers annealed to the same region; and iii) comparing the obtained mathematical products for each of the primers (A-xN).sub.i, (B-yN).sub.i, (C-xN).sub.i, (D-yN).sub.i, (E-xN).sub.i and (N-yN).sub.i, wherein those primers with high values code for DNA-ligand conjugates which are present at a high concentration in the DNA-encoded library.

29. The method according to claim 28, wherein the method comprises i) calculating a mathematical product of the value obtained for each primer A-xN and each primer B-yN, for each primer A-xN and each primer D-xN, for each primer C-yN and D-xN, for each primer A-xN and N-yN, for each primer M-xN and D-yN and for each primer M-xN and N-yN by following equation
Value (AB).sub.i=Value (A-xN).sub.i.Math.Value (B-yN).sub.i;
Value (AD).sub.i=Value (A-xN).sub.i.Math.Value (D-yN).sub.i;
Value (CD).sub.i=Value (C-xN).sub.i.Math.Value (D-yN).sub.i;
Value (AF).sub.i=Value (A-xN).sub.i.Math.Value (F-yN).sub.i;
Value (ED).sub.i=Value (E-xN).sub.i.Math.Value (D-yN).sub.i;
Value (EF).sub.i=Value (E-xN).sub.i.Math.Value (F-yN).sub.i; ii) calculating the mathematical product of the values (AB).sub.i, (AD).sub.i, (CD).sub.i, (A-F).sub.i, (E-D).sub.i and (E-F).sub.i for each primer combinations i by the following equation
Value.sup.i=value (AB).sub.i.Math.value (AD).sub.i.Math.value (CD).sub.i.Math.value (AF).sub.i.Math.value (ED).sub.i.Math.value (EF).sub.i; iii) comparing the obtained mathematical products Value.sup.i, wherein those primer combinations i with high values code for DNA-ligand conjugates which are present at a high concentration in the DNA-encoded library.

30. The method according to claim 29, wherein the method further comprises calculating a Value.sup.i by the following calculation:
Value.sup.i=log.sub.10[value (AB).sub.i value (AD).sub.i value (CD).sub.i value (AF).sub.i.Math.value (ED).sub.i.Math.value (EF).sub.i].

Description

[0087] With reference to the following Figures and Examples, the subject according to the invention is intended to be explained in more detail without wishing to restrict said subject to the special embodiments shown here.

[0088] FIG. 1A shows how the coding algorithm works for generating a qPCR-matrix for DNA codes 1 having one single coding region 2 (coding region III). E and F are primary primers and E.sub.xe and F.sub.xf are secondary primers. Primary primer E binds upstream (i.e. towards the 5-end) of the first region #1 and primary primer F binds upstream (i.e. towards the 5-end) of the third region #3. A qPCR with only the two primary primers E, F amplifies the DNA barcodes of all DNA-substance conjugates of the DNA-encoded library having coding region III. A qPCR with at least one primary primer E, F and at least one secondary primer E.sub.xe, F.sub.xf is called a primary qPCR. FIG. 1A illustrates a qPCR template containing one single coding region III having the three code parts (sub-codes) #1, #2, #3. The sequence of the second part #2 of the coding region III is a unique sub-code. Each combination of the first part #1 and the third part #3 can also represent a unique code. Therefore, a sequence of the second part #2 is corresponding to a combination of the first part #1 and the third part #3. For each code part (sub-code) #1, #2, #3, there is a minimal difference number n between any pair of sequences (e.g. between two different xe sequences), while n should be 2. This means that the code parts #1, #2, #3 differs from each other by at least two nucleotides.

[0089] FIG. 1B shows how the coding algorithm works for generating a qPCR-matrix for DNA codes having two coding regions, namely coding region I and coding region II. A, B, C and D are primary primers, A.sub.xa, B.sub.xb, C.sub.xc and D.sub.xd are secondary primers and A.sub.xaya and D.sub.xdyd are tertiary primers. A qPCR comprising the use of at least two tertiary primers is called a tertiary PCR.

[0090] FIG. 1B illustrates a qPCR template containing two different coding regions I, II, wherein the first coding region I has four code parts (sub-codes) #1, #2, #3, #4 and the second coding region II also has four code parts (sub-codes) #1, #2, #3, #4. The sequence of the second parts #2 of each coding region I, II represents a unique sub-code. Each combination of the first part #1 and the third part #3 of each coding region I, II can also represent a unique sub-code of each coding region I, II. In this case, a sequence of the second part #2 of each coding region I, II is corresponding to a combination of the sequence of the first part #1 and the sequence of the third part #3 of each coding region I, II. Each combination of the first part #1 the fourth part #4 can also represent one unique building block. In this case, a sequence of the second part #2 of each coding region I, II is also corresponding to a combination of the sequence of the first part #1 and the sequence of the fourth part #4 of each coding region I, II. For each code part (sub-code) #1, #2, #3, #4, there is a minimal difference number n between any pair of sequences (e.g. between two different #2 sequences), while n should be 2. This means that each code part #1, #2, #3, #4 differs from another code part #1, #2, #3, #4 by at least two nucleotides.

[0091] FIG. 2 shows how the coding algorithm works for generating a qPCR-matrix for DNA codes having three coding regions I, II, III. A, B, C and D are each a primary primer. A.sub.xa, B.sub.xb, C.sub.xc, D.sub.xd, M.sub.xm and N.sub.xn are each a secondary primer. A.sub.xaya, D.sub.xdyd, M.sub.xmym and N.sub.xnyn are each a tertiary primer. A qPCR using at least two tertiary primers is called a tertiary PCR. FIG. 2 illustrates a qPCR template containing three different coding regions I, II, III, wherein the first coding region I has four code parts (sub-codes) #1, #2, #3, #4, the second coding region II also has four code parts (sub-codes) #1, #2, #3, #4 and the third coding region III has five code parts (sub-codes) #1, #2, #3, #4, #5. The sequence of each second code part #2 of each coding region I, II, III is a unique sub-code. Each combination of code parts #1 and code part #3, code part #1 with code part #4 and code part #1 with code part #5 can also represent a unique sub-code. For example, a sequence of code part #2 is corresponding to a combination of code part #1 and code part #3. For each code part (sub-code), there is a minimal difference number n between any pair of sequences (e.g. between two different ab sequences), while n should be 2. This means that each code part #1, #2, #3, #4, #5 differs from another code part #1, #2, #3, #4, #5 by at least two nucleotides.

[0092] FIGS. 3A, 3B and 3C show three different qPCR matrices which were obtained after a qPCR using a DNA barcode having three coding regions I, II, III as template and 20 different primary primers A and 20 different primary primers B for binding to coding region I (see matrix QPCR with A+B in FIG. 3A, columns=different primers A, lines=different primers B), 20 different primary primers C and 20 different primary primers D for coding region II (see matrix Q-PCR with C+D in FIG. 3B, columns=different primers C, lines=different primers D) and 20 different primary primers E and 20 different primary primers F for coding region III (see matrix Q-PCR with E+F in FIG. 3C, columns=different primers E, lines=different primers F). An exemplary result of the matrix is illustrated in the table E+F in FIG. 3D which the primer pairs with the strongest amplification signal are listed together with their obtained (normalized) amplification signal. It can be derived from said table that the strong amplification signals have been obtained with the primer pairs E3 and F3 (25%), E18 and F11 (20%), E3 and F11 (15%), E11 and F3 (15%) and E11 and F17 (15%) and a medium amplification signal has been obtained with the primer pair E3 and F17 (10%). Below the table E+F in FIG. 3D, the obtained result is also shown in a column diagram. It can be derived from the obtained result that DNA-substance conjugates to which e.g. the primer pair E3 and F3 binds had a high concentration in the DNA-encoded library (after the enrichment experiment) and DNA-substance conjugates to which e.g. the primer pair E3 and F17 binds had a lower concentration in the DNA-encoded library (after the enrichment experiment). It may also be concluded that DNA-substance conjugates to which primer pairs with no signal (e.g. E1 and F1) bind were not present in the DNA-encoded library (after the enrichment experiment). Since the substance connected to each specific DNA coding region is known, this approach allows a fast and sensitive identification of substances being present at a high concentration after a (DEL) selection experiment.

[0093] FIG. 4 shows the results of the qPCR matrices Q-PCR with A+B (amplification of coding region I) and Q-PCR with C+D (amplification of coding region II) from FIG. 3 and also the results of a secondary PCR with primer pairs A and D (coding region between I and II; see FIG. 2). qPCR with the primer pair A+D gave strong amplification signals for the specific primer pairs A11 and D15 (20%), A11 and D2 (15%), A2 and D18 (15%) and medium amplification signals for the primer pairs A2 and D8 (10%), A11 and D18 (10%), A17 and D18 (10%), A17 and D15 (10%) and A17 and D8 (10%). Each of said identified A primers binds to a specific coding region I and each of said identified D primers binds to a specific coding region II. This means that the primers A and D which gave strong signals code for coding regions I and II which must have been enriched in the DNA-encoded library after the enrichment experiment. It can also be concluded that the two coding regions I and II must be located on one single DNA strand because otherwise, no amplification signal would have been obtained. In order to combine the result obtained with the primer pair A and D with the result of the other primer pairs A and B and C and D, the mathematical product of the value obtained for each specific primer pair is calculated by the equation Value.sub.coding region I-II=Value.sub.matrix-A+D.Math.Value.sub.matrix-A+B.Math.Value.sub.matrix-C+D. Specific primers A, B, C and D which resulted in a high amplification signal consequently have a high Value.sub.coding region I-II. Thus, the obtained Value.sub.coding region I-II allows the identification of primers which must have bound to abundant DNA-barcodes and thus allows the identification of substances (bound to the DNA-barcode) which were abundantly present after the (DEL) selection experiment.

[0094] FIG. 5 shows a plot of 36 different combinations of coding regions I (A+B) and II (C+D) which gave the highest mathematical product according to equation Value.sub.coding region I-II=Value.sub.matrix-A+D.Math.Value.sub.matrix-A+B.Math.Value.sub.matrix-C+D (see absolute value in arbitrary unit on y-axis). In said plot, it can be visually identified that the combinations of coding region I and II with the numbers 2, 4, 23, 29 and 32 on the x-axis achieved the highest score. These numbers refer to the following five different combinations of coding regions I and II: A11B4-C19D2 (no. 2), A11B17-C 19D2 (no. 4), A2B4-C10-D18 (no. 23), A11B4-C19D15 (no. 29) and A11B17-C19D15 (no. 32). Naturally, it is known for which substances (or plurality of substances) these five different combinations encode. Thus, it is possible to identify five different (groups of) substances which have been strongly enriched in a (DEL) selection experiment.

[0095] FIG. 6 shows a part of qPCR matrices obtained after a qPCR using a DNA barcode having three coding regions I, II, III as template, 20 different primary primers A and B for coding region I, 20 different primary primers C and D for coding region II, the (same) 20 different primary primers A and D for coding region I to II and 20 different primary primers E and F for coding region III. After having calculated the Value.sub.coding region I-II=Value.sub.matrix-A+D.Math.Value.sub.matrix-A+B.Math.Value.sub.matrix-C+D, it has become clear that significant values are obtained for coding regions I-II coded by the nine primer pairings A17B17-C1D15, A2B4-C10D18, A11B4-C1D15, A11B4-C 19D15, A11B4-C19D2, A11B17-C1D15, A11B17-C19D15, A11B17-C10D18 and A11B17-C 19D2. The highest value for coding region III has been determined as well by equation Value.sub.coding region III=Value.sub.matrix-F+E and it has been found that high values for coding region III are obtained by the five primer pairings F3E3, F11E18, F3E11, F11E3 and F17E11. If the nine coding regions I to II identified above encode a first group of nine different substances and the five coding regions III identified above encode a second group of five different substances, it follows that the combination of the substances of the first group and second group must have been present at a high concentration before the qPCR experiment, i.e. must have been strongly enriched by the (DEL) selection experiment.

[0096] FIG. 7 shows a decoding process for a medium DEL having 306 compounds, each tagged with a DNA barcode. Before and after a DEL selection experiment, a primary qPCR was conducted with the primer pairs E and F, with the primer pairs E.sub.xe and F, with the primer pairs E and F.sub.xf, with the primer pairs E.sub.xe1 and F.sub.xf11 and with the primer pairs E.sub.xe3 and F.sub.xf17. The obtained C.sub.q values before selection are shown in the left matrix, the obtained C.sub.q valued after selection are shown in the middle matrix and the C.sub.q values are shown in the right matrix in FIG. 7. A Cq value of a primer pair which is below the Cq value of the primer pair E and F indicates an enrichment of the DNA-substance conjugate. As can be seen in the C.sub.q-matrix, subcoding region E-F.sub.xf11 has C.sub.q-value of 10.0 which is below the C.sub.q-value of 13.1 for subcoding region E-F (i.e. below the control). This means that the subcoding region E-F.sub.xf11 has been enriched. The same is true for the subcoding region E.sub.xe1-F with its C.sub.q-value of 10.0 being below the C.sub.q-value of 13.1 for subcoding region E-F (i.e. being below the control). Thus, the results of the primary qPCR indicate that after the DEL selection experiment, substance(s) encoded by the subcoding regions to which primers E.sub.xe1 and F.sub.xf11 bind were enriched more strongly than substances encoded by the subcoding regions to which primers E.sub.xe3 and F.sub.xf17 bind. Additionally, for confirmation of said data, a secondary qPCR was conducted with the primer pair E.sub.xe and F.sub.xf. Said secondary qPCR confirmed that the subcoding region E.sub.xe1-F.sub.xf11 is enriched more strongly than the subcoding region E.sub.xe3-F.sub.xf17 (see matrix C.sub.q in FIG. 7: value in field of column E.sub.xe1 and row F.sub.xf11 is much lower than value in field of column E.sub.xe3 and row F.sub.xf17 and much lower than the value in the field of column E and row F, which is the control). In summary, both the primary and secondary qPCR demonstrate that the substance(s) connected to the E.sub.xe1-F.sub.xf11 subcoding region must have been strongly enriched after the DEL selection experiment.

[0097] FIG. 8 shows an example of a large DEL having 4.sup.10 compounds, each tagged with a DNA barcode. The libraries were generated by partially degenerated synthesis of DNA. FIG. 8 illustrates the setup for conducting a primary PCR with one (constant) primer E and various different primers F.sub.n, wherein each primer Fn codes for a certain subgroup of the library, specifically .sup.n compounds of the library (having 4.sup.10 compounds in total), wherein n is an integer from 0 to 5. This means that if six primers F are used, the first primer F.sub.o codes for .sup.0 of all compounds of the library, i.e. all compounds of the library (=4.sup.10=1048576 compounds), the primer F.sub.1 only codes for of all compounds of the library (=262144 compounds), the primer F.sub.2 only codes for 1/16 of all compounds of the library (=65536 compounds), the primer F.sub.3 only codes for 1/64 of all compounds of the library (=16384 compounds), the primer F.sub.4 only codes for 1/256 of all compounds of the library (=4096 compounds) and the primer F.sub.5 only codes for 1/1024 of all compounds of the library (=1024 compounds). This means that after the qPCR has been performed, the group of encoded substances which have been selected in the DEL experiment can be significantly narrowed because encoded substances which are not amplifiable with a primer F.sub.n, wherein n is 1 to 5, give no signal in qPCR. For example, if the combination of primer E and primer F.sub.1 fails to provide a signal in primary qPCR, it is clear that of all compounds, i.e. 786432 compounds of 1048576 compounds, are not amplifiable by said qPCR and thus were not enriched by the DEL selection experiment preceding the primary qPCR. Thus, there are only of all compounds (=262144 compounds) of the DEL library remaining coming into question for having been enriched in the (DEL) selection experiment. FIG. 9 shows another example of a very large DEL having 4.sup.20 compounds, each tagged with a DNA barcode. The principal procedure is the same like the one disclosed in FIG. 8 for a DEL having 4.sup.10 compounds. However, due to the larger size of the DEL, it is beneficial if the primary PCR is carried out with more than five different primers F. Specifically, it is beneficial if n is an integer from 0 to 10 in this case. This means that if eleven primers F are used, the first primer F.sub.o codes for all compounds of the library and the last primer F.sub.10 only codes for .sup.10 of all compounds of the library (=1048576 compounds). This means that after the qPCR has been performed, the group of encoded substances which have been selected in the DEL experiment has been significantly narrowedFor example, a DNA barcode of a DNA-substance conjugate which give no signal with the primer F.sub.1 in qPCR means that said DNA barcode belongs to a group of of 4.sup.20 compounds (8.2.Math.10.sup.11 compounds) which have not been enriched in the DEL enrichment experiment. Thus, the group of relevant enriched DNA-substance conjugates has been narrowed to of 4.sup.20 compounds (2.7.Math.10.sup.11 compounds). With each primer F.sub.n increasing from n=1 to n=10, the group of relevant compounds is further narrowed. An amplification signal turning up with primer F.sub.10 means that the DNA-substance conjugate is within a subgroup of 4.sup.10 (1.Math.10.sup.6 compounds) of 4.sup.20 compounds (1.1.Math.10.sup.12 compounds) in total.

[0098] FIG. 10 shows the identification of the substance 4-carboxybenzenesulfonamide (in the following: CBS) after a DNA-CBS-conjugate within a DNA-encoded library (DEL) has been enriched by selection with the enzyme carbonic anhydrase II. A small qPCR matrix has been built using three primers 1a, 2a, 3a pairing with the three primers 1b, 2b, 3b. The primer pair 1a, 1b anneals to all DNA-barcodes of the DNA-substance conjugates of the library and thus has the potential to amplify DNA barcodes of the entire library. Primer 2a covers a sub-library containing DNA-CBS-conjugate and primer 2b covers another sub-library containing DNA-CBS-conjugate. The combination of primers 2a and 2b can be assigned exclusively to CBS. Primer 3a covers a sub-library containing theobromine (in the following: Theo) conjugated to a DNA barcode (=DNA-Theo-conjugate), primer 3b covers another sub-library containing the DNA-Theo-conjugate. The combination of primers 3a and 3b can be assigned exclusively to Theo. Cq is the difference in qPCR cycle before and after selection. A small number reflects large enrichment. Cq(1a-1b)>Cq(2a-1b)Cq(1a-2b)>Cq(2a-2b) indicated that CBS is remarkably enhanced. Cq(1a-1b)<Cq(3a-1b)Cq(1a-3b)=Cq(3a-3b) indicated that Theo is not enriched.

[0099] FIG. 11 shows a DNA-substance conjugate in which a first substance S is conjugated chemically covalently to a first coding region DNA sequence I and second coding region DNA sequence and in which a second substance S is conjugated chemically covalently to a third coding region DNA sequence III. Each coding region DNA sequence I, II, III has a first part #1 and a third part #3 to which certain primers can bind (i.e. anneal during qPCR). Primer P2 (5-gctgttccca cattgcgt-3, SEQ-ID Nr. 1) binds to the first part #1 of first coding region DNA sequence I, primer P2Y (5-ccttctggat tcggtcggag caccatc-3, SEQ-ID Nr. 2) binds to the third part #3 of first coding region DNA sequence I, primer P2Y (5-gatggtgctc cgaccgaatc cagaagg-3, SEQ-ID Nr. 3) binds to the first part #1 of second coding region DNA sequence II, primer P1Y (5-ggaggtgtag acgacagagt atttgactgt cagg-3, SEQ-ID Nr. 4) binds to the third part #3 of second coding region DNA sequence II, primer P4 (5-cagatcgagc aactccac-3, SEQ-ID Nr. 5) binds to the first part #1 of third coding region DNA sequence III and primer P5 (5-tggtctcagc cgccctat-3, SEQ-ID Nr. 6) binds to the third part #3 of third coding region DNA sequence III. If substance S has been enriched after a selection experiment with the DNA-encoded library, amplification with primer pair P2 and P2Y, primer pair P2Y and P1Y and primer pair P4 and P5 each gives a strong amplification signal in qPCR which allows identification of substance S.

EXAMPLE 1DEL COMPRISING DNA BARCODES WITH ONE SINGLE CODING REGION

[0100] For DNA codes containing only one single coding region, each code has 3 parts, #1 (first part), #2 (second part) and #3 (third part). Each #2 sequence is a unique code, while each combination of #1 and #3 can also represent a unique code (see e.g. FIG. 1A). Therefore, a sequence of #2 is corresponding to a combination of #1 and #3.

[0101] For each part, there is a minimal difference number n between any pair of sequences (e.g. between two different #1 sequences), while n should be 2.

EXAMPLE 2DEL COMPRISING DNA BARCODES WITH TWO CODING REGIONS

[0102] For DNA codes containing two coding regions, each sub-code has 4 parts, for example the first coding region #1 (first part), #2 (second part), #3 (third part) and #4 (fourth part) and the second coding region #1 (first part), #2 (second part), #3 (third part) and #4 (fourth part) (see e.g. FIG. 1B). Each #2 (second part) sequence is a unique sub-code, while each combination of #1, #3 and #4 is corresponding to #2 and each combination of #1, #3 and #4 is corresponding to #2.

[0103] For each part, there is a minimal difference number n between any pair of sequences (e.g. between two different #1 sequences), while n should be 2.

EXAMPLE 3DEL COMPRISING DNA BARCODES WITH MORE THAN TWO CODING REGIONS

[0104] For DNA codes containing more than two coding regions (see e.g. FIG. 1B or FIG. 2), two sub-codes at both ends are designed according Example 2, and the sub-code(s) in between is/are designed according to either Example 1, Example 2 or Example 4. It is very unlikely that a high quality DEL can be synthesized using the split-and-pool method. Therefore, a DEL containing less than 4 sub-codes is favorable.

EXAMPLE 4DEL COMPRISING DNA BARCODES WITH 5 SUB-CODES

[0105] The DNA barcodes of this DEL have 5 parts, #1 (first part), #2 (second part), #3 (third part), #4 (fourth part) and #5 (fifth part).

[0106] Each #2 sequence (second part) is a unique sub-code, while each combination of #1 and #3 can also represent a unique sub-code. Therefore, a sequence of #2 is corresponding to a combination of #1 and #3. Each combination of #1 and #4 can also represent a unique sub-code. Therefore, a sequence of #2 is corresponding to a combination of #1 and #4. Each combination of #1 and #5 can also represent a unique sub-code. Therefore, a sequence of #2 is corresponding to a combination of #1 and #5.

[0107] For each part, there is a minimal difference number n between any pair of sequences (e.g. between two different #1 sequences), while n should be 2 in all designs.

EXAMPLE 5DESCRIPTION OF DECODING PROCESS: DECODING ONE-(SUB)-CODE

[0108] A primary qPCR matrix is built for the first coding region I using primer A with u different primers B-xb, and primer B with v different primers A-xa. Therefore, the size of resulting matrix is u.Math.v (see e.g. FIG. 3). Same matrices can also be built for the second coding region II and the third coding region III.

[0109] A secondary qPCR matrix is built for the first coding region I using pairs of B-xb and A-xa, while B-xb and A-xa are chosen according to the signal intensity in the primary matrix. Same secondary matrices can be built for the second coding region II and the third coding region III. The ranking for each building block can thus be concluded.

[0110] For sequence containing two sub-codes, an additional secondary qPCR matrix can be built using A-xa and D-xd, while A-xa and D-xd are chosen according to the signal intensity in the primary matrices.

[0111] In combination with the two sub-code matrices (A-xa+B-xb and C-xc+D-xd), the ranking of the combinations can be concluded based on certain algorithm, for example:

Value.sup.i=Value.sup.i.sub.matrix-A+D.Math.Value.sup.i.sub.matrix-A+B.Math.Value.sup.i.sub.matrix-C+D,

[0112] wherein the Value.sup.i is a value relating and being proportional to the amount of a certain DNA barcode in the DEL. In other words, said Value.sup.i relates to an individual DNA sequence (barcode structure) which resulted from the combinatorial synthesis through joining two building blocks and two sub-codes.

[0113] To further validate the Value.sup.i ranking, an additional tertiary qPCR matrix can be built using A-xa-ya and D-xd-yd, while A-xa-ya and D-xd-yd are chosen according to the signal intensity in the primary and secondary matrices and the resulting Value.sup.i ranking.

[0114] A full matrix can also be built using A, D and all A-xa-ya and D-xd-yd, though it will be significantly more expensive than the method described before.

[0115] The method cannot provide a fully quantitative decoding solution for DEL containing more than two sub-codes. However, combining various primary, secondary, and tertiary rtPCR matrices can provide a Value.sup.i for certain compounds i, which is corresponding to a DNA code containing several sub-codes. All forward and backward primers can be combined to build a matrix.

[0116] For example, any primer A, A-xa, A-xa-ya can be combined with any primer B, B-xb, N, N-yn, N-xn-yn, D, D-xd, D-xd-yd to build QPCR matrices. A value for a particular compound can be calculated according to certain algorithm, for example:

Value.sup.i=log.sub.10(Value.sup.i.sub.matrix-A+D.Math.Value.sup.i.sub.matrix-A+N.Math.Value.sup.i.sub.matrix-M+D.Math.Value.sup.i.sub.matrix-A+B.Math.Value.sup.i.sub.matrix-C+D.Math.Value.sup.i.sub.matrix-M+N)

[0117] in which the Value.sup.i.sub.matrix-A+D, Value.sup.i.sub.matrix-A+N and Value.sup.i.sub.matrix-M+D can be either from the secondary, or tertiary matrices, or as a combination of them, and in which the Value.sup.i.sub.matrix-A+B.Math.Value.sup.i.sub.matrix-C+D.Math.Value.sup.i.sub.matrix-M+N are from the secondary matrices.

LIST OF REFERENCE SIGNS

[0118] DBC: DNA barcode sequence; [0119] S: substance; [0120] I: first coding region DNA sequence; [0121] II: second coding region DNA sequence; [0122] II: third coding region DNA sequence; [0123] #1: first part of a coding region DNA sequence; [0124] #2: second part of a coding region DNA sequence; [0125] #3: third part of a coding region DNA sequence; [0126] #4: fourth part of a coding region DNA sequence; [0127] #5: fifth part of a coding region DNA sequence; [0128] A, B, C, D, E, F, M, N: primary primer; [0129] A.sub.xa, B.sub.xb, C.sub.xc, D.sub.xd, E.sub.xe, F.sub.xf, M.sub.xm, N.sub.yn: secondary primer; [0130] A.sub.xaya, D.sub.xdyd, M.sub.xmym, N.sub.xnyn: tertiary primer; [0131] 1a, 1b: primary primer binding to all DBS; [0132] 2a, 2b: secondary primer binding to DBS of CBS only; [0133] 3a, 3b: secondary primer binding to Theo only; [0134] P2: primer annealing to first part #1 of coding region I; [0135] P2Y: primer annealing to third part #3 of coding region I; [0136] P2Y: primer annealing to first part #1 of coding region II; [0137] P1Y: primer annealing to third part #3 of coding region II; [0138] P4: primer annealing to first part #1 of coding region III; [0139] P5: primer annealing to third part #3 of coding region III.

METHOD FOR PROVIDING A DNA-ENCODED LIBRARY, DNA-ENCODED LIBRARY AND METHOD OF DECODING A DNA-ENCODED LIBRARY

Assignee

Inventors

Cpc classification

Classification Explorer

C12N15/1065

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2563/179

CHEMISTRY; METALLURGY

Classification Explorer

C40B50/10

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1093

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2563/179

CHEMISTRY; METALLURGY

Classification Explorer

C40B70/00

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1093

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1089

CHEMISTRY; METALLURGY

Classification Explorer

C40B40/08

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12N15/10

CHEMISTRY; METALLURGY

Classification Explorer

C40B50/10

CHEMISTRY; METALLURGY

Classification Explorer

C40B70/00

CHEMISTRY; METALLURGY

Classification Explorer

C40B40/08

CHEMISTRY; METALLURGY

Abstract

Claims

Description