Method for Constructing Antibody Complementarity Determining Region Library
20230027822 · 2023-01-26
Inventors
- Xia Sheng (Nanjing, Jiangsu, CN)
- Cheng-Hsien Wu (Nanjing, Jiangsu, CN)
- Yikai Qiu (Nanjing, Jiangsu, CN)
Cpc classification
C40B40/10
CHEMISTRY; METALLURGY
C07K16/22
CHEMISTRY; METALLURGY
G01N33/53
PHYSICS
G01N33/15
PHYSICS
C12N15/1068
CHEMISTRY; METALLURGY
C12N15/70
CHEMISTRY; METALLURGY
C12N15/1089
CHEMISTRY; METALLURGY
C07K16/00
CHEMISTRY; METALLURGY
International classification
C12N15/10
CHEMISTRY; METALLURGY
C07K16/22
CHEMISTRY; METALLURGY
C12N15/70
CHEMISTRY; METALLURGY
Abstract
Disclosed are a method and a device for constructing an antibody complementarity determining region (CDR) library. Also disclosed are a method, a device and a computer program product for determining the occurrence frequency of member sequences of an antibody CDR library, by means of which an antibody CDR library with a specific amino acid distribution at one or more positions can be obtained.
Claims
1. A method for generating a primary CDR amino acid sequence library, comprising the steps of: (1) determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences; (2) determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence; and (3) randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein (3.1) when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein (3.1.1) when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or (3.1.2) when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or (3.2) when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences.
2. The method of claim 1, further comprising generating a CDR nucleotide sequence library from the primary CDR amino acid sequence library by: reverse-translating all the CDR amino acid sequences in the primary CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library.
3. (canceled)
4. The method of claim 2, further comprising generating a CDR peptide library from the CDR nucleotide sequence library by: (5) synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating a CDR nucleic acid library; and (6) expressing all the CDR nucleic acids in the CDR nucleic acid library in an expression system, thereby generating the CDR peptide library.
5. The method of claim 1, further comprising generating a CDR peptide library from the CDR amino acid sequence library by: (4) synthesizing CDR peptides according to all the CDR amino acid sequences in the CDR amino acid sequence library, thereby generating the CDR peptide library.
6. (canceled)
7. The method of claim 1, further comprising generating a large CDR amino acid sequence library from a primary CDR amino acid sequence library, comprising the steps of: (4) determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library; and (5) according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library, and thereby generating the large CDR amino acid sequence library.
8-11. (canceled)
12. The method of claim 1, wherein the predetermined capacity of the primary CDR amino acid sequence library is 1,000 to 10,000 amino acid sequences.
13. The method of claim 7, wherein the predetermined capacity of the secondary CDR amino acid sequence library is 10 to 1000 times the predetermined capacity of the primary CDR amino acid sequence library.
14. The method of claim 1, wherein the length of the CDR amino acid sequence is 3 to 10 amino acid residues, and wherein the CDR amino acid sequence comprises 3 to 10 variable positions.
15. (canceled)
16. The method of claim 1, wherein the initial set of alternative CDR amino acid sequences comprises 10 to 1000 allowable CDR amino acid sequences.
17. (canceled)
18. A device for generating a CDR nucleotide sequence library comprising the following apparatuses: an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, and/or a predetermined capacity of the CDR amino acid sequence library; a processing apparatus, which is configured to be used for performing the operations of: (1) determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences; (2) determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence; (3) randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein (3.1) when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein (3.1.1) when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or (3.1.2) when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or (3.2) when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences; and (4) reverse-translate all the CDR amino acid sequences in the primary CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library; and an output apparatus, which is configured for outputting the CDR nucleotide sequence library.
19. (canceled)
20. The device of claim 18, wherein the device is further configured for generating a CDR peptide library from the CDR nucleotide sequence library, wherein the device further comprising the following apparatus: a nucleic acid synthesis apparatus, which is configured for synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating a CDR nucleic acid library; and a peptide expression apparatus, which is configured for expressing all the CDR nucleic acids in the CDR nucleic acid library in an expression system, thereby generating the CDR peptide library.
21. The device of claim 18, wherein the device is further configured for generating a CDR peptide library from the primary CDR amino acid sequence library, wherein the device further comprises the following apparatus: a peptide synthesis apparatus, which is configured for synthesizing CDR peptides according to all the CDR amino acid sequences in the primary CDR amino acid sequence library, thereby generating the CDR peptide library.
22. (canceled)
23. The device of claim 18, wherein the device is further configured for generating a large CDR amino acid sequence library from the primary CDR amino acid sequence library by: an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, a predetermined capacity of the primary CDR amino acid sequence library, and/or a predetermined capacity of a secondary CDR amino acid sequence library; the processing apparatus, which is further configured to (4) determine an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library; and (5) according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly select a sequence from the initial set of alternative CDR amino acid sequences and add the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library, and thereby generating the large CDR amino acid sequence library; and the output apparatus, which is further configured for outputting the large CDR amino acid sequence library.
24-27. (canceled)
28. The device of claim 18, wherein the predetermined capacity of the primary CDR amino acid sequence library is 1,000 to 10,000 amino acid sequences.
29. The device of claim 23, wherein the predetermined capacity of the secondary CDR amino acid sequence library is 10 to 1000 times the predetermined capacity of the primary CDR amino acid sequence library.
30. The device of claim 18, wherein the length of the CDR amino acid sequence is 3 to 10 amino acid residues, and wherein the CDR amino acid sequence comprises 3 to 10 variable positions.
31. (canceled)
32. The device of claim 18, wherein the initial set of alternative CDR amino acid sequences comprises 10 to 1000 allowable CDR amino acid sequences.
33. The device of claim 18, further comprising a storage apparatus, which is configured to store an algorithm for performing the operations.
34. A computer program product, comprising a computer program instruction for operating the device of claim 18.
35. A storage apparatus for storing the computer program product of claim 34.
Description
DETAILED DESCRIPTION OF EMBODIMENTS
[0014] In a first aspect, the present invention relates to a method for generating a CDR amino acid sequence library, comprising the steps of:
[0015] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0016] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence; and
[0017] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0018] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0019] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0020] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0021] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences.
[0022] In a second aspect, the present invention relates to a method for generating a CDR nucleotide sequence library, comprising the steps of:
[0023] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0024] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0025] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0026] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0027] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0028] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0029] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences; and
[0030] 4. reverse-translating all the CDR amino acid sequences in the CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library.
[0031] In a third aspect, the present invention relates to a method for generating a CDR nucleic acid library, comprising the steps of:
[0032] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0033] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0034] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0035] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0036] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0037] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0038] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0039] 4. reverse-translating all the CDR amino acid sequences in the CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library; and
[0040] 5. synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating the CDR nucleic acid library.
[0041] In a fourth aspect, the present invention relates to a method for generating a CDR peptide library, comprising the steps of:
[0042] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0043] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0044] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0045] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0046] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0047] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0048] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0049] 4. reverse-translating all the CDR amino acid sequences in the CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library;
[0050] 5. synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating the CDR nucleic acid library; and
[0051] 6. expressing all the CDR nucleic acids in the CDR nucleic acid library in an expression system, thereby generating the CDR peptide library.
[0052] In a fifth aspect, the present invention relates to a method for generating a CDR peptide library, comprising the steps of:
[0053] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0054] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0055] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0056] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0057] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0058] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0059] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences; and
[0060] 4. synthesizing CDR peptides according to all the CDR amino acid sequences in the CDR amino acid sequence library, thereby generating the CDR peptide library.
[0061] In a sixth aspect, the present invention relates to a device for generating a CDR amino acid sequence library, comprising the following apparatus: [0062] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, and/or a predetermined capacity of the CDR amino acid sequence library; [0063] a processing apparatus, which is configured to be used for performing the operations of:
[0064] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0065] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0066] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0067] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0068] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0069] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0070] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences; and [0071] an output apparatus, which is configured for outputting the CDR amino acid sequence library.
[0072] In a seventh aspect, the present invention relates to a device for generating a CDR nucleotide sequence library, comprising the following apparatus: [0073] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, and/or a predetermined capacity of the CDR amino acid sequence library; [0074] a processing apparatus, which is configured to be used for performing the operations of:
[0075] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0076] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0077] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0078] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0079] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0080] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0081] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences; and
[0082] 4. reverse-translating all the CDR amino acid sequences in the CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library; and [0083] an output apparatus, which is configured for outputting the CDR nucleotide sequence library.
[0084] In an eighth aspect, the present invention relates to a device for generating a CDR nucleic acid library, comprising the following apparatus: [0085] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, and/or a predetermined capacity of the CDR amino acid sequence library; [0086] a processing apparatus, which is configured to be used for performing the operations of:
[0087] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0088] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0089] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0090] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0091] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0092] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0093] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences; and
[0094] 4. reverse-translating all the CDR amino acid sequences in the CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library; [0095] an output apparatus, which is configured for outputting the CDR nucleotide sequence library; and [0096] a nucleic acid synthesis apparatus, which is configured for synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating the CDR nucleic acid library.
[0097] In a ninth aspect, the present invention relates to a device for generating a CDR peptide library, comprising the following apparatus: [0098] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, and/or a predetermined capacity of the CDR amino acid sequence library; [0099] a processing apparatus, which is configured to be used for performing the operations of:
[0100] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0101] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0102] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0103] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0104] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0105] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0106] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences; and
[0107] 4. reverse-translating all the CDR amino acid sequences in the CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library; [0108] an output apparatus, which is configured for outputting the CDR nucleotide sequence library; [0109] a nucleic acid synthesis apparatus, which is configured for synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating the CDR nucleic acid library; and [0110] a peptide expression apparatus, which is configured for expressing all the CDR nucleic acids in the CDR nucleic acid library in an expression system, thereby generating the CDR peptide library.
[0111] In a tenth aspect, the present invention relates to a device for generating a CDR peptide library, comprising the following apparatus: [0112] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, and/or a predetermined capacity of the CDR amino acid sequence library; [0113] a processing apparatus, which is configured to be used for performing the operations of:
[0114] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0115] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in the CDR amino acid sequence library according to a predetermined capacity of the CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence; and
[0116] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the CDR amino acid sequence library and cycling same, until the cumulative number of CDR amino acid sequences added to the CDR amino acid sequence library reaches the predetermined capacity of the CDR amino acid sequence library, thereby generating the CDR amino acid sequence library, wherein [0117] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0118] 3.1.1 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the CDR amino acid sequence library, or [0119] 3.1.2 when adding the selected CDR amino acid sequence to the CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0120] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences; [0121] an output apparatus, which is configured for outputting the CDR amino acid sequence library; and [0122] a peptide synthesis apparatus, which is configured for synthesizing CDR peptides according to all the CDR amino acid sequences in the CDR amino acid sequence library, thereby generating the CDR peptide library.
[0123] In one embodiment, the predetermined capacity of the CDR amino acid sequence library is 1,000 to 100,000 amino acid sequences, for example 1,000 to 90,000, 1,000 to 80,000, 1,000 to 75,000, 1,000 to 70,000, 1,000 to 60,000, 1,000 to 50,000, 1,000 to 40,000, 1,000 to 30,000, 1,000 to 25,000, 1,000 to 20,000, 1,000 to 10,000, 2,000 to 100,000, 2,500 to 100,000, 3,000 to 100,000, 4,000 to 100,000, 5,000 to 100,000, 6,000 to 100,000, 7,000 to 100,000, 7,500 to 100,000, 8,000 to 100,000, 9,000 to 100,000, or 10,000 to 100,000 amino acid sequences, for example 1,000, 2,000, 2,500, 3,000, 4,000, 5,000, 6,000, 7,000, 7,500, 8,000, 9,000, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 60,000, 70,000, 75,000, 80,000, 90,000, 100,000 amino acid sequences.
[0124] In an eleventh aspect, the present invention relates to a method for generating a large CDR amino acid sequence library, comprising the steps of:
[0125] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0126] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0127] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0128] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0129] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0130] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0131] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0132] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library; and
[0133] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library, and thereby generating the large CDR amino acid sequence library.
[0134] In a twelfth aspect, the present invention relates to a method for generating a large CDR nucleotide sequence library, comprising the steps of:
[0135] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0136] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0137] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0138] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0139] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0140] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0141] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0142] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library;
[0143] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library; and
[0144] 6. reverse-translating all the CDR amino acid sequences in the secondary CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the large CDR nucleotide sequence library.
[0145] In a thirteenth aspect, the present invention relates to a method for generating a large CDR nucleic acid library, comprising the steps of:
[0146] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0147] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0148] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0149] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0150] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0151] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0152] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0153] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library;
[0154] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library;
[0155] 6. reverse-translating all the CDR amino acid sequences in the secondary CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library; and
[0156] 7. synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating the large CDR nucleic acid library.
[0157] In a fourteenth aspect, the present invention relates to a method for generating a large CDR peptide library, comprising the steps of:
[0158] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0159] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0160] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0161] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0162] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0163] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0164] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0165] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library;
[0166] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library;
[0167] 6. reverse-translating all the CDR amino acid sequences in the secondary CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library;
[0168] 7. synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating the CDR nucleic acid library; and
[0169] 8. expressing all the CDR nucleic acids in the CDR nucleic acid library in an expression system, thereby generating the large CDR peptide library.
[0170] In a fifteenth aspect, the present invention relates to a method for generating a large CDR peptide library, comprising the steps of:
[0171] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0172] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0173] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0174] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0175] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0176] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0177] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0178] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library;
[0179] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library; and
[0180] 6. synthesizing CDR peptides according to all the CDR amino acid sequences in the CDR amino acid sequence library, thereby generating the large CDR peptide library.
[0181] In a sixteenth aspect, the present invention relates to a device for generating a large CDR amino acid sequence library, comprising the following apparatus: [0182] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, a predetermined capacity of a primary CDR amino acid sequence library, and/or a predetermined capacity of a secondary CDR amino acid sequence library; [0183] a processing apparatus, which is configured to be used for performing the operations of:
[0184] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0185] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0186] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0187] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0188] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0189] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0190] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0191] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library; and
[0192] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library, and thereby generating the large CDR amino acid sequence library; and [0193] an output apparatus, which is configured for outputting the large CDR amino acid sequence library.
[0194] n a seventeenth aspect, the present invention relates to a device for generating a large CDR nucleotide sequence library, comprising the following apparatus: [0195] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, a predetermined capacity of a primary CDR amino acid sequence library, and/or a predetermined capacity of a secondary CDR amino acid sequence library; [0196] a processing apparatus, which is configured to be used for performing the operations of:
[0197] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0198] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0199] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0200] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0201] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0202] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0203] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0204] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library;
[0205] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library; and
[0206] 6. reverse-translating all the CDR amino acid sequences in the secondary CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the large CDR nucleotide sequence library; and [0207] an output apparatus, which is configured for outputting the large CDR nucleotide sequence library.
[0208] In an eighteenth aspect, the present invention relates to a device for generating a large CDR nucleic acid library, comprising the following apparatus: [0209] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, a predetermined capacity of a primary CDR amino acid sequence library, and/or a predetermined capacity of a secondary CDR amino acid sequence library; [0210] a processing apparatus, which is configured to be used for performing the operations of:
[0211] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0212] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0213] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0214] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0215] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0216] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0217] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0218] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library;
[0219] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library; and
[0220] 6. reverse-translating all the CDR amino acid sequences in the secondary CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library; [0221] an output apparatus, which is configured for outputting the CDR nucleotide sequence library; and [0222] a nucleic acid synthesis apparatus, which is configured for synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating the large CDR nucleic acid library.
[0223] In a nineteenth aspect, the present invention relates to a device for generating a large CDR peptide library, comprising the following apparatus: [0224] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, a predetermined capacity of a primary CDR amino acid sequence library, and/or a predetermined capacity of a secondary CDR amino acid sequence library; [0225] a processing apparatus, which is configured to be used for performing the operations of:
[0226] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0227] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0228] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0229] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0230] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0231] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0232] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0233] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library;
[0234] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library; and
[0235] 6. reverse-translating all the CDR amino acid sequences in the secondary CDR amino acid sequence library into CDR nucleotide sequences, thereby generating the CDR nucleotide sequence library; [0236] an output apparatus, which is configured for outputting the CDR nucleotide sequence library; [0237] a nucleic acid synthesis apparatus, which is configured for synthesizing CDR nucleic acids according to all the CDR nucleotide sequences in the CDR nucleotide sequence library, thereby generating the CDR nucleic acid library; and [0238] a peptide expression apparatus, which is configured for expressing all the CDR nucleic acids in the CDR nucleic acid library in an expression system, thereby generating the large CDR peptide library.
[0239] In a twentieth aspect, the present invention relates to a device for generating a large CDR peptide library, comprising the following apparatus: [0240] an input apparatus, which is configured for inputting a predetermined length of a CDR amino acid sequence, a predetermined type of an allowable amino acid residue and a predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, a predetermined capacity of a primary CDR amino acid sequence library, and/or a predetermined capacity of a secondary CDR amino acid sequence library; [0241] a processing apparatus, which is configured to be used for performing the operations of:
[0242] 1. determining all allowable CDR amino acid sequences according to a predetermined type of an allowable amino acid residue at each position of a CDR amino acid sequence, and optionally according to a specified excluded sequence, to produce a set of alternative CDR amino acid sequences;
[0243] 2. determining an allowable number for each amino acid residue at each position of the CDR amino acid sequence in a primary CDR amino acid sequence library according to a predetermined capacity of the primary CDR amino acid sequence library and a predetermined occurrence frequency of each allowable amino acid residue at each position of the CDR amino acid sequence;
[0244] 3. randomly selecting a CDR amino acid sequence from the set of alternative CDR amino acid sequences and adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, and cycling same until the cumulative number of CDR amino acid sequences added to the primary CDR amino acid sequence library reaches the predetermined capacity of the primary CDR amino acid sequence library, thereby generating the primary CDR amino acid sequence library, wherein [0245] 3.1 when the set of alternative CDR amino acid sequences has not been emptied, a CDR amino acid sequence is randomly selected from the set of alternative CDR amino acid sequences, wherein [0246] 3.1.1 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library does not result in that the cumulative number of corresponding amino acid residues at any position exceeds the allowable number of the corresponding amino acid residues at the position, adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library, or [0247] 3.1.2 when adding the selected CDR amino acid sequence to the primary CDR amino acid sequence library results in that the cumulative number of corresponding amino acid residues at at least one position exceeds the allowable number of the corresponding amino acid residues at the position, removing the selected CDR amino acid sequence from the set of alternative CDR amino acid sequences, or [0248] 3.2 when the set of alternative CDR amino acid sequences has been emptied, a CDR amino acid sequence is randomly selected from the initial set of alternative CDR amino acid sequences;
[0249] 4. determining an occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library; and
[0250] 5. according to the occurrence frequency of each CDR amino acid sequence in the primary CDR amino acid sequence library, randomly selecting a sequence from the initial set of alternative CDR amino acid sequences and adding the selected sequence to a secondary CDR amino acid sequence library, until the cumulative number of CDR amino acid sequences added to the secondary CDR amino acid sequence library reaches the predetermined capacity of the secondary CDR amino acid sequence library, thereby generating the secondary CDR amino acid sequence library; [0251] an output apparatus, which is configured for outputting the secondary CDR amino acid sequence library; and [0252] a peptide expression apparatus, which is configured for synthesizing CDR peptides according to all the CDR amino acid sequences in the secondary CDR amino acid sequence library, thereby generating the large CDR peptide library.
[0253] In one embodiment, the predetermined capacity of the primary CDR amino acid sequence library is about 1,000 to 100,000 amino acid sequences, for example about 1,000 to 90,000, 1,000 to 80,000, 1,000 to 75,000, 1,000 to 70,000, 1,000 to 60,000, 1,000 to 50,000, 1,000 to 40,000, 1,000 to 30,000, 1,000 to 25,000, 1,000 to 20,000, 1,000 to 10,000, 2,000 to 100,000, 2,500 to 100,000, 3,000 to 100,000, 4,000 to 100,000, 5,000 to 100,000, 6,000 to 100,000, 7,000 to 100,000, 7,500 to 100,000, 8,000 to 100,000, 9,000 to 100,000, or 10,000 to 100,000 amino acid sequences, for example about 1,000, 2,000, 2,500, 3,000, 4,000, 5,000, 6,000, 7,000, 7,500, 8,000, 9,000, 10,000, 20,000, 25,000, 30,000, 40,000, 50,000, 60,000, 70,000, 75,000, 80,000, 90,000, 100,000 amino acid sequences.
[0254] In one embodiment, the predetermined capacity of the secondary CDR amino acid sequence library is about 1 to 10000 times or even more, for example, about 10 to 1000 times, 10 to 900 times, 10 to 800 times, 10 to 700 times, 10 to 600 times, 10 to 500 times, 10 to 400 times, 10 to 300 times, 10 to 200 times, 10 to 100 times, 10 to 90 times, 10 to 80 times, 10 to 70 times, 10 to 60 times, 10 to 50 times, 10 to 40 times, 10 to 30 times, 10 to 20 times, 20 to 1000 times, 30 to 1000 times, 40 to 1000 times, 50 to 1000 times, 60 to 1000 times, 70 to 1000 times, 80 to 1000 times, 90 to 1000 times, 100 to 1000 times, 200 to 1000 times, 300 to 1000 times, 400 to 1000 times, 500 to 1000 times, 600 to 1000 times, 700 to 1000 times, 800 to 1000 times, or 900 to 1000 times, for example, about 10, 20, 25, 30, 40, 50, 60, 70, 75, 80, 90, 100, 200, 250, 300, 400, 500, 600, 700, 750, 800, 900, or 1000 times the predetermined capacity of the primary CDR amino acid sequence library.
[0255] The device of the present invention can further comprise a storage apparatus, which is configured to store an algorithm for performing the operations.
[0256] In a twenty-first aspect, the present invention relates to a computer program product, comprising a computer program instruction, wherein when the instruction is executed by a computer, the above-mentioned method is implemented and/or the above-mentioned device is operated.
[0257] In a twenty-second aspect, the present invention relates to a storage apparatus, which stores the above-mentioned computer program product.
[0258] In one embodiment, the CDR is antibody heavy chain CDR1, CDR2 and/or CDR3, and/or light chain CDR1, CDR2 and/or CDR3. In one embodiment, the antibody is a mammalian antibody, e.g., a rodent antibody (e.g., a mouse, rat or rabbit antibody) or a primate antibody (e.g., a cynomolgus or human antibody). In one embodiment, the antibody is a human antibody, a humanized antibody, or a chimeric antibody.
[0259] The present invention can be used in, but not limited to an antibody CDR. In fact, the present invention can be used for any peptide (alternatively referred to as oligopeptide, polypeptide, protein, amino acid polymer, etc.) of interest in diversity. For example, the present invention can be used for the diversity of acting site of one or both of two molecules that interact (e.g., recognize, bind, modify, cleave, etc.), e.g., antibody-antigen, receptor-ligand and enzyme-substrate. Moreover, the present invention can also be used for other polymer molecules of interest in diversity, such as polysaccharide or nucleic acid, especially functional, non-coding nucleic acid, such as functional RNA.
[0260] In one embodiment, the predetermined length of the CDR amino acid sequence is about 3 to 20 or more amino acid residues, for example about 3 to 15, 3 to 10, 3 to 5, 5 to 20, 5 to 15, 5 to 10, 5 to 7, 10 to 20, 10 to 15, or 15 to 20 amino acid residues, for example about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 amino acid residues. The length of CDR amino acid sequences in a CDR library are generally the same. However, the length of CDR amino acid sequences in a CDR library can be different, and in this case, “deletion” is provided as an option for amino acids at one or more positions.
[0261] As mentioned above, the present invention can be used in, but not limited to CDR, or even to peptide. Therefore, the above content is also suitable for other sequences, such as nucleotide sequences. Furthermore, the present invention can also be used in, but not limited to the above-mentioned sequence length. A person skilled in the art would appreciate that the sequence length has a small effect on the implementation of the present invention, and the sequence complexity (i.e., the number of variable positions and the number of types of alternative amino acid/nucleotide residues at each variable position) has a great effect on the implementation of the present invention. In other words, the sequence of the present invention can comprise 3 to 20 or more, for example about 3 to 15, 3 to 10, 3 to 5, 5 to 20, 5 to 15, 5 to 10, 5 to 7, 10 to 20, 10 to 15, or 15 to 20, for example 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 variable positions. In this case, the full length of the sequence can be longer. The full length of a sequence is mainly affected by the efficiency of a synthesizer to synthesize the sequence. Variable positions can be completely contiguous (i.e., all variable positions are connected into one segment), completely discontinuous (i.e., any two variable positions are not connected), or neither (i.e., some but not all variable positions are connected into one or more segments, and there may also be one or more isolated variable positions).
[0262] In one embodiment, each position allows selection of about 1 to 20 common amino acid residues, e.g., about 2 to 10, 3 to 10, or 5 to 10 common amino acid residues, e.g., about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 common amino acid residues. In one embodiment, the number of types of amino acid residues allowed to be selected at each position is identical. In one embodiment, the number of types of amino acid residues allowed to be selected at each position is different. In one embodiment, the number of types of amino acid residues allowed to be selected at some positions is identical. In one embodiment, the number of types of amino acid residues allowed to be selected at some positions is different. In one embodiment, the number of types of amino acid residues allowed to be selected at all positions is identical. In one embodiment, the number of types of amino acid residues allowed to be selected at all positions is different. The present invention can be used in, but not limited to the 20 common amino acids, and can also be used in all known amino acids, especially in chemically synthesized peptide libraries.
[0263] As mentioned above, the present invention can be used in, but not limited to peptide, but can also be used in other polymer molecules. Therefore, the above content is also suitable for other building blocks such as nucleotide and monosaccharide.
[0264] In one embodiment, the (initial) set of alternative CDR amino acid sequences comprises about 10 to 1000 allowable CDR amino acid sequences, for example about 10 to 900, 10 to 800, 10 to 750, 10 to 700, 10 to 600, 10 to 500, 10 to 400, 10 to 300, 10 to 250, 10 to 200, 10 to 100, 10 to 90, 10 to 80, 10 to 75, 10 to 70, 10 to 60, 10 to 50, 10 to 40, 10 to 30, 10 to 25, 10 to 20, 20 to 1000, 25 to 1000, 30 to 1000, 40 to 1000, 50 to 1000, 60 to 1000, 70 to 1000, 75 to 1000, 80 to 1000, 90 to 1000, 100 to 1000, 200 to 1000, 250 to 1000, 300 to 1000, 400 to 1000, 500 to 1000, 600 to 1000, 700 to 1000, 750 to 1000, 800 to 1000, or 900 to 1000, for example about 10, 20, 25, 30, 40, 50, 60, 70, 75, 80, 90, 100, 200, 250, 300, 400, 500, 600, 700, 750, 800, 900, or 1000 allowable CDR amino acid sequences.
[0265] Generally, an amino acid sequence in a library is encoded by a nucleotide sequence (DNA sequence (in the case of expression using intracellular translation) or RNA sequence (in the case of expression using extracellular translation)). In this case, reverse translation is usually performed using codons that are unique or preferred (or most frequently occurring in nature) by the host cell or expression system. Alternatively, an amino acid sequence in a library can be encoded by multiple nucleotide sequences (e.g., due to codon redundancy). In this case, the capacity of the nucleotide sequence library may be larger than the capacity of the amino acid sequence library.
[0266] The methods for randomly selecting a sequence from an alternative sequence set are well known in the art. For example, the interval [0, 1] can be divided into n intervals in equal proportions according to the number n of the sequences in the alternative sequence set, and each interval corresponds to a sequence. A random number generator is then used to generate the number x {x∈R|0≤x≤1} according to the average distribution. The corresponding sequence is selected according to the subinterval to which x belongs. For another example, the choice function in random submodule of numpy module of the python software can be used, wherein parameter a is set as the set of alternative sequences.
[0267] The methods for randomly selecting a sequence from an (initial) set of alternative sequences in proportion (e.g., according to the occurrence frequency of each sequence in a primary library) are also well known in the art. For example, the interval [0, 1] can be divided into n intervals according to the number n of the sequences in the alternative sequence set, and each interval corresponds to a sequence. The size of each interval is proportional to its corresponding selection probability (i.e., the above-mentioned occurrence frequency). A random number generator is then used to generate the number x {x∈R|0≤x≤1} according to the average distribution. The corresponding sequence is selected according to the subinterval to which x belongs. For another example, the choice function in random submodule of numpy module of the python software can be used, wherein the parameter a is set as the set of alternative sequences, and parameter p is set as the selection probability of each sequence in the alternative sequence set (e.g., the occurrence frequency of each sequence in the primary library).
[0268] Methods, reagents and apparatus for the synthesis (including high-throughput synthesis) of a nucleic acid are well known in the art, such as the phosphoramidite method and B3 Synthesizer from CustomArray. Methods, reagents and apparatus for the synthesis (including high-throughput synthesis) of a peptide are well known in the art, such as the carbodiimide method and SOPHAS of Zinsser Analytic. Methods, reagents and apparatus for the expression (including high-throughput expression) of a peptide are well known in the art. The expression system may be a cell expression system or a cell-free expression system (e.g., a ribosomal expression system). The cell can be a prokaryotic or a eukaryotic cell, and can be a bacterial, fungal, plant or animal (especially mammalian) cell.
[0269] In one embodiment, the predetermined length of a CDR amino acid sequence, the predetermined type of an allowable amino acid residue and the predetermined occurrence frequency of each amino acid residue at each position of the CDR amino acid sequence, the predetermined capacity of a primary CDR amino acid sequence library, and/or the predetermined capacity of a secondary CDR amino acid sequence library can be input based on an input file (e.g., an EXCEL file). In one embodiment, a CDR amino acid sequence library, a primary and/or secondary CDR amino acid sequence library, and/or a CDR nucleotide sequence library can be output based on an output file (e.g., an EXCEL file). In one embodiment, the output file is transmitted to a nucleic acid synthesis apparatus and/or a peptide expression apparatus to generate a corresponding nucleic acid and/or peptide library.
[0270] In this context, “about” means the error range well-recognized in the art, or ±10%, 5%, 3% or 1% of the indicated value.
EXAMPLES
Example 1
[0271] In the example, a (small, simple) antibody heavy chain CDR1 library was generated, with requirements as follows: the final library comprises 10000 amino acid sequences, the length of each sequence is 5 amino acid residues, and the allowable types of amino acids at each position and the ratio thereof are as shown in Table 1.
TABLE-US-00001 TABLE 1 Amino acid distribution set in heavy chain CDR1 library H31 H32 H33 H34 H35 Ala (A) 45% Asn (N) 25% Gly (G) 55% His (H) 40% Ile (I) 60% Met (M) 40% Ser (S) 100% 35% Tyr (Y) 100%
[0272] Step 1. All possible amino acid sequences were listed as an alternative sequence set. In this example, other than the amino acid distribution shown in Table 1, there are no additional limitations. The alternative sequence set consists of 12 sequences, as shown in Table 2.
TABLE-US-00002 TABLE 2 Alternative sequence set of heavy chain CDR1 library No. Sequence 1 SYAIN 2 SYAIH 3 SYAIS 4 SYAMN 5 SYAMH 6 SYAMS 7 SYGIN 8 SYGIH 9 SYGIS 10 SYGMN 11 SYGMH 12 SYGMS
[0273] Step 2. For the library comprising 10000 sequences and having the amino acid distribution shown in Table 1, the given number of various amino acids at each position thereof was calculated, as shown in Table 3.
TABLE-US-00003 TABLE 3 Given number for each amino acid at each position in heavy chain CDR1 library H31 H32 H33 H34 H35 Ala (A) 4500 Asn (N) 2500 Gly (G) 5500 His (H) 4000 Ile (I) 6000 Met (M) 4000 Ser (S) 10000 3500 Tyr (Y) 10000
[0274] Step 3. A sequence was randomly selected from an alternative sequence set for judging whether the addition of the sequence to a library will cause the number of certain amino acids at a certain position to exceed the given number of the amino acids at the position shown in Table 3, wherein if the number of certain amino acids at a certain position does not exceed the given number, then the sequence is added to the library, whereas if the number of certain amino acids at a certain position exceeds the given number, then the sequence is not added to the library and removed from the alternative sequence set.
[0275] The total number of sequences in the library was checked, wherein if the total number reached 10000, then the selection and storage tasks were completed, whereas if the total number did not reach 10000, then the selection and storage tasks were continued.
[0276] Step 4. After the above-mentioned operations, the actual number of various sequences in the generated library is as shown in Table 4. The library size of the example is 10000 sequences, and no expansion operation is required.
TABLE-US-00004 TABLE 4 Actual number for each sequence in heavy chain CDR1 library No. Sequence Number 1 SYAIN 610 2 SYAIH 943 3 SYAIS 962 4 SYAMN 607 5 SYAMH 708 6 SYAMS 670 7 SYGIN 661 8 SYGIH 1658 9 SYGIS 1166 10 SYGMN 622 11 SYGMH 691 12 SYGMS 702
[0277] By statistics, the distribution ratio of various amino acids at each position in the generated library is as shown in Table 5 and is exactly identical to the expected amino acid distribution in Table 1.
TABLE-US-00005 TABLE 5 Actual amino acid distribution in heavy chain CDR1 library H31 H32 H33 H34 H35 Ala (A) 45.0% Asn (N) 25.0% Gly (G) 55.0% His (H) 40.0% Ile (I) 60.0% Met (M) 40.0% Ser (S) 100.0% 35.0% Tyr (Y) 100.0%
[0278] Step 5. The amino acid sequences in the library were reverse-translated into DNA sequences.
[0279] Step 6. A high-throughput gene synthesis was performed with chips.
Example 2
[0280] In the example, a (large, simple) antibody heavy chain CDR1 library was generated, with requirements as follows: the final library comprises 1 000 000 amino acid sequences, the length of each sequence is 5 amino acid residues, and the allowable types of amino acids at each position and the ratio thereof are as shown in Table 6. In this example, the sequence distribution in a primary library of 10000 sequences was determined and then expanded to a secondary library of 1,000,000 sequences.
TABLE-US-00006 TABLE 6 Amino acid distribution set in heavy chain CDR1 library H31 H32 H33 H34 H35 Ala (A) 45% Asn (N) 25% Gly (G) 55% His (H) 40% Ile (I) 60% Met (M) 40% Ser (S) 100% 35% Tyr (Y) 100%
[0281] Step 1. All possible amino acid sequences were listed as an alternative sequence set. In this example, other than the amino acid distribution shown in Table 6, there are no additional limitations. The alternative sequence set consists of 12 sequences, as shown in Table 7.
TABLE-US-00007 TABLE 7 Alternative sequence set of heavy chain CDR1 library No. Sequence 1 SYAIN 2 SYAIH 3 SYAIS 4 SYAMN 5 SYAMH 6 SYAMS 7 SYGIN 8 SYGIH 9 SYGIS 10 SYGMN 11 SYGMH 12 SYGMS
[0282] Step 2. For the primary library comprising 10000 sequences and having the amino acid distribution shown in Table 6, the given number of various amino acids at each position thereof was calculated, and the results are as shown in Table 8.
TABLE-US-00008 TABLE 8 Given number for each amino acid at each position in primary library H31 H32 H33 H34 H35 Ala (A) 4500 Asn (N) 2500 Gly (G) 5500 His (H) 4000 Ile (I) 6000 Met (M) 4000 Ser (S) 10000 3500 Tyr (Y) 10000
[0283] Step 3. A sequence was randomly selected from an alternative sequence set for judging whether the addition of the sequence to a primary library will cause the number of certain amino acids at a certain position to exceed the given number of the amino acids at the position shown in Table 8, wherein if the number of certain amino acids at a certain position does not exceed the given number, then the sequence is added to the primary library, whereas if the number of certain amino acids at a certain position exceeds the given number, then the sequence is not added to the primary library and removed from the alternative sequence set.
[0284] The total number of sequences in the primary library was checked, wherein if the total number reached 10000, then the selection and storage tasks were completed, whereas if the total number did not reach 10000, then the selection and storage tasks were continued.
[0285] Step 4. After the above-mentioned operations, the actual number and proportion of various sequences in the generated primary library are as shown in Table 9. The library size of the example is 1000000 sequences, and expansion operation is required. The actual proportion of various sequences in the primary library was used as the sampling probability of the secondary library.
Table 9: Actual number and proportion of each sequence in primary library
[0286] The proportion shown in Table 9 was used as the probability distribution, and 1000000 sequences were re-selected from the alternative sequence set to generate a secondary library. The actual number of various sequences in the secondary generated library is as shown in Table 10.
TABLE-US-00009 TABLE 10 Actual number for each sequence in heavy chain CDR1 library No. Sequence Number 1 SYAIN 61407 2 SYAIH 94304 3 SYAIS 96356 4 SYAMN 60931 5 SYAMH 70800 6 SYAMS 67183 7 SYGIN 65934 8 SYGIH 164791 9 SYGIS 116449 10 SYGMN 62122 11 SYGMH 68925 12 SYGMS 70798
[0287] By statistics, the distribution ratio of various amino acids at each position in the generated secondary library is as shown in Table 11 and is basically identical to the expected amino acid distribution in Table 6.
TABLE-US-00010 TABLE 11 Actual amino acid distribution in heavy chain CDR1 library H31 H32 H33 H34 H35 Ala (A) 45.10% Asn (N) 25.04% Gly (G) 54.90% His (H) 39.88% Ile (I) 59.92% Met (M) 40.08% Ser (S) 100.00% 35.08% Tyr (Y) 100.00%
[0288] Step 5. The amino acid sequences in the secondary library were reverse-translated into DNA sequences.
[0289] Step 6. A high-throughput gene synthesis was performed with chips.
[0290] The square of the coefficient of determination, i.e., R.sup.2 was used to calculate the degree of agreement between the actual amino acid distribution of multiple selection positions H33, H34 and H35 and the expected amino acid distribution thereof. The calculated R.sup.2 values for the positions are respectively: 0.9996 for H33; 0.9999 for H34; and 0.9998 for H35.
Example 3
[0291] In the example, a (small, complex) antibody heavy chain CDR1 library was generated, with requirements as follows: the final library comprises 10000 amino acid sequences, the length of each sequence is 5 amino acid residues, and the allowable types of amino acids at each position and the ratio thereof are as shown in Table 12.
TABLE-US-00011 TABLE 12 Amino acid distribution set in heavy chain CDR1 library H31 H32 H33 H34 H35 Ala (A) 5.5% 45.0% Asn (N) 6.5% 25.0% Asp (D) 7.5% Gly (G) 8.5% 55.0% His (H) 9.5% 40.0% Ile (I) 10.5% 60.0% Leu (L) 11.5% Met (M) 12.5% 40.0% Ser (S) 100.0% 13.5% 35.0% Tyr (Y) 14.5%
[0292] Step 1. All the possible amino acid sequences were listed as alternative sequences. In this example, other than the amino acid distribution shown in Table 12, there are no additional limitations. The alternative sequence set consists of 120 sequences, as shown in Table 13.
TABLE-US-00012 TABLE 13 Alternative sequence set of heavy chain CDR1 library No. Sequence 1 SAAIS 2 SAAIN 3 SAAIH 4 SAAMS 5 SAAMN 6 SAAMH 7 SAGIS 8 SAGIN 9 SAGIH 10 SAGMS 11 SAGMN 12 SAGMH 13 SNAIS 14 SNAIN 15 SNAIH 16 SNAMS 17 SNAMN 18 SNAMH 19 SNGIS 20 SNGIN 21 SNGIH 22 SNGMS 23 SNGMN 24 SNGMH 25 SDAIS 26 SDAIN 27 SDAIH 28 SDAMS 29 SDAMN 30 SDAMH 31 SDGIS 32 SDGIN 33 SDGIH 34 SDGMS 35 SDGMN 36 SDGMH 37 SGAIS 38 SGAIN 39 SGAIH 40 SGAMS 41 SGAMN 42 SGAMH 43 SGGIS 44 SGGIN 45 SGGIH 46 SGGMS 47 SGGMN 48 SGGMH 49 SHAIS 50 SHAIN 51 SHAIH 52 SHAMS 53 SHAMN 54 SHAMH 55 SHGIS 56 SHGIN 57 SHGIH 58 SHGMS 59 SHGMN 60 SHGMH 61 SIAIS 62 SIAIN 63 SIAIH 64 SIAMS 65 SIAMN 66 SIAMH 67 SIGIS 68 SIGIN 69 SIGIH 70 SIGMS 71 SIGMN 72 SIGMH 73 SLAIS 74 SLAIN 75 SLAIH 76 SLAMS 77 SLAMN 78 SLAMH 79 SLGIS 80 SLGIN 81 SLGIH 82 SLGMS 83 SLGMN 84 SLGMH 85 SMAIS 86 SMAIN 87 SMAIH 88 SMAMS 89 SMAMN 90 SMAMH 91 SMGIS 92 SMGIN 93 SMGIH 94 SMGMS 95 SMGMN 96 SMGMH 97 SSAIS 98 SSAIN 99 SSAIH 100 SSAMS 101 SSAMN 102 SSAMH 103 SSGIS 104 SSGIN 105 SSGIH 106 SSGMS 107 SSGMN 108 SSGMH 109 SYAIS 110 SYAIN 111 SYAIH 112 SYAMS 113 SYAMN 114 SYAMH 115 SYGIS 116 SYGIN 117 SYGIH 118 SYGMS 119 SYGMN 120 SYGMH
[0293] Step 2. For the library comprising 10000 sequences and having the amino acid distribution shown in Table 13, the given number of various amino acids at each position thereof was calculated, and the results are as shown in Table 14.
TABLE-US-00013 TABLE 14 Given number for each amino acid at each position in heavy chain CDR1 library H31 H32 H33 H34 H35 Ala (A) 550 4500 Asn (N) 650 2500 Asp (D) 750 Gly (G) 850 5500 His (H) 950 4000 Ile (I) 1050 6000 Leu (L) 1150 Met (M) 1250 4000 Ser (S) 10000 1350 3500 Tyr (Y) 1450
[0294] Step 3. A sequence was randomly selected from an alternative sequence set for judging whether the addition of the sequence to a library will cause the number of certain amino acids at a certain position to exceed the given number of the amino acids at the position shown in Table 14, wherein if the number of certain amino acids at a certain position does not exceed the given number, then the sequence is added to the library, whereas if the number of certain amino acids at a certain position exceeds the given number, then the sequence is not added to the library and removed from the alternative sequence set.
[0295] The total number of sequences in the library was checked, wherein if the total number reached 10000, then the selection and storage tasks were completed, whereas if the total number did not reach 10000, then the selection and storage tasks were continued.
[0296] Step 4. After the above-mentioned operations, the actual number of various sequences in the generated library is shown in Table 15. The library size of the example is 10000 sequences, and no expansion operation is required.
TABLE-US-00014 TABLE 15 Actual number for each sequence in heavy chain CDR1 library No. Sequence Number 1 SAAIS 38 2 SAAIN 45 3 SAAIH 45 4 SAAMS 54 5 SAAMN 65 6 SAAMH 45 7 SAGIS 41 8 SAGIN 52 9 SAGIH 49 10 SAGMS 35 11 SAGMN 43 12 SAGMH 38 13 SNAIS 48 14 SNAIN 49 15 SNAIH 70 16 SNAMS 57 17 SNAMN 53 18 SNAMH 48 19 SNGIS 56 20 SNGIN 42 21 SNGIH 59 22 SNGMS 63 23 SNGMN 56 24 SNGMH 49 25 SDAIS 61 26 SDAIN 61 27 SDAIH 70 28 SDAMS 72 29 SDAMN 58 30 SDAMH 60 31 SDGIS 66 32 SDGIN 61 33 SDGIH 72 34 SDGMS 58 35 SDGMN 56 36 SDGMH 55 37 SGAIS 90 38 SGAIN 74 39 SGAIH 76 40 SGAMS 72 41 SGAMN 86 42 SGAMH 77 43 SGGIS 69 44 SGGIN 58 45 SGGIH 61 46 SGGMS 70 47 SGGMN 51 48 SGGMH 67 49 SHAIS 102 50 SHAIN 70 51 SHAIH 92 52 SHAMS 97 53 SHAMN 64 54 SHAMH 81 55 SHGIS 90 56 SHGIN 63 57 SHGIH 79 58 SHGMS 76 59 SHGMN 58 60 SHGMH 78 61 SIAIS 100 62 SIAIN 75 63 SIAIH 91 64 SIAMS 87 65 SIAMN 65 66 SIAMH 73 67 SIGIS 141 68 SIGIN 67 69 SIGIH 155 70 SIGMS 73 71 SIGMN 54 72 SIGMH 69 73 SLAIS 107 74 SLAIN 70 75 SLAIH 127 76 SLAMS 56 77 SLAMN 59 78 SLAMH 83 79 SLGIS 186 80 SLGIN 61 81 SLGIH 164 82 SLGMS 76 83 SLGMN 65 84 SLGMH 96 85 SMAIS 128 86 SMAIN 54 87 SMAIH 104 88 SMAMS 88 89 SMAMN 70 90 SMAMH 62 91 SMGIS 205 92 SMGIN 56 93 SMGIH 277 94 SMGMS 64 95 SMGMN 67 96 SMGMH 75 97 SSAIS 115 98 SSAIN 80 99 SSAIH 105 100 SSAMS 77 101 SSAMN 69 102 SSAMH 71 103 SSGIS 173 104 SSGIN 76 105 SSGIH 374 106 SSGMS 81 107 SSGMN 71 108 SSGMH 58 109 SYAIS 103 110 SYAIN 72 111 SYAIH 113 112 SYAMS 72 113 SYAMN 77 114 SYAMH 67 115 SYGIS 180 116 SYGIN 51 117 SYGIH 481 118 SYGMS 73 119 SYGMN 76 120 SYGMH 84
[0297] By statistics, the distribution ratio of various amino acids at each position in the generated library is as shown in Table 16 and is almost identical to the expected amino acid distribution in Table 12.
TABLE-US-00015 TABLE 16 Actual amino acid distribution in heavy chain CDR1 library H31 H32 H33 H34 H35 Ala (A) 5.50% 45.00% Asn (N) 6.50% 25.00% Asp (D) 7.50% Gly (G) 8.51% 55.00% His (H) 9.50% 40.00% Ile (I) 10.50% 60.00% Leu (L) 11.50% Met (M) 12.50% 40.00% Ser (S) 100.00% 13.50% 35.00% Tyr (Y) 14.49%
[0298] The above results demonstrate that in the case that there are many optional sequences and the amino acid distribution is relatively complex, the method of the present invention can also obtain good results.
[0299] Step 5. The amino acid sequences in the library were reverse-translated into DNA sequences.
[0300] Step 6. A high-throughput gene synthesis was performed with chips.
[0301] The square of the coefficient of determination, i.e., R.sup.2, was used to calculate the degree of agreement between the actual amino acid distribution of multiple selection positions H32, H33, H34 and H35 and the expected amino acid distribution thereof. The calculated R.sup.2 values for the positions are respectively: 0.999998 for H32; 1.000000 for H33; 1.000000 for H34; and 1.000000 for H35.