METHOD AND DEVICE FOR FIXED-POINT EDITING OF NUCLEOTIDE SEQUENCE WITH STORED DATA

Abstract

Disclosed are a method and device for fixed-point editing of a nucleotide sequence stored with data.

Claims

1. A method for fixed-point editing of a nucleic acid sequence with stored data, which comprises the following steps: (1) splitting a nucleic acid sequence in which a data is stored into a plurality of sequence fragments, and dividing all the sequence fragments into i partitions, wherein i is a positive integer; (2) adding a partition adapter at one or both ends of the sequence fragments in each partition, wherein the partition adapter sequence for each partition is different from each other; (3) synthesizing the sequence fragments in each partition as described in the synthesis step (2) to obtain nucleic acid fragments; (4) determine a partition n where a sequence fragment to be edited is located, and record it as the n.sup.th partition; (5) amplifying the sequence fragments of all partitions except for the sequence fragments of the n.sup.th partition by using a partition primer library, wherein the partition primer library comprises primers that are at least partially complementary to the partition adapter sequences of the 1.sup.st partition, the 2.sup.nd partition, . . . , the n−1.sup.th partition, the n+1.sup.th partition, . . . , and the i.sup.th partition, respectively, so as to obtain a library comprising the sequence fragments of the 1.sup.st partition, the 2.sup.nd partition, . . . , the n−1.sup.th partition, the n+1.sup.th partition, . . . , and the i.sup.th partition; and (6) correcting a wrong sequence in the sequence fragment to be edited in the n.sup.th partition to obtain a correct sequence, then synthesizing all sequence fragments in the n.sup.th partition according to the correct sequence, and adding them into the library of step (5) so as to obtain a library with the correct sequence.

2. The method according to claim 1, characterized by further comprising one or more of the following items: (a) in step (1), the data is text information, image information, or sound information. (b) before step (1), the data is encoded into binary data according to a first encoding rule; preferably the first encoding rule is a binary encoding rule; and/or the binary data is encoded into a nucleic acid sequence through a second encoding rule, so as to obtain the nucleic acid sequence in which the data is stored, preferably, the second encoding rule is Huffman Encoding Rule, Fountain Code Encoding Rule, XOR Encoding Rule, or Grass Encoding Rule; (c) in step (1), the nucleic acid sequence in which a data is stored is split into a plurality of sequence fragments with length of not exceeding 200 nt, in which each fragment has the same length.

3. The method according to claim 1, wherein in step (2), the partition adapter is added at one or both ends of the sequence fragments in each partition according to any one of the following rules: a partition adapter A1 is added at one or both ends of each sequence fragment in the 1.sup.st partition, a partition adapter A2 is added at one or both ends of each sequence fragment in the 2.sup.nd partition, . . . , a partition adapter Ai is added at one or both ends of all sequence fragments in the i.sup.th partition, wherein the partition adapter sequences are different from each other but have the same length, which is preferably 16-20 nt; a partition adapter A1 is added at the 5′end of each sequence fragment in the 1.sup.st partition, a partition adapter A1′ is added at the 3′end of each sequence fragment in the 1.sup.st partition, a partition adapter A2 is added at the 5′end of each sequence fragment in the 2.sup.nd partition, a partition adapter A2′ is added at the 3′end of each sequence fragment in the 2.sup.nd partition, . . . , a partition adapter Ai is added at the 5′end of each sequence fragment in the i.sup.th partition, and a partition adapter Ai′ is added at the 3′end of each sequence fragment in the i.sup.th partition, wherein the partition adapter sequences are different from each other but have the same length, which is preferably 16-20 nt; a universal adapter A is added at the 5′end of the sequence fragments of each partition, a partition adapter A1 is added at the 3′end of each sequence fragment in the 1.sup.st partition, a partition adapter A2 is added at the 3′end of each sequence fragment in the 2.sup.nd partition, . . . , a partition adapter Ai is added at the 3′end of each sequence fragment in the i.sup.th partition, wherein the partition adapter sequences are different from each other but have the same length, which is preferably 16-20 nt; a universal adapter A is added at the 3′end of the sequence fragments in each partition, a partition adapter A1 is added at the 5′end of each sequence fragment in the 1.sup.st partition, a partition adapter A2 is added at the 5′end of each sequence fragment in the 2.sup.nd partition, . . . , a partition adapter Ai is added at the 5′end of each sequence fragment in the i.sup.th partition, wherein the partition adapter sequences are different from each other but have the same length, which is preferably 16-20 nt.

4. The method according to claim 1, wherein the sequence fragments in the library in step (6) are stored in a medium, or the sequence fragments in the library in step (6) are connected to a vector, and the vector is stored in a medium, or the sequence fragments in the library in step (6) are assembled, and the assembled sequence fragments are stored in a medium, preferably, the medium is selected from liquid phase, dry powder, living cells, or a combination thereof.

5. The method according to claim 1, wherein after a sequence fragment added with a partition adapter is obtained in step (2), the sequence fragment is added with an index number, wherein the index number is adjacent to the partition adapter.

6. The method according to claim 1, wherein the partition adapter has a length of 18 nt, and the index number sequence has a length of 5 nt to 10 nt, preferably 6 nt.

7. The method according to claim 1, wherein the partition n where the sequence fragment to be edited is located is determined by the following method: the partition n where the sequence fragment to be edited is located is determined according to the encoding rules used when the data is stored, or the partition n where the sequence fragment to be edited is located is determined by sequencing the nucleic acid sequence fragment synthesized in step (3) and performing sequence alignment.

8. The method according to claim 1, wherein in step (5), a multiplex PCR is used to amplify the sequence fragments, preferably, the multiplex PCR is Touch up, or Touch down PCR, preferably, the polymerase used is selected from Taq, Phusion, Q5, Vent, KlenTaq, or a combination thereof.

9. A decoding method, comprising sequencing the library obtained by using the method according to claim 1 to obtain each sequence fragment; and obtaining the position sequence information of each sequence fragment according to the index number of the each sequence fragment; splicing the sequence fragments according to the position sequence information into a nucleic acid sequence in which the data is stored. optionally, the obtained nucleic acid sequence in which the data is stored is transcoded into a corresponding binary code, and then the binary code is transcoded into a corresponding data information.

10. A device for fixed-point editing of a nucleic acid sequence with stored data, comprising: a module for splitting sequence and dividing partitions, which is configured to split the nucleic acid sequence in which a data is stored into a plurality of sequence fragments, and to divide all the sequence fragments into i partitions, wherein i is a positive integer; a module for adding partition adapter, which is configured to add a partition adapter at one or both ends of the sequence fragments in each partition, wherein the partition adapter sequence of each partition is different from each other; a module for synthesizing nucleic acid, which is configured to synthesize nucleic acid fragments for the sequence fragments with the added partition adapters; a positioning module, which is configured to determine the partition n where a sequence fragment to be edited is located, and record it as the n.sup.th partition; an amplification module, which is configured to amplify the sequence fragments of all partitions except for the sequence fragments of the n.sup.th partition by using a partition primer library, wherein the partition primer library comprises primers that are at least partially complementary to the partition adapter sequences of the 1.sup.st partition, the 2.sup.nd partition, . . . , the n−1.sup.th partition, the n+1.sup.th partition, . . . , and the i.sup.th partition, respectively, so as to obtain a library comprising the sequence fragments of the 1.sup.st partition, the 2.sup.nd partition, . . . , the n−1.sup.th partition, the n+1.sup.th partition, . . . , and the i.sup.th partition; and a correction module, which is configured to correct a wrong sequence in a sequence fragment to be edited in the n.sup.th partition to obtain a correct sequence, then synthesize all the sequence fragments in the n.sup.th partition according to the correct sequence and add them to the library obtained by the amplification module, so as to obtain a library with the correct sequence, optionally, the device further comprises a module for adding index number, which is configured to add an index number to the sequence fragments added with partition adapter, wherein the index number is adjacent to the partition adapter.

11. The device according to claim 10, wherein the partition adapter is added at one or both ends of the sequence fragments in each partition according to any one of the following rules: a partition adapter A1 is added at one or both ends of each sequence fragment in the 1.sup.st partition, a partition adapter A2 is added at one or both ends of each sequence fragment in the 2.sup.nd partition, . . . , a partition adapter Ai is added at one or both ends of all sequence fragments in the i.sup.th partition, wherein the partition adapter sequences are different from each other but have the same length, which is preferably 16-20 nt; a partition adapter A1 is added at the 5′end of each sequence fragment in the 1.sup.st partition, a partition adapter A1′ is added at the 3′end of each sequence fragment in the 1.sup.st partition, a partition adapter A2 is added at the 5′end of each sequence fragment in the 2.sup.nd partition, a partition adapter A2′ is added at the 3′end of each sequence fragment in the 2.sup.nd partition, . . . , a partition adapter Ai is added at the 5′end of each sequence fragment in the i.sup.th partition, and a partition adapter Ai′ is added at the 3′end of each sequence fragment in the i.sup.th partition, wherein the partition adapter sequences are different from each other but have the same length, which is preferably 16-20 nt; a universal adapter A is added at the 5′end of the sequence fragments of each partition, a partition adapter A1 is added at the 3′end of each sequence fragment in the 1.sup.st partition, a partition adapter A2 is added at the 3′end of each sequence fragment in the 2.sup.nd partition, . . . , a partition adapter Ai is added at the 3′end of each sequence fragment in the i.sup.th partition, wherein the partition adapter sequences are different from each other but have the same length, which is preferably 16-20 nt; or a universal adapter A is added at the 3′end of the sequence fragments in each partition, a partition adapter A1 is added at the 5′end of each sequence fragment in the 1.sup.st partition, a partition adapter A2 is added at the 5′end of each sequence fragment in the 2.sup.nd partition, . . . , a partition adapter Ai is added at the 5′end of each sequence fragment in the i.sup.th partition, wherein the partition adapter sequences are different from each other but have the same length, which is preferably 16-20 nt; or the partition adapter has a length of 18 nt, and the index number sequence has a length of 5 nt to 10 nt, preferably 6 nt.

12. The device according to claim 10, further comprising an assembly module, which is configured to assemble each sequence fragment in the library.

13. The device according to claim 10, further comprising a module for ligating vector, which is configured to ligate each sequence fragment in the library to a vector.

14. The device according to claim 10, further comprising a medium storage module, which is configured to store each sequence fragment in the library in a medium, or store the vector ligated with sequence fragment in a medium, or store the assembled sequence fragments in a medium, preferably, the medium is selected from liquid phase, dry powder, living cells, or a combination thereof.

15. A decoding device, comprising: a sequencing module, which is configured to sequence a library obtained by using the method according to claim 1 to obtain each sequence fragment; a module for acquiring position information, which is configured to obtain the position sequence information of the each sequence fragment according to the index number of the each sequence fragment; a splicing module, which is configured to splice the each sequence fragment according to the position sequence information to form a nucleic acid in which the data is stored.

16. The decoding device according to claim 15, further comprising a transcoding module, which is configured to transcode the nucleic acid sequence in which the data is stored into a corresponding binary code, and then transcode the binary code into a corresponding data information.

17. A computer-readable storage medium, comprising a computer program stored thereon, wherein when the program is executed by a processor, the method according to claim 1 is implemented.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0051] The drawings described here are used to provide a further understanding of the present disclosure and constitute a part of the application. The exemplary examples of the present disclosure and the description thereof are used to explain the present disclosure, and do not constitute an improper limitation of the present disclosure. In the attached drawings:

[0052] FIG. 1 shows a flowchart of DNA storage.

[0053] FIG. 2 shows a schematic diagram of sequence fragments after splitting according to some examples of the present disclosure.

[0054] FIG. 3 shows a flowchart of DNA storage sequence fixed-point editing process according to some examples of the present disclosure.

SPECIFIC MODELS FOR CARRYING OUT THE INVENTION

[0055] The following will clearly and completely describe the technical solutions in the examples of the present disclosure with reference to the accompanying drawings in the examples of the present disclosure. Obviously, the described examples are only a part of the examples of the present disclosure, rather than all the examples. The following description of at least one exemplary example is actually only illustrative, and in no way serves as any limitation to the present disclosure and its application or use. Based on the examples of the present disclosure, all other examples obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.

[0056] Unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions and numerical values set forth in these examples do not limit the scope of the present disclosure. At the same time, it should be understood that, for ease of description, the sizes of the various parts shown in the drawings are not drawn in accordance with actual proportional relationships. The technologies, methods and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods and equipment should be regarded as part of the description of the granted patent. In all examples shown and discussed herein, any specific value should be interpreted as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary examples may have different values. It should be noted that similar reference numerals and letters indicate similar items in the following drawings, so once an item is defined in one drawing, it does not need to be further discussed in the subsequent drawings.

EXAMPLE 1

Fixed-Point Editing of Nucleic Acid Sequence with Stored Data

[0057] Original document: Two sonnets by Shakespeare (English)

[0058] Simulation scenario: After the DNA sequences were synthesized, it was found that the stored original file was wrong, and the synthesized sequence needs to be subjected to modification and addition operations.

[0059] Experiment Process:

[0060] 1. The wrong version of the original file was encoded on a computer terminal by Church simple code [Next-Generation Digital Information Storage in DNA George M. Church, Yuan Gao and Sriram Kosuri (Aug. 16, 2012) Science 337 (6102), 1628. [doi: 10.1126/science.1226355]] in combination with Reed Solomon error correction code to obtain 176 sequences, in which “Like feeble old man” in line 11 of the wrong version should be “Like feeble age” in the original text, and “Lord of my” in line 17 of the wrong version should be “Lord of my love” in the original text.

[0061] 2. After encoding, all sequences were divided into 8 partitions, and 176 DNA sequences with length of 114 were obtained by adding index numbers and partition adapters (in total of 8, A to H) to each sequence and adding universal adapter ATGGTCAGATCGTGCATC, and each partition comprised 22 DNA sequences. Partition A comprised the sequences 1 to 22, in which the 5′end of each sequence was added with the universal adapter, and the 3′end was added with the partition adapter of Partition A; Partition B comprised the sequences 23 to 44, in which the 5′end of each sequence was added with the universal adapter, and the 3′end was added with the partition adapter of Partition B; . . . ; Partition H comprised the sequences 155 to 176, in which the 5′end of each sequence was added with the universal adapter, and the 3′end was added with the partition adapter of Partition H. The sequences of the partition adapters of Partitions A to H were different from each other, and all had a length of 18 nt.

[0062] The structure of each sequence from 5′ to 3′ was: universal adapter-sequence in which information was to be stored-index number-partition adapter.

[0063] 3. The 176 sequences obtained in step 2 were synthesized.

[0064] 4. After sequence alignment, it was found that the content to be modified in line 11 was in the 58.sup.th sequence in Partition C, and its wrong version sequence was:

TABLE-US-00001 ATGGTCAGATCGTGCATCAGCTGGCGACGAGGTAAGGATGATTAGATAAA [00001] embedded image

[0065] wherein, the single underline indicated the universal adapter sequence, the double underline indicated the partition adapter sequence of Partition C, and the framed sequence indicated the index number region.

[0066] 5. The primers that were complementary to the partition adapters A, B, D, E, F, G, H and the universal adapter sequence were added into the primer library, which was used to perform multiple PCR, so that all 154 sequences in Partitions A, B, D, E, F, G, H were amplified.

[0067] Therein, the multiplex PCR adopted touchdown PCR, using Q5® Reaction Buffer Pack kit, and the ratio of two enzymes was Q5: Ex Taq=8:1. The reaction procedure was: 98° C., 5 min; 25 cycles, and the temperature for each cycle was reduced by 0.2° C. (98° C., 20s; 55.2° C. to 60° C., 30 s; 72° C., 10 s); 72° C., 5 min; 12° C., hold.

[0068] 6. Through the multiplex PCR amplification and dilution in step 5, an Oligo library containing only Partitions A, B, D, E, F, G and H was obtained.

[0069] 7. By re-encoding the information of Partition C, new 22 sequences of Partition C were obtained, in which the corrected 58.sup.th sequence was as follows (the remaining 21 sequences of Partition C remained unchanged):

TABLE-US-00002 ATGGTCAGATCGTGCATCACGTATTCACGAAGGGACGAAGACAACTCCTA [00002] embedded image

[0070] wherein, the single underline indicated the universal adapter sequence, the double underline indicated the partition adapter sequence of Partition C, and the framed sequence indicated the index number region.

[0071] At the same time, the content that should be added in line 17 was designed, the original index number region was AGCCTA, two new sequences were added, which index number regions were A-AGCCTA and T-AGCCTA, and the newly added sequences 89-A and 89-B were respectively:

TABLE-US-00003 Sequence 89-A: ATGGTCAGATCGTGCATCATGAAATTTGGACCACAGGGCTACAAGTTATT [00003] embedded image Sequence 89-B: ATGGTCAGATCGTGCATCAGGGTCCTACGATGTGTTGTGCATCATGCTGA [00004]

[0072] wherein, the single underline indicated the universal adapter sequence, the double underline indicated the partition adapter sequences, and the framed sequence indicated the index number regions.

[0073] 8. The newly synthesized sequences in step 7 were mixed with the Oligo library obtained in step 6 to obtain a new mixture library.

[0074] 9. The newly obtained Oligo library in step 8 was subjected to Sanger sequencing.

[0075] 10. The sequencing result was returned to the computer for decoding, and the correct original file was obtained.

[0076] 11. The newly obtained Oligo library in step 8 was frozen into dry powder and stored at −20° C.

EXAMPLE 2

Decoding

[0077] The correct Oligo library edited in Example 1 was subjected to sequencing, and the sequence group A after sequencing was subjected to the removal of two ends with length of 18 nt (universal adapter and partition adapter, respectively) to obtain sequence group A′. Firstly, the index number information was read, and the index number was decoded, to obtain numbers of different sizes.

[0078] Then, the sequence group A′ was rearranged according to the index rule in ascending order, and then the index number was removed to obtain sequence group A″.

[0079] According to the encoding rules used in Example 1, the nucleic acid sequences of the sequence group A″ were transcoded into the corresponding binary codes, the binary codes of all the sequences were connected according to the previous index order, and then the binary codes were read according to the computer language to restore the original file.

METHOD AND DEVICE FOR FIXED-POINT EDITING OF NUCLEOTIDE SEQUENCE WITH STORED DATA

Inventors

Cpc classification

Classification Explorer

C12Q2531/113

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2537/143

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

G11C13/0019

PHYSICS

Classification Explorer

C12Q2525/197

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2537/143

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/197

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2531/113

CHEMISTRY; METALLURGY

Classification Explorer

G06N3/123

PHYSICS

International classification

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

G11C13/00

PHYSICS

Abstract

Claims

Description