Optical Methods For DNA Assembly For Computer Data Storage
20240229017 ยท 2024-07-11
Assignee
Inventors
Cpc classification
G16B50/00
PHYSICS
C12N15/1068
CHEMISTRY; METALLURGY
International classification
Abstract
An array-based system of assembled DNA for computer data storage is described. An array surface contains immobilized seed DNA initially having with blunt (or blocked) ends with a photocleavable optical linker at a forward end thereof holding the last few base pairs. A light source is light is applied to break the linker, generating a sticky end which allows for hybridization. Data-bearing DNA cassettes are introduced to the array and attach via their sticky ends to the unblock sites on the array surface. The attachment is made permanent via ligase.
Claims
1. A process for DNA-based data storage comprising: immobilizing multiple seed DNA strands on a surface of a chip, each strand of seed DNA being a double-stranded DNA oligomer including a first attached end and a second exposed end, wherein the second exposed end of each immobilized double-stranded DNA oligomer is blunt such that both a forward and a reverse strand of thereof are of equal lengths at the second end, and further wherein each immobilized double-stranded DNA oligomer contains a photocleavable optical linker at a first location of a forward strand of each of the multiple strands of immobilized double-stranded DNA oligomers; directing light a first time to the array to break multiple of the photocleavable optical linkers at a first location of a forward strand of multiple immobilized double-stranded DNA oligomers in a first predetermined pattern thereby exposing a sticky end of each of the multiple immobilized double-stranded DNA oligomers exposed to the light, while the first attached ends of the multiple strands remains attached to the surface of the chip; and introducing first data-encoded DNA cassettes to the surface of the chip, wherein each of the first data-encoded DNA cassette attaches at a first end thereof to an exposed sticky end of a seed DNA strand, and further wherein each of the first data-encoded DNA cassettes includes a photocleavable optical linker at a second end thereof.
2. The process according to claim 1, wherein the light is from UV scanner source.
3. The process according to claim 1, wherein the light is from a digital light processor (DLP).
4. The process according to claim 1, further comprising: securing the attachments using ligase.
5. The process according to claim 1, wherein immobilizing multiple seed DNA strands on the surface of the chip includes immobilizing millions of individual strands of double-stranded DNA oligomers.
6. The process according to claim 1, wherein introducing the first data-encoded DNA cassettes to the surface of the chip includes pumping a liquid containing the first data-encoded DNA cassettes across the surface of the chip.
7. The process according to claim 1, wherein the surface of the chip contains individually addressable pixels, each pixel including at least one strand of double-stranded DNA oligomer and each pixel, including the at least one strand, being individually addressable by the light.
8. The process according to claim 7, wherein the surface of the chip contains millions of pixels.
9. The process according to claim 8, wherein each pixel is approximately 30 to 100 ?m in size.
10. The process according to claim 1, further comprising: directing light a second time to the array to break at least one of the group consisting of multiple of the photocleavable optical linkers at the second ends of multiple immobilized double-stranded DNA oligomers and multiple of the photocleavable optical linkers at the second ends of the attached first data-encoded DNA cassettes in a second predetermined pattern thereby exposing sticky ends of each of the multiple immobilized double-stranded DNA oligomers and each of the attached first data-encoded DNA cassettes exposed to the light; introducing second data-encoded DNA cassettes to the surface of the chip, wherein each of at least a portion of the second data-encoded DNA cassettes attach to one of an exposed sticky end of an immobilized double-stranded DNA oligomer or an exposed sticky end of a first attached data-encoded DNA cassette.
11. The process according to claim 10, further comprising: securing the attachments using ligase.
Description
BRIEF DESCRIPTION OF FIGURES
[0012] The patent or application file contains at least one figure executed in color and/or a photograph. Copies of this patent or patent application publication with color drawing(s) and/or photographs will be provided by the Office upon request and payment of the necessary fee.
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
DETAILED DESCRIPTION
[0021] The purpose of the embodiments described herein is to production and use of an array based system of DNA assembly for the purpose of computer data storage. In this application assembled DNA sequences are used to represent computer binary data. The motivation for this approach is multifold. DNA has the capacity to store huge amounts of information in a small physical footprint; over 200 Petabytes of data could be stored in a gram of DNA, which occupies a volume of less than a teaspoon. This is approximately 10 times all the data, both digital and printed, currently held by the library of congress. Additionally, DNA has long term stability with half-life over 500 years, also making it attractive for use as a long-term data storage medium. The current challenge in using DNA for data storage is that a very large variety of DNA oligomers must be made in a highly parallel-process to be practical. The current methods of DNA synthesis are not suitable for this application.
[0022] Accordingly, in the present embodiments, we use a method of light based chemistry that allows for the controlled addition of DNA cassettes across an array of base DNA oligomers. This method has the advantage of simplicity of the fluidics and controlled growth over millions of reaction sites.
[0023] In this process DNA is synthesized to represent digital data. The first steps are to take the digital data consisting as zeros and ones, and convert it into DNA sequence data. For the present embodiments, a predetermined set of DNA oligomers represent the binary data (see Table 1), not individual base pairs. For an example of known methods for storing data in synthesized DNA, see Storing data in oligonucleotides is described in Organick L, Ang S D, Chen Y et al. Random access in large-scale DNA data storage. Nat Biotechnol 2018; 36: 242-8 which is incorporated herein by reference in its entirety. Next, the DNA is assembled on an array of reaction sites with each DNA addition representing a set of digital data. Short sequences of several base pairs are used to represent bit values. In the specific embodiment discussed herein, we used 3 sets of 4 oligomers, to allow 2 bits per addition cycle. But this process could be used to add more bits per addition cycle. A pool consisting of three sets of 8 oligomers in each set can be used to encode 3 bits per addition, and a larger reagent pool consisting of 3 sets of 16 oligomers in each set allows each addition to add 4 bits.
[0024] An example of how binary values of 3 bits in length can be represented as a DNA sequence is shown in Table 1. This is just an example; other DNA sequences can be used in practice.
TABLE-US-00001 TABLE1 DNAsequenceused binary torepresentbinary values dataforwardstrand 000 CAC 001 CAT 010 CTA 011 CTG 100 TCG 101 TCA 110 TGC 111 TGT
[0025] For example, a binary stream of 24 bits is first broken into 8 sets of 3 bits. Then using a lookup table, e.g., Table 1, it is converted to DNA sequence as shown in the example in Table 2.
TABLE-US-00002 TABLE2 24bitdata: 011 110 111 100 111 101 101 111 DNAequivalent: CTG TGC TGT TCG TGT TCA TCA TGT
[0026] To allow attachment to the immobilized DNA array, the oligomer also has a binding sequence at the front and back end to allow for hybridization. For example, the front binding sequence for one set could be AAGG, the back ATTG. The complete 3 to 5 sequence would consist of the leading DNA for attachment to the immobilized DNA on the array, the data DNA, followed by a second set of attachment DNA for adding a subsequent round. Between the data sequence and the second set of binding sequence is an optical linker that is designated as: {hacek over ()} as shown in
[0027] For the present example, the full reagent pool shown in
[0028] The DNA is synthesized as large pools of oligomers, with each oligomer type representing a small portion of the binary data. This requires synthesizing millions of specific DNA oligomers and requires a parallel process to be practical. A critical feature of the present process is the use of light based chemistry and equipment that allows the DNA synthesis process to be performed quickly and in parallel over millions of reaction sites with no segregation of reaction sites required. These attributes make this an ideal method for applying DNA for the computer data storage application.
[0029] A primary step in the embodied process is controlling growth on an array in a manner that facilitates parallel processing. Initially, the process starts with an array of immobilized DNA with blunt ends. DNA is considered to have a blunt end when both the forward and reverse strands are equal length. It is very difficult to ligate a second dsDNA oligomer to a blunt end. TO address this difficulty, the present embodiments hold the last few base pairs using an optical linker. If a reaction is desired, the optical linker is broken with light, the last few base pairs are removed, leaving an open stretch or sticky end that allows for hybridization. The dsDNA representing digital data is then free to add where the array is treated with light, i.e., the sticky end, and is prevented from addition at the location with blunt ends. The addition is then made permanent with ligase that repairs the two nicks.
[0030] A summary of this process is shown and illustrated in
[0031] To make this chemistry practical, a light source that can direct light to millions of pixels is necessary. As shown in
[0032] Additionally, each incoming data-bearing DNA cassette also has an optical linker and the process can be repeated in cycles until a long dsDNA oligomer is formed. These series of data-bearing DNA additions are used to encode digital data to each oligomer strand. At the end of the process millions of unique DNA strands are generated across the array that represent the digital data.
[0033] Referring to
[0034]
[0035] The method described herein for the controlled addition of DNA cassettes across an array of base DNA oligomers has been proven out in experimental settings. In
TABLE-US-00003 /iCy3/CTCACAACCCCAGAAA/iSpPC/CAGACATGCTTCCTGACATACGATATCTGTGA GCTTAATGTCCTTATGT/3Bio/
[0036] Next, the slide containing the hybridized base DNA is exposed to UV light (250-310 nm) using a DLP in 5 by 5 pixel (5?5 pixel is equivalent to 1334?1334 ?m) pattern for 15 minutes.
[0037] In
[0038] The outcomes illustrated in
[0052] Referring to
[0053] The major advantages of the embodiments here are two-fold. First, the DNA is synthesized as large pools of oligomers, with each oligomer type representing a small portion of the binary data. This attribute is critical if DNA is to be used for data storage. In the exemplary embodiments described herein, the DLP is used to control the light at the pixel level and has the capability to direct light to millions of reaction sites simultaneously. This allows the synthesis of millions of specific DNA oligomers in a parallel process.
[0054] The second advantage to the embodiments described herein is the use of light based chemistry that allows deprotection to occur on the array surface, not in solution. By controlling the reaction at the immobilized surface, the array can be flooded with the DNA cassettes and the reactions are limited to the desired array location. In other processes the reaction needs to be isolated to individual wells, which requires dispensing the reagents to millions of sites; a process that is very difficult to scale-up.
[0055] One skilled in the art would appreciate variations and substitutes, including temperature ranges, timing ranges, light ranges and the like which would fall within the ordinary course of experimentation and thus are considered to be within the scope of the present embodiments. Additionally, one skilled in the art would appreciate the component substitutes which, though not listed out explicitly, would be known to one skilled in the art and thus considered to be within the scope of the present embodiments.