Oligonucleotide data storage on solid supports

11931713 ยท 2024-03-19

Assignee

Inventors

Cpc classification

International classification

Abstract

A data storage medium is disclosed comprising a solid support matrix including an optional stabilising reagent or reagents in a dry form, for use as a support for artificially synthesised oligonucleotide sequences encoded with data. Preferably the matrix is fibrous (for example cellulose, or glass, fibres) formed into a support of sufficient strength to hold the oligonucleotide sequences. The stabilising reagents are preferably a combination of a weak base, and a chelating agent, optionally, uric acid or a urate salt, and optionally an anionic surfactant.

Claims

1. A data storage medium comprising a solid support matrix adapted for use as a dry storage support for non-naturally occurring oligonucleotide sequences encoded with ternary or quaternary base system DNA data, wherein the matrix is fibrous, wherein said matrix includes a reagent mix in a dry form comprising a combination of a weak base, a chelating agent, a stabilising reagent, and an anionic surfactant, wherein the stabilising reagent comprises a chaotropic salt, and wherein the storage medium includes a visual delineation around an area for holding the DNA data comprising: (i) a printed boundary on the fibrous matrix configured to show where the DNA data is stored to facilitate sampling therefrom or (ii) an indicating dye of chlorophenol red or phenol red which is configured to discolour, thereby showing where the DNA data is stored on the storage medium to facilitate sampling therefrom.

2. The data storage medium as claimed in claim 1, wherein the matrix comprises cellulose or glass fibres formed into a support of sufficient strength to hold the oligonucleotide sequences.

3. The data storage medium as claimed in claim 1, wherein the weak base is a Lewis base which has a pH of about 6 to 10.

4. The data storage medium as claimed in claim 3, wherein the weak base comprises an alkali metal carbonate, bicarbonate, phosphate or borate.

5. The data storage medium as claimed in claim 3, wherein the weak base comprises tris-hydroxymethyl amino methane (Tris), ethanolamine, triethanolamine, or glycine and alkaline salts of organic acids.

6. The data storage medium as claimed in claim 5, wherein the chelating agent comprises EDTA.

7. The data storage medium as claimed in claim 6, wherein the anionic surfactant includes a hydrocarbon moiety, aliphatic or aromatic, containing one or more anionic groups.

8. The data storage medium as claimed in claim 7, wherein the anionic surfactant comprises sodium dodecyl sulphate (SDS) and/or sodium lauryl sarcosinate (SLS).

9. A method of storing data, said method comprising the following steps: a) producing synthesised oligonucleotides having a non-naturally occurring nucleotide sequence representative of ternary or quaternary base system data to be stored; b) optionally replicating said oligonucleotides; and c) depositing said oligonucleotides, optionally replicated, onto a storage medium according to claim 1.

10. The method of storing data as claimed in claim 9, wherein at least steps b) and c) are performed in a liquid, and said liquid is allowed or caused to evaporate from the storage medium.

11. The method of storing data as claimed in claim 9, wherein step a) includes converting binary data into a quaternary code, and sequencing DNA nucleotides, according to the quaternary code to provide said oligonucleotides.

12. The method of storing data as claimed in claim 9, wherein step a) includes incorporating a reporter into said sequence, to provide an indication of data integrity, said reporter providing an increase or decrease in detectability in response to a change in said integrity.

13. The method of storing data as claimed in claim 12, wherein said reporter is a fluorescent dye.

14. The method of storing data as claimed in claim 13, wherein said nucleotide sequence is at least 250 base pairs in length.

15. The method of storing data as claimed in claim 14, wherein said replication is by means of a polymerase chain reaction (PCR), isothermal amplification, or rolling circle amplification.

16. The method of storing data as claimed in claim 15, wherein one or more sequences which aid replication are added to the end or ends of the sequenced oligonucleotides.

17. A method for recovering stored data from the storage medium of claim 1 when stored on the storage medium, said recovery method comprising the steps of: i) extracting at least a portion of the replicated oligonucleotides from the storage medium; ii) optionally performing further replication of said oligonucleotides; and iii) performing an analysis of the oligonucleotides to determine their nucleotide sequence, and decoding said sequence to thereby recover said stored data.

18. The method for recovering stored data as claimed in claim 17, wherein said extracting step includes removing at least a portion of the storage medium or the DNA stored thereon, and introducing into a vessel the at least a portion, or DNA, and then performing said further replication step.

19. The method as claimed in claim 18, wherein said further replication step includes: adding a liquid to said at least a portion.

20. The method for recovering stored data from the storage medium, as claimed in claim 17, wherein the analysis step includes determining the nucleic composition of the oligonucleotide by known techniques, using: an electropherogram; radiolabelling; single-molecule real-time sequencing; pyrosequencing; sequencing by ligation; or dideoxy chain termination.

21. A method for determining integrity of non-naturally occurring oligonucleotide sequences encoded with ternary or quaternary base system DNA data stored on a storage medium comprising a solid support matrix, wherein the matrix is fibrous, wherein said matrix includes a reagent mix in a dry form, wherein the reagent mix comprises a combination of a weak base, a chelating agent, a stabilising reagent comprising a chaotropic salt, and an anionic surfactant, and wherein the storage medium includes a visual delineation around an area for holding the DNA data comprising: (i) a printed boundary on the fibrous matrix configured to show where the DNA data is stored or (ii) an indicating dye such as chlorophenol red or phenol red, which is configured to discolor, thereby showing where the DNA data is stored on the storage medium, said method including the steps of: sampling medium from the visual delineated area; and performing an STR (short tandem repeat) profile analysis on a portion of the DNA data stored on the medium sampled from the visual delineated area.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows a Oligonucleotide sequenced as deposited on an FTA card(sequence listing 1);

(2) FIG. 2 shows a portion of the sequence of FIG. 1 recovered from an FTA card(sequence listing 2);

(3) FIG. 3 shows another portion of the sequence of FIG. 1 recovered from an FTA card(sequence listing 3);

(4) FIG. 4 shows a comparison between recovery using FTA cards and plain paper;

(5) FIG. 5 shows the results of a DNA integrity experiment using STR profiling; and

(6) FIG. 6 shows a micrographic image of a data storage medium according to one embodiment of the invention.

DETAILED DESCRIPTION OF THE INVENTION

(7) In one embodiment digital information is encoded into sequenced DNA according to directions given in Goldman et al 2013. This group used a number of common computer file formats. In this case, five files comprised all 154 of Shakespeare's sonnets (ASCII text), a classic scientific paper, a medium-resolution colour photograph of the European Bioinformatics Institute (JPEG), a 26-second excerpt from Martin Luther King's 1963 I have a dream speech (MP3 format). The data comprising each file were represented as DNA information sequences (see FIG. 1) known as base pairs (bp) formed from the known constituents of DNA namely Adenine (A), Cytosine (C), Guanine (G) and Thymine (T). Each DNA sequence was split into overlapping segments, generating fourfold redundancy, and alternate segments were converted to their reverse complement. These measures reduce the probability of data loss. Each segment was augmented with indexing information that facilitated the identification of the file from which it originated and its location within that file. In all, the five files were represented by a total of 153,335 segments of DNA information, each comprising 117 nucleotides bps. Oligonucleotides were synthesised corresponding to the designed DNA segments, creating 1.2?10.sup.7 copies of each oligonucleotide. Errors occur independently during synthesise (1 error per 500 bases) in the different copies of each string. Thus reducing the method's overall error rate. After resuspension, amplification and purification, a sample of the resulting library was sequenced to recover the information. The remainder was sub-aliquoted and re-lyophilized for long-term storage typically at minus 20? Celsius. Sequencing results yielded 79.6?10.sup.6 read-pairs of 104 bases in length, from which the authors reconstructed full-length (117-nt) oligonucleotides in silico. Full-length DNA sequences representing the original encoded files were then reconstructed digitally using software looking for the indexing information. The resulting DNA sequences could be fully decoded without intervention. Inspection confirmed that the original computer files had been reconstructed with 100% accuracy.

(8) Goldman transposed digital binary computer information into ternary (base 3), requiring just three characters, and then encoded that information into the DNA. This allows the four nucleotide bases (A, C, G, and T) to represent 0, 1, and 2 sequentially with 1 redundant base, and thus avoid repetitive use of bases which can cause data recovery errors due to present day sequencing machine shortcomings. For example, at one base A=0, C=1 and G=2, if the next base would be repeated then, for the next base C=0, G=1 and T=2, and so on. This ensures that a sequence of identical digits in the data is not represented by a sequence of identical bases in the DNA, helping to avoid reading mistakes.

(9) Alternatively, a quaternary (base 4) system is proposed here, with or without the shifting base representation system mentioned immediately above. Conveniently, quaternary codes of 4 characters (A, C, G and T) provide a binary 8 bit byte (octet). ASCII code requires 7 bits, and a parity bit can be added to each character to provide error detection.

(10) In a quaternary based system, the following encoding may be used:

(11) TABLE-US-00001 Englishlanguagecharacters GE ASCII7bitcodeplusparitybit 1000111110001010 Quaternarycode 20332022 DNAsequence[whereA= 0,C= 1, GATTGAGG G= 2andT= 3always] DNAsequence[wheren+ 1isused,accordingtothe GTCAGTAT followingrule:] PreviousbasesNextbases A= 0,C= 1,G= 2andT= 3A= 1,C= 2,G= 3andT= 0 A= 1,C= 2,G= 3andT= 0A= 2,C= 3,G= 0andT= 1 A= 2,C= 3,G= 0andT= 1A= 3,C= 0,G= 1andT= 2 A= 3,C= 0,G= 1andT= 2A= 0,C= 1,G= 2andT= 3

(12) The n+1 quaternary system used above increases data density over the ternary system used by Goldman, but still reduces the chances of strings of repeating bases which are difficult to read. The use of a quaternary system is not tied to the use of ASCII characters, and so any binary code can be transposed into base pairs according to the steps in the table above.

(13) Recovering the data is possible by reversing the encoding shown in the table.

(14) The DNA data storage medium of the present invention is based on a product known as FTA? as supplied by GE Healthcare Inc, and variants of that product, for example FTA Elute, the compositions of which are given in more detail below, and which are herein referred to generally as FTA or FTA cards or FTA paper.

(15) FTA includes stabilising reagent mix which comprises a combination of a weak base, and a chelating agent, optionally, uric acid or a urate salt, and optionally an anionic surfactant.

(16) The weak base of the combination may be a Lewis base which has a pH of about 6 to 10, preferably about pH 8 to 9.5. One function of the weak base is to act as a buffer to maintain a composition pH of about 6 to 10, preferably about pH 8.0 to 9.5, for example, pH 8.6. Hence, a weak base suitable for the composition of the invention may, in conjunction with other components of the composition, provide a composition pH of 6 to 10, preferably, about pH 8.0 to 9.5. Suitable weak bases according to the invention include organic and inorganic bases. Suitable inorganic weak bases include, for example, an alkali metal carbonate, bicarbonate, phosphate or borate (e.g., sodium, lithium, or potassium carbonate). Suitable organic weak bases include, for example, tris-hydroxymethyl amino methane (Tris), ethanolamine, tri-ethanolamine and glycine and alkaline salts of organic acids (e.g. trisodium citrate). A preferred organic weak base is a weak monovalent organic base, for example, Tris. The Tris may be either a free base or a salt, for example, a carbonate salt.

(17) A preferred chelating agent is a strong chelating agent. By strong chelating agent it is meant that the agent binds multivalent metal ions with a comparable or better affinity than ethylene diamine tetraacetic acid (EDTA). A preferred chelating agent according to the invention is EDTA.

(18) Anionic surfactants are examples of surfactants which are useful in the present invention. A preferred anionic surfactant is a strong anionic detergent. As used herein, a strong anionic detergent includes a hydrocarbon moiety, aliphatic or aromatic, containing one or more anionic groups. Particularly preferred anionic detergents suitable for the invention include sodium dodecyl sulphate (SDS) and sodium lauryl sarcosinate (SLS). In a preferred embodiment, the anionic detergent causes inactivation of most microorganisms which have protein or lipids in their outer membranes or capsids, for example, fungi, bacteria or viruses. This includes microorganisms which may be pathogenic to humans and are present in a biological sample. The detergent also functions to denature proteins such as DNAses thereby affording additional protection to the applied DNA. The storage medium has a visual delineation (to the human eye or an automated device such as a camera and controller) around an area holding DNA data so that it is possible in future to view where the DNA was deposited. In this case the delineation is a printed circle into which the DNA is deposited, but an indicating dye can also provide a delineation.

(19) Alternatively, a product called FTA Ellute could be used as a storage medium, in which case the stabilising reagent may comprise a single reagent, for example a chaotropic substance such as a chaotropic salt, for example guanidinium thiocyanate.

(20) In order to demonstrate that a viable long term storage solution could be achieved a data storage medium in the form of a solid support matrix including a stabilising reagent or reagents in a dry form, for use as a support for oligonucleotide sequences encoded with data, 2 experiments were performed:

Example 1

Sequencing of DNA Fragments Applied to FTA Cards

(21) Samples of plasmid DNA designated pUC19-p53 were spotted (stock concentration of 0.5 ?g/ml in TE buffer) at a 1 ng mass onto FTA sample collection cards (GE Healthcare). These cards provide stabilising reagents according to the invention. This plasmid contains a 1038 base pairs of a gene named p53 mRNA co-corresponding to nucleotide sequence from 217 to 1255 of the p53 cDNA as described in GenBank accession number BC003596 (IMAGE; 3544714, MGC; 646). This fragment was cloned into the Smal site of the plasmid pUC19. The DNA sequence of the pUC-p53 DNA prior to application to FTA is shown in FIG. 1. After application of the plasmid, the FTA cards were left to air dry for 60 min before being placed into multi-barrier poaches and stored at room temperature for at least a week (as recommended by the manufacturer).

(22) DNA was extracted from the sample collection matrices using an Illustra? tissue and cells genomic prep Mini Spin Kit (GE Healthcare) according to manufacturer's recommendations with modifications as outlined below.

(23) 1. The 3.0 mm marked disk containing the applied DNA sample was removed using a Uni-Core punch and placed into a 1.5 ml sterile tube.

(24) 2. Sterile phosphate buffered saline (PBS) (1 ml) was added to each tube.

(25) 3. Tubes were centrifuged at 12 000 rpm for 2 min (All centrifugations were conducted in a Heraeus Biofuge?).

(26) 4. PBS was discarded and 50 ?l of fresh sterile PBS was added to each tube.

(27) 5. The 3.0 mm punches were homogenized using a 20-gauge sterile needle.

(28) 6. Tubes were centrifuged at 5000 rpm for 10 sec.

(29) 7. DNA was isolated from the supernatant according to the manufacturer's instructions for the Illustra tissue and cells genomic prep Mini Spin Kit.

(30) To generate DNA for sequencing two PCR products were generated using End-Point PCR (see FIGS. 2 and 3 for details). This was carried out as follows: rTaq2 DNA Polymerase (GE Healthcare) was used together with the following primer sequences to target the p53 insert in the plasmid DNA sample extracted from the respective paper matrices.

(31) The 963 base pair PCR product 1 (FIG. 2) was generated using the following Primers (Sigma Genosys):

(32) TABLE-US-00002 Primer1(M13forward[?20]); 5-GTAAAACGAACGGCCAGT-3 Primer2(M13reverse); 5-CAGGAAACAGCTATGAC-3

(33) The 250 base pair PCR product 2 (FIG. 3) was generated using the following Primers (Sigma Genosys):

(34) TABLE-US-00003 Primer3;(forward) 5-GCGCACAGAGGAAGAGAATC-3 Primer4;(reverse) 5-CCAAGGCCTCATTCAGCTCT-3

(35) A direct/punch-in PCR reaction was also performed, as an alternative procedure, in which a 1.2 mm diameter punch was used to excise a FTA disc from the card and this was placed directly into a PCR reaction supplemented with a-cyclodextrin as described in the UK patent application GB 1219137.5. The two PCR products were generated using the primers as described above.

(36) PCR amplicons were generated using a PCR master mix (95 ?l; 1 unit of enzyme, 20 pmol forward and reverse primers) and 5 ?l sample (plasmid DNA extracted from the FTA cards using the Illustra Kit) were added to appropriate wells of a 96-well plate and amplified accordingly (denature 94? C., 3 min. 30 cycles of 94? C., 30 sec; 55? C., 1 min; 72? C., 2 min; final soak at 72? C., 10 min). For direct amplification from the FTA card the 5 ?l DNA sample was replaced with the appropriate volume of buffer and a FTA disc.

(37) In order to check the integrity of the amplified DNA, 10 ?l volume of each PCR was loaded on a 1% agarose/TAE buffer gel (Tris, acetic acid and Ethylenediaminetetraacetic acid (EDTA)) along with DNA size standards and positive and negative (no template) PCR controls. To generate DNA sequencing reactions the PCR products and primers described above were used following the BigDye? Terminator v3.1 Cycle Sequencing Kit manufacturer's recommendations.

(38) To remove excess PCR primers, 2 ?l of ExoproSTAR (GE Healthcare) was added to 5 ?l of amplified DNA in a test tube. The test tube was sealed and spun briefly to ensure its contents were at the bottom of the tube. The tube was then incubated on a PTC200 thermocycler for 15 minutes at 37? C., followed by 15 minutes at 80? C. and a hold at 4? C. The final volume of each reaction was made up to 10 ?l by the addition of sterile water prior to the cycle sequencing reaction. Stock cycle sequencing mix was made up as follows:

(39) TABLE-US-00004 TABLE 1 Sequencing mix Constituent 1x 10x 15x Big Dye Mix 4 ?l 40 ?l 60 ?l Primer(1.6 pmol/?l) 2 ?l 20 ?l 30 ?l 5x Buffer 4 ?l 40 ?l 60 ?l

(40) Ten microliters of sequencing mix was added to each 10 ?l PCR product from above. These samples were then amplified using both forward and reverse sequencing primers using the following protocol on a MJ Research PTC200 thermocycler: 96? C. 30 seconds; 50? C. 15 seconds; 60? C. 4 minutes for 25 cycles followed by a 4? C. hold on completion of the cycling. The amplified sequencing products were then processed using the Auto Seq G-50 columns: Columns were vortexed briefly, the caps loosened and the seals removed. The columns were then placed into collection tubes. The tubes were spun for 1 minute at 2000 g in an Eppendorf 5417C centrifuge and the flow through discarded. The columns were then transferred to fresh tubes and sequencing sample slowly applied to the top of the column. The tubes were spun for 1 minute at 2000?g, the eluates were collected and stored at 2-8? C. in labelled 0.2 ml PCR tubes prior to transfer to a sequencing facility for analysis.

(41) Each product was sequenced six times. The consensus sequences of both PCR products (product 1 and 2) were compiled from all sequencing reactions performed. The sequencing results of the amplified PCR pUC-p53 products are shown in FIGS. 2 and 3. Analysis of both consensus sequences indicates a complete homology with that described in the relevant regions of sequence described in FIG. 1.

(42) FIG. 1 shows the sequence in pUC19 containing a fragment of the p53 cDNA as it was deposited on the FTA support matrix which includes the stabilising reagents. Sequences in italics and bold refer to pUC19- and p53-specific sequences respectively. Sequences underline refers to PCR and DNA sequencing primers used in this study.

(43) FIG. 2 shows the recovered PCR product 1 sequence, and so demonstrates that recovery of DNA sequences of at least 963 base pairs which equates to 1900 bytes (about 2 Kb) of digital information, is possible on FTA cards. Thus relatively long DNA sequences are stable in this storage format.

(44) FIG. 3 shows the recovered PCR product 2 sequence, and so demonstrates that recovery of DNA sequences of 250 base pairs which equates to 500 bytes (about 0.5 Kb) of digital information, is possible on FTA cards.

Example 2

Quantitative PCR (qPCR) Measurement of DNA Recovered from FTA Cards

(45) qPCR was employed to quantify the amount of amplifiable DNA on each 1.2 mm diameter disks punched from FTA cards compared with a plain paper support (Whatman product code: 31-ETF) control. The real-time PCR method targeted a 500 bp region of Lambda DNA previously applied to FTA cards and dried. The technique utilized the intercalating dye GelStar? Nucleic Acid Stain (Lonza Group Ltd) to reveal the amplified product. This method provided the threshold cycle (Ct) values and mass amounts of amplifiable DNA when read from the standard curve. The PCR master mix contained the following components: FideliTaq? DNA Polymerase (GE Healthcare); Lambda DNA (Promega Corp); GelStar Nucleic Acid Stain, used at 1:1000 dilution in a Tris/EDTA (TE) buffer; and Primers (Sigma Genosys):

(46) TABLE-US-00005 Primer1forward 5-GGTTATCGAAATCAGCCACAGCGCC-3 Primer2reverse 5'-GATGAGTTCGTGTCCGTACAACTGG-3

(47) A PCR master mix (20 ?l; containing 2.5 U enzyme, 5 pmol forward and reverse primers, and 1.5 ?l GelStar Nucleic Acid Stain) and 5 ?l sample or standard DNA dilution series, were added to wells of a 96-well plate and amplified on an Applied Biosystems 7900HT Fast Real-Time PCR System according to the specified cycling conditions (denature 95? C., 3 min. 40 cycles of 95? C., 30 sec; 55? C., 30 sec; 72? C., 1 min).

(48) Data were generated using qPCR to reveal the amounts of amplifiable DNA that could be recovered from DNA-spotted FTA cards. The standard curve was used to accurately determine the amount of amplifiable DNA that could be recovered from FTA cards. Amounts of DNA recovered were compared with those derived from uncoated paper matrix 31-ETF (FIG. 4).

(49) FIG. 4 shows DNA Recovery from FTA Cards as determined by quantitative PCR. DNA was applied to the matrices and recovered. The quantity of DNA recovered from FTA was compared with that recovered from uncoated paper. This graph demonstrates that recovery of DNA data from FTA is superior to recovery from a support which has no stabilising reagents added.

(50) To enhance the usefulness of the invention the inventors propose means for monitoring the integrity of the DNA for example visually on the solid support, or by minor processing of a portion of the storage medium (to leave the bulk of the replicated data on the storage medium).

(51) In one embodiment, the data integrity monitoring technique relies on specific reporter groups attached or incorporated into the synthesized DNA either during or post synthesis. The reporters will generate an appropriate signal to indicate DNA integrity e.g. if the DNA remains intact or has become degraded. Such reporter groups could potentially be based upon a range of molecules including florescent cye-dyes, quenchers and emitters, enzymatic reporters etc. If florescent dyes are used, then simple illumination of the storage medium will indicate the integrity of the data stored thereon. Alternatively the data integrity monitoring technique may be performed by removing a portion of the storage medium and performing short tandem repeat (STR) profiling reaction that will interrogate 20 or so different STR loci generated as control DNA sequences within the synthesised DNA data. Such interrogation should report the correct profile, if the integrity of the DNA data is intact. Either system will potentially facilitate a relatively simple method in which the integrity of the DNA can be monitored during/after storage.

Example 3

Short Tandem Repeat Profiling as a Data Integrity Check

(52) In order to further demonstrate the integrity of the DNA applied to the FTA following elution, short tandem repeat (STR) profiling was carried out using punches taken from dried blood spots on FTA Micro Cards. Dried blood spots for sample collection purposes were prepared as follows. Human whole blood, from 2 donors, (75 ?l) was spotted on to FTA Micro Cards, allowed to dry at ambient temperature for 3 h, and stored in a desiccator before use. FTA cards were sampled within one week of spotting. FTA cards were punched at the centre of the sample spot in order to reduce variability.

(53) The resultant punches were added directly to the STR profiling reagents and direct STR profiling was carried out on the punches using a PowerPlex 18D System (Promega, Southampton, UK) over 26 amplification cycles, following the kit manufacturer's instructions. Thermal cycling conditions were as follows: 96? C. for 2 min., then: 94? C. for 10 sec, 60? C. for 1 min for 26 cycles, then: 60? C. for 20 min, followed by a 4? C. rest.

(54) The resulting PCR products were analysed on an ABI? 3130xl Genetic Analyzer capillary electrophoresis system. The electrophoresis run cocktail mix was prepared using 1.0 ?l of CC5 (Promega) internal lane standard plus 10.0 ?l HiDi formamide per sample. Eleven microliters of run cocktail mix was added to 1.0 ?l of sample. Instrument settings were as follows: injection time, 5 seconds; injection and run voltages, 3 kV; run time, 1,500 seconds with a G5 dye set. Results were evaluated with GeneMapper? v3.2 software (Life Technologies, Paisley, UK).

(55) FIG. 5 shows an electropherogram prepared in GeneMapper software following DNA amplification from dried blood spots on FTA Micro Cards. Punches were prepared a 1.2 mm manual micro-punch. STR analysis was carried out using the PowerPlex 18D kits over 26 amplification cycles. The resulting PCR amplicons were analysed using an Applied Biosystems 3130xl Genetic Analyser and GeneMapper v3.2 software. The ordinate shows relative fluorescence units (RFU) ranging from 2000-8000, with 8000 RFU showing full scale deflection. The abscissa shows base pairs. The figures below the peaks indicate the number of short tandem repeats associated with each allele. Periodical STR profiling reactions can be performed on the synthesised DNA applied to FTA cards and a quality measure such as peak heights etc. can be used to determine the quality and integrity of the applied synthetic DNA. The use of small 1.2 mm punches will potentially represent a preservation of DNA resource stored on FTA for long periods

(56) It can be concluded that using a STR profiling is a viable way to determine the integrity of DNA data stored on FTA cards.

(57) Where reporter groups are used, these reporters may provide for visual (manual or automated) inspection of the solid support and the generation of a signal that indicates that the DNA remains intact to facilitate sequencing & hence data recovery. The following commercially-available florescent dyes that are known to interact with DNA and have been used in fluorescence microscopy; PicoGreen, SYBR Green, Acridine Orange, Bisbenzimide trihydrochloride, HOE-33342. Hoechst 33258, DAPI, Hoechst, Propidium Iodide etc. Visible dyes DNA binding dyes include; methylene blue, thionine etc Chen et al 2009 Nano Today, 4, 125-134 describes a FRET assay to monitor DNA stability. Plasmid DNA was double-labeled with QD (525 nm emission) and specific nucleic acid dyes complexed with Cy5. The QD donor drives energy transfer stepwise through the intermediate nucleic acid dye to the final acceptor Cy5. Incorporating reagent into the DNA polymer could also be used.

(58) The combination of data in the form of DNA on a storage medium according to the invention, coupled with a simple system to measure its integrity is a viable solution to archiving complex digital information.

(59) FIG. 6 shows FTA paper having relatively large fibres of cellulose material i.e. those thick elements running generally from top to bottom of the picture, intertwined with DNA strands, akin to a spider's web. This image illustrates the way that the DNA is held to the FTA in an intertwined way and thereby stabilised.

(60) The experiments above show that FTA is a preferred data storage medium, although previous experiments have concluded that a solid matrix in the form of a cellulose fibre paper matrix, which has no stabilising reagent added, is capable of storing human DNA for many years27 years in the reported case. So, although desirable, it is not essential to include a stabilising reagent in or to the solid matrix.

(61) Although fibrous solid matrix have been used in experimentsin this case a cellulose based paper product called FTA, it will be apparent to the skilled addressee that any material or surface which is man made and has a surface or subsurface having the ability to hold oligonucleotides in a similar manner to that shown in FIG. 6 could be employed as a solid matrix support for this invention. For example, woven or non-woven fibrous materials, including man made, or naturally occurring polymer fibres, mineral fibre based materials such as glass fibre materials, surface treated solid materials for example, chemically of mechanically treated materials, including laser etched surfaces, all provided with a surface micro roughness of sufficient roughness to hold DNA, porous materials such as porous polymers which have pores large enough to accommodate and hold the DNA stands, in a similar manner to that shown in FIG. 6. In each case the surface or subsurface of the material accommodates DNA, but will yield the DNA when required, and thereby is also capable of accepting a stabilising reagent if used in dry form or in wet form for subsequent drying.

(62) Recovering DNA from the storage medium has been carried out in the experiments described above, using known techniques which all involve liquid based replication of DNA by PCR, although other replication is possible, for example isothermal amplification, e.g. helicase dependant amplification, or for example rolling circle amplification. However, providing sufficient DNA can be recovered from the storage medium, that liquid based replication is not essential.

(63) Additionally, removal of a portion of the storage medium to recover data is not essential. It is possible to elute a portion of the DNA data off the storage medium without removing a portion of the storage medium itself. In a further example DNA can be removed for the storage medium by abrading the storage medium which will itself hold multiple copies of the DNA data, and analysing directly the material abraded from the storage medium. The DNA data could even be analysed whilst remaining on the storage medium, for example by spectroscopic, or molecular analysis techniques.

(64) Although embodiments only have been illustrated, it will be apparent to the skilled addressee that further modifications, variants, additions and omissions are possible within the scope and spirit of the invention defined herein.