G06N3/123

Recovering timing information from DNA encoded data
11580413 · 2023-02-14 · ·

Systems and methods for timing recovery in DNA storage systems is described. In one embodiment, the present systems and methods include generating a unique pattern of DNA bases and use the unique pattern for a phase-locked loop (PLL) field of a data layout, generating a multidimensional mapping, configuring the multidimensional mapping to include one or more prohibited sequences of DNA bases, identifying a prohibited sequence from the multidimensional mapping and use the prohibited sequence for one or more synch-mark (SM) fields of the data layout, prohibiting a User Data field from using any of the prohibited sequences of DNA bases when converting binary data to DNA bases, identifying random insertion and/or deletion of DNA bases in the User Data field, and repairing the random insertion and/or deletion of DNA bases in the User Data field.

Recovering timing information from DNA encoded data
11580413 · 2023-02-14 · ·

Systems and methods for timing recovery in DNA storage systems is described. In one embodiment, the present systems and methods include generating a unique pattern of DNA bases and use the unique pattern for a phase-locked loop (PLL) field of a data layout, generating a multidimensional mapping, configuring the multidimensional mapping to include one or more prohibited sequences of DNA bases, identifying a prohibited sequence from the multidimensional mapping and use the prohibited sequence for one or more synch-mark (SM) fields of the data layout, prohibiting a User Data field from using any of the prohibited sequences of DNA bases when converting binary data to DNA bases, identifying random insertion and/or deletion of DNA bases in the User Data field, and repairing the random insertion and/or deletion of DNA bases in the User Data field.

MOLECULAR DATA STORAGE SYSTEMS AND METHODS
20230040158 · 2023-02-09 ·

A molecular data storage system is presented for encoding data-block(s). The system includes one or more populations of molecular sequences, each population encoding a respective one of the data-blocks. Each molecular sequence comprises a data encoding section comprising a sequence of similar predetermined length N of short k-mers, whereby in each population the data encoding sections of all molecular sequences have the similar predetermined length N. The short k-mers serve as data encoding building blocks of the data encoding sections, whereby valid short k-mers serving as data encoding building blocks form a subset of a building-block-set consisting of a number Z of different preselected short k-mers each presenting a unique combination of a number k of bases of a preselected set of bases, characterized in that all the Z types of short k-mers in said building-block-set have a similar predetermined size k≥2 (plurality) of bases. The data encoding sections collectively encode a sequence of encoded alphabet letters S=(π.sup.1, π.sup.2, . . . , π.sup.n . . . , π.sup.N−1, π.sup.N). Each valid encoded alphabet letter π.sup.n at location n of the sequence S of alphabet letters is characterized by occurrence of a predetermined plurality of different types of short k-mers of the building-block-set in a corresponding location n along the data encoding sections of the plurality of molecular sequences of said population.

Molecular State Machines
20180004537 · 2018-01-04 ·

A molecular state machine is implemented in a cell by designing the cell to use specific homology directed repair (“HDR”) templates for repairing double strand breaks in polynucleotides based on a current “state” of the cell. The state may be established by the presence of a molecule in the cell or by the availability of specific cut sites in the polynucleotides of the cell. Different HDR templates or different nucleases may be available for performing HDR based on the state. When the state is changed, the same signal or event will result in a different HDR template being incorporated into the existing polynucleotides of the cell. Signals that are internal or external to the cell may be used to change the state of the cell. The cell may create a log of molecular events, store binary data, or perform other synthetic biology/molecular computing functions based on state.

Neural networks implemented with DSD circuits

Neural networks can be implemented with DNA strand displacement (DSD) circuits. The neural networks are designed and trained in silico taking into account the behavior of DSD circuits. Oligonucleotides comprising DSD circuits are synthesized and combined to form a neural network. In an implementation, the neural network may be a binary neural network in which the output from each neuron is a binary value and the weight of each neuron either maintains the incoming binary value or flips the binary value. Inputs to the neural network are one more oligonucleotides such as synthetic oligonucleotides containing digital data or natural oligonucleotides such as mRNA. Outputs from the neural networks may be oligonucleotides that are read by directly sequencing or oligonucleotides that generate signals such as by release of fluorescent reporters.

Neural networks implemented with DSD circuits

Neural networks can be implemented with DNA strand displacement (DSD) circuits. The neural networks are designed and trained in silico taking into account the behavior of DSD circuits. Oligonucleotides comprising DSD circuits are synthesized and combined to form a neural network. In an implementation, the neural network may be a binary neural network in which the output from each neuron is a binary value and the weight of each neuron either maintains the incoming binary value or flips the binary value. Inputs to the neural network are one more oligonucleotides such as synthetic oligonucleotides containing digital data or natural oligonucleotides such as mRNA. Outputs from the neural networks may be oligonucleotides that are read by directly sequencing or oligonucleotides that generate signals such as by release of fluorescent reporters.

METHOD AND DATA PROCESSING DEVICE FOR PROCESSING GENETIC DATA
20230021229 · 2023-01-19 ·

A method for processing genetic data, which comprise a series of sequence elements each representing a biomolecule, comprises the steps of forming sequence fragments (S2), wherein each sequence fragment comprises a section of the series of sequence elements having a fragment length of at least two sequence elements, applying a coding function to each of the sequence fragments in order to generate a multiplicity of encrypted fragment data items (S3) winch are each assigned to one of the sequence fragments, and storing the encrypted fragment data (S4), wherein the sequence fragments are formed in such a manner that the sections of the series of sequence elements overlap and each sequence element is included in at least two sequence fragments. A description is also given of a data processing device for processing genetic data and a method for querying a database containing encrypted fragment data which were generated and stored using the method for processing genetic data.

AUTHENTICATION DEVICE USING DNA BASE SEQUENCE INFORMATION
20230015381 · 2023-01-19 ·

The present disclosure relates to an authentication device using DNA base sequence information, the authentication device including an authentication means, in which a plurality of DNA base sequence information are included in authentication information derived by reading out the authentication means composed of a plurality DNAs.

GENERATING MACHINE LEARNING MODELS USING GENETIC DATA
20230222311 · 2023-07-13 ·

Systems, methods, and apparatuses for generating and using machine learning models using genetic data. A set of input features for training the machine learning model can be identified and used to train the model based on training samples, e.g., for which one or more labels are known. As examples, the input features can include aligned variables (e.g., derived from sequences aligned to a population level or individual references) and/or non-aligned variables (e.g., sequence content). The features can be classified into different groups based on the underlying genetic data or intermediate values resulting from a processing of the underlying genetic data. Features can be selected from a feature space for creating a feature vector for training a model. The selection and creation of feature vectors can be performed iteratively to train many models as part of a search for optimal features and an optimal model.

High-Capacity Storage of Digital Information in DNA

A method for storage of an item of information (210) is disclosed. The method comprises encoding bytes (720) in the item of information (210), and representing using a schema the encoded bytes by a DNA nucleotide to produce a DNA sequence (230). The DNA sequence (230) is broken into a plurality of overlapping DNA segments (240) and indexing information (250) added to the plurality of DNA segments. Finally, the plurality of DNA segments (240) is synthesized (790) and stored (795).