Reagents and Methods for the Analysis of Microparticles

Abstract

Reagents and methods for the analysis of cell free biomolecules (e.g. cell free nucleic acid molecules and cell free polypeptides) of microparticles (e.g. cell-free microparticles originating from blood, or cell-free microparticles originating from an embryo generated by in vitro fertilisation) are provided. Also provided are reagents and methods for the analysis of biomolecules (e.g. nucleic acid molecules and polypeptides) of cells (e.g. cells originating from blood, or cells originating from an embryo generated by in vitro fertilisation). The methods comprise analysing a sample that comprises a microparticle (or cell) or a sample derived from a microparticle (or cell). The methods include methods of measuring at least two linked signals, each signal corresponding to the presence, absence and/or level of a biomolecule of a microparticle (or cell). The methods also include methods of determining the presence, absence and/or level of a biomolecule of a microparticle (or cell) using a barcoded affinity probe. In certain methods both nucleic acid biomolecules and non-nucleic acid biomolecules of a microparticle (or cell) are analysed together. Reagents for use in the methods are also provided.

Claims

1. A method of analysing a sample comprising a cell-free microparticle or a sample derived from a cell-free microparticle, wherein the cell-free microparticle contains at least two fragments of genomic DNA, wherein the cell-free microparticle is derived from a pre-implantation embryo generated by in vitro fertilisation, and wherein the method comprises: (a) preparing the sample for sequencing comprising linking at least two of the at least two fragments of genomic DNA to produce a set of at least two linked fragments of genomic DNA; and (b) sequencing each of the linked fragments in the set to produce at least two linked sequence reads.

2. The method of claim 1, wherein at least 3, at least 4, at least 5, at least 10, at least 50, at least 100, at least 500, at least 1000, at least 5000, at least 10,000, at least 100,000, or at least 1,000,000 fragments of genomic DNA of the cell-free microparticle are linked and then sequenced to produce at least 3, at least 4, at least 5, at least 10, at least 50, at least 100, at least 500, at least 1000, at least 5000, at least 10,000, at least 100,000, or at least 1,000,000 linked sequence reads.

3. The method of claim 1 or claim 2, wherein the sample comprises first and second cell-free microparticles, wherein each cell-free microparticle contains at least two fragments of genomic DNA, and wherein the method comprises performing step (a) to produce a first set of linked fragments of genomic DNA for the first cell-free microparticle and a second set of linked fragments of genomic DNA for the second cell-free microparticle, and performing step (b) to produce a first set of linked sequence reads for the first cell-free microparticle and a second set of linked sequence reads for the second cell-free microparticle.

4. The method of claim 1 or claim 2, wherein the sample comprises n cell-free microparticles, wherein each cell-free microparticle contains at least two fragments of genomic DNA, and wherein the method comprises performing step (a) to produce n sets of linked fragments of genomic DNA, one set for each of the n cell-free microparticles, and performing step (b) to produce n sets of linked sequence reads, one for each of the n cell-free microparticles.

5. The method of claim 4, wherein n is at least 3, at least 5, at least 10, at least 50, at least 100, at least 1000, at least 10,000, at least 100,000, at least 1,000,000, at least 10,000,000, or at least 100,000,000 cell-free microparticles.

6. The method of any one of claims 3-5, wherein prior to step (a), the method further comprises the step of partitioning the sample into at least two different reaction volumes.

7. A method of preparing a sample for sequencing, wherein the sample comprises a cell-free microparticle or a sample derived from a cell-free microparticle, wherein the cell-free microparticle contains at least two fragments of genomic DNA, wherein the cell-free microparticle is derived from a pre-implantation embryo generated by in vitro fertilisation, and wherein the method comprises appending the at least two fragments of genomic DNA of the cell-free microparticle to a barcode sequence, or to different barcode sequences of a set of barcode sequences, to produce a set of linked fragments of genomic DNA.

8. The method of claim 7, wherein prior to the step of appending the at least two fragments of genomic DNA of the cell-free microparticle to a barcode sequence, or to different barcode sequences of a set of barcode sequences, the method comprises appending a coupling sequence to each of the fragments of genomic DNA of the cell-free microparticle, wherein the coupling sequences are then appended to the barcode sequence, or to the different barcode sequences of a set of barcode sequences, to produce the set of linked fragments of genomic DNA.

9. The method of claim 7 or claim 8, wherein the sample comprises first and second cell-free microparticles, wherein each cell-free microparticle contains at least two fragments of genomic DNA, and wherein the method comprises appending the at least two fragments of genomic DNA of the first cell-free microparticle to a first barcode sequence, or to different barcode sequences of a first set of barcode sequences, to produce a first set of linked fragments of genomic DNA and appending the at least two fragments of genomic DNA of the second cell-free microparticle to a second barcode sequence, or to different barcode sequences of a second set of barcode sequences, to produce a second set of linked fragments of genomic DNA.

10. The method of any one of claims 1-6, wherein the method comprises: (a) preparing the sample for sequencing comprising appending the at least two fragments of genomic DNA of the cell-free microparticle to a barcode sequence to produce a set of linked fragments of genomic DNA; and (b) sequencing each of the linked fragments in the set to produce at least two linked sequence reads, wherein the at least two linked sequence reads are linked by the barcode sequence.

11. The method of claim 10, wherein prior to the step of appending the at least two fragments of genomic DNA of the cell-free microparticle to a barcode sequence, the method comprises appending a coupling sequence to each of the fragments of genomic DNA of the cell-free microparticle, wherein the coupling sequences are then appended to the barcode sequence to produce the set of linked fragments of genomic DNA.

12. The method of claim 10 or claim 11, wherein the sample comprises first and second cell-free microparticles, wherein each cell-free microparticle contains at least two fragments of genomic DNA, and wherein the method comprises performing step (a) to produce a first set of linked fragments of genomic DNA for the first cell-free microparticle and a second set of linked fragments of genomic DNA for the second cell-free microparticle, and performing step (b) to produce a first set of linked sequence reads for the first cell-free microparticle and a second set of linked sequence reads for the second cell-free microparticle, wherein the at least two linked sequence reads for the first cell-free microparticle are linked by a different barcode sequence to the at least two linked sequence reads of the second cell-free microparticle.

13. The method of any one of claims 1-6, wherein the method comprises: (a) preparing the sample for sequencing comprising appending each of the at least two fragments of genomic DNA of the cell-free microparticle to a different barcode sequence of a set of barcode sequences to produce a set of linked fragments of genomic DNA; and (b) sequencing each of the linked fragments in the set to produce at least two linked sequence reads, wherein the at least two linked sequence reads are linked by the set of barcode sequences.

14. The method of claim 13, wherein prior to the step of appending each of the at least two fragments of genomic DNA of the cell-free microparticle to a different barcode sequence, the method comprises appending a coupling sequence to each of the fragments of genomic DNA of the cell-free microparticle, wherein each of the at least two fragments of genomic DNA of the cell-free microparticle is appended to a different barcode sequence of the set of barcode sequences by its coupling sequence.

15. The method of claim 13 or claim 14, wherein the sample comprises first and second cell-free microparticles, wherein each cell-free microparticle contains at least two fragments of genomic DNA, and wherein the method comprises performing step (a) to produce a first set of linked fragments of genomic DNA for the first cell-free microparticle and a second set of linked fragments of genomic DNA for the second cell-free microparticle, and performing step (b) to produce a first set of linked sequence reads for the first cell-free microparticle and a second set of linked sequence reads for the second cell-free microparticle, wherein the first set of linked sequence reads are linked by a different set of barcode sequences to the second set of linked sequence reads.

16. The method of any one of claims 7-15, wherein prior to the step of appending, the method further comprises the step of partitioning the sample into at least two different reaction volumes.

17. A method of preparing a sample for sequencing, wherein the sample comprises first and second cell-free microparticles or wherein the sample is derived from first and second cell-free microparticles, and wherein each cell-free microparticle contains at least two fragments of a target nucleic acid, wherein the cell-free microparticle is derived from a pre-implantation embryo generated by in vitro fertilisation, and wherein the method comprises the steps of: (a) contacting the sample with a library comprising at least two multimeric barcoding reagents, wherein each multimeric barcoding reagent comprises first and second barcode regions linked together, wherein each barcode region comprises a nucleic acid sequence and wherein the first and second barcode regions of a first multimeric barcoding reagent are different to the first and second barcode regions of a second multimeric barcoding reagent of the library; and (b) appending barcode sequences to each of first and second fragments of the target nucleic acid of the first cell-free microparticle to produce first and second barcoded target nucleic acid molecules for the first cell-free microparticle, wherein the first barcoded target nucleic acid molecule comprises the nucleic acid sequence of the first barcode region of the first multimeric barcoding reagent and the second barcoded target nucleic acid molecule comprises the nucleic acid sequence of the second barcode region of the first multimeric barcoding reagent, and appending barcode sequences to each of first and second fragments of the target nucleic acid of the second cell-free microparticle to produce first and second barcoded target nucleic acid molecules for the second cell-free microparticle, wherein the first barcoded target nucleic acid molecule comprises the nucleic acid sequence of the first barcode region of the second multimeric barcoding reagent and the second barcoded target nucleic acid molecule comprises the nucleic acid sequence of the second barcode region of the second multimeric barcoding reagent.

18. The method of claim 17, wherein the method comprises the steps of: (a) contacting the sample with a library comprising at least two multimeric barcoding reagents, wherein each multimeric barcoding reagent comprises first and second barcoded oligonucleotides linked together, wherein the barcoded oligonucleotides each comprise a barcode region and wherein the barcode regions of the first and second barcoded oligonucleotides of a first multimeric barcoding reagent of the library are different to the barcode regions of the first and second barcoded oligonucleotides of a second multimeric barcoding reagent of the library; and (b) annealing or ligating the first and second barcoded oligonucleotides of the first multimeric barcoding reagent to first and second fragments of the target nucleic acid of the first cell-free microparticle to produce first and second barcoded target nucleic acid molecules, and annealing or ligating the first and second barcoded oligonucleotides of the second multimeric barcoding reagent to first and second fragments of the target nucleic acid of the second cell-free microparticle to produce first and second barcoded target nucleic acid molecules; and optionally wherein prior to the step of annealing or ligating the first and second barcoded oligonucleotides to first and second fragments of genomic DNA, the method comprises appending a coupling sequence to each of the fragments of genomic DNA, wherein the first and second barcoded oligonucleotides are then annealed or ligated to the coupling sequences of the first and second fragments of genomic DNA.

19. The method of claim 17 or claim 18, wherein step (b) comprises: (i) annealing the first and second barcoded oligonucleotides of the first multimeric barcoding reagent to first and second fragments of genomic DNA of the first cell-free microparticle, and annealing the first and second barcoded oligonucleotides of the second multimeric barcoding reagent to first and second fragments of genomic DNA of the second cell-free microparticle; and (ii) extending the first and second barcoded oligonucleotides of the first multimeric barcoding reagent to produce first and second different barcoded target nucleic acid molecules and extending the first and second barcoded oligonucleotides of the second multimeric barcoding reagent to produce first and second different barcoded target nucleic acid molecules, wherein each of the barcoded target nucleic acid molecules comprises at least one nucleotide synthesised from the fragments of genomic DNA as a template.

20. The method of any one of claims 17-19, wherein prior to step (b), the method further comprises the step of partitioning the sample into at least two different reaction volumes.

21. The method of any one of claims 1-16, wherein the method comprises: (a) preparing the sample for sequencing comprising: (i) contacting the sample with a multimeric barcoding reagent comprising first and second barcode regions linked together, wherein each barcode region comprises a nucleic acid sequence, and (ii) appending barcode sequences to each of the at least two fragments of genomic DNA of the cell-free microparticle to produce first and second different barcoded target nucleic acid molecules, wherein the first barcoded target nucleic acid molecule comprises the nucleic acid sequence of the first barcode region and the second barcoded target nucleic acid molecule comprises the nucleic acid sequence of the second barcode region; and (b) sequencing each of the barcoded target nucleic acid molecules to produce at least two linked sequence reads; and optionally wherein prior to the step of appending barcode sequences to each of the at least two fragments of genomic DNA of the cell-free microparticle, the method comprises appending a coupling sequence to each of the fragments of genomic DNA of the cell-free microparticle, wherein a barcode sequence is then appended to the coupling sequence of each of the at least two fragments of genomic DNA of the cell-free microparticle to produce the first and second different barcoded target nucleic acid molecules.

22. The method of claim 21, wherein the method comprises analysing a sample comprising at least two cell-free microparticles, wherein each cell-free microparticle contains at least two fragments of genomic DNA, and wherein the method comprises the steps of: (a) preparing the sample for sequencing comprising: (i) contacting the sample with a library of multimeric barcoding reagents comprising a multimeric barcoding reagent for each of the two or more cell-free microparticles, wherein each multimeric barcoding reagent is as defined in claim 21; and (ii) appending barcode sequences to each of the at least two fragments of genomic DNA of each cell-free microparticle, wherein at least two barcoded target nucleic acid molecules are produced from each of the at least two cell-free microparticles, and wherein the at least two barcoded target nucleic acid molecules produced from a single cell-free microparticle each comprise the nucleic acid sequence of a barcode region from the same multimeric barcoding reagent; and (b) sequencing each of the barcoded target nucleic acid molecules to produce at least two linked sequence reads for each cell-free microparticle.

23. The method of claim 22, wherein prior to the step of appending, the method further comprises the step of partitioning the sample into at least two different reaction volumes.

24. The method of any one of claims 1-23, wherein the cell-free microparticle is selected from the group consisting of: an exosome, an apoptotic body, and an extracellular microvesicle.

25. The method of any one of claims 1-24, wherein the sample comprises in vitro embryo culture medium and/or embryonic blastocoel fluid.

26. A method of analyzing a pre-implantation embryo generated by in vitro fertilisation, wherein the method comprises analysing a sample comprising a cell-free microparticle or a sample derived from a cell-free microparticle by the method of any one of claims 1-25, and wherein the cell-free microparticle is derived from the pre-implantation embryo.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[1197] The invention, together with further objects and advantages thereof, may best be understood by making reference to the description taken together with the accompanying drawings, in which:

[1198] FIG. 1 illustrates a multimeric barcoding reagent that may be used in the method illustrated in FIG. 3 or FIG. 4.

[1199] FIG. 2 illustrates a kit comprising a multimeric barcoding reagent and adapter oligonucleotides for labelling a target nucleic acid.

[1200] FIG. 3 illustrates a first method of preparing a nucleic acid sample for sequencing using a multimeric barcoding reagent.

[1201] FIG. 4 illustrates a second method of preparing a nucleic acid sample for sequencing using a multimeric barcoding reagent.

[1202] FIG. 5 illustrates a method of preparing a nucleic acid sample for sequencing using a multimeric barcoding reagent and adapter oligonucleotides.

[1203] FIG. 6 illustrates a method of preparing a nucleic acid sample for sequencing using a multimeric barcoding reagent, adapter oligonucleotides and target oligonucleotides.

[1204] FIG. 7 illustrates a method of assembling a multimeric barcode molecule using a rolling circle amplification process.

[1205] FIG. 8 illustrates a method of synthesizing multimeric barcoding reagents for labeling a target nucleic acid that may be used in the methods illustrated in FIG. 3, FIG. 4 and/or FIG. 5.

[1206] FIG. 9 illustrates an alternative method of synthesizing multimeric barcoding reagents (as illustrated in FIG. 1) for labeling a target nucleic acid that may be used in the method illustrated in FIG. 3 and/or FIG. 4.

[1207] FIG. 10 is a graph showing the total number of nucleotides within each barcode sequence.

[1208] FIG. 11 is a graph showing the total number of unique barcode molecules in each sequenced multimeric barcode molecule.

[1209] FIG. 12 shows representative multimeric barcode molecules that were detected by the analysis script.

[1210] FIG. 13 is a graph showing the number of unique barcodes per molecular sequence identifier against the number of molecular sequence identifiers following the barcoding of synthetic DNA templates of known sequence with multimeric barcoding reagents containing barcoded oligonucleotides.

[1211] FIG. 14 is a graph showing the number of unique barcodes per molecular sequence identifier against the number of molecular sequence identifiers following the barcoding of synthetic DNA templates of known sequence with multimeric barcoding reagents and separate adapter oligonucleotides.

[1212] FIG. 15 is a table showing the results of barcoding genomic DNA loci of three human genes (BRCA1, HLA-A and DQB1) with multimeric barcoding reagents containing barcoded oligonucleotides.

[1213] FIG. 16 is a schematic illustration of a sequence read obtained from barcoding genomic DNA loci with multimeric barcoding reagents containing barcoded oligonucleotides.

[1214] FIG. 17 is a graph showing the number of barcodes from the same multimeric barcoding reagent that labelled sequences on the same synthetic template molecule against the number of synthetic template molecules.

[1215] FIG. 18 illustrates a method in which two or more sequences from a microparticle are determined and linked informatically.

[1216] FIG. 19 illustrates a method in which sequences from a particular microparticle are linked by a shared identifier.

[1217] FIG. 20 illustrates a method in which molecular barcodes are appended to fragments of genomic DNA within microparticles that have been partitioned, and wherein said barcodes provide a linkage between sequences derived from the same microparticle.

[1218] FIG. 21 illustrates a specific method in which molecular barcodes are appended to fragments of genomic DNA within microparticles by multimeric barcoding reagents, and wherein said barcodes provide a linkage between sequences derived from the same microparticle.

[1219] FIG. 22 illustrates a method in which fragments of genomic DNA within individual microparticles are appended to each other, and wherein the resulting molecules are sequenced, such that sequences from two or more fragments of genomic DNA from the same microparticle are determined from the same sequenced molecule, thereby establishing a linkage between fragments within the same microparticle.

[1220] FIG. 23 illustrates a method in which individual microparticles (and/or small groups of microparticles) from a large sample of microparticles are sequenced in two or more separate, individual sequencing reactions, and the sequences determined from each such sequencing reaction are thus determined to be linked informatically and thus predicted to derive from the same individual microparticle (and/or small group of microparticles).

[1221] FIG. 24 illustrates a specific method in which fragments of genomic DNA within individual microparticles are appended to a discrete region of a sequencing flow cell prior to sequencing, and wherein the proximity of fragments sequenced on said flow cell provides a linkage between sequences derived from the same microparticle.

[1222] FIG. 25 illustrates the linkage of sequences of fragments of genomic DNA within a microparticle, as produced by a method of appending barcoded oligonucleotides (from the ‘Variant A’ version of the example protocol). Shown is the density of sequence reads across all chromosomes in the human genome, with clear clustering of reads within singular chromosomal segments.

[1223] FIG. 26 illustrates the linkage of sequences of fragments of genomic DNA within a microparticle, as produced by a method of appending barcoded oligonucleotides (from the ‘Variant B’ version of the example protocol). Shown is the density of sequence reads across all chromosomes in the human genome, with clear clustering of reads within singular chromosomal segments.

[1224] FIG. 27 illustrates the linkage of sequences of fragments of genomic DNA within a microparticle, as produced by a method of appending barcoded oligonucleotides (from the ‘Variant B’ version of the example protocol). Shown is the density of sequence reads zoomed in within a specific chromosomal segment, to show the focal, high-density nature of these linked reads.

[1225] FIG. 28 illustrates the linkage of sequences of fragments of genomic DNA within a microparticle, as produced by a method of appending barcoded oligonucleotides (from the ‘Variant C’ version of the example protocol). Shown is the density of sequence reads across all chromosomes in the human genome, with clear clustering of reads within singular chromosomal segments, though with such segments being larger in chromosomal span than in the other Variant methods (due to the larger microparticles being pelleted within Variant C compared with Variants A or B).

[1226] FIG. 29 illustrates a negative-control experiment, wherein fragments of genomic DNA are purified (i.e. therefore being unlinked) before being appended to barcoded oligonucleotides. No clustering of reads is observed at all, validating that microparticles comprise fragments of genomic DNA from focal, contiguous genomic regions.

[1227] FIG. 30 illustrates the concept of multi-parametric measurement of target molecules of a single microparticle.

[1228] FIG. 31 illustrates a method in which target biomolecules are measured using barcoded affinity probes and a step of partitioning.

[1229] FIG. 32 illustrates a method in which target biomolecules are measured using barcoded affinity probes and multimeric barcoding reagents.

[1230] FIG. 33 illustrates a method (and associated experimental results) of analysing a sample comprising a microparticle, wherein the microparticle comprises fragments of genomic DNA and a protein, and wherein the method comprises measurement of the protein using a antibody-conjugated bead-based approach, and subsequent barcoding and sequencing of fragments of genomic DNA.

[1231] FIG. 34 illustrates a method (and associated experimental results) of analysing a sample comprising a microparticle, wherein the microparticle comprises fragments of genomic DNA and a protein, and wherein the method comprises measurement of the protein using a antibody-conjugated bead-based approach, and wherein a step of measuring modified nucleobases is also performed, and then with subsequent barcoding and sequencing of fragments of genomic DNA.

[1232] FIG. 35 illustrates a method (and associated experimental results) of analysing a sample comprising a microparticle, wherein the microparticle comprises fragments of genomic DNA and a first protein and a second protein, and wherein the method comprises measurement of the first protein using a antibody-conjugated bead-based approach, and measurement of the second protein with a barcoded affinity probe, and subsequent barcoding and sequencing of fragments of genomic DNA and sequences from barcoded affinity probes.

[1233] A detailed description of each of FIGS. 18-35 is provided below.

[1234] FIG. 18 illustrates a method in which two or more sequences from a microparticle are determined and linked informatically. In the method, a microparticle, comprised within or derived from a blood, plasma, or serum sample, comprises two or more fragments of genomic DNA. The sequences of at least parts of these fragments of genomic DNA is determined; and furthermore, through one or more methods, an informatic linkage is established such that the first and second sequences from a microparticle are linked.

[1235] This linkage may take any form, such as a shared identifier (which could, for example, derive from a shared barcode that may be appended to said first and second genomic DNA sequences during a molecular barcoding process); any other shared property may also be used to link the two sequences; the data comprising the sequences themselves may be comprised within a shared electronic storage medium or partition thereof. Furthermore, the linkage may comprise a non-binary or relative value, for example representing the physical proximity of the two fragments within a spatially-metered sequencing reaction, or representing an estimated likelihood or probability that the two sequences may derive from fragments of genomic DNA comprised within the same microparticle.

[1236] FIG. 19 illustrates a method in which sequences from a particular microparticle are linked by a shared identifier. In the method, a number of sequences from fragments of genomic DNA comprised within two different microparticles (e.g. two different microparticles derived from a single blood, plasma, or serum sample) are determined, e.g. by a nucleic acid sequencing reaction. Sequences corresponding to fragments of genomic DNA from the first microparticle are each assigned to the same informatic identifier (here, the identifier ‘0001’), and sequences corresponding to fragments of genomic DNA from the second microparticle are each assigned to the same, different informatic identifier (here, the identifier ‘0002’). This information of sequences and corresponding identifiers thus comprises informatic linkages between sequences derived from the same microparticle, with the set of different identifiers serving the function of informatic linkage.

[1237] FIG. 20 illustrates a method in which molecular barcodes are appended to fragments of genomic DNA within microparticles that have been partitioned, and wherein said barcodes provide a linkage between sequences derived from the same microparticle. In the method, microparticles from a sample of microparticles are partitioned into two or more partitions, and then the fragments of genomic DNA within the microparticles are barcoded within the partitions, and then sequences are determined in such a way that the barcodes identify from which partition the sequence was derived, and thereby link the different sequences from individual microparticles.

[1238] In the first step, microparticles are partitioned into two or more partitions (which could comprise, for example, different physical reaction vessels, or different droplets within an emulsion). The fragments of genomic DNA are then released from the microparticles within each partition (i.e., the fragments are made physically accessible such that they can then be barcoded). This release step may be performed with a high-temperature incubation step, and/or via incubation with a molecular solvent or chemical surfactant. Optionally (but not shown here), an amplification step may be performed at this point, prior to appending barcode sequences, such that all or part of a fragment of genomic DNA is replicated at least once (e.g. in a PCR reaction), and then barcode sequences may be subsequently appended to the resulting replication products.

[1239] Barcode sequences are then appended to the fragments of genomic DNA. The barcode sequences may take any form, such as primers which comprise a barcode region, or barcoded oligonucleotides within multimeric barcoding reagents, or barcode molecules within multimeric barcode molecules. The barcode sequences may also be appended by any means, for example by a primer-extension and/or PCR reaction, or a single-stranded or double-stranded ligation reaction, or by in vitro transposition. In any case, the process of appending barcode sequences produces a solution of molecules within each partition wherein each such molecule comprises a barcode sequence, and then all or part of a sequence corresponding to a fragment of genomic DNA from a microparticle that was partitioned into said partition.

[1240] The barcode-containing molecules from different partitions are then merged together into a single reaction, and then a sequencing reaction is performed on the resulting molecules to determine sequences of genomic DNA and the barcode sequences to which they have been appended. The associated barcode sequences are then used to identify the partitions from which each sequence was derived, and thereby link sequences determined in the sequencing reaction that were derived from fragments of genomic DNA comprised within the same microparticle or group of microparticles.

[1241] FIG. 21 illustrates a specific method in which molecular barcodes are appended to fragments of genomic DNA within microparticles by multimeric barcoding reagents, and wherein said barcodes provide a linkage between sequences derived from the same microparticle. In the method, microparticles from a sample of microparticles are crosslinked and then permeabilised, and then the fragments of genomic DNA comprised within the microparticles are barcoded by multimeric barcoding reagents, and then sequences are determined in such a way that the barcodes identify by which multimeric barcoding reagent each sequence was barcoded, and thereby link the different sequences from individual microparticles.

[1242] In the first step, microparticles from a sample of microparticles are crosslinked by a chemical crosslinking agent. This step serves the purpose of holding fragments of genomic DNA within each microparticle in physical proximity to each other, such that the sample may be manipulated and processed whilst retaining the basic structural nature of the microparticles (i.e., whilst retaining physical proximity of genomic DNA fragments derived from the same microparticle). In a second step, the crosslinked microparticles are permeabilised (i.e., the fragments of genomic DNA are made physically accessible such that they can then be barcoded in a barcoding step); this permeabilisation may for example be performed by incubation with a chemical surfactant such as a non-ionic detergent.

[1243] Barcode sequences are then appended to fragments of genomic DNA, wherein barcode sequences comprised within a multimeric barcoding reagent (and/or multimeric barcode molecule) are appended to fragments within the same crosslinked microparticle. The barcode sequences may be appended by any means, for example by a primer-extension reaction, or by a single-stranded or double-stranded ligation reaction. The process of appending barcode sequences is conducted such that a library of many multimeric barcoding reagents (and/or multimeric barcode molecules) is used to append sequences to a sample comprising many crosslinked microparticles, under dilution conditions such that each multimeric barcoding reagent (and/or multimeric barcode molecule) typically will only barcode sequences comprised within a single microparticle.

[1244] A sequencing reaction is then performed on the resulting molecules to determine sequences of genomic DNA and the barcode sequences to which they have been appended. The associated barcode sequences are then used to identify by which multimeric barcoding reagent (and/or multimeric barcode molecule) each sequence was barcoded, and thereby link sequences determined in the sequencing reaction that were derived from fragments of genomic DNA comprised within the same microparticle.

[1245] FIG. 22 illustrates a method in which fragments of genomic DNA within individual microparticles are appended to each other, and wherein the resulting molecules are sequenced, such that sequences from two or more fragments of genomic DNA from the same microparticle are determined from the same sequenced molecule, thereby establishing a linkage between fragments within the same microparticle. In the method, fragments of genomic DNA within individual microparticle are crosslinked to each other, and then blunted, and then the resulting blunted fragments of genomic DNA are ligated to each other into contiguous, multi-part sequences. The resulting molecules are then sequenced, such that sequences from two or more fragments of genomic DNA comprised within the same sequenced molecule are thus determined to be linked as deriving from the same microparticle.

[1246] In the first step, microparticles from a sample of microparticles are crosslinked by a chemical crosslinking agent. This step serves the purpose of holding fragments of genomic DNA within each microparticle in physical proximity to each other, such that the sample may be manipulated and processed whilst retaining the basic structural nature of the microparticles (i.e., whilst retaining physical proximity of genomic DNA fragments derived from the same microparticle). In a second step, the crosslinked microparticles are permeabilised (i.e., the fragments of genomic DNA are made physically accessible such that they can then be barcoded in a barcoding step); this permeabilisation may for example be performed by incubation with a chemical surfactant such as a non-ionic detergent.

[1247] In a next step, the ends of fragments of genomic DNA within each microparticle are blunted (i.e. any overhangs are removed and/or ends are filled-in) such that the ends are able to be appended to each other in a double-stranded ligation reaction. A double-stranded ligation reaction is then performed (e.g. with T4 DNA Ligase), wherein the blunted ends of molecules comprised within the same microparticles are ligated to each other into contiguous, multi-part double-stranded sequences. This ligation reaction (or any other step) may be performed under dilution conditions such that spurious ligation products between sequences comprised within two or more different microparticles are minimised.

[1248] A sequencing reaction is then performed on the resulting molecules to determine sequences of genomic DNA within each multi-part molecule. The resulting molecules are then evaluated, such that sequences from two or more fragments of genomic DNA comprised within the same sequenced molecule are thus determined to be linked as deriving from the same microparticle.

[1249] FIG. 23 illustrates a method in which individual microparticles (and/or small groups of microparticles) from a large sample of microparticles are sequenced in two or more separate, individual sequencing reactions, and the sequences determined from each such sequencing reaction are thus determined to be linked informatically and thus predicted to derive from the same individual microparticle (and/or small group of microparticles). In the method, microparticles from a sample of microparticles are divided into two or more separate sub-samples of microparticles. Each sub-sample may comprise one or more individual microparticles, but in any case will comprise only a fraction of the original sample of microparticles.

[1250] The fragments of genomic DNA within each sub-sample are then released and processed into a form such that they may be sequenced (e.g., they may be appended to sequencing adapters such as Illumina sequencing adapters, and optionally amplified and purified for sequencing). This method may or may not include a step of appending barcode sequences; optionally the sequenced molecules do not comprise any barcode sequences.

[1251] Fragments of genomic DNA (and/or replicated copies thereof) from each individual sub-sample are then sequenced in separate, independent sequencing reactions. For example, molecules from each sub-sample may be sequenced on a separate sequencing flowcell, or may be sequenced within a different lane of a flowcell, or may be sequenced within a different port or flowcell of a nanopore sequencer.

[1252] The resulting sequenced molecules are then evaluated, such that sequences from the same individual sequencing reaction are thus determined to be linked as deriving from the same microparticle (and/or from the same small group of microparticles).

[1253] FIG. 24 illustrates a specific method in which fragments of genomic DNA within individual microparticles are appended to a discrete region of a sequencing flowcell prior to sequencing, and wherein the proximity of fragments sequenced on said flowcell comprises a linkage between sequences derived from the same microparticle. In the method, microparticles from a sample of microparticles are crosslinked and then permeabilised, and then fragments of genomic DNA comprised within individual microparticles are appended to a sequencing flowcell, such that two or more fragments from the same individual microparticle are appended to the same region of the flowcell. The appended molecules are then sequenced, and the proximity of the resulting sequences on the flowcell comprises a linking value, wherein sequences within close proximity on the flowcell may be predicted to derive from the same individual microparticle within the original sample.

[1254] In the first step, microparticles from a sample of microparticles are crosslinked by a chemical crosslinking agent. This step serves the purpose of holding fragments of genomic DNA within each microparticle in physical proximity to each other, such that the sample may be manipulated and processed whilst retaining the basic structural nature of the microparticles (i.e., whilst retaining physical proximity of genomic DNA fragments derived from the same microparticle). In a second step, the crosslinked microparticles are permeabilised (i.e., the fragments of genomic DNA are made physically accessible such that they can then be appended to a flowcell); this permeabilisation may for example be performed by incubation with a chemical surfactant such as a non-ionic detergent.

[1255] In a next step, fragments of genomic DNA from microparticles are then appended to the flowcell of a sequencing apparatus, such that two or more fragments crosslinked within the same microparticle are appended to the same discrete region of the flowcell. This may be performed in a multi-part reaction involving adapter molecules; for example, an adapter molecule may be appended to fragments of genomic DNA within microparticles, and said adapter molecule may comprise a single-stranded portion that is complementary to single-stranded primers on the flowcell. Sequences from a crosslinked microparticle may then be allowed to diffuse and anneal to different primers within the same region of the flowcell.

[1256] The resulting sequenced molecules are then sequenced, such that the proximity of the resulting sequences on the flowcell provides a linking value, wherein sequences within close proximity on the flowcell (e.g. within a certain discrete region and/or proximity value) may be predicted to derive from the same individual microparticle within the original sample.

[1257] The advantages of the invention may be illustrated, by way of example only, by reference to possible applications in NIPT and cancer detection:

[1258] By way of example, in the field of oncology, the invention may enable a powerful new framework to screen for the early detection of cancer. Several groups are seeking to develop cfDNA assays which can detect low levels of circulating DNA from early tumours (so-called ‘circulating tumour DNA’ or ctDNA) prior to metastatic conversion. One of the chief approaches taken to delineate cancerous from non-cancerous specimens is by detecting ‘structural variants’ (genetic amplifications, deletions, or translocations) that are a near-universal hallmark of malignancies; however, detection of such large-scale genetic events through the current ‘molecular counting’ framework requires ultra-deep sequencing of cfDNA to achieve statistically meaningful detection, and even then requires that a sufficient amount of ctDNA be present in the plasma to generate a sufficient absolute molecular signal even with hypothetically unlimited sequencing depth.

[1259] By contrast, the current invention may enable direct molecular assessment of structural variation, with potential single-molecule sensitivity: any structural variation that includes a ‘rearrangement site’ (for example, a point on one chromosome that has been translocated with and thus attached to another chromosome, or a point where a gene or other chromosomal segment has been amplified or deleted within a single chromosome) may be detectable directly by this method, since circulating microparticles containing DNA of the rearrangement may include a population of DNA fragments flanking both sides of the rearrangement site itself, which by this method can then be linked with each other to informatically deduce both the location of the rearrangement itself, and the bound of the two participating genomic loci on each end thereof.

[1260] To conceptualise how this may improve both the cost-effectiveness and the absolute analytic sensitivity of a universal cancer screen, the example can be given of a hypothetical single circulating microparticle, which contains a chromosomal translocation from an early cancer cell, and which contains a total of 1 megabase of DNA spanning the left and right halves of this translocation, with this DNA being fragmented as 10,000 different, 100-nucleotide-long individual fragments that cumulatively span the entire 1 megabase segment. To detect the presence of this translocation event using the current, unlinked-fragment-only approach, the single, 100-base-pair fragment that itself contains the exact site of translocation would need to be sequenced, and sequenced across its entire length to detect the actual translocation site itself. This test method would thus need to both: 1) efficiently convert all of the 10,000 fragments into a format that can be read on a sequencer (i.e., the majority of the 10,000 fragments must be successfully processed and retained throughout the entire DNA purification and sequencing sample-preparation process), and then 2) all of the 10,000 fragments must be sequenced at least once by a DNA sequencing process to reliably sequence the one that includes the translocation site (i.e., at least 1 megabase of sequencing must be performed, even assuming a theoretical uniform sampling of all input molecules into the sequencing step). Thus, 1 megabase of sequencing would need to be performed to detect the translocation event.

[1261] By contrast, to detect the presence of the translocation with a high degree of statistical confidence but using the linked-fragment approach, only a small number of input fragments from each side of the translocation site itself would need to be sequenced (to distinguish a ‘confident’ translocation event from e.g. statistical noise or mis-mapping errors). To provide a high degree of statistical confidence, on the order of 10 fragments from each side of the translocation could be sequenced; and since they need only be mapped to a location in the genome and not sequenced across their entire length to observe the actual translocation itself, on the order of only 50 base pairs from each fragment need be sequenced. Taken together, this generates a total sequencing requirement of 1000 base pairs to detect the presence of the translocation—a 1000-fold reduction from the 1,000,000 base pairs required by current state-of-the-art.

[1262] In addition to this considerable benefit in terms of relative sequencing throughput and cost, a linked-read approach may also increase the absolute achievable sensitivity of these cancer-screening tests. Since, for early-stage (and thus potentially curable) cancers, the absolute amount of tumour DNA in the circulation is low, the loss of sample DNA during the sample processing and preparation process for sequencing could significantly impede test efficacy, even with theoretically limitless sequencing depth. In keeping with the above example, using current approaches, the single DNA fragment containing the translocation site itself would need to be retained and successfully processed throughout the entire sample collection, processing, and sequencing-preparation protocol and then be successfully sequenced. However, all of these steps result in a certain fraction of ‘input’ molecules thereto being either physically lost from the processed sample (e.g. during a centrifugation or cleanup step), or otherwise simply not successfully processed/modified for subsequent steps (e.g., not successfully amplified prior to placement on a DNA sequencer). In contrast, since the linked-read approach of the invention need only involve sequencing of a small proportion of actual ‘input’ molecules, this type of sample loss may have a considerably reduced impact upon the ultimate sensitivity of the final assay.

[1263] In addition to its applications in oncology and cancer screening, this invention may also enable considerable new tools in the domain of noninvasive prenatal testing (NIPT). A developing foetus (and the placenta in which it is contained) shed fragmented DNA into the maternal circulation, a proportion of which is contained within circulating microparticles. Analogous to the problem of cancer screening from ctDNA, circulating foetal DNA only represents a minor fraction of the overall circulating DNA in pregnant individuals (the majority of circulating DNA being normal maternal DNA). A considerable technical challenge for NIPT revolves around differentiating actual foetal DNA from maternal DNA fragments (which will share the same nucleotide sequence since they are the source of inheritance for half of the foetal genome). An additional technical challenge for NIPT involves the detection of long-range genomic sequences (or mutations) from the short fragments of foetal DNA present in the circulation.

[1264] Analysis of linked fragments originating from the same individual microparticle presents a powerful framework for substantially addressing both of these technical challenges for NIPT. Since (approximately) half of the foetal genome will be identical in sequence to the (approximately) half of the maternal genome which the developing foetus has inherited, it is difficult to distinguish whether a given sequenced fragment with a maternal sequence may have been generated by normal maternal tissues, or rather by developing foetal tissues. By contrast, for the (approximately) half of the foetal genome which has been paternally inherited (inherited from the father), the presence of sequence variants (e.g. single nucleotide variants or other variants) present in the paternal genome but not in the maternal genome serves as a molecular marker to identify these paternally-inherited foetal fragments (since the only paternal DNA sequences in circulation will be those from the pregnancy itself).

[1265] The ability to sequence multiple fragments from single foetal microparticles that happen to contain both maternal and paternal sequences (e.g. sequences from one particular maternally-inherited foetal chromosome, together with sequences from a second foetal chromosome that has been paternally inherited) thus presents a method for direct recognition of which maternal sequences have been inherited by the developing foetus: maternal sequences that are found co-localised within microparticles that also contain paternal sequences can be predicted to be foetally-inherited maternal sequences, and, in contrast, maternal sequences that are not found co-localised with paternal sequences can be predicted to represent the maternal sequences which were not inherited by the foetus. By this technique, the large majority of circulating DNA that is comprised of normal maternal DNA may be specifically filtered out of the processed sequence dataset, and only sequences evidenced as being true foetal sequences may be isolated informatically for further analysis.

[1266] Since ‘foetal fractions’ (the fraction of all circulating DNA which has been generated by the foetus itself) for NIPT assays are frequently below 10%, and for some clinical specimens between 1% and 5%, and since this paternal-sequence-derived ‘informatic-gating’ step produces an ‘effective foetal fraction’ of 100% (assuming minimal mis-mapping errors), this linked-fragment approach has the potential to improve the signal-to-noise ratio for NIPT tests by one to two orders of magnitude. Therefore, the invention has the potential to improve the overall analytic sensitivity and specificity of NIPT tests, as well as considerably reduce the amount of sequencing required for the process, and also enable NIPT tests to be performed earlier in pregnancy (time points at which foetal fractions are sufficiently low that current tests have unacceptable false-positive and false-negative rates).

[1267] Importantly, the present invention provides a novel, orthogonal dimensionality within sequence data from circulating DNA in the form of informatically linked sequences, upon which analysis algorithms, computations, and/or statistical tests may be performed directly to generate considerably more sensitive and specific genetic measurements. For example, rather than evaluating overall amounts of sequence between two chromosomes across an entire sample to measure a foetal chromosomal aneuploidy, linked sequences (and/or sets or subsets thereof) can be assessed directly to examine, for example, the number of sequences per informatically-linked set that map to a particular chromosome or chromosome portion. Comparisons and/or statistical tests may be performed to compare linked sets of sequences of different presumed cellular origin (for example, comparison between foetal sequences and maternal sequences, or between presumed healthy tissues and presumed cancerous or malignant tissues), or to evaluate sequence features or numeric features which only exist at the level of linked sets of sequences (and which do not exist at the level of individual, unlinked sequences), such as specific chromosomal distribution patterns, or cumulative enrichments of particular sequences or sequence sets.

[1268] In addition to its application for detection of foetal microparticle sequences, this method has the potential to detect long-range genetic sequences or sequence mutations present in the foetal genome. Much in the same manner as described for cancer genome rearrangements, if several DNA fragments from a foetal microparticle are sequenced that span and/or flank a genomic rearrangement site (e.g. a translocation or amplification or deletion), then these classes of rearrangements may be informatically detected even without directly sequencing rearrangement sites themselves. In addition, outside of genomic rearrangement events, this method has the potential to detect ‘phasing’ information within individual genomic regions. For example, if two single-nucleotide variants are found at different points within a specific gene but separated by several kilobases of genomic distance, this method may enable assessment of whether these two single nucleotide variants are located on the same, single copy of the gene in the foetal genome, or whether they are each located on a different one of the two copies of the gene present in the foetal genome (i.e. whether they are located within the same haplotype). This function may have particular clinical utility for the genetic assessment and prognosis of de novo single nucleotide mutations in foetal genomes, which comprise a large fraction of major developmental disorders with genetic etiology.

[1269] In addition to its applications in cancer detection and NIPT, this invention has significant related applications in the noninvasive genetic profiling and/or screening of embryos generated from in vitro fertilisation. Notably—and in strong parallel to the case of ‘foetal fraction’ effects noted in NIPT— samples of cell-free DNA from in vitro embryo culture medium and/or blastocoel fluid can comprise a substantive fraction of DNA from maternal (i.e. non-embryonic) cells, such as cumulus cells which are present during embryo development within some in vitro fertilisation procedures. Relatedly, embryos themselves can exhibit genetic mosaicism, wherein only a fraction of embryo cells comprise a particular genetic sequence or event (such as a particular chromosomal aneuploidy). These various biologic factors can significantly impede the sensitivity and specificity of noninvasive genetic measurement of embryos.

[1270] The present invention provides a categorically new technical approach for noninvasive genetic measurement of embryos, wherein multiple sets of linked signals (e.g. linked fragments of genomic DNA, and/or linked polypeptide levels) from a heterogenous sample of diverse microparticles are used to specifically measure, evaluate, detect, and process biomolecules (e.g. genomic DNA sequences) from embryo cells themselves, and even further, to specifically measure and process biomolecules from specific parts of the embryo (e.g. from the inner cell mass). This method allows for the measurement of relevant genomic events (e.g, aneuploidy) specifically within the embryo, thus reducing the statistical noise introduced by, for example, the presence of maternal cumulus cells within in vitro embryo culture medium; the method thus provides for superior analytic detection of such genomic events by boosting the genomic ‘signal-to-noise’ ratio via linked-signal detection and analysis. This technology will thus provide a superior means of performing noninvasive genetic profiling and/or screening of embryos generated from in vitro fertilisation.

[1271] Furthermore, by a process of empirically evaluating specific reference sequences and/or target (bio)molecules in the context of linked signals from microparticles associated with various medical outcomes and/or clinical syndromes (such as successful live birth following implantation of an in vitro fertilised embryo into the uterus, and/or developmental and/or structural disorders of the foetus and/or child during foetal development and/or after birth (such as growth deficits and/or physical/anatomic abnormalities), and/or spontaneous abortion (i.e. miscarriage, at any stage of pregnancy), this technology is positioned to predict the likelihood of successful pregnancy and live birth, and associated healthy clinical outcomes for the foetus and child.

[1272] FIG. 31 illustrates a method in which target biomolecules are measured using barcoded affinity probes and a step of partitioning. In the method, barcoded affinity probes are incubated with microparticles from a sample of microparticles and allowed to bind to target polypeptides (i.e. target biomolecules) within or upon said microparticles. The barcoded affinity probes comprise an affinity moiety capable of binding to the target polypeptides and a barcoded oligonucleotide that identifies the barcoded affinity probe. The microparticles are then partitioned into two or more partitions, and then the fragments of genomic DNA within the microparticles and the barcoded oligonucleotides from bound barcoded affinity probes are barcoded within the partitions, and then sequences are determined in such a way that the barcodes identify from which partition the sequence was derived, and thereby link the different sequences from individual microparticles.

[1273] Following a step of binding barcoded affinity probes to target polypeptides, microparticles are partitioned into two or more partitions (which could comprise, for example, different physical reaction vessels, or different droplets within an emulsion). The fragments of genomic DNA and barcoded oligonucleotides from barcoded affinity probes are then released from the microparticles within each partition (i.e., the fragments are made physically accessible such that they can then be barcoded). This release step may be performed with a high-temperature incubation step, and/or via incubation with a molecular solvent or chemical surfactant. Optionally (but not shown here), an amplification step may be performed at this point, prior to appending barcode sequences, such that all or part of a fragment of genomic DNA is replicated at least once (e.g. in a PCR reaction), and then barcode sequences may be subsequently appended to the resulting replication products.

[1274] Barcode sequences are then appended to the fragments of genomic DNA (or amplified products thereof) and barcoded oligonucleotides (or amplicons thereof) from barcoded affinity probes (i.e. barcode sequences are appended to the “target nucleic acid molecules”. The barcode sequences may take any form, such as primers which comprise a barcode region, or barcoded oligonucleotides within multimeric barcoding reagents, or barcode molecules within multimeric barcode molecules. The barcode sequences may also be appended by any means, for example by a primer-extension and/or PCR reaction, or a single-stranded or double-stranded ligation reaction, or by in vitro transposition. In any case, the process of appending barcode sequences produces a solution of molecules within each partition wherein each such molecule comprises a barcode sequence, and then all or part of a sequence corresponding to a fragment of genomic DNA or barcoded oligonucleotide from a barcoded affinity probe from a microparticle that was partitioned into said partition.

[1275] The barcode-containing molecules from different partitions are then merged together into a single reaction, and then a sequencing reaction is performed on the resulting molecules to determine sequences of genomic DNA and/or sequences from barcoded affinity probes and the barcode sequences to which they have been appended. The associated barcode sequences are then used to identify the partitions from which each sequence was derived, and thereby link sequences determined in the sequencing reaction that were derived from target biomolecules comprised within the same microparticle or group of microparticles. The sequence of the barcoded oligonucleotide identifies the linked affinity moiety and thereby the target polypeptide to which the affinity moiety binds. Therefore, the sequencing data identifies genomic DNA fragments and one or more target polypeptides likely to have been co-localised within the same microparticle.

[1276] FIG. 32 illustrates a method in which target biomolecules are measured using barcoded affinity probes and multimeric barcoding reagents. In the method, barcoded affinity probes are incubated with microparticles from a sample of microparticles and allowed to bind to target polypeptides (i.e. target biomolecules) within or upon said microparticles. The barcoded affinity probes comprise an affinity moiety capable of binding to the target polypeptides and a barcoded oligonucleotide that identifies the barcoded affinity probe. Microparticles from a sample of microparticles are then crosslinked and then permeabilised, and then target nucleic acid molecules (i.e. the fragments of genomic DNA and the barcoded oligonucleotides from barcoded affinity probes comprised within the microparticles) are barcoded by multimeric barcoding reagents, and then sequences are determined in such a way that the barcodes identify by which multimeric barcoding reagent each sequence was barcoded, and thereby link the different sequences from individual microparticles.

[1277] Following a step of binding barcoded affinity probes to target polypeptides, microparticles from a sample of microparticles are crosslinked by a chemical crosslinking agent. This step serves the purpose of holding fragments of genomic DNA and barcoded oligonucleotides from barcoded affinity probes within each microparticle in physical proximity to each other, such that the sample may be manipulated and processed whilst retaining the basic structural nature of the microparticles (i.e., whilst retaining physical proximity of genomic DNA fragments and barcoded oligonucleotides from barcoded affinity probes derived from the same microparticle). In a second step, the crosslinked microparticles are permeabilised (i.e., the fragments of genomic DNA are made physically accessible such that they can then be barcoded in a barcoding step); this permeabilisation may for example be performed by incubation with a chemical surfactant such as a non-ionic detergent. Optionally, a (first or second) step of of binding barcoded affinity probes to target polypeptides may be performed following any such step of crosslinking, and/or following any such step of permeabilisation.

[1278] Barcode sequences are then appended to fragments of genomic DNA and to barcoded oligonucleotides comprised with barcoded affinity probes, wherein barcode sequences comprised within a multimeric barcoding reagent (and/or multimeric barcode molecule) are appended to fragments within or bound to the same crosslinked microparticle. The barcode sequences may be appended by any means, for example by a primer-extension reaction, or by a single-stranded or double-stranded ligation reaction. The process of appending barcode sequences is conducted such that a library of many multimeric barcoding reagents (and/or multimeric barcode molecules) is used to append sequences to a sample comprising many crosslinked microparticles, under dilution conditions such that each multimeric barcoding reagent (and/or multimeric barcode molecule) typically will only barcode target nucleic acid molecules comprised within a single microparticle. Optionally, any method of appending one or more coupling molecules to target nucleic acid molecules (e.g. to fragments of genomic DNA and/or to barcoded oligonucleotides from barcoded affinity probes) may be performed prior to and/or during any step of appending barcode sequences, and then (optionally) barcode sequences from multimeric barcoding reagents may be linked to said coupling molecules, optionally with a subsequent barcode-connecting step wherein said barcode sequences are appended to said target nucleic acid molecules.

[1279] A sequencing reaction is then performed on the resulting molecules to determine sequences of genomic DNA and barcoded oligonucleotides from barcoded affinity probes and the barcode sequences to which they have been appended. The associated barcode sequences are then used to identify by which multimeric barcoding reagent (and/or multimeric barcode molecule) each sequence was barcoded, and thereby link sequences determined in the sequencing reaction that were derived from fragments of genomic DNA and barcoded oligonucleotides from barcoded affinity probes comprised within or bound to the same microparticle. The sequence of the barcoded oligonucleotide identifies the linked affinity moiety and thereby the target polypeptide to which the affinity moiety binds. Therefore, the sequencing data identifies genomic DNA fragments and one ore more target polypeptides likely to have been co-localised within the same microparticle.

EXAMPLES

Example 1

[1280] Materials and Methods

[1281] Method 1— Synthesis of a Library of Nucleic Acid Barcode Molecules

[1282] Synthesis of Double-Stranded Sub-Barcode Molecule Library

[1283] In a PCR tube, 10 microliters of 10 micromolar BC_MX3 (an equimolar mixture of all sequences in SEQ ID NO: 18 to 269) were added to 10 microliters of 10 micromolar BC_ADD_TP1 (SEQ ID NO: 1), plus 10 microliters of 10× CutSmart Buffer (New England Biolabs) plus 1.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen) plus 68 microliters H.sub.2O, to final volume of 99 microliters. The PCR tube was placed on a thermal cycler and incubated at 75° C. for 5 minutes, then slowly annealed to 4° C., then held 4° C., then placed on ice. 1.0 microliter of Klenow polymerase fragment (New England Biolabs; at 5 U/uL) was added to the solution and mixed. The PCR tube was again placed on a thermal cycler and incubated at 25° C. for 15 minutes, then held at 4° C. The solution was then purified with a purification column (Nucleotide Removal Kit; Qiagen), eluted in 50 microliters H.sub.2O, and quantitated spectrophotometrically.

[1284] Synthesis of Double-Stranded Downstream Adapter Molecule

[1285] In a PCR tube, 0.5 microliters of 100 micromolar BC_ANC_TP1 (SEQ ID NO: 2) were added to 0.5 microliters of 100 micromolar BC_ANC_BT1 (SEQ ID NO: 3), plus 20 microliters of 10× CutSmart Buffer (New England Biolabs) plus 178 microliters H.sub.2O, to final volume of 200 microliters. The PCR tube was placed on a thermal cycler and incubated at 95° C. for 5 minutes, then slowly annealed to 4° C., then held 4° C., then placed on ice, then stored at −20° C.

[1286] Ligation of Double-Stranded Sub-Barcode Molecule Library to Double-Stranded Downstream Adapter Molecule

[1287] In a 1.5 milliliter Eppendorf tube, 1.0 microliter of Double-Stranded Downstream Adapter Molecule solution was added to 2.5 microliters of Double-Stranded Sub-Barcode Molecule Library, plus 2.0 microliters of 10×T4 DNA Ligase buffer, and 13.5 microliters H.sub.2O to final volume of 19 microliters. 1.0 microliter of T4 DNA Ligase (New England Biolabs; high concentration) was added to the solution and mixed. The tube was incubated at room temperature for 60 minutes, then purified with 1.8× volume (34 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 40 microliters H.sub.2O.

[1288] PCR Amplification of Ligated Library

[1289] In a PCR tube, 2.0 microliters of Ligated Library were added to 2.0 microliters of 50 micromolar BC_FWD_PR1 (SEQ ID NO: 4), plus 2.0 microliters of 50 micromolar BC_REV_PR1 (SEQ ID NO: 5), plus 10 microliters of 10×Taq PCR Buffer (Qiagen) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen) plus 81.5 microliters H.sub.2O, plus 0.5 microliters Qiagen Taq Polymerase (at 5 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 15 cycles of: 95° C. for 30 seconds, then 59° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C. The solution was then purified with 1.8× volume (180 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 50 microliters H.sub.2O.

[1290] Uracil Glycosylase Enzyme Digestion

[1291] To an eppendorf tube 15 microliters of the eluted PCR amplification, 1.0 microliters H.sub.2O, plus 2.0 microliters 10× CutSmart Buffer (New England Biolabs), plus 2.0 microliter of USER enzyme solution (New England Biolabs) was added and mixed. The tube was incubated at 37° C. for 60 minutes, then the solution was purified with 1.8× volume (34 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 34 microliters H.sub.2O.

[1292] MlyI Restriction Enzyme Cleavage

[1293] To the eluate from the previous (glycosylase digestion) step, 4.0 microliters 10× CutSmart Buffer (New England Biolabs), plus 2.0 microliter of MlyI enzyme (New England Biolabs, at 5 U/uL) was added and mixed. The tube was incubated at 37° C. for 60 minutes, then the solution was purified with 1.8× volume (72 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 40 microliters H.sub.2O.

[1294] Ligation of Sub-Barcode Library to MlyI-Cleaved Solution

[1295] In a 1.5 milliliter Eppendorf tube, 10 microliter of MlyI-Cleaved Solution solution was added to 2.5 microliters of Double-Stranded Sub-Barcode Molecule Library, plus 2.0 microliters of 10×T4 DNA Ligase buffer, and 4.5 microliters H.sub.2O to final volume of 19 microliters. 1.0 microliter of T4 DNA Ligase (New England Biolabs; high concentration) was added to the solution and mixed. The tube was incubated at room temperature for 60 minutes, then purified with 1.8× volume (34 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 40 microliters H.sub.2O.

[1296] Repeating Cycles of Sub-Barcode Addition

[1297] The experimental steps of: 1) Ligation of Sub-Barcode Library to MlyI-Cleaved Solution, 2) PCR Amplification of Ligated Library, 3) Uracil Glycosylase Enzyme Digestion, and 4) MlyI Restriction Enzyme Cleavage were repeated, in sequence, for a total of five cycles.

[1298] Synthesis of Double-Stranded Upstream Adapter Molecule

[1299] In a PCR tube, 1.0 microliters of 100 micromolar BC_USO_TP1 (SEQ ID NO: 6) were added to 1.0 microliters of 100 micromolar BC_USO_BT1 (SEQ ID NO: 7), plus 20 microliters of 10× CutSmart Buffer (New England Biolabs) plus 178 microliters H.sub.2O, to final volume of 200 microliters. The PCR tube was placed on a thermal cycler and incubated at 95° C. for 60 seconds, then slowly annealed to 4° C., then held 4° C., then placed on ice, then stored at −20° C.

[1300] Ligation of Double-Stranded Upstream Adapter Molecule

[1301] In a 1.5 milliliter Eppendorf tube, 3.0 microliters of Upstream Adapter solution were added to 10.0 microliters of final (after the fifth cycle) MlyI-Cleaved solution, plus 2.0 microliters of 10×T4 DNA Ligase buffer, and 5.0 microliters H.sub.2O to final volume of 19 microliters. 1.0 microliter of T4 DNA Ligase (New England Biolabs; high concentration) was added to the solution and mixed. The tube was incubated at room temperature for 60 minutes, then purified with 1.8× volume (34 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 40 microliters H.sub.2O.

[1302] PCR Amplification of Upstream Adapter-Ligated Library

[1303] In a PCR tube, 6.0 microliters of Upstream Adapter-Ligated Library were added to 1.0 microliters of 100 micromolar BC_CS_PCR_FWD1 (SEQ ID NO: 8), plus 1.0 microliters of 100 micromolar BC_CS_PCR_REV1 (SEQ ID NO: 9), plus 10 microliters of 10×Taq PCR Buffer (Qiagen) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen) plus 73.5 microliters H.sub.2O, plus 0.5 microliters Qiagen Taq Polymerase (at 5 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 15 cycles of: 95° C. for 30 seconds, then 61° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C. The solution, containing a library of amplified nucleic acid barcode molecules, was then purified with 1.8× volume (180 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions). The library of amplified nucleic acid barcode molecules was then eluted in 40 microliters H.sub.2O.

[1304] The library of amplified nucleic acid barcode molecules synthesised by the method described above was then used to assemble a library of multimeric barcode molecules as described below.

[1305] Method 2— Assembly of a Library of Multimeric Barcode Molecules

[1306] A library of multimeric barcode molecules was assembled using the library of nucleic acid barcode molecules synthesised according to the methods of Method 1.

[1307] Primer-Extension with Forward Termination Primer and Forward Splinting Primer

[1308] In a PCR tube, 5.0 microliters of the library of amplified nucleic acid barcode molecules were added to 1.0 microliters of 100 micromolar CS_SPLT_FWD1 (SEQ ID NO: 10), plus 1.0 microliters of 5 micromolar CS_TERM_FWD1 (SEQ ID NO: 11), plus 10 microliters of 10× Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen) plus 80.0 microliters H.sub.2O, plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 1 cycle of: 95° C. for 30 seconds, then 53° C. for 30 seconds, then 72° C. for 60 seconds, then 1 cycle of: 95° C. for 30 seconds, then 50° C. for 30 seconds, then 72° C. for 60 seconds, then held at 4° C. The solution was then purified a PCR purification column (Qiagen), and eluted in 85.0 microliters H.sub.2O.

[1309] Primer-Extension with Reverse Termination Primer and Reverse Splinting Primer

[1310] In a PCR tube, the 85.0 microliters of forward-extension primer-extension products were added to 1.0 microliters of 100 micromolar CS_SPLT_REV1 (SEQ ID NO: 12), plus 1.0 microliters of 5 micromolar CS_TERM_REV1 (SEQ ID NO: 13), plus 10 microliters of 10× Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 1 cycle of: 95° C. for 30 seconds, then 53° C. for 30 seconds, then 72° C. for 60 seconds, then 1 cycle of: 95° C. for 30 seconds, then 50° C. for 30 seconds, then 72° C. for 60 seconds, then held at 4° C. The solution was then purified a PCR purification column (Qiagen), and eluted in 43.0 microliters H.sub.2O.

[1311] Linking Primer-Extension Products with Overlap-Extension PCR

[1312] In a PCR tube were added the 43.0 microliters of reverse-extension primer-extension products, plus 5.0 microliters of 10× Thermopol Buffer (NEB) plus 1.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) to final volume of 50 microliters. The PCR tube was placed on a thermal cycler and amplified for 5 cycles of: 95° C. for 30 seconds, then 60° C. for 60 seconds, then 72° C. for 2 minutes; then 5 cycles of: 95° C. for 30 seconds, then 60° C. for 60 seconds, then 72° C. for 5 minutes; then 5 cycles of: 95° C. for 30 seconds, then 60° C. for 60 seconds, then 72° C. for 10 minutes; then held at 4° C. The solution was then purified with 0.8× volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 40 microliters H.sub.2O.

[1313] Amplification of Overlap-Extension Products

[1314] In a PCR tube were added 2.0 microliters of Overlap-Extension PCR solution, plus 1.0 microliters of 100 micromolar CS_PCR_FWD1 (SEQ ID NO: 14), plus 1.0 microliters of 100 micromolar CS_PCR_REV1 (SEQ ID NO: 15), plus 10 microliters of 10× Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL), plus 83.0 microliters H.sub.2O to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 15 cycles of: 95° C. for 30 seconds, then 58° C. for 30 seconds, then 72° C. for 10 minutes; then held at 4° C. The solution was then purified with 0.8× volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 50 microliters H.sub.2O, and quantitated spectrophotometrically.

[1315] Gel-Based Size Selection of Amplified Overlap-Extension Products

[1316] Approximately 250 nanograms of Amplified Overlap-Extension Products were loaded and run on a 0.9% agarose gel, and then stained and visualised with ethidium bromide. A band corresponding to 1000 nucleotide size (plus and minus 100 nucleotides) was excised and purified with a gel extraction column (Gel Extraction Kit, Qiagen) and eluted in 50 microliters H.sub.2O.

[1317] Amplification of Overlap-Extension Products

[1318] In a PCR tube were added 10.0 microliters of Gel-Size-Selected solution, plus 1.0 microliters of 100 micromolar CS_PCR_FWD1 (SEQ ID NO: 14), plus 1.0 microliters of 100 micromolar CS_PCR_REV1 (SEQ ID NO: 15), plus 10 microliters of 10× Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) plus 75.0 microliters H.sub.2O to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 15 cycles of: 95° C. for 30 seconds, then 58° C. for 30 seconds, then 72° C. for 4 minutes; then held at 4° C. The solution was then purified with 0.8× volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 50 microliters H.sub.2O, and quantitated spectrophotometrically.

[1319] Selection and Amplification of Quantitatively Known Number of Multimeric Barcode Molecules

[1320] Amplified gel-extracted solution was diluted to a concentration of 1 picogram per microliter, and then to a PCR tube was added 2.0 microliters of this diluted solution (approximately 2 million individual molecules), plus 0.1 microliters of 100 micromolar CS_PCR_FWD1 (SEQ ID NO: 14), plus 0.1 microliters of 100 micromolar CS_PCR_REV1 (SEQ ID NO: 15), plus 1.0 microliter 10× Thermopol Buffer (NEB) plus 0.2 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 0.1 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) plus 6.5 microliters H.sub.2O to final volume of 10 microliters. The PCR tube was placed on a thermal cycler and amplified for 11 cycles of: 95° C. for 30 seconds, then 57° C. for 30 seconds, then 72° C. for 4 minutes; then held at 4° C.

[1321] To the PCR tube was added 1.0 microliters of 100 micromolar CS_PCR_FWD1 (SEQ ID NO: 14), plus 1.0 microliters of 100 micromolar CS_PCR_REV1 (SEQ ID NO: 15), plus 9.0 microliters of 10× Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) plus 76.0 microliters H.sub.2O to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 10 cycles of: 95° C. for 30 seconds, then 57° C. for 30 seconds, then 72° C. for 4 minutes; then held at 4° C. The solution was then purified with 0.8× volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 50 microliters H.sub.2O, and quantitated spectrophotometrically.

[1322] Method 3: Production of Single-Stranded Multimeric Barcode Molecules by In Vitro Transcription and cDNA Synthesis

[1323] This method describes a series of steps to produce single-stranded DNA strands, to which oligonucleotides may be annealed and then barcoded along. This method begins with four identical reactions performed in parallel, in which a promoter site for the T7 RNA Polymerase is appended to the 5′ end of a library of multimeric barcode molecules using an overlap-extension PCR amplification reaction. Four identical reactions are performed in parallel and then merged to increase the quantitative amount and concentration of this product available. In each of four identical PCR tubes, approximately 500 picograms of size-selected and PCR-amplified multimeric barcode molecules (as produced in the ‘Selection and Amplification of Quantitatively Known Number of Multimeric Barcode Molecules’ step of Method 2) were mixed with 2.0 microliters of 100 micromolar CS_PCR_FWD1_T7 (SEQ ID NO. 270) and 2.0 microliters of 100 micromolar CS_PCR_REV4 (SEQ ID NO. 271), plus 20.0 microliters of 10× Thermopol PCR buffer, plus 4.0 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, and 2.0 microliters Vent Exo Minus polymerase (at 5 units per microliter) plus water to a total volume of 200 microliters. The PCR tube was placed on a thermal cycler and amplified for 22 cycles of: 95° C. for 60 seconds, then 60° C. for 30 seconds, then 72° C. for 3 minutes; then held at 4° C. The solution from all four reactions was then purified with a gel extraction column (Gel Extraction Kit, Qiagen) and eluted in 52 microliters H.sub.2O.

[1324] Fifty (50) microliters of the eluate was mixed with 10 microliters 10× NEBuffer 2 (NEB), plus 0.5 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, and 1.0 microliters Vent Exo Minus polymerase (at 5 units per microliter) plus water to a total volume of 100 microliters. The reaction was incubated for 15 minutes at room temperature, then purified with 0.8× volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 40 microliters H.sub.2O, and quantitated spectrophotometrically.

[1325] A transcription step is then performed, in which the library of PCR-amplified templates containing T7 RNA Polymerase promoter site (as produced in the preceding step) is used as a template for T7 RNA polymerase. This comprises an amplification step to produce a large amount of RNA-based nucleic acid corresponding to the library of multimeric barcode molecules (since each input PCR molecule can serve as a template to produce a large number of cognate RNA molecules). In the subsequent step, these RNA molecules are then reverse transcribed to create the desired, single-stranded multimeric barcode molecules. Ten (10) microliters of the eluate was mixed with 20 microliters 5× Transcription Buffer (Promega), plus 2.0 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 10 microliters of 0.1 milimolar DTT, plus 4.0 microliters SuperAseln (Ambion), and 4.0 microliters Promega T7 RNA Polymerase (at 20 units per microliter) plus water to a total volume of 100 microliters. The reaction was incubated 4 hours at 37° C., then purified with an RNEasy Mini Kit (Qiagen), and eluted in 50 micoliters H.sub.2O, and added to 6.0 microliters SuperAseln (Ambion).

[1326] The RNA solution produced in the preceding in vitro transcription step is then reverse transcribed (using a primer specific to the 3′ ends of the RNA molecules) and then digested with RNAse H to create single-stranded DNA molecules corresponding to multimeric barcode molecules, to which oligonucleotides maybe be annealed and then barcoded along. In two identical replicate tubes, 23.5 microliters of the eluate was mixed with 5.0 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 3.0 microliters SuperAseln (Ambion), and 10.0 microliters of 2.0 micromolar CS_PCR_REV1 (SEQ ID NO. 272) plus water to final volume of 73.5 microliters. The reaction was incubated on a thermal cycler at 65° C. for 5 minutes, then 50° C. for 60 seconds; then held at 4° C. To the tube was added 20 microliters 5× Reverse Transcription buffer (Invitrogen), plus 5.0 microliters of 0.1 milimolar DTT, and 1.75 microliters Superscript III Reverse Transcriptase (Invitrogen). The reaction was incubated at 55° C. for 45 minutes, then 60° C. for 5 minutes; then 70° C. for 15 minutes, then held at 4° C., then purified with a PCR Cleanup column (Qiagen) and eluted in 40 microliters H.sub.2O.

[1327] Sixty (60) microliters of the eluate was mixed with 7.0 microliters 10×RNAse H Buffer (Promega), plus 4.0 microliters RNAse H (Promega. The reaction was incubated 12 hours at 37° C., then 95° C. for 10 minutes, then held at 4° C., then purified with 0.7× volume (49 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.

[1328] Method 4: Production of Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides

[1329] This method describes steps to produce multimeric barcoding reagents from single-stranded multimeric barcode molecules (as produced in Method 3) and appropriate extension primers and adapter oligonucleotides.

[1330] In a PCR tube, approximately 45 nanograms of single-stranded RNAse H-digested multimeric barcode molecules (as produced in the last step of Method 3) were mixed with 0.25 microliters of 10 micromolar DS_ST_05 (SEQ ID NO. 273, an adapter oligonucleotide) and 0.25 microliters of 10 micromolar US_PCR_Prm_Only_03 (SEQ ID NO. 274, an extension primer), plus 5.0 microliters of 5× Isothermal extension/ligation buffer, plus water to final volume of 19.7 microliters. In order to anneal the adapter oligonucleotides and extension primers to the multimeric barcode molecules, in a thermal cycler, the tube was incubated at 98° C. for 60 seconds, then slowly annealed to 55° C., then held at 55° C. for 60 seconds, then slowly annealed to 50° C. then held at 50° C. for 60 seconds, then slowly annealed to 20° C. at 0.1° C./sec, then held at 4° C. To the tube was added 0.3 microliters (0.625 U) Phusion Polymerase (NEB; 2 U/uL) 2.5 microliters (100 U) Taq DNA Ligase (NEB; 40 U/uL); and 2.5 microliters 100 milimolar DTT. In order to extend the extension primer(s) across the adjacent barcode region(s) of each multimeric barcode molecule, and then to ligate this extension product to the phosphorylated 5′ end of the adapter oligonucleotide annealed to the downstream thereof, the tube was then incubated at 50° C. for 3 minutes, then held at 4° C. The reaction was then purified with a PCR Cleanup column (Qiagen) and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.

[1331] Method 5: Production of Synthetic DNA Templates of Known Sequence

[1332] This method describes a technique to produce synthetic DNA templates with a large number of tandemly-repeated, co-linear molecular sequence identifiers, by circularizing and then tandemly amplifying (with a processive, strand-displacing polymerase) oligonucleotides containing said molecular sequence identifiers. This reagent may then be used to evaluate and measure the multimeric barcoding reagents described herein.

[1333] In a PCR was added 0.4 microliters of 1.0 micromolar Syn_Temp_01 (SEQ ID NO. 275) and 0.4 microliters of 1.0 micromolar ST_Splint_02 (SEQ ID NO. 276) and 10.0 microliters of 10×NEB CutSmart buffer. On a thermal cycler, the tube was incubated at 95° C. for 60 seconds, then held at 75° C. for 5 minutes, then slowly annealed to 20° C. then held at 20° C. for 60 seconds, then held at 4° C. To circularize the molecules through an intramolecular ligation reaction, the tube was then added 10.0 microliters ribo-ATP and 5.0 microliters T4 DNA Ligase (NEB; High Concentration). The tube was then incubated at room temperature for 30 minutes, then at 65° C. for 10 minutes, then slowly annealed to 20° C. then held at 20° C. for 60 seconds, then held at 4° C. To each tube was then added 10×NEB CutSmart buffer, 4.0 microliters of 10 millimolar deoxynucleotide triphosphate nucleotide mix, and 1.5 microliters of diluted phi29 DNA Polymerase (NEB; Diluted 1:20 in 1× CutSmart buffer) plus water to a total volume of 200 microliters. The reaction was incubated at 30° C. for 5 minutes, then held at 4° C., then purified with 0.7× volume (140 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.

[1334] Method 6: Barcoding Synthetic DNA Templates of Known Sequence with Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides

[1335] In a PCR tube were added 10.0 microliters 5× Phusion HF buffer (NEB), plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 2.0 microliters (10 nanograms) 5.0 nanogram/microliters Synthetic DNA Templates of Known Sequence (as produced by Method 5), plus water to final volume of 42.5 microliters. The tube was then incubated at 98° C. for 60 seconds, then held at 20° C. To the tube was added 5.0 microliters of 5.0 picogram/microliter Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides (as produced by Method 4). The reaction was then incubated at 70° C. for 60 seconds, then slowly annealed to 60° C., then 60° C. for five minutes, then slowly annealed to 55° C., then 55° C. for five minutes, then slowly annealed to 50° C., then 50° C. for five minutes, then held at 4° C. To the reaction was added 0.5 microliters of Phusion Polymerase (NEB), plus 2.0 microliters 10 uM SynTemp_PE2_B1_Short1 (SEQ ID NO. 277, a primer that is complementary to part of the extension products produced by annealing and extending the multimeric barcoding reagents created by Method 4 along the synthetic DNA templates created by Method 5, serves as a primer for the primer-extension and then PCR reactions described in this method). Of this reaction, a volume of 5.0 microliters was added to a new PCR tube, which was then incubated for 30 seconds at 55° C., 30 seconds 60° C., and 30 seconds 72° C., then followed by 10 cycles of: 98° C. then 65° C. then 72° C. for 30 seconds each, then held at 4° C. To each tube was then added 9.0 microliters 5× Phusion buffer, plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 1.75 microliters 10 uM SynTemp_PE2_B1_Short1 (SEQ ID NO. 277), plus 1.75 microliters 10 uM US_PCR_Prm_Only_02 (SEQ ID NO. 278, a primer partially complementary to the extension primer employed to generate the multimeric barcoding reagents as per Method 4, and serving as the ‘forward’ primer in this PCR amplification reaction), plus 0.5 microliters Phusion Polymerase (NEB), plus water to final volume of 50 microliters. The PCR tube was placed on a thermal cycler and amplified for 24 cycles of: 98° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C., then purified with 1.2× volume (60 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.

[1336] The resulting library was then barcoded for sample identification by a PCR-based method, amplified, and sequenced by standard methods using a 150-cycle, mid-output NextSeq flowcell (Illumina), and demultiplexed informatically for further analysis.

[1337] Method 7: Barcoding Synthetic DNA Templates of Known Sequence with Multimeric Barcoding Reagents and Separate Adapter Oligonucleotides

[1338] To anneal and extend adapter oligonucleotides along the synthetic DNA templates, in a PCR tube were added 10.0 microliters 5× Phusion HF buffer (NEB), plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 5.0 microliters (25 nanograms) 5.0 nanogram/microliters Synthetic DNA Templates of Known Sequence (as produced by Method 5), plus 0.25 microliters of 10 micromolar DS_ST_05 (SEQ ID NO. 273, an adapter oligonucleotide), plus water to final volume of 49.7 microliters. On a thermal cycler, the tube was incubated at 98° C. for 2 minutes, then 63° C. for 1 minute, then slowly annealed to 60° C. then held at 60° C. for 1 minute, then slowly annealed to 57° C. then held at 57° C. for 1 minute, then slowly annealed to 54° C. then held at 54° C. for 1 minute, then slowly annealed to 50° C. then held at 50° C. for 1 minute, then slowly annealed to 45° C. then held at 45° C. for 1 minute, then slowly annealed to 40° C. then held at 40° C. for 1 minute, then held at 4° C. To the tube was added 0.3 microliters Phusion Polymerase (NEB), and the reaction was incubated at 45° C. for 20 seconds, then 50° C. for 20 seconds, then 55° C. for 20 seconds, 60° C. for 20 seconds, then 72° C. for 20 seconds, then held at 4° C.; the reaction was then purified with 0.8× volume (40 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.

[1339] In order to anneal adapter oligonucleotides (annealed and extended along the synthetic DNA templates as in the previous step) to multimeric barcode molecules, and then to anneal and then extend extension primer(s) across the adjacent barcode region(s) of each multimeric barcode molecule, and then to ligate this extension product to the phosphorylated 5′ end of the adapter oligonucleotide annealed to the downstream thereof, to a PCR tube was added 10 microliters of the eluate from the previous step (containing the synthetic DNA templates along which the adapter oligonucleotides have been annealed and extended), plus 3.0 microliters of a 50.0 nanomolar solution of RNAse H-digested multimeric barcode molecules (as produced in the last step of Method 3), plus 6.0 microliters of 5× Isothermal extension/ligation buffer, plus water to final volume of 26.6 microliters. On a thermal cycler, the tube was incubated at 70° C. for 60 seconds, then slowly annealed to 60° C., then held at 60° C. for 5 minutes, then slowly annealed to 55° C. then held at 55° C. for 5 minutes, then slowly annealed to 50° C. at 0.1° C./sec then held at 50° C. for 30 minutes, then held at 4° C. To the tube was added 0.6 microliters 10 uM US_PCR_Prm_Only_02 (SEQ ID NO: 278, an extension primer), and the reaction was incubated at 50° C. for 10 minutes, then held at 4° C. To the tube was added 0.3 microliters (0.625 U) Phusion Polymerase (NEB; 2 U/uL) 2.5 microliters (100 U) Taq DNA Ligase (NEB; 40 U/uL); and 2.5 microliters 100 milimolar DTT. The tube was then incubated at 50° C. for 5 minutes, then held at 4° C. The reaction was then purified with 0.7× volume (21 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.

[1340] To a new PCR tube was add 25.0 microliters of the eluate, plus 10.0 microliters 5× Phusion HF buffer (NEB), plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 2.0 microliters 10 uM SynTemp_PE2_B1_Short1 (SEQ ID NO: 277; a primer that is complementary to part of the extension products produced by the above steps; serves as a primer for the primer-extension and then PCR reactions described here), plus 0.5 uL Phusion Polymerase (NEB), plus water to final volume of 49.7 microliters. Of this reaction, a volume of 5.0 microliters was added to a new PCR tube, which was then incubated for 30 seconds at 55° C., 30 seconds 60° C., and 30 seconds 72° C., then followed by 10 cycles of: 98° C. then 65° C. then 72° C. for 30 seconds each, then held at 4° C. To each tube was then added 9.0 microliters 5× Phusion buffer, plus 1.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 1.75 microliters 10 uM SynTemp_PE2_B1_Short1 (SEQ ID NO: 277), plus 1.75 microliters 10 uM US_PCR_Prm_Only_02 (SEQ ID NO: 278), plus 0.5 microliters Phusion Polymerase (NEB), plus water to final volume of 50 microliters. The PCR tube was placed on a thermal cycler and amplified for 24 cycles of: 98° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C., then purified with 1.2× volume (60 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.

[1341] The resulting library was then barcoded for sample identification by a PCR-based method, amplified, and sequenced by standard methods using a 150-cycle, mid-output NextSeq flowcell (Illumina), and demultiplexed informatically for further analysis.

[1342] Method 9: Barcoding Genomic DNA Loci with Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides

[1343] This method describes a framework for barcoding targets within specific genomic loci (e.g. barcoding a number of exons within a specific gene) using multimeric barcoding reagents that contain barcoded oligonucleotides. First, a solution of Multimeric Barcode Molecules was produced by In Vitro Transcription and cDNA Synthesis (as described in Method 3). Then, solutions of multimeric barcoding reagents containing barcoded oligonucleotides was produced as described in Method 4, with a modification made such that instead of using an adapter oligonucleotide targeting a synthetic DNA template (i.e. DS_ST_05, SEQ ID NO: 273, as used in Method 4), adapter oligonucleotides targeting the specific genomic loci were included at that step. Specifically, a solution of multimeric barcoding reagents containing appropriate barcoded oligonucleotides was produced individually for each of three different human genes: BRCA1 (containing 7 adapter oligonucleotides, SEQ ID NOs 279-285), HLA-A (containing 3 adapter oligonucleotides, SEQ ID NOs 286-288), and DQB1 (containing 2 adapter oligonucleotides, SEQ ID NOs 289-290). The process of Method 4 was conducted for each of these three solutions as described above. These three solutions were then merged together, in equal volume, and diluted to a final, total concentration all barcoded oligonucleotides of approximately 50 nanomolar.

[1344] In a PCR tube were plus 2.0 microliters 5× Phusion HF buffer (NEB), plus 1.0 microliter of 100 nanogram/microliter human genomic DNA (NA12878 from Coriell Institute) to final volume of 9.0 microliters. In certain variant versions of this protocol, the multimeric barcoding reagents (containing barcoded oligonucleotides) were also added at this step, prior to the high-temperature 98° C. incubation. The reaction was incubated at 98° C. for 120 seconds, then held at 4° C. To the tube was added 1.0 microliters of the above 50 nanomolar solution of multimeric barcode reagents, and then the reaction was incubated for 1 hour at 55° C., then 1 hour at 50° C., then 1 hour at 45° C., then held at 4° C. (Note that for certain samples, this last annealing process was extended to occur overnight, for a total of approximately 4 hours per temperature step).

[1345] In order to add a reverse universal priming sequence to each amplicon sequence (and thus to enable subsequent amplification of the entire library at once, using just one forward and one reverse amplification primer), the reaction was diluted 1:100, and 1.0 microliter of the resulting solution was added in a new PCR tube to 20.0 microliters 5× Phusion HF buffer (NEB), plus 2.0 microliters 10 millimolar deoxynucleotide triphosphate nucleotide mix, plus 1.0 microliters a reverse-primer mixture (equimolar concentration of SEQ ID Nos 291-303, each primer at 5 micromolar concentration), plus 1.0 uL Phusion Polymerase (NEB), plus water to final volume of 100 microliters. The reaction was incubated at 53° C. for 30 seconds, 72° C. for 45 seconds, 98° C. for 90 seconds, then 68° C. for 30 seconds, then 64° C. for 30 seconds, then 72° C. for 30 seconds; then held at 4° C. The reaction was then purified with 0.8× volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 30 microliters H.sub.2O, and quantitated spectrophotometrically.

[1346] The resulting library was then barcoded for sample identification by a PCR-based method, amplified, and sequenced by standard methods using a 150-cycle, mid-output NextSeq flowcell (Illumina), and demultiplexed informatically for further analysis.

[1347] Method 10— Sequencing the Library of Multimeric Barcode Molecules

[1348] Preparing Amplified Selected Molecules for Assessment with High-Throughput Sequencing

[1349] To a PCR tube was added 1.0 microliters of the amplified selected molecule solution, plus 1.0 microliters of 100 micromolar CS_SQ_AMP_REV1 (SEQ ID NO: 16), plus 1.0 microliters of 100 micromolar US_PCR_Prm_Only_02 (SEQ ID NO: 17), plus 10 microliters of 10× Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) plus 84.0 microliters H.sub.2O to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 3 cycles of: 95° C. for 30 seconds, then 56° C. for 30 seconds, then 72° C. for 3 minutes; then held at 4° C. The solution was then purified with 0.8× volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 85 microliters H.sub.2O.

[1350] This solution was then added to a new PCR tube, plus 1.0 microliters of 100 micromolar Illumina_PE1, plus 1.0 microliters of 100 micromolar Illumina_PE2, plus 10 microliters of 10× Thermopol Buffer (NEB) plus 2.0 microliter of 10 millimolar deoxynucleotide triphosphate nucleotide mix (Invitrogen), plus 1.0 microliters Vent Exo-Minus Polymerase (New England Biolabs, at 2 U/uL) to final volume of 100 microliters. The PCR tube was placed on a thermal cycler and amplified for 4 cycles of: 95° C. for 30 seconds, then 64° C. for 30 seconds, then 72° C. for 3 minutes; then 18 cycles of: 95° C. for 30 seconds, then 67° C. for 30 seconds, then 72° C. for 3 minutes; then held at 4° C. The solution was then purified with 0.8× volume (80 microliters) Ampure XP Beads (Agencourt; as per manufacturer's instructions), and eluted in 40 microliters H.sub.2O.

[1351] High-throughput Illumina sequencing was then performed on this sample using a MiSeq sequencer with paired-end, 250-cycle V2 sequencing chemistry.

[1352] Method 11— Assessment of Multimeric Nature of Barcodes Annealed and Extended Along Single Synthetic Template DNA Molecules

[1353] A library of barcoded synthetic DNA templates was created using a solution of multimeric barcoding reagents produced according to a protocol as described generally in Method 3 and Method 4, and using a solution of synthetic DNA templates as described in Method 5, and using a laboratory protocol as described in Method 6; the resulting library was then barcoded for sample identification by a PCR-based method, amplified, and sequenced by standard methods using a 150-cycle, mid-output NextSeq flowcell (Illumina), and demultiplexed informatically for further analysis. The DNA sequencing results from this method were then compared informatically with data produced from Method 10 to assess the degree of overlap between the multimeric barcoding of synthetic DNA templates and the arrangement of said barcodes on individual multimeric barcoding reagents (the results are shown in FIG. 17).

[1354] Method 12— Acquisition of a Sample Comprising Microparticles from In Vitro Embryo Culture Medium

[1355] An in vitro fertilised human embryo (for example, generated by intra-cytoplasmic sperm injection, or any other IVF technique(s) known in the art) is cultured to the blastocyst stage. At least 10 microlitres of a microdrop of culture medium (i.e. in vitro embryo culture medium, such as global total (LifeGlobal)) in which said embryo has incubated for at least 24 hours is carefully aspirated (for example, by gently moving the blastocyst to the edge of its microdrop with a pipette tip so as not to disturb the embryo during aspiration, and then carefully aspirating said at least 10 microlitres of culture medium) and then transferred to an RNAse-free and DNAse-free PCR tube. This sample comprising microparticles from said in vitro embryo culture medium may then be further processed by any method of analyzing microparticles described herein.

[1356] Results

[1357] Structure and Expected Sequence Content of Each Sequence Multimeric Barcoding

[1358] Reagent Molecule

[1359] The library of multimeric barcode molecules synthesised as described in Methods 1 to 3 was prepared for high-throughput sequencing, wherein each molecule sequenced includes a contiguous span of a specific multimeric barcode molecule (including one or more barcode sequences, and one or more associate upstream adapter sequences and/or downstream adapter sequences), all co-linear within the sequenced molecule. This library was then sequenced with paired-end 250 nucleotide reads on a MiSeq sequencer (Illumina) as described. This yielded approximately 13.5 million total molecules sequenced from the library, sequenced once from each end, for a total of approximately 27 million sequence reads.

[1360] Each forward read is expected to start with a six nucleotide sequence, corresponding to the 3′ end of the upstream adapter: TGACCT

[1361] This forward read is followed by the first barcode sequence within the molecule (expected to be 20 nt long).

[1362] This barcode is then followed by an ‘intra-barcode sequence’ (in this case being sequenced in the ‘forward’ direction (which is 82 nucleotides including both the downstream adapter sequence and upstream adapter sequence in series):

TABLE-US-00001 ATACCTGACTGCTCGTCAGTTGAGCGAATTCCGTATGGTGGTACACACCT ACACTACTCGGACGCTCTTCCGATCTTGACCT

[1363] Within the 250 nucleotide forward read, this will then be followed by a second barcode, another intra-barcode sequence, and then a third barcode, and then a fraction of another intra-barcode sequence.

[1364] Each reverse read is expected to start with a sequence corresponding to the downstream adapter sequence:

TABLE-US-00002 GCTCAACTGACGAGCAGTCAGGTAT

[1365] This reverse read is then followed by the first barcode coming in from the opposite end of the molecule (also 20 nucleotides long, but sequenced from the opposite strand of the molecule and thus of the inverse orientation to those sequenced by the forward read)

[1366] This barcode is then followed by the ‘intra-barcode sequence’ but in the inverse orientation (as it is on the opposite strand):

TABLE-US-00003 AGGTCAAGATCGGAAGAGCGTCCGAGTAGTGTAGGTGTGTACCACCATAC GGAATTCGCTCAACTGACGAGCAGTCAGGTAT

[1367] Likewise this 250 nucleotide reverse read will then be followed by a second barcode, another intra-barcode sequence, and then a third barcode, and then a fraction of another intra-barcode sequence.

[1368] Sequence Extraction and Analysis

[1369] With scripting in Python, each associated pair of barcode and flanking upstream-adapter and downstream-adapter sequence were isolated, with each individual barcode sequence of each barcode molecule then isolated, and each barcode sequence that was sequenced within the same molecule being annotated as belonging to the same multimeric barcode molecule in the library of multimeric barcode molecules. A simple analysis script (Networkx; Python) was employed to determine overall multimeric barcode molecule barcode groups, by examining overlap of barcode-barcode pairs across different sequenced molecules. Several metrics of this data were made, including barcode length, sequence content, and the size and complexity of the multimeric barcode molecules across the library of multimeric barcode molecules.

[1370] Number of Nucleotides within Each Barcode Sequence

[1371] Each individual barcode sequence from each barcode molecule, contained within each Illumina-sequenced molecule was isolated, and the total length of each such barcode was determined by counting the number of nucleotides between the upstream adapter molecule sequence, and the downstream adapter molecule sequence. The results are shown in FIG. 10.

[1372] The overwhelming majority of barcodes are 20 nucleotides long, which corresponds to five additions of our four-nucleotide-long sub-barcode molecules from our double-stranded sub-barcode library. This is thus the expected and desired result, and indicates that each ‘cycle’ of: Ligation of Sub-Barcode Library to MlyI-Cleaved Solution, PCR Amplification of the Ligated Library, Uracil Glycosylase Enzyme Digestion, and MlyI Restriction Enzyme Cleavage, was successful and able to efficiently add new four-nucleotide sub-barcode molecules at each cycle, and then was successfully able to amplify and carry these molecules forward through the protocol for continued further processing, including through the five total cycles of sub-barcode addition, to make the final, upstream-adapter-ligated libraries.

[1373] We also used this sequence analysis method to quantitate the total number of unique barcodes in total, across all sequenced multimeric barcode molecules: this amounted to 19,953,626 total unique barcodes, which is essentially identical to the 20 million barcodes that would be expected, given that we synthesised 2 million multimeric barcode molecules, each with approximately 10 individual barcode molecules.

[1374] Together, this data and analysis thus shows that the methods of creating complex, combinatoric barcodes from sub-barcode sequences is effective and useful for the purpose of synthesising multimeric barcode molecules.

[1375] Total Number of Unique Barcode Molecules in Each Multimeric Barcode Molecule

[1376] FIG. 11 shows the results of the quantification of the total number of unique barcode molecules (as determined by their respective barcode sequences) in each sequenced multimeric barcode molecule. As described above, to do this we examined, in the first case, barcode sequences which were present and detected within the same individual molecules sequenced on the sequencer. We then employed an additional step of clustering barcode sequences further, wherein we employed a simple network analysis script (Networkx) which can determine links between individual barcode sequences based both upon explicit knowledge of links (wherein the barcodes are found within the same, contiguous sequenced molecule), and can also determine ‘implicit’ links, wherein two or more barcodes, which are not sequenced within the same sequenced molecule, instead both share a direct link to a common, third barcode sequence (this shared, common link thus dictating that the two first barcode sequences are in fact located on the same multimeric barcode molecule).

[1377] This figure shows that the majority of multimeric barcode molecules sequenced within our reaction have two or more unique barcodes contained therein, thus showing that, through our Overlap-Extension PCR linking process, we are able to link together multiple barcode molecules into multimeric barcode molecules. Whilst we would expect to see more multimeric barcode molecules exhibiting closer to the expected number of barcode molecules (10), we expect that this observed effect is due to insufficiently high sequencing depth, and that with a greater number of sequenced molecules, we would be able to observe a greater fraction of the true links between individual barcode molecules. This data nonetheless suggest that the fundamental synthesis procedure we describe here is efficacious for the intended purpose.

[1378] Representative Multimeric Barcode Molecules

[1379] FIG. 12 shows representative multimeric barcode molecules that have been detected by our analysis script. In this figure, each ‘node’ is a single barcode molecule (from its associated barcode sequence), each line is a ‘direct link’ between two barcode molecules that have been sequenced at least once in the same sequenced molecule, and each cluster of nodes is an individual multimeric barcode molecule, containing both barcodes with direct links and those within implicit, indirect links as determined by our analysis script. The inset figure includes a single multimeric barcode molecule, and the sequences of its constituent barcode molecules contained therein.

[1380] This figure illustrates the our multimeric barcode molecule synthesis procedure: that we are able to construct barcode molecules from sub-barcode molecule libraries, that we are able to link multiple barcode molecules with an overlap-extension PCR reaction, that we are able to isolate a quantitatively known number of individual multimeric barcode molecules, and that we are able to amplify these and subject them to downstream analysis and use.

[1381] Barcoding Synthetic DNA Templates of Known Sequence with (i) Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides, and (ii) Multimeric Barcoding Reagents and Separate Adapter Oligonucleotides

[1382] Sequence Extraction and Analysis

[1383] With scripting in Python and implemented in an Amazon Web Services (AWS) framework, for each sequence read following sample-demultiplexing, each barcode region from the given multimeric barcode reagent was isolated from its flanking upstream-adapter and downstream-adapter sequence. Likewise, each molecular sequence identifier region from the given synthetic DNA template molecule was isolated from its flanking upstream and downstream sequences. This process was repeated for each molecule in the sample library; a single filtering step was performed in which individual barcodes and molecular sequence identifiers that were present in only a single read (thus likely to represent either sequencing error or error from the enzymatic sample-preparation process) were censored from the data. For each molecular sequence identifier, the total number of unique (ie with different sequences) barcode regions found associated therewith within single sequence reads was quantitated. A histogram plot was then created to visualize the distribution of this number across all molecular sequence identifiers found in the library.

[1384] Discussion

[1385] FIG. 13 shows the results of this analysis for Method 6 (Barcoding Synthetic DNA Templates of Known Sequence with Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides). This figure makes clear that the majority of multimeric barcoding reagents are able to successfully label two or more of the tandemly-repeated copies of each molecular sequence identifier with which they are associated. A distribution from 1 to approximately 5 or 6 ‘labelling events’ is observed, indicating that there may be a degree of stochastic interactions that occur with this system, perhaps due to incomplete enzymatic reactions, or steric hindrance at barcode reagent/synthetic template interface, or other factors.

[1386] FIG. 14 shows the results of this same analysis conducted using Method 7 (Barcoding Oligonucleotides Synthetic DNA Templates of Known Sequence with Multimeric Barcode Molecules and Separate Adapter Oligonucleotides). This figure also clearly shows that the majority of multimeric barcoding reagents are able to successfully label two or more of the tandemly-repeated copies of each molecular sequence identifier with which they are associated, with a similar distribution to that observed for the previous analysis.

[1387] Together, these two figures show that this framework for multimeric molecular barcoding is an effective one, and furthermore that the framework can be configured in different methodologic ways. FIG. 13 shows results based on a method in which the framework is configured such that the multimeric barcode reagents already contain barcoded oligonucleotides, prior to their being contacted with a target (synthetic) DNA template. In contrast, FIG. 14 shows results based on an alternative method in which the adapter oligonucleotides first contact the synthetic DNA template, and then in a subsequent step the adapter oligonucleotides are barcoded through contact with a multimeric barcode reagent. Together these figures demonstrate both the multimeric barcoding ability of these reagents, and their versatility in different key laboratory protocols.

[1388] To analyse whether, and the extent to which, individual multimeric barcoding reagents successfully label two or more sub-sequences of the same synthetic DNA template, the groups of different barcodes on each individual multimeric barcoding reagent in the library (as predicted from the Networkx analysis described in the preceding paragraph and as illustrated in FIG. 12) was compared with the barcodes annealed and extended along single synthetic DNA templates (as described in Method 11). Each group of barcodes found on individual multimeric barcoding reagents was given a numeric ‘reagent identifier label’. For each synthetic DNA template molecular sequence identifier (i.e., for each individual synthetic DNA template molecule) that was represented in the sequencing data of Method 11 by two or more barcodes (i.e., wherein two or more sub-sequences of the synthetic template molecule were annealed and extended by a barcoded oligonucleotide), the corresponding ‘reagent identifier label’ was determined. For each such synthetic template molecule, the total number of multimeric barcodes coming from the same, single multimeric barcoding reagent was then calculated (i.e., the number of different sub-sequences in the synthetic template molecule that were labeled by a different barcoded oligonucleotide but from the same, single multimeric barcoding reagent was calculated). This analysis was then repeated and compared with a ‘negative control’ condition, in which the barcodes assigned to each ‘reagent identifier label’ were randomized (i.e. the same barcode sequences remain present in the data, but they no longer correspond to the actual molecular linkage of different barcode sequences across the library of multimeric barcoding reagents).

[1389] The data from this analysis is shown in FIG. 17, for both the actual experimental data and for the control data with randomized barcode assignments (note the logarithmic scale of the vertical axis). As this figure shows, though the number of unique barcoding events per target synthetic DNA template molecule is small, they overlap almost perfectly with the known barcode content of individual multimeric barcoding reagents. That is, when compared with the randomized barcode data (which contains essentially no template molecules that appear to be ‘multivalently barcoded’), the overwhelming majority (over 99.9%) of template molecules in the actual experiment that appear to be labeled by multiple barcoded oligonucleotides from the same, individual multimeric barcoding reagent, are in fact labeled multiply by the same, single reagents in solution. By contrast, if there were no non-random association between the different barcodes that labelled individual synthetic DNA templates (that is, if FIG. 17 showed no difference between the actual experimental data and the randomized data), then this would have indicated that the barcoding had not occurred in a spatially-constrained manner as directed by the multimeric barcoding reagents. However, as explained above, the data indicates convincingly that the desired barcoding reactions did occur, in which sub-sequences found on single synthetic DNA templates interacted with (and were then barcoded by) only single, individual multimeric barcoding reagents.

[1390] Barcoding Genomic DNA Loci with Multimeric Barcoding Reagents Containing Barcoded Oligonucleotides

[1391] Sequence Extraction and Analysis

[1392] As with other analysis, scripting was composed in Python and implemented in an Amazon Web Services (AWS) framework. For each sequence read following sample-demultiplexing, each barcode region from the given multimeric barcode reagent was isolated from its flanking upstream-adapter and downstream-adapter sequence and recorded independently for further analysis. Likewise, each sequence to the 3′ end of the downstream region (representing sequence containing the barcoded oligonucleotide, and any sequences that the oligonucleotide had primed along during the experimental protocol) was isolated for further analysis. Each downstream sequence of each read was analysed for the presence of expected adapter oligonucleotide sequences (i.e. from the primers corresponding to one of the three genes to which the oligonucleotides were directed) and relevant additional downstream sequences. Each read was then recorded as being either ‘on-target’ (with sequence corresponding to one of the expected, targeted sequence) or ‘off-target’. Furthermore, for each of the targeted regions, the total number of unique multimeric barcodes (i.e. with identical but duplicate barcodes merged into a single-copy representation) was calculated. A schematic of each expected sequence read, and the constituent components thereof, is shown in FIG. 16.

[1393] Discussion

[1394] FIG. 15 shows the results of this analysis for this method, for four different independent samples. These four samples represent a method wherein the process of annealing the multimeric barcode reagents took place for either 3 hours, or overnight (approximately 12 hours). Further, for each of these two conditions, the method was performed either with the multimeric barcode reagents retained intact as originally synthesized, or with a modified protocol in which the barcoded oligonucleotides are first denatured away from the barcode molecules themselves (through a high-temperature melting step). Each row represents a different amplicon target as indicated, and each cell represents the total number of unique barcode found associated with each amplicon in each of the four samples. Also listed is the total proportion of on-target reads, across all targets summed together, for each sample.

[1395] As seen in the figure, the majority of reads across all samples are on-target; however there is seen a large range in the number of unique barcode molecules observed for each amplicon target. These trends across different amplicons seem to be consistent across the different experimental conditions, and could be due to different priming (or mis-priming) efficiencies of the different oligonucleotides, or different amplification efficiencies, or different mapping efficiencies, plus potential other factors acting independently or in combination. Furthermore, it is clear that the samples that were annealed for longer have a larger number of barcodes observed, likely due to more complete overall annealing of the multimeric reagents to their cognate genomic targets. And furthermore, the samples where the barcoded oligonucleotides were first denatured from the barcode molecules show lower overall numbers of unique barcodes, perhaps owing to an avidity effect wherein fully assembled barcode molecules can more effectively anneal clusters of primers to nearby genomic targets at the same locus. In any case, taken together, this figure illustrates the capacity of multimeric reagents to label genomic DNA molecules, across a large number of molecules simultaneously, and to do so whether the barcoded oligonucleotides remain bound on the multimeric barcoding reagents or whether they have been denatured therefrom and thus potentially able to diffuse more readily in solution.

Example 2

[1396] Materials and Methods for Linking Sequences from Microparticles

[1397] All experimental steps are conducted in a contamination-controlled laboratory environment, including the use of standard physical laboratory separations (E.g. pre-PCR and post-PCR laboratories).

[1398] Protocol for Isolating a Microparticle Specimen

[1399] A standard blood sample (e.g. 5-15 mL in total) is taken from a subject, and processed with a blood fractionation method using EDTA-containing tubes to isolate the plasma fraction, using centrifugation at 800×G for 10 minutes. Then a cellular plasma fraction is then carefully isolated and centrifuged at 800×G for 10 minutes to pellet remaining intact cells. The supernatant is then carefully isolated for further processing. The supernatant is then centrifuged at 3000×G for 30 minutes to pellet a microparticle fraction (a high-speed centrifugation mode at 20,000×G for 30 minutes is used to pellet a higher-concentration microparticle specimen); then the resulting supernatant is carefully removed, and the pellet is resuspended in an appropriate buffer for the following processing step. An aliquot from the resuspended pellet is taken and used to quantitate the concentration of DNA in the resuspended pellet (e.g. using a standard fluorescent nucleic acid staining method such as PicoGreen, ThermoFisher Scientific). The specimen is adjusted in volume to achieve an appropriate concentration for subsequent processing steps.

[1400] Protocol for Partitioning and PCR-Amplification

[1401] Following the process of isolating a microparticle specimen as above, the pellet is resuspended in a PCR buffer comprising a full solution of 1×PCR buffer, PCR polymerase enzyme, dNTPs, and a set of primer pairs; a polymerase and PCR buffer appropriate for direct PCR is employed. This resuspending step is performed such that each 5 microliters of the resuspended solution contains approximately 0.1 picograms of DNA from the microparticle specimen itself. A panel of 5-10 primer pairs (a greater number is used for larger amplicon panels) covering one or more gene targets is designed using a multiplex PCR design algorithm (e.g. PrimerPlex; PREMIER Biosoft) to minimise cross-priming and to achieve approximately equal annealing temperatures across all primers; each amplicon length is locked between 70 and 120 nucleotides; each forward primer has a constant forward adapter sequence at its 5′ end, and each reverse primer has a constant reverse adapter sequence at its 5′ end, and the primers are included in the polymerase reaction at equimolar concentrations. The resuspended sample is then spread across a set of PCR tubes (or individual wells in a 384-well plate format) with 5.0 microliters of the reaction solution included in each tube/well; up to 384 or more individual reactions are performed as the total amount of DNA in the microparticle specimen allows; 10-15 PCR cycles are performed for subsequent barcoding with barcoded oligonucleotides; 22-28 PCR cycles are performed for subsequent barcoding with multimeric barcoding reagents.

[1402] Protocol for Barcoding with Barcoded Oligonucleotides

[1403] Following the protocol of PCR amplification as above, barcoded oligonucleotides are added to each well, with each forward barcoded oligonucleotide comprising the forward adapter sequence at its 3′ end, a forward (read 1) Illumina sequencing primer sequence on its 5′ end, and a 6-nucleotide barcode sequence between the two; a reverse primer containing a reverse (read 2) Illumina amplification sequence on its 5′ end and the reverse adapter sequence at its 3′ end is used. A different single barcoded oligonucleotide (i.e. containing a different barcode sequence) is used for each well. The PCR reaction volume is adjusted to 50 microliters to dilute the target-specific primers, and 8-12 PCR cycles are performed to append barcode sequences to the sequences within each tube/well. The amplification products from each well are purified using a SPRI cleanup/size-selection step (Agencourt Ampure XP, Beckman-Coulter Genomics), and the resulting purified products from all wells are merged into a single solution. A final PCR reaction using the full-length Illumina amplification primers (PE PCR Primer 1.0/2.0) is performed for 7-12 cycles to amplify the merged products to the appropriate concentration for loading onto an Illumina flowcell, and the resulting reaction is SPRI purified/size-selected and quantitated.

[1404] Protocol for Barcoding with Multimeric Barcoding Reagents

[1405] To append barcode sequences with multimeric barcoding reagents, following the process of PCR amplification as above, PCR amplification products from individual wells are purified with a SPRI purification step, and then resuspended in 1×PCR reaction buffer (with dNTPs) in individual wells without merging or cross-contaminating the samples from different wells. From a library of at least 10 million different multimeric barcoding reagents, an aliquot containing approximately 5 multimeric barcoding reagents is then added to each well, wherein each multimeric barcoding reagent is a contiguous multimeric barcode molecule made of 10-30 individual barcode molecules, with each barcode molecule comprising a barcode region with a different sequence from the other barcode molecules, and with a barcoded oligonucleotide annealed to each barcode molecule. Each barcoded oligonucleotide contains a forward (read 1) Illumina sequencing primer sequence on its 5′ end, and the forward adapter sequence (also contained in the forward PCR primers) at its 3′ end, with its barcode sequence within the middle section. A reverse primer containing a reverse (read 2) Illumina amplification sequence on its 5′ end and the reverse adapter sequence at its 3′ end is also included in the reaction mixture. A hot-start polymerase is used for this barcode-appending reaction. The polymerase is first activated at its activation temperature, and then 5-10 PCR cycles are performed with the annealing step performed at the forward/reverse adapter annealing temperature to extend the barcoded oligonucleotides along the PCR-amplified products, and to extend the reverse Illumina amplification sequence to these primer-extension products. The resulting products from each well are purified using a SPRI cleanup/size-selection, and the resulting purified products from all wells are merged into a single solution. A final PCR reaction using the full-length Illumina amplification primers (PE PCR Primer 1.0/2.0) is performed for 7-12 cycles to amplify the merged products to the appropriate concentration for loading onto an Illumina flowcell, and the resulting reaction is SPRI purified/size-selected and quantitated.

[1406] Protocol for Sequencing and Informatic Analysis

[1407] Following barcoding and amplification protocols, amplified samples are quantitated and sequenced on Illumina sequencers (e.g. HiSeq 2500). Prior to loading, samples are combined with sequencer-ready phiX genomic DNA libraries such that phiX molecules comprise 50-70% of the final molar fraction of the combined libraries. Combined samples are then each loaded onto one or more lanes of the flowcell at the recommended concentration for clustering. Samples are sequenced to a read depth wherein each individual barcoded sequence is sequenced on average by 5-10 reads, using paired-end 2×100 sequencing cycles. Raw sequences are then quality-trimmed and length-trimmed, constant adapter/primer sequences are trimmed away, and the genomic DNA sequences and barcode sequences from each retained sequence read are isolated informatically. Linked sequences are determined by detecting genomic DNA sequences that are appended to the same barcode sequence, or appended to different barcode sequences from the same set of barcode sequences (i.e. from the same multimeric barcoding reagent).

[1408] Protocol for Barcoding Fragments of Genomic DNA using Barcoded Oligonucleotides

[1409] To isolate microparticles from whole blood, 1.0 mililiters of whole human blood (collected with K2 EDTA tubes) were added to each of two 1.5 mililiter Eppendorf DNA Lo-Bind tubes, and centrifuged in a desktop microcentrifuge for 5 minutes at 500×G; the resulting top (supernatant) layer (approximately 400 microliters from each tube) were then added to new 1.5 mililiter Eppendorf DNA Lo-Bind tubes, and again centrifuged in a desktop microcentrifuge for 5 minutes at 500×G; the resulting top (supernatant) layer (approximately 300 microliters from each tube) were then added to new 1.5 mililiter Eppendorf DNA Lo-Bind tubes, and centrifuged in a desktop microcentrifuge for 15 minutes at 3000×G; the resulting supernatant layer was fully and carefully aspirated, and the pellet in each tube was resuspend in 10 microliters Phosphate-Buffered Saline (PBS) and then the two 10 microliter resuspended samples were merged into a single 20 microliter sample (producing the sample for ‘Variant A’ of the present method).

[1410] In a related variant of the method (‘Variant C’), an aliquot of this original 20 microliter sample was transferred to a new 1.5 mililiter Eppendorf DNA Lo-Bind tube, and centrifuged for 5 minutes at 1500×G, with the resulting pellet then resuspended in PBS and aliquoted into low-concentration solutions as described below.

[1411] Microparticles within the aforementioned 20 microliter sample (and/or from the resuspend ‘Variant C’ sample) were then partitioned prior to appending barcoded oligonucleotides. To partition low numbers of microparticles per partition, the 20-microliter sample was aliquoted into solutions containing lower microparticle concentrations; 8 solutions with different concentrations were used, with the first being the original (undiluted) 20-microliter sample, and each of the subsequent 7 solutions having a 2.5-fold lower microparticle concentration (in PBS) relative to the preceding solution. A 0.5 microliter aliquot of each solution was then added to 9.5 microliters of 1.22×‘NEBNext Ultra II End Prep Reaction Buffer’ (New England Biolabs) in H2O in 200 microliter PCR tubes (Flat cap; from Axygen) and mixed gently. To permeabilise the microparticles, tubes were heated at 65 degrees Celsius for 30 minutes on a thermal cycler with a heated lid. To each tube was added 0.5 microliters ‘NEBNext Ultra II End Prep Enzyme Mix’ and mixed the solutions were mixed gently; the solutions were incubated at 20 degrees Celsius for 30 minutes and then 65 degrees Celsius for 30 minutes on a thermal cycler.

[1412] To each tube was added 5.0 microliters ‘NEBNext Ultra II Ligation Master Mix’, and 0.33 microliters 0.5× (in H2O) ‘NEBNext Ligation Enhancer’, and 0.42 microliters 0.04× (in 0.1× NEBuffer 3) ‘NEBNext Adapter’, and the solutions were mixed gently; the solutions were then incubated at 20 degrees Celsius for 15 minutes (or for 2 hours in “Variant B” of this method) on a thermal cycler with the heated lid turned off. To each tube was added 0.5 microliters ‘NEBNext USER Enzyme’, and the solutions were mixed gently; the solutions were then incubated at 20 degrees Celsius for 20 minutes at 37 degrees Celsius for 30 minutes on a thermal cycler with a heated lid set to 50 degrees Celsius, and then held at 4 degrees Celsius. Each reaction was then purified with 1.1×-volume Ampure XP SPRI beads (Agencourt; as per manufacturer's instructions) and eluted in 21.0 microliters H2O. This process of ligating ‘NEBNext Adapter’ sequences to fragments of genomic DNA from partitioned microparticles provides a process of appending a coupling sequence to said fragments (wherein the ‘NEBNext Adapter’ itself, which comprises partially double-stranded and partially single-stranded sequences, comprises said coupling sequences, wherein the process of appending coupling sequence is performed with a ligation reaction). In a subsequent step of the process, barcoded oligonucleotides are appended to fragments of genomic DNA from partitioned microparticles with an annealing and extension process (performed via a PCR reaction).

[1413] In ‘Variant B’ of this method, following the above USER enzyme step but prior to Ampure XP purification, the USER-digested samples were added to 50.0 microliters ‘NEBNext Ultra II Q5 Master Mix’, and 2.5 microliters ‘Universal PCR Primer for Illumina’, and 2.5 microliters of a specific ‘NEBNext Index Primer’ [from NEBNext Multiplex Oligos Index Primers Set 1 or Index Primers Set 2], and 28.2 microliters H2O, and the solutions were mixed gently, and then amplified by 5 cycles PCR in a thermal cycler, with each cycle being: 98 degrees Celsius for 20 seconds, and 65 degrees Celsius for 3 minutes. Each reaction was then purified with 0.95×-volume Ampure XP SPRI beads (Agencourt; as per manufacturer's instructions) and eluted in 21.0 microliters H2O.

[1414] Ampure XP-purified solutions (either following USER-digestion or following the initial PCR amplification process for ‘Variant B’ of the methods) (20.0 microliters each) were then added to 25.0 microliters ‘NEBNext Ultra II Q5 Master Mix’, and 2.5 microliters ‘Universal PCR Primer for Illumina’, and 2.5 microliters of a specific ‘NEBNext Index Primer’, and the solutions were mixed gently, and then amplified by 28 (Or 26 cycles for Variant B) cycles PCR in a thermal cycler, with each cycle being: 98 degrees Celsius for 10 seconds, and 65 degrees Celsius for 75 seconds; with a single final extension step of 75 degrees Celsius for 5 minutes. Each reaction was then purified with 0.9×-volume Ampure XP SPRI beads (Agencourt; as per manufacturer's instructions) and eluted in 25.0 microliters H2O. These steps of PCR append barcode sequences to the sequences of fragments of genomic DNA from microparticles, wherein the barcode sequences are comprised within barcoded oligonucleotides (i.e. comprised within the specific ‘NEBNext Index Primer’ employed within each PCR reaction). In each primer-binding and extension step of the PCR reactions, the barcoded oligonucleotides hybridise to coupling sequences (e.g. the sequences within the ‘NEBNext Adapter’) and then are used to prime an extension step, wherein the 3′ end of the barcoded oligonucleotide is extended to produce a sequence comprising both the barcode sequence and a sequence of a fragment of genomic DNA from a microparticle. One barcoded oligonucleotide (and thus one barcode sequence) was employed per PCR reaction, with different barcode sequences used for each of the different PCR reactions. Therefore, sequences of fragments of genomic DNA from microparticles in each partition were appended to a single barcode sequence, which links the set of sequences from the partition. The set of sequences in each of the partitions was linked by a different barcode sequence.

[1415] To create a negative-control sample, a separate 20-microliter sample of microparticles was prepared as in the first paragraph above, but then the fragments of genomic DNA therein were isolated and purified with a Qiagen DNEasy purification kit (using the spin-column and centrifugation protocol as per the Qiagen manufacturer's instructions), and eluted in 50 microliters H2O, and then being processed with the NEBNext End Prep, Ligation, USER, and PCR processing steps as described above. This negative-control sample was employed to analyse the sequencing signals and readouts wherein fragments of genomic DNA from a very large number of microparticles are analysed (i.e. wherein no linking of sequences from one or a small number of microparticles has been performed).

[1416] Following the above steps of centrifuging and partitioning microparticles, and then appending coupling sequences, appending barcode sequences, and PCR amplification and purification, several barcoded libraries comprising sequences from fragments of genomic DNA from microparticles were then merged and sequenced on a Mid-Output Illumina NextSeq 500 flowcell for 150 cycles performed with paired-end reads (100×50), plus a separate (forward-direction) Index Read (to determine the barcode sequences appended with the barcoded oligonucleotides). Typically, between 6 and 12 barcoded libraries (i.e. comprising one barcoded set of linked sequences per library) were merged and sequenced per flowcell; coverage of at least several million total reads were achieved per barcoded library. Sequence reads were demultiplexed according to the barcode within the index read, sequences from each barcoded partition were mapped with Bowtie2 to the reference human genome sequence (hg38), and then mapped (and de-duplicated) sequences were imported into Seqmonk (version 1.39.0) for visualisation, quantitation, and analysis. In typical representative analyses, reads were mapped into sliding windows of 500 Kb along each human chromosome and then the total number of reads across each such window were quantitated and visualised.

[1417] Key experimental results of these barcoded oligonucleotide methods are shown in FIGS. 25-29, and described in further detail here:

[1418] FIG. 25 illustrates the linkage of sequences of fragments of genomic DNA within a representative microparticle, as produced by a method of appending barcoded oligonucleotides (from the ‘Variant A’ version of the example protocol). Shown is the density of sequence reads across all chromosomes in the human genome within 500 kilobase (Kb) sliding windows tiled across each chromosome. Two clear, self-contained clusters of reads are observed, approximately 200 Kb and 500 Kb in total span respectively. Notably, both of the two read clusters are on the same chromosome, and furthermore are from nearby portions of the same chromosome arm (on chromosome 14), thus confirming the suspicion that, indeed, multiple intramolecular chromosomal structures may be packaged into singular microparticles, whereupon fragments of genomic DNA derived therefrom circulate within the human vasculature.

[1419] FIG. 26 also illustrates the linkage of sequences of fragments of genomic DNA within a microparticle, but as produced by a variant method of appending barcoded oligonucleotides (from the ‘Variant B’ version of the example protocol) wherein the duration of ligation is increased relative to ‘Variant A’. Shown again is the density of sequence reads across all chromosomes in the human genome, with clear clustering of reads within singular chromosomal segments (on chromosome 1 and chromosome 12 respectively). It is possible that the partition employed in this experiment comprised two different microparticles, in which case it is likely that one read cluster arose from each microparticle; alternatively, it is possible that a single microparticle contained a read cluster from each of chromosomes 1 and 12, which would thus demonstrate that inter-molecular chromosomal structures may also be packaged into singular microparticles which then circulate through the blood.

[1420] FIG. 27 illustrates the linkage of sequences of fragments of genomic DNA within a microparticle, as produced by a method of appending barcoded oligonucleotides (from the ‘Variant B’ version of the example protocol). Shown are the actual sequence reads (of the read cluster from chromosome 12 from FIG. 26) zoomed in within a large and then within a small chromosomal segment, to show the focal, high-density nature of these linked reads, and to demonstrate the fact that the read clusters comprise clear, contiguous clusters of sequences from individual chromosome molecules from single cells, even down to the level of demonstrating immediately adjacent, non-overlapping, nucleosomally-positioned fragments.

[1421] FIG. 28 illustrates the linkage of sequences of fragments of genomic DNA within a microparticle, as produced by a method of appending barcoded oligonucleotides (from the ‘Variant C’ version of the example protocol). In contrast to Variant A and Variant B, this Variant C experiment employed a lower-speed centrifugation process to isolate a different, larger population of microparticles compared with the other two variants. Shown is the density of sequence reads across all chromosomes in the human genome, from this experiment, again with clear clustering of reads observed within singular chromosomal segments. However, such segments are clearly larger in chromosomal span than in the other Variant methods (due to the larger microparticles being pelleted within Variant C compared with Variants A or B).

[1422] FIG. 29 illustrates a negative-control experiment, wherein fragments of genomic DNA are purified with a cleanup kit (Qiagen DNEasy Spin Column Kit) (i.e. therefore being unlinked) before being appending to barcoded oligonucleotides as in the ‘Variant A’ protocol. As would be expected given the input sample of unlinked reads, no clustering of reads is observed at all (rather, what reads do exist are dispersed randomly and essentially evenly throughout all chromosomal regions of the genome), validating that microparticles comprise fragments of genomic DNA from focal, contiguous genomic regions within individual chromosomes. Even with further random sampling/sub-sampling of reads from said control library, no read clusters are observed.

Example 3

[1423] Materials and Methods for Measuring Sets of Linked Signals from Target Biomolecules

[1424] Protocol for CD2 Protein Measurement and Selection

[1425] To measure CD2 protein levels on microparticles, microparticles were isolated and resuspended in phosphate buffered saline (PBS) as described above, and were then incubated with 10 uL washed CD2 Dynabeads (Invitrogen, catalogue number 11159D) for 20 minutes at 4 degrees Celcius. Following bead-sample incubation and binding, the reaction mixture was bound by a magnet and the resulting supernatant (bead-unbound) phase containing ‘CD2-negative’ microparticles was aspirated and transferred to a new tube, and the beads with bound ‘CD2-positive’ microparticles was released from the magnet and resuspended in PBS. The CD2-negative and CD2-positive were then partitioned and aliquoted into low-concentration solutions as described above and then individual aliquots were barcoded and prepared for sequencing with a NEBNext sample-preparation kit as described above; a fraction of the CD2-negative was also then further processed for methylation and PMCA measurement as described below.

[1426] Protocol for Measurement and Enrichment of 5-Methylcytosine-Modified DNA

[1427] To measure 5-methylcytosine-modified DNA within fragments of genomic DNA within microparticles, CD2-negative microparticles were isolated as described above, and then partitioned and aliquoted as described above, and then fragments of genomic DNA from the aliquoted and partitioned microparticles were released from said microparticles by incubation at 65 degrees Celsius for 30 minutes as described above, and then the ends of the fragments of genomic DNA were end-repaired, A-tailed, ligated to adapters and then digested with USER enzyme with a NEBNext sample-preparation kit as described above, and then samples were diluted 5-fold by volume in 1× CutSmart buffer (New England Biolabs) and then digested at 37 degrees Celsius for 30 minutes with 1.0 uL HpaII enzyme (New England Biolabs), which digests unmethylated DNA at CCGG sites but which is inhibited from digesting by methylated CCGG sites, thus enriching for fragments of DNA comprising methylated CCGG sequences compared with unmethylated CCGG sequences. The resulting samples were then PCR-amplified with partition barcodes using a ‘NEBNext Ultra II Q5 Master Mix” and ‘NEBNext Index Primers’ and then cleaned up with Ampure XP beads as described previously. Resulting barcoded and amplified samples were quantitated, pooled, and sequenced on a V2 2×25 basepair MiSeq flowcell (Illumina) such that each individual barcoded sample produced approximately 1 million total sequence reads; data was mapped with Bowtie2 (in the Galaxy cloud-based informatics suite) to the human reference sequence and analysed further in SeqMonk genomics software as described previously.

[1428] Synthesis of Barcoded Affinity Probes

[1429] To synthesise barcoded affinity probes against PMCA (Plasma membrane calcium ATPase protein), two complementary oligonucleotides were synthesised (PolyT_5 AM_3 dT_1 and PolyT_5 AM_3 dT_COMPL1 by Integrated DNA Technologies), with each comprising outer forward and reverse sequences for the NEBNext Index primers and an internal synthetic barcode sequence, and each blocked on the 3′ end with an inverted dT base, and with PolyT_5 AM_3 dT_1 comprising a 5′ C12 amino modifier (for activation and conjugation to an antibody). The oligonucleotides were annealed to each other using a slow primer-annealing cycle on a thermal cycler, cleaned up with 2.8× Ampure XP beads, and resuspended in H2O. and then 100 microliters of 42 micromolar purified, annealed oligonucleotide was conjugated to 100 micrograms of an affinity-purified monoclonal antibody against human PMCA protein (ab2783, Abcam) with the ThunderLink PLUS Oligo Conjugation System (Expedeon, catalogue number 425-0300) as per manufacturer's directions, with activated oligo material conjugated to activated antibody material at a 1:2 volumetric ratio, and then diluted 1:400 in PBS, and then used as a barcoded affinity probe for PMCA measurement as below.

TABLE-US-00004 PolyT_5AM_3dT_1: /5AmMC12/TTCCCTACACGACGCTCTTCCGATCTCAGTTAGATACAACG TGACCTGAGCAGTCTTAGCGAGATCGGAAGAGCACACGTCTGAACT*C*/ 3InvdT/ PolyT_5AM_3dT_COMPL1: G*A*GTTCAGACGTGTGCTCTTCCGATCTCGCTAAGACTGCTCAGGTCAC GTTGTATCTAACTGAGATCGGAAGAGCGTCGTGTAGGGA*A*/3InvdT/ In the above sequences: * = phosphorothioate bond /5AmMC12/ = 5-prime terminal amino modifier with C12 linker /3InvdT/ = 3-prime terminal inverted dT base

[1430] Protocol for PMCA Protein Measurement

[1431] To measure PMCA protein levels on microparticles, CD2-negative microparticles were isolated as described above, and then 20 microliters CD2-negative microparticles were incubated with 1.0 microliter of 1:400 diluted barcoded affinity probe against PMCA for 30 minutes at 4 degrees Celsius. The sample was then centrifuged at 3000×G for 15 minutes at room temperature, the supernatant was aspirated (with care taken not to disturb the pellet), and the pellet was washed with 300 microliters PBS and then again centrifuged at 3000×G for 15 minutes at room temperature, with the supernatant again aspirated (with care again taken not to disturb the pellet), and the resulting washed, barcoded affinity probe-bound microparticle sample was resuspended in 25 microliters PBS. The resulting microparticle sample was then partitioned and aliquoted into low-concentration solutions as described above and then individual aliquots were barcoded and prepared for sequencing with a NEBNext sample-preparation kit as described above. The resulting x samples were then PCR-amplified with partition barcodes using a ‘NEBNext Ultra II Q5 Master Mix” and ‘NEBNext Index Primers’ and then cleaned up with Ampure XP beads as described previously. Resulting barcoded and amplified samples were quantitated, pooled, and sequenced on a V2 2×25 basepair MiSeq flowcell (Illumina) such that each individual barcoded sample produced approximately 1 million total sequence reads; data was mapped with Bowtie 2 (in the Galaxy cloud-based informatics suite) to the human reference sequence and analysed further in SeqMonk genomics software as described previously. Reads comprising the internal synthetic barcode sequences from PMCA barcoded affinity probes were detected, quantitated and analysed separately for each barcoded library.

[1432] In FIG. 33, at the top of the figure is shown the schematic of an experimental method wherein a sample of microparticles is generated and then incubated with a solution of beads, wherein the beads are conjugated to antibodies for the CD2 protein (which is found on the membrane of a subset of immune cells and on microparticles that will derive therefrom). Following a process of allowing CD2-positive microparticles (ie microparticles with a high concentration of CD2 protein on their surface) to bind to the anti-CD2 beads, a magnet is used to collect the beads and the microparticles bound thereto (thus performing a measurement of and selection for CD2 protein comprised on the beads). The supernatant (comprising CD2-negative microparticles) and the bead-bound fraction (containing CD2-positive microparticles) are then diluted and partitioned into partitions, and the nucleic acid content (i.e. fragments of genomic DNA) comprised within each partition is appended to a partition-associated barcode, and then barcoded nucleic acids across several partitions are pooled and sequenced.

[1433] At the bottom of the figure is shown sequences of fragments of genomic DNA within two representative microparticle partitions, as produced by a method of appending barcoded oligonucleotides, and taken from the CD2-positive pool (left) and from the CD2-negative pool. Shown is the density of sequence reads across all chromosomes in the human genome within 2 Megabase (Mb) sliding windows tiled across each chromosome. Clear, self-contained clusters of reads are observed, of varying but large sizes, showing that measurement of a target polypeptide (CD2 in this example) from microparticles, combined with measurement of many linked fragments of genomic DNA, is achievable by these experimental methods.

[1434] In FIG. 34, at the top of the figure is shown the schematic of an experimental method wherein a sample of microparticles is generated and then incubated with a solution of beads, wherein the beads are conjugated to antibodies for the CD2 protein (which is found on the membrane of a subset of immune cells and on microparticles that will derive therefrom). Following a process of allowing CD2-positive microparticles (ie microparticles with a high concentration of CD2 protein on their surface) to bind to the anti-CD2 beads, a magnet is used to collect the beads and the microparticles bound thereto (thus performing a measurement of and selection for CD2 protein comprised on the beads). The supernatant (comprising CD2-negative microparticles) fraction is then diluted and partitioned into partitions, and the nucleic acid content (i.e. fragments of genomic DNA) comprised within each partition is then digested with a 5-methylcytosinse-sensitive restriction enzyme (HpaII, which digests at unmethylated CCGG DNA sites but which is inhibited by cytosine methylation), to thus enrich for fragments of genomic DNA which are unmethylated at CCGG sites (thus performing a measurement of 5-methylcytosine-modified DNA). The resulting un-digested, non-methylated-enriched DNA fragments are then appended to a partition-associated barcode, and then barcoded nucleic acids across several partitions are pooled and sequenced.

[1435] At the bottom left of the figure is shown sequences of fragments of genomic DNA within a representative microparticle partition, as produced by a method of appending barcoded oligonucleotides, and taken from the CD2-negative pool following depletion of unmethylated DNA fragments by HpaII digestion. Shown is the density of sequence reads across all chromosomes in the human genome within 2 Megabase (Mb) sliding windows tiled across each chromosome. At right is a plot of the percentage of sequence reads containing CCGG sequences, within 4 control (undigested) libraries and 4 HpaII-digested libraries (enriched for methylated CCGG DNA). As expected, the digested libraries exhibit a small but clear depletion of CCGG sequences fractionally within the library, which will correspond to the molecular depletion of unmethylated CCGG-containing fragments in the HpaII samples, thus showing that the methods are cumulatively able to measure polypeptides, and fragments of genomic DNA, and modified DNA nucleotides, from microparticles.

[1436] In FIG. 35, at the top of the figure is shown the schematic of an experimental method wherein a sample of microparticles is generated and then incubated with a solution of beads, wherein the beads are conjugated to antibodies for the CD2 protein (which is found on the membrane of a subset of immune cells and on microparticles that will derive therefrom). Following a process of allowing CD2-positive microparticles (ie microparticles with a high concentration of CD2 protein on their surface) to bind to the anti-CD2 beads, a magnet is used to collect the beads and the microparticles bound thereto (thus performing a measurement of and selection for CD2 protein comprised on the beads). The supernatant (comprising CD2-negative microparticles) fraction is then incubated with a solution of barcoded affinity probes comprising an antibody against PMCA (Plasma membrane calcium ATPase) protein and a barcoded oligonucleotide. The resulting barcoded affinity probe-bound microparticles are then pelleted by a centrifugation step and washed with PBS to remove unbound barcoded affinity probes. The resulting barcoded affinity probe-bound microparticles are then resuspended in PBS and diluted and partitioned into partitions, and the nucleic acid content (i.e. fragments of genomic DNA and sequences from barcoded affinity probes) comprised within each partition is then appended to a partition-associated barcode, and then barcoded nucleic acids across several partitions are pooled and sequenced.

[1437] At the bottom left of the figure is shown sequences of fragments of genomic DNA within a representative microparticle partition, as produced by a method of appending barcoded oligonucleotides, and taken from the CD2-negative pool and then incorporating measurement of PMCA with barcoded affinity probes. Shown is the density of sequence reads across all chromosomes in the human genome within 2 Megabase (Mb) sliding windows tiled across each chromosome. At right is shown the number of sequence reads in each of 4 control samples (without barcoded affinity probe labelling) and 2 samples (i.e. microparticle partitions) following a process of labelling with PMCA-targeted barcoded affinity probes. No sequence reads from the barcoded affinity probe are found in the control samples, but large quantitative amounts of sequences from the barcoded affinity probe are observed in each of the positive samples. Cumulatively these results illustrate that the methods are able to measure multiple polypeptides (including via use of barcoded affinity probes) and fragments of genomic DNA from microparticles.

[1438] Various publications are cited herein, the disclosures of which are incorporated by reference in their entireties.

Reagents and Methods for the Analysis of Microparticles

Inventors

Cpc classification

Classification Explorer

C12Q1/6804

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2600/178

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2565/543

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/161

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2563/179

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2535/122

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2565/543

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/155

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2563/149

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2600/118

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2535/122

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/1065

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2563/179

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6809

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2537/159

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2537/159

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2563/149

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/155

CHEMISTRY; METALLURGY

Classification Explorer

C12Q2525/161

CHEMISTRY; METALLURGY

International classification

Classification Explorer

C12Q1/6806

CHEMISTRY; METALLURGY

Classification Explorer

C12N15/10

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6804

CHEMISTRY; METALLURGY

Classification Explorer

C12Q1/6809

CHEMISTRY; METALLURGY

Abstract