NUCLEIC ACID SAMPLE ENRICHMENT AND SCREENING METHODS
20230193247 · 2023-06-22
Assignee
Inventors
- Clement Chu (South San Francisco, CA)
- Mark Theilmann (South San Francisco, CA, US)
- Noah Welker (South San Francisco, CA, US)
- Peter Grauman (South San Francisco, CA, US)
Cpc classification
C12N15/1065
CHEMISTRY; METALLURGY
C12Q2525/161
CHEMISTRY; METALLURGY
C12Q2537/165
CHEMISTRY; METALLURGY
C12N15/1065
CHEMISTRY; METALLURGY
C12Q2565/514
CHEMISTRY; METALLURGY
C12Q2537/165
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
C12Q2565/514
CHEMISTRY; METALLURGY
C12Q1/6806
CHEMISTRY; METALLURGY
International classification
Abstract
Described herein are methods for enriching test samples for target nucleic acid molecules for further genetic screening. Methods may comprise isolating nucleic acid from test subjects, preparing nucleic acid libraries wherein the nucleic acid molecules are tagged or barcoded to identify sample of origin, determining fragment size distribution, determining abundance of a target nucleic acid population, calculating numerical offset values to determine amount of libraries to add for fragment size selection, performing fragment size selection, and performing a diagnostic assay on a sample enriched for a target nucleic acid.
Claims
1. A method of enhancing the sensitivity and resolution of genetic diagnostic assays of pooled nucleic acid samples comprising the steps of: a. isolating and purifying nucleic acid from a plurality of test subjects to generate corresponding samples of origin to generate at least one sample of origin; b. preparing a library for each test subject wherein the nucleic acid fragments are barcoded and wherein each library corresponds to a specific sample of origin; c. adding a first number of nucleic acid units from each sample of origin to form a first pooled test sample; d. determining the fragment size distribution within each sample of origin; e. determining the abundance of a target nucleic acid population in each sample of origin; f. calculating a unique numerical offset value for each sample of origin; g. adding a second number of nucleic acid units from each sample of origin based on the unique numerical offset value to form a second pooled test sample; and h. performing fragment size selection on the second pooled test sample and isolating the target nucleic acid population in suspension to form a third pooled test sample enriched for said target nucleic acid population, wherein said third pooled test sample is ready for diagnostic assay.
2. The method of claim 1, further comprising the step of sequencing said third pooled test sample and screening the target nucleic acid population for genetic anomalies.
3. The method of claim 1 wherein said fragment size distribution is determined by sequencing.
4. The method of claim 3 wherein said sequencing is paired-end sequencing.
5. The method of claim 1 wherein said fragment size distribution is determined by fluorescence correlation spectroscopy.
6. The method of claim 1, further comprising the step of pairing the nucleic acid fragments in the third pooled test sample with the respective sample of origin.
7. The method of claim 1, wherein said nucleic acid is genomic DNA.
8. The method of claim 1, wherein said nucleic acid is FFPE DNA.
9. The method of claim 1, wherein said nucleic acid is RNA.
10. The method of claim 1, wherein said nucleic acid is cell-free DNA.
11. The method of claim 1, wherein said nucleic acid is isolated from whole blood.
12. The method of claim 1, wherein said unique numerical offset value is calculated by dividing the abundance of the target nucleic acid population determined in step c by the first number of nucleic acid units.
13. The method of claim 12, wherein said target nucleic acid population is a fetal fraction of said cell-free DNA.
14. The method of claim 12, wherein said target nucleic acid population is the tumor fraction of said cell-free DNA.
15. The method of claim 12, wherein said target nucleic acid population are fragments of nucleic acid comprising a particular methylation signature.
16. The method of claim 15, wherein said methylation signature is hypermethylation or hypomethylation.
17. The method of claim 1, wherein said target nucleic acid population is enriched for fragments within a predetermined length range.
18. The method of claim 1, wherein said target nucleic acid population is enriched for fragments of a predetermined length.
19. The method of claim 1, wherein said target nucleic acid population is enriched for fragments comprising a particular methylation signature.
20. The method of claim 19, wherein said methylation signature is hypermethylation or hypomethylation.
21. The method of claim 1, wherein said fragment size selection is performed using gel electrophoresis.
22. The method of claim 1, wherein said first and second number of nucleic acid units is selected from the group consisting of microliters, nanograms, and moles.
23. The method of claim 1, further comprising the step of performing whole genome sequencing.
24. The method of claim 1 wherein said pooled test sample comprises between 2 and 1000 different samples.
25. A method of enhancing the sensitivity and resolution of genetic diagnostic assays of pooled nucleic acid samples comprising the steps of: a. isolating and purifying nucleic acid from a plurality of test subjects to generate corresponding samples of origin; b. preparing a library for each test subject wherein the nucleic acid fragments are barcoded and wherein each library corresponds to a specific sample of origin; c. adding a first number of nucleic acid units from each sample of origin to form a first pooled test sample; d. performing fragment size selection on the first pooled test sample and isolating the target nucleic acid population in suspension to form a second pooled test sample enriched for said target nucleic acid population; e. determining the abundance of a target nucleic acid population in each sample of origin; f. calculating a unique numerical offset value for each sample of origin; g. adding a second number of nucleic acid units from each sample or origin based on the unique numerical offset value to form a third pooled test sample enriched for said target nucleic acid population; h. performing a second fragment size selection on the third pooled test sample and isolating the target nucleic acid population in suspension to form a fourth pooled test sample enriched for said target nucleic acid population and comprising substantially equal proportions from each said sample of origin, wherein said fourth pooled test sample is ready for diagnostic assay.
26. The method of claim 25 wherein step f is performed by sequencing.
27. The method of claim 25 wherein said sequencing is paired-end sequencing.
28. The method of claim 25 wherein step f is performed by quantitative PCR.
29. The method of claim 25 wherein step f is performed by digital PCR.
30. The method of claim 29 wherein said digital PCR is droplet digital PCR.
31. The method of claim 25, further comprising the step of sequencing said fourth pooled test sample and screening the target nucleic acid population for genetic anomalies.
32. The method of claim 25, further comprising the step of pairing the nucleic acid fragments in the fourth pooled test sample with the respective sample of origin.
33. The method of claim 25, wherein said nucleic acid is genomic DNA.
34. The method of claim 25, wherein said nucleic acid is FFPE DNA.
35. The method of claim 25, wherein said nucleic acid is RNA.
36. The method of claim 25, wherein said nucleic acid is cell-free DNA.
37. The method of claim 25, wherein said nucleic acid is isolated from whole blood.
38. The method of claim 25, wherein said unique numerical offset value is calculated by dividing the abundance of the target nucleic acid population determined in step e by the first number of nucleic acid units.
39. The method of claim 25, wherein said target nucleic acid population is a fetal fraction of said cell-free DNA.
40. The method of claim 25, wherein said target nucleic acid population is the tumor fraction of said cell-free DNA.
41. The method of claim 25, wherein said target nucleic acid population are fragments comprising a particular methylation signature.
42. The method of claim 25, wherein said methylation signature is hypermethylation or hypomethylation.
43. The method of claim 25, wherein said target nucleic acid population is enriched for nucleic acid fragments within a predetermined length range.
44. The method of claim 25, wherein said target nucleic acid population is enriched for fragments of a predetermined length.
45. The method of claim 25, wherein said target nucleic acid population is enriched for fragments comprising a particular methylation signature.
46. The method of claim 25, wherein said methylation signature is hypermethylation or hypomethylation.
47. The method of claim 25, wherein said fragment size selection is performed using gel electrophoresis.
48. The method of claim 25, wherein said first and second number of nucleic acid units is selected from the group consisting of microliters, nanograms, and moles.
49. The method of claim 25, further comprising the step of performing whole genome sequencing.
50. The method of claim 25 wherein said pooled test sample comprises between 2 and 1000 different samples.
51. A method of enhancing the sensitivity and resolution of genetic diagnostic assays comprising the steps of: a. isolating and purifying nucleic acid from at least one test subject to generate at least one sample of origin; b. preparing a nucleic acid library for said at least one test subject wherein the nucleic acid fragments are barcoded and wherein said nucleic acid library corresponds to said at least one sample of origin; c. adding a first number of nucleic acid units from said nucleic acid library to form a first test sample; d. determining the fragment size distribution within said nucleic acid library; e. calculating the abundance of a target nucleic acid population in said nucleic acid library; f. calculating a unique numerical offset value for said nucleic acid library; g. adding a second number of nucleic acid units from said nucleic acid library based on the unique numerical offset value to form a second test sample; and h. performing fragment size selection on the second test sample and isolating the target nucleic acid population in suspension to form a third test sample enriched for said target nucleic acid population, wherein said third test sample is ready for diagnostic assay.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0103] Representative embodiments of the invention are disclosed in more detail with reference to the following figures.
[0104]
[0105]
[0106]
[0107]
[0108]
[0109]
[0110]
[0111]
[0112]
[0113]
[0114]
[0115]
[0116]
DETAILED DESCRIPTION
[0117] While various embodiments of the invention have been shown and described herein, it will be obvious to those of skill in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. Alternatives to the embodiments of the invention described herein may be employed. Where values are described as ranges, it will be understood that such disclosure includes the disclosure of all possible sub-ranges within such ranges, as well as specific numerical values that fall within such ranges irrespective of whether a specific numerical value or specific sub-range is expressly stated.
[0118] Unless a term is expressly defined in this patent using the sentence “[a]s used herein, the term ‘______’, generally refers to ...” or a similar sentence, there is no intent to limit the meaning of that term, either expressly or by implication, beyond its plain or ordinary meaning, and such term should not be interpreted to be limited in scope based on any statement made in any section of this patent (other than the language of the claims). To the extent that any term recited in the claims at the end of this patent is referred to in this patent in a manner consistent with a single meaning, that is done for sake of clarity only so as to not confuse the reader, and it is not intended that such claim term be limited, by implication or otherwise, to that single meaning. Finally, unless a claim element is defined by reciting the word “means” and a function without the recital of any structure, it is not intended that the scope of any claim element be interpreted based on the application of 35 U.S.C. § 112, sixth paragraph.
[0119] Unless defined otherwise, all technical and scientific terms used herein generally have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Generally, the nomenclature used herein and the laboratory procedures in cell culture, molecular genetics, organic chemistry, and nucleic acid chemistry and hybridization described below are those well-known and commonly employed in the art. The techniques and procedures are generally performed according to conventional methods in the art and various general references (see generally, Sambrook et al. MOLECULAR CLONING: A LABORATORY MANUAL, 2d ed. (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., which is incorporated herein by reference), which are provided throughout this document. The nomenclature used herein and the laboratory procedures in analytical chemistry, and organic synthetic described below are those well-known and commonly employed in the art.
[0120] As used herein, the singular forms “a,” “an,” and “the” include the plural reference unless the context clearly dictates otherwise.
[0121] As used herein, the terms “about” or “approximately,” generally refer to within an acceptable error range for a value as determined by those skilled in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the relevant field. Alternatively, “about” can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value.
[0122] As used herein, the term “subject”, generally refers to an animal, such as a mammal (e.g., human) or avian (e.g., bird), or other organism, such as a plant. For example, the subject can be a vertebrate, a mammal, a rodent (e.g., a mouse), a primate, a simian or a human. Animals may include, but are not limited to, farm animals, sport animals, and pets. A subject can be a healthy or asymptomatic individual, an individual that has or is suspected of having a disease (e.g., cancer) or a pre - disposition to the disease, and/or an individual that is in need of therapy or suspected of needing therapy. A subject can be a patient. A subject can be a microorganism or microbe (e.g., bacteria, fungi, archaea, viruses).
[0123] As used herein, the term “genome,” generally refers to genomic information from a subject, which may be, for example, at least a portion or an entirety of a subject’s hereditary information. A genome can be encoded either in DNA or in RNA. A genome can comprise coding regions (e.g., that code for proteins) as well as non-coding regions. A genome can include the sequence of all chromosomes together in an organism. For example, the human genome ordinarily has a total of 46 chromosomes. The sequence of all of these together may constitute a human genome.
[0124] As used herein, the terms “polynucleotide”, “nucleotide”, “nucleotide sequence”, “nucleic acid” and “oligonucleotide” are used interchangeably and generally refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, intergenic DNA, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), shorthairpin RNA (shRNA), micro-RNA (miRNA), small nucleolar RNA, ribozymes, cDNA, FFPE DNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, adapters, and primers. A polynucleotide may comprise modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component, tag, reactive moiety, or binding partner. Polynucleotide sequences, when provided, are listed in the 5′ to 3′ direction, unless stated otherwise.
[0125] As used herein, the term “gene” generally refers to a DNA segment that is involved in producing a polypeptide and includes regions preceding and following the coding regions as well as intervening sequences (introns) between individual coding segments (exons).
[0126] As used herein, the term “base pair” or “bp” generally refers to a partnership (i.e., hydrogen bonded pairing) of adenine (A) with thymine (T), or of cytosine (C) with guanine (G) in a double stranded DNA molecule. In some embodiments, a base pair may include A paired with Uracil (U), for example, in a DNA/RNA duplex.
[0127] As used herein, the term “barcode” generally refers to a known nucleic acid sequence that allows some feature of a polynucleotide with which the barcode is associated to be identified. In some embodiments, the feature of the polynucleotide to be identified is the sample from which the polynucleotide is derived. In some embodiments, barcodes are about or at least about 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more nucleotides in length. In some embodiments, barcodes are shorter than 10, 9, 8, 7, 6, 5, or 4 nucleotides in length. In some embodiments, barcodes associated with some polynucleotides are of different lengths than barcodes associated with other polynucleotides. In general, barcodes are of sufficient length and comprise sequences that are sufficiently different to allow the identification of samples based on barcodes with which they are associated. In some embodiments, a barcode, and the sample source with which it is associated, can be identified accurately after the mutation, insertion, or deletion of one or more nucleotides in the barcode sequence such as the mutation, insertion, or deletion of 1, 2, 3, 4, 5, 6. 7, 8, 9, 10, or more nucleotides. In some embodiments, each barcode in a plurality of barcodes differ from every other barcode in the plurality at at least three nucleotide positions, such as at least 3, 4, 5, 6, 7, 8, 9, 10, or more nucleotide positions. A plurality of barcodes may be represented in a pool of samples, each sample comprising polynucleotides comprising one or more barcodes that differ from the barcodes contained in the polynucleotides derived from the other samples in the pool. Samples of polynucleotides comprising one or more barcodes can be pooled based on the barcode sequences to which they are joined, such that all four of the nucleotide bases A, G, C, and T are approximately evenly represented at one or more positions along each barcode in the pool (such as at 1, 2, 3, 4, 5, 6, 7, 8, or more positions, or all positions of the barcode). In some embodiments, the methods of the invention further comprise identifying the sample from which a target polynucleotide is derived based on a barcode sequence to which the target polynucleotide is joined. In general, a barcode comprises a nucleic acid sequence that when joined to a target polynucleotide serves as an identifier of the sample from which the target polynucleotide was derived. In some embodiments, separate amplification reactions are carried out for separate samples using amplification primers comprising at least one different barcode sequence for each sample, such that no barcode sequence is joined to the target polynucleotides of more than one sample in a pool of two or more samples. In some embodiments, amplified polynucleotides derived from different samples and comprising different barcodes are pooled before proceeding with subsequent manipulation of the polynucleotides (such as before amplification and/or sequencing on a solid support). Pools can comprise any fraction of the total constituent amplification reactions, including whole reaction volumes. Samples can be pooled evenly or unevenly. In some embodiments, target polynucleotides are pooled based on the barcodes to which they are joined. Pools may comprise polynucleotides derived from about, less than about, or more than about 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 25, 30, 40, 50, 75, 100, or more different samples.
[0128] As used herein, the term “sequencing,” generally refers to methods and technologies for determining the sequence of nucleotide bases in one or more polynucleotides. The polynucleotides can be, for example, nucleic acid molecules such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA). Sequencing can be performed by various systems currently available, such as, without limitation, a sequencing system by Illumina®, Pacific Biosciences (PacBio®), Oxford Nanopore®, or Life Technologies (Ion Torrent®). Such systems may provide a plurality of raw genetic data corresponding to the genetic information of a subject (e.g., human), as generated by the systems from a sample provided by the subject. In some examples, such systems provide sequencing reads (also “reads” herein). A read may include a string of nucleic acid bases corresponding to a sequence of a nucleic acid molecule that has been sequenced.
[0129] As used herein, the term “Next Generation Sequencing (NGS)” generally refers to sequencing methods that allow for massively parallel sequencing of clonally amplified and of single nucleic acid molecules during which a plurality, e.g., millions, of nucleic acid fragments from a single sample or from multiple different samples are sequenced in unison. Non-limiting examples of NGS include sequencing-by-synthesis, sequencing-by-ligation, real-time sequencing, and nanopore sequencing.
[0130] As used herein, the term “paired end sequencing” generally refers to a method based on high throughput sequencing that generates sequencing data from both ends of a nucleic acid molecule. The method generally involves sequencing from ends of a nucleic acid sequence toward the interior. Paired end sequencing is useful for determining the length of the segment of DNA that falls between two sequences.
[0131] As used herein the term “whole genome sequencing” refers to determining the complete DNA sequence of the genome at one time. As used herein, a “whole genome sequence”, or WGS (also referred to in the art as a “full”, “complete”, or entire” genome sequence), generally refers to encompassing a substantial, but not necessarily complete, genome of a subject. In the art the term “whole genome sequence” or WGS is used to refer to a nearly complete genome of the subject, such as at least 95% complete in some usages. The term “whole genome sequence” or WGS as used herein does not encompass “sequences” employed for gene-specific techniques such as single nucleotide polymorphism (SNP) genotyping, for which typically less than 0.1% of the genome is covered. The term “whole genome sequence”, or WGS as used herein does not require that the genome be aligned with any reference sequence and does not require that variants or other features be annotated.
[0132] The term “fragment size distribution” refers to any one value or a set of values that represents a length, mass, weight, or other measure of the size of molecules corresponding to a particular group (e.g. nucleic acid fragments from a particular chromosomal region). Various embodiments can use a variety of size distributions. In some embodiments, a size distribution relates to the rankings of the sizes (e.g., an average, median, or mean) of fragments of one chromosome relative to fragments of other chromosomes. In other embodiments, a size distribution can relate to a statistical value of the actual sizes of the fragments of a chromosome. In one implementation, a statistical value can include any average, mean, or median size of fragments of a chromosome. In another implementation, a statistical value can include a total length of fragments below a cutoff value, which may be divided by a total length of all fragments, or at least fragments below a larger cutoff value.
[0133] As used herein, the term “library” or “sequencing library” generally refers to a nucleic acid (e.g., DNA or RNA) that is processed for sequencing, e.g., using massively parallel methods, e.g., NGS. The nucleic acid may optionally be amplified to obtain a population of multiple copies of processed nucleic acid, which can be sequenced by NGS or other suitable technique.
[0134] As used herein, “fraction multiplier technology” or “FX technology” or “FX protocol” generally refers to the methods described herein to increase the yield of the target nucleic acid fraction (e.g., cffDNA) thereby increasing sensitivity for detection of anomalies, such as, for example fetal anomalies arising from copy-number changes of any size across the genome. Embodiments of methods utilizing FX protocol are described in greater detail below. In some embodiments, FX protocol leverages the reduced size of target nucleic acid molecules to increase the relative abundance of the target nucleic acid fraction. In such instances, the methods may be referred to as “fetal fraction amplification” or “FFA”.
[0135] With reference to
[0136] The following description involves performance of the methods on pooled samples; however, it should be noted that the processes described herein are equally applicable to a single sample and/or multiplexed samples. For example, multiplex PCR reactions can be employed to enrich for target nucleic acid. In one multiplexing embodiment, PCR primers can be designed to barcodes and a PCR reaction can be run to amplify sequences comprising the barcodes. In another embodiment, the original samples are combined to a desired mixture ratio to generate a first pooled test sample. In one embodiment, predetermined quantities of original samples are added to produce the first pooled test sample such that the units of each sample are substantially equivalent (e.g., 1:1:1:1, etc.) — an equal mixture ratio. “Units” can be defined as any appropriate unit of measurement, such as, for example nanograms (ng), microliters (.Math.l), or moles (mol).
[0137] In yet another embodiment, the distribution of nucleic acid fragments bearing a specific characteristic (e.g., fragment size, molecular weight, methylation state) for the first pooled test sample is assayed. For example, in the case of cfDNA or FFPE DNA, paired-end sequencing can be used to deduce the fragment size distribution of each original sample within the first pooled test sample. Paired-end sequencing is known in the art and was originally described in Smith, M. W. et al. (1994). Genomic sequence sampling: a strategy for high resolution sequence-based physical mapping of complex genomes. Nature Genetics. 7: 40-47. Paired-end sequencing obtains information for both ends of each DNA molecule. By finding the coordinates of the 2 sequences on the genome through sequence alignment, one can deduce the length of the DNA fragment. A single sequencing experiment yields sequence and size information for millions - billions of DNA fragments. Sequencing can be performed by various systems currently available, such as, without limitation, a sequencing system by Illumina®, Pacific Biosciences (PacBio®), Oxford Nanopore®, or Life Technologies (Ion Torrent®). It should be understood that other suitable techniques are available to determine fragment size distribution. For example, fluorescence correlation spectroscopy as described in Jiang, J., et al. (2018). Analysis of the concentrations and size distributions of cell-free DNA in schizophrenia using fluorescence correlation spectroscopy. Translational Psychiatry. 8:104.
[0138] In one embodiment, once the fragment size distribution within the first pooled test sample is determined, then the sample specific relative quantities (i.e., in units of choice) of DNA fragments within a target fragment size ranges (e.g., 100 to 165bp) within length bins or, alternatively, DNA fragments of specific target sizes (e.g., 165bp) may be calculated for each original sample. Sample specific relative quantities can be calculated using amplification procedures, such as polymerase chain reaction (PCR), quantitative PCR (qPCR), droplet digital PCR (ddPCR), and isothermal amplification. In another embodiment, using the sample specific relative quantities, a numerical offset value can be calculated by determining the ratio of sample specific relative quantity (units) / total units in first pooled test sample. In yet another embodiment, the numerical offset value is used to calculate the weighted number of units (e.g., .Math.l, ng, etc.) from the sample specific libraries to be added to a second pooled test sample for fragment size selection.
[0139] In one embodiment, the weighted number of units (e.g., based on mass, volume, etc.) of each sample of origin are mixed together to generate the second pooled test sample. In some embodiments, the weighted number of units are mixed such that the units are in substantially equal proportions. In some embodiments, unequal amounts (predetermined sample specific units) of one or more samples of origin are mixed together depending on the assay being performed. Once the second pooled test sample is generated, fragment size selection can be performed to select for the desired fragment lengths. In some embodiments, gel electrophoresis can be used to isolate, excise, and purify the desired nucleic acid size fraction and generate a third pooled test sample containing nucleic acid fragments within the target size range or of specific target size. In one embodiment, nucleic acid electrophoretic separation followed by the recovery of the desired fragment lengths is used. Various known electrophoretic processes may be used for this purpose, but in one embodiment, the NIMBUS Select™ workstation with Ranger Technology™ for high throughput nucleic acid size selection may be used.
[0140] It should be understood that alternative techniques for fragment selection may be used, for example, bisulfite conversion techniques followed by on-column purification for methylated DNA; methylated DNA immunoprecipitation (based on nucleic acid methylation relative to other nucleic acid); solid support capture (e.g., affinity column), such as an antibody-coated spin column; synchronous (or non-synchronous) coefficient of drag alteration sizing (SCODlI); solid phase reversible immobilization sizing (e.g., using carboxylated magnetic beads); affinity chromatography processes, or combinations of PCR amplification with varied lengths of amplicons and microchip separation.
[0141] In another embodiment, following preparation of the third pooled test sample comprising a nucleic acid mixture enriched for the target nucleic acid fragments of a specific size or size range, as well as containing the predetermined proportions (equal or variable proportions) from each of the original samples, the third pooled test sample is sequenced by next generation sequencing (NGS) or the like. Once the third pooled test sample is sequenced, barcodes may be deconvoluted via available software programs on the market to pair reads to sample of origin based on barcode sequences as described in greater detail above. Following read pairing to sample of origin, the sequencing reads can then be analyzed as desired depending on the screening being performed.
[0142] In an alternative embodiment, the numerical offset value is determined by performing fragment size selection on the first pooled sample omitting the fragment size distribution step described above via paired end sequencing, for example. In this embodiment, a predetermined amount of the first pooled test sample is used to isolate, extract, and generate a second pooled test sample containing nucleic acid fragments within the target size range or of a specific target size. In one embodiment, electrophoretic separation of nucleic acid followed by recovery of the desired lengths of nucleic acid fragments is used, as described above. In some embodiments, following recovery, the second pooled test sample is sequenced and the relative abundance of target fragments within each sample of origin may be inferred based on the number of reads assigned to a specific sample within sample specific bins. then the sample specific relative quantities (i.e., in units of choice) of DNA fragments within a target fragment size ranges (e.g., 100 to 165bp) within length bins or, alternatively, DNA fragments of specific target sizes (e.g., 165bp) may be calculated for each original sample. Sample specific relative quantities can be calculated using amplification procedures, such as polymerase chain reaction (PCR), quantitative PCR (qPCR), droplet digital PCR (ddPCR), and isothermal amplification. In another embodiment, using the sample specific relative quantities, a numerical offset value can be calculated by determining the ratio of sample specific relative quantity (units) / total units in first pooled test sample. In yet another embodiment, the numerical offset value is used to calculate the weighted number of units (e.g., .Math.l, ng, etc.) from the sample specific libraries to be added to a second pooled test sample for fragment size selection.In this embodiment, a numerical offset value is determined based on the relative abundances observed from the sample specific sequencing reads. Aliquots from each sample of origin, adjusted based on the numerical offset value, are pooled to generate a third pooled test sample, and the fragment size selection/sequencing steps repeated. Ideally, equal relative abundances (based on molar concentrations, molecular weights, etc.) of the target fragments are present in the third pooled test sample, which is also enriched for the target fragments. In some embodiments, a second fragment size selection on the third pooled test sample can be performed using conventional techniques and the target nucleic acid population is isolated in suspension to form a fourth pooled test sample enriched for said target nucleic acid population and comprising substantially equal proportions from each said sample of origin. In some embodiments, the third pooled test sample and/or the fourth pooled test sample is sequenced to screen the target nucleic acid population for genetic abnormalities.
[0143] In one embodiment, FX protocol can be combined with whole-genome sequencing (WGS) based NIPS. For example, WGS-based NIPS (without FX protocol) has been configured to identify novel microdeletions anywhere in the genome, but its sensitivity and resolution is limited to microdeletions exceeding 7Mb in length. Since many microdeletions span <7Mb, increasing sensitivity for small regions across the genome could have great clinical value. The resolution limit of genome-wide copy number variant detection is driven by the relative amount of signal in a sample (dictated by the relative amount of FF in a sample and the size of the CNV) and the amount of noise present in a sample (dictated by depth to which a sample is sequenced) (e.g., it is more challenging to detect small deletions in samples with low FF). Attempts to increase resolution by deeper sequencing provides diminished returns and quickly yields and economically inviable screening test. Therefore, methods to increase FF (fetal fraction) are preferable under these circumstances; hence, the impact and applicability of FX protocol.
EXAMPLES
[0144] For each example discussed below, all samples were from patients who had consented to deidentified research and received testing with Prequel NIPS. The study was granted an institutional review board (IRB) exemption by Advarra (Pro0042194).
Example 1
[0145] Plasma was separated from a 10 ml whole-blood sample via centrifugation at 1600 g for 10 min. using a two-step centrifugation process and plasma was transferred to microcentrifuge tubes and centrifuged at 16,000 x g for 10 min. The plasma was stored at -80 degree C before DNA extraction. DNA fragments were extracted from 0.6 ml cell-free plasma using the Circulating Nucleic Acid Kit (Qiagen, GE). An Ion Plus Fragment Library Kit (Life Technologies, USA) for the Ion Proton Platform was used to construct sequencing libraries for each plasma sample and the libraries quantified on a Qubit Fluorometer - each sample specific library containing substantially the same concentration of total DNA. Sample specific libraries are barcoded for sample of origin identification. Each library contains different amounts of DNA within a specific size range (100 to 165 bp) or a specific target size (165 bp). As shown in Table 1, samples 1-5 (remaining samples not shown) were mixed at a 1:1 ratio (10 ng each) to generate a first pooled test sample (FPTS).
[0146] In some embodiments, an alternative extraction and library preparation protocol involves extraction of the target nucleic acid fraction (e.g., cffDNA) from plasma using silanol-coated magnetic beads (Dynabeads, ThermoFisher) to yield samples at a relatively uniform concentration and fragment size or length (e.g. 165 bp). The target nucleic acid fraction was quantified (PicoGreen, ThermoFisher) and converted into a barcoded next-generation (NGS)-competent sequencing library suitable for Illumina platform using manufacturer’s instructions. Libraries were amplified via 12 rounds of polymerase chain reaction (PCR) (KAPA HiFi HotStart PCR Kit, Roche) before magnetic bead-based PCR cleanup followed by another round of quantification.
[0147] Fragment size distribution of each sample within the FPTS was determined and the relative amount of DNA (ng) within the target size range (100-165bp) was obtained. A 2 .Math.l sample from each sample represented in the FPTS was analyzed for fragment size distribution using a Fragment Analyzer (Advanced Analytical Technologies, Ames Iowa). With continued reference to Table 1, the proportion of DNA for each sample 1-5 within the target size range relative to the total units added (10 ng) to the FPTS is calculated. The values obtained were used to calculate the total amounts of DNA needed from each of the five samples to have equal and predetermined DNA units (1 ng) within the target size range.
[0148] Original library samples were remixed according to the calculation at volumes that would add 1 ng of DNA within the target size range for each sample to generate second pooled test sample (SPTS). SPTS were subjected to fragment size selection procedures using 2% E-Gel EX CloneWell Agarose Gels (Invitrogen, Carlsbad, CA, USA) as in Qiao et al. and Liang et al. A piece of E-Gels contains six effective wells and each well can run a mixed sample which contains five samples of the DNA sequencing library. DNA within target size range was retrieved from the bottom wells on the gel and the selected library (i.e., third pooled test sample (TPTS) was sequenced using an Ion Proton system (Life Technologies). Other sequencing strategies may be used, for example, the Illumina IIiSeq 4000 followed by processing via a custom bioinformatics pipeline.
[0149] Other strategies for fragment size selection include electrophoresis on 2% agarose cassettes (BluePippin, Sage Science) following the manufacturer’s instructions for “range” mode. Short fragments are eluted from the gel until the desired target size of the eluted DNA is, for example, 140 nt. See
TABLE-US-00001 Sample Total DNA units (ng) DNA units within range (ng) Proportion within range Desired target units within range (ng) Total DNA units (ng) required for sample meet desired target 1 10 1 0.1 1 10 2 10 3 0.3 1 3.33 3 10 5 0.5 1 2 4 10 7 0.7 1 1.43 5 10 10 1 1 1
[0150] Barcodes were deconvoluted via available software programs on the market to pair reads to sample of origin based on barcode sequences and then reads were screened for relevant medical condition or chromosomal abnormality, e.g., fetal aneuploidy.
Example 2
[0151] To validate FX protocol analytically, plasma was extracted from 10 ml of whole blood samples (1,264 NIPS patient samples and 66 controls tested on 11 batches) using a two-step centrifugation process and plasma was transferred to microcentrifuge tubes and centrifuged at 16,000 x g for 10 min to remove residual cells and obtain cell free plasma which was stored at -80 degree C before DNA extraction. DNA fragments were extracted from 0.6 ml cell-free plasma using the Circulating Nucleic Acid Kit (Qiagen, GE). An Ion Plus Fragment Library Kit (Life Technologies, USA) for the Ion Proton Platform was used to construct sequencing libraries for each plasma sample and the libraries quantified on a Qubit Fluorometer - each sample specific library containing substantially the same concentration of total DNA. Sample specific libraries are barcoded for sample of origin identification. Each library contains different amounts of DNA within a specific size range (100 to 165 bp) or a specific target size (165 bp). Samples were mixed at a 1:1 ratio (10 ng each) to generate a first pooled test sample (FPTS).
[0152] Each patient sample was processed through two workflows: (1) standard WGS-based NIPS without FX protocol or (2) WGS-based NIPS with FX protocol. FX protocol leverages the reduced size of fetal-derived cfDNA molecules to increase the relative abundance of fetal cfDNA. The workflows were executed completely independently, each beginning with the extraction of cfDNA from replicate plasma aliquots.
[0153] Fragment size distribution of each sample within the FPTS was determined and the relative amount of DNA (ng) within the target size range (100-165bp) was obtained. A 2 .Math.l sample from each sample represented in the FPTS was analyzed for fragmentation size distribution using a Fragment Analyzer (Advanced Analytical Technologies, Ames Iowa). The proportion of DNA for each sample within the target size range relative to the total units added (10 ng) to the FPTS was calculated. The values obtained were used to calculate the total amounts of DNA needed from each of the five samples to have equal and predetermined DNA units (1 ng) within the target size range. Original library samples were remixed according to the calculation at volumes that would add 1 ng of DNA within the target size range for each sample to generate second pooled test sample (SPTS). SPTS were subjected to fragment size selection procedures using in this case E-Gel CloneWell Agarose Gels (Invitrogen, Carlsbad, CA, USA). A piece of E-Gels contains six effective wells and each well can run a mixed sample which contains five samples of the DNA sequencing library. DNA within target size range was retrieved from the bottom wells on the gel and the selected library (i.e., third pooled test sample (TPTS) was sequenced using an Ion Proton system (Life Technologies).
A. FX Protocol Increases Fetal Fraction (FF)
[0154] To directly measure the impact of FX protocol, samples were tested with both the standard - NIPS and FX protocol focusing particularly on the number of samples with FF > 4%. According to the American College of Medical Genetics and Genomics (ACMG), the threshold for low FF is less than 4%. As shown in
[0155] To confirm that FX protocol did not artifactually increase FF by corrupting our FFinference regression model, we verified that the density of reads from chrY in pregnancies with male fetuses rose commensurately. See
[0156] Sample-level changes in FF resulting from the FX protocol to determine whether the upward shift in the overall FF distribution may obscure downward-shifting FF in a subset of samples. See
B. FX Protocol Increases NIPS Sensitivity
[0157] In the same manner that fetal fraction (FF) can be directly measured in male-fetus pregnancies from the relative NGS depth of chrX and chrY, it is possible to measure FF of aneuploid samples via the relative NGS depth on the aneuploid chromosome (FF .sub.positive /
[0158] In every positive sample tested — across common aneuploidies (e.g., sex chromosome aneuploidy (SCA)), rare autosomal aneuploidies (RAA), and microdeletions, FX protocol yielded an increase in FF (
[0159] A sample that screened negative for the 5p microdeletion with standard NIPS but positive with FX protocol (
[0160] To quantify the gain in sensitivity and specificity achievable with FX protocol, we analyzed the relationship between various clinical and technical metrics, such as z-scores, depth, incidence, and FF (see Materials and methods”). The ROC curves (
TABLE-US-00002 Analytical Sensitivity Analytical Specificity Common Aneuploidies (aggregate) 99.988% ± 0.004% 99.968% ± 0.005% T21 99.990% ± 0.005% 99.996% ± 0.001% T18 99.990% ± 0.002% 99.996% ± 0.001% T13 99.978% ± 0.005% 99.976% ± 0.005% RAAs (aggregate) 99.695% ± 0.305% 99.981% ± 0.010% Microdeletions (aggregate) 97.172% ± 0.054% 99.767% ± 0.012% DiGeorge syndrome (22q11.2) 95.633% ± 0.071% 99.949% ± 0.005%
[0161] In addition to assessing performance with the ROC analysis above, we also observed that all samples with a confirmed aneuploidy or microdeletion were correctly identified with FX protocol (Table 3).
TABLE-US-00003 Prequel with FFA had perfect concordance relative to orthogonally confirmed outcome. Sixty-seven samples had confirmed outcome via diagnostic procedure or assessment at birth. One confirmed negative sample was included in the analysis and contributes to the specificity calculation. 95% confidence intervals (CI) were calculated with the Jeffreys-interval approach Aneuploidy # confirmed positive FFA clinical sensitivity FFA clinical specificity T21 24 100% (CI: 90.2% -100%) 100% (CI: 94.5% -100%) T18 24 100% (CI: 90.2% -100%) 100% (CI: 94.5% -100%) T13 10 100% (CI: 78.3% -100%) 100% (CI: 95.8% -100%) Microdclction 2 100% (CI: 33.3% -100%) 100% (CI: 96.3% -100%) SCA 7 100% (CI: 70.8% -100%) 100% (CI: 96.0% -100%)
[0162] Moreover, the results were repeatable and reproducible within and across batches, respectively (Tables 4 and 5). Together, these experiments establish the analytical validity of FX protocol.
TABLE-US-00004 Intra-run repeatability. Six screen-positive samples (two each for T13, T18, and T21) and 50 screen-negative samples were tested in duplicate on a single flow cell. The observed percent agreement was 100% Replicate 1 Negative Positive 3 Negative 50 0 Positive 0 6
TABLE-US-00005 Inter-run reproducibility. Nine samples that were screen-positive for common aneuploidies (three each for T13, T18, and T21) and 66 screen-negative samples were tested in duplicate on separate flowcells. The observed percent agreement was 100% Replicate 1 Negative Positive Replicate 2 Negative 66 0 Positive 0 9
[0163] Finally, the number of false negative (FN) results per sample screened with standard NIPS or Prequel with Fx protocol (labeled FFA here) were estimated The false negative rate (FNR) is calculated as (1 - sensitivity), where sensitivity is the analytical sensitivity estimated from the ROC analysis. The number of FN per sample screened is the product of the FNR and the prevalence. Prevalence numbers can range based on age and other factors, thus prevalence values are approximate, expressed as 1 in x, where x is rounded to nearest hundred (for common aneuploidies and RAAs) or the nearest 1000 (for common microdeletions and 22q11.2). The five common microdeletions are 1p, 4p, 5p, 15q11, and 22q11.2. The rates of FNR and FN / sample screened are much lower when FX protocol is used to enhance the fetal fraction. See Table 6 below.
TABLE-US-00006 Standard NTPS Prequel w/ FFA Approximate Prevalence FNR Approximate FN / sample screened FNR Approximate FN / sample screened Common aneuploidies 1 in 100 1 in 183 1 in 18,300 1 in 8,333 1 in 833,300 RAAs 1 in 200 1 in 7 1 in 1,400 1 in 344 1 in 66,800 Common microdeletions 1 in 1,000 1 in 4 1 in 4,000 1 in 36 1 in 36,000 DiGeorge Syndrome (22q11.2) 1 in 2,000 1 in 3 1 in 6,000 1 in 22.9 1 in 45,800
C. FX Protocol Increases Sex-Calling Accuracy
[0164] Sex miscalls in NIPS arise from limitations that are either biological (e.g., true fetal mosaicism, vanishing twin) or technical (e.g., low FF). While the former poses inherent challenges (many sex miscalls occur at FF far greater than 4%), the latter can be mitigated by FX protocol due to its ability to increase the FF of all samples and thereby remove borderline calls.
Example 3
[0165] A goal of many next generation sequencing (NGS) based tests is to consolidate samples prior to sequencing in equal amounts. Ideally all samples would receive the exact number of reads they require to maintain test performance. In many cases, for consistency, this number of reads would be equal across all samples and the coefficient of variation (CV) for mapped reads would be 0. However, due to errors in the process (i.e. liquid handling, quantification, etc.) this is often not the case. This is even further exacerbated when pooled samples are size selected to isolate only a particular size range of nucleic acid.
[0166] In this experiment, ~120 NGS libraries we consolidated (or pooled) in equimolar concentrations and a gel-based size selection to isolate fragments between 200-250 base pairs was performed by gel electrophoresis. The pooled NGS libraries were then sequenced on an Illumina sequencer and downstream analyses to determine the number of reads that mapped to the genome for every sample was performed. Referring to
[0167] The same equimolar pool was then sequenced on a small-scale Illumina sequencing platform and paired end data was used to determine the fragment length distribution for every sample in the pool. The relative amount of DNA that was present within the 200-250bp size range was determined which was used to create “factors” that can be applied to the original quantification value, named here “in silico factors.” Using these updated quantification values, the NGS libraries were re-consolidated such that each library contained an equimolar amount of DNA within the 200-250 bp size range. This re-consolidated pool was then subjected to the same 200-250 bp gel-based size selection, Illumina sequencing, and analyses as outlined above. The distribution of mean centered mapped reads is indicated in the right-hand boxplot. Since samples were now pooled based on the number of molecules present in the 200-250bp size bin, the CV of mapped reads for this consolidated pool is much lower at 0.03.
[0168] Lower CV (i.e. tighter distribution) of mapped reads allows for lower costs and/or fewer failed samples, since many assays that utilize NGS have a minimum read threshold required for analysis. Libraries with wide distributions of mapped reads often have samples that do not receive a sufficient number of reads. A mitigation strategy for this is combining fewer samples together in a consolidated pool, but this comes with the trade-off of increased cost. This workflow allows for size selection of a pool of DNA samples while still maintaining a tight distribution of mapped reads (low CV).”
[0169] Here we validated and characterized the performance of a NIPS that applies FX protocol to every sample. For 99.8% of samples tested, FF increased with FX protocol, with the average gain being 2.3-fold. Low-FF samples received the largest FF scaling, and of 2401 samples tested, 3.7% had low FF before FX protocol, but none had low FF after FX protocol. The gain in FF is molecular and not algorithmic: FX protocol distinguishes between maternal and fetal DNA, and it increases the relative proportion of fetal DNA in the sample undergoing WGS. Though NIPS showed high sensitivity and specificity for common aneuploidies across the FF spectrum without FX protocol technology, application of FX protocol increases performance for each type of aneuploidy. The gain was particularly substantial for microdeletions.
[0170] FX protocol has a dramatic impact on the performance of microdeletion screening in NIPS. For common microdeletions, the expected aggregate sensitivity increases (Table 1,
[0171] Beyond the common microdeletions, our data suggest that FX protocol will increase the resolution of gwCNV detection, enabling confident identification of microdeletions below the current limit of 7 MB achievable with standard NIPS. Short microdeletions in samples with low FF can be challenging to detect with NIPS and limit sensitivity, but FX protocol raises the achievable sensitivity limit by reducing the frequency of low-FF samples. Notably, the 22q11.2 microdeletion, which causes DiGeorge syndrome, most commonly spans ~2-3 MB and has an expected sensitivity of 95.6% with FX protocol. To ensure that false positives are rare, the resolution limit for novel gwCNV detection may need to be above 3 MB, but db Var contains more than a thousand unique pathogenic microdeletions between 3 MB and 7 MB in size, a number of which are associated with clinically serious phenotypes, so any gains in resolution should increase the utility of NIPS for patients and providers.
[0172] Even if two NIPS laboratories were to test the same plasma sample, the reported FF and sensitivity for aneuploidy may differ due to variations in the laboratories’ respective molecular and computational protocols. For instance, based on differing methods of aligning, filtering, counting, and analyzing NGS reads, a laboratory reporting 8% FF could have higher aneuploidy sensitivity than a laboratory reporting 10% FF. These differences complicate interlab comparisons of NIPS performance, especially since laboratories demonstrate performance on different sample sets and with different study designs (e.g., clinical experience study vs. analytical validation study). As such, it can be difficult to make conclusive statements about relative NIPS performance. However, here we have demonstrated an unequivocal NIPS performance gain: two protocols (standard NIPS and FX protocol) were compared on a single set of samples within a single laboratory using a single aneuploidy—calling algorithm. FF increased 2.3-fold on average, and this FF increase resulted from a higher frequency of fetal-derived NGS reads. Beyond showing evidence for a relative gain in performance, the ROC analysis we performed yields an estimate of analytical sensitivity and specificity in an unbiased cohort reflective of a large population of clinical samples.
[0173] The FX protocol strategy described herein increases the FF of a sample at the molecular level via size selection upstream of sequencing, yet it is also possible to increase FF via algorithmic size selection downstream of sequencing. Specifically, the bioinformatics pipeline could calculate each fragment’s length based on the respective mapping positions of its paired-end reads and upweight shorter fragments in the analysis. However, the disadvantage of this bioinformatic approach is that substantial resources would still be consumed by sequencing longer fragments—likely to be maternal-derived—that contribute little to fetal aneuploidy detection. By contrast, when performing molecular size selection upstream of sequencing, all of the sequenced fragments have elevated likelihood of being fetal-derived.
[0174] Although the foregoing invention has been described in some detail by way of illustration and examples for purposes of clarity of understanding, it will be apparent to those skilled in the art that certain changes and modifications may be practiced without departing from the spirit and scope of the invention. Therefore, the description should not be construed as limiting the scope of the invention.