Solid state nanopores aided by machine learning for identification and quantification of heparins and glycosaminoglycans
11325987 · 2022-05-10
Assignee
Inventors
- Peiming Zhang (Gilbert, AZ)
- Xu Wang (Tempe, AZ, US)
- Jong One Im (Tempe, AZ, US)
- Stuart Lindsay (Phoenix, AZ)
Cpc classification
C08B37/0075
CHEMISTRY; METALLURGY
G16B40/10
PHYSICS
International classification
C08B37/00
CHEMISTRY; METALLURGY
G01N33/50
PHYSICS
Abstract
The present disclosure provides a method for identifying and quantifying sulfated glycosaminoglycans, including for example heparin, by passing a sample through nanopores. The glycosaminoglycans sample is measured in microliter quantities, at nanomolar concentrations with detection of impurities below 0.5%, and a dynamic range over five decades of magnitude with a trained machine learning algorithm.
Claims
1. A method for characterizing the purity of glycosaminoglycans, comprising: (a) passing one or more calibration samples through a first silicon nitride nanopore while recording a translocation current signal; (b) passing a negatively charged glycosaminoglycan sample to be characterized through the first silicon nitride nanopore or a second silicon nitride nanopore while recording a translocation current signal; and (c) using a machine learning algorithm to determine the purity of the negatively charged glycosaminoglycan sample to be characterized from the translocation current signals; wherein the negatively charged glycosaminoglycan molecule translocates through a silicon nitride nanopore under a voltage bias and wherein the voltage bias causes a transient blockade of ionic current and generates a current spike.
2. The method of claim 1 wherein the machine learning algorithm extracts data by Fourier transform (FFT) and cepstrum transform of the nanopore data recorded in the time domain to index individual current spikes.
3. The method of claim 1 wherein the negatively charged glycosaminoglycan sample is passed through more than one nanopore.
4. The method of claim 2 wherein the nanopore data is a sum of molecular bumping and translocating events.
5. The method of claim 3, wherein the more than one nanopore have different shapes.
6. The method of claim 3, wherein the more than one nanopore have different sizes.
Description
DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
DEFINITIONS
(26) “Glycosaminoglycans” (GAGs) or mucopolysaccharides are long unbranched polysaccharides consisting of a repeating disaccharide unit. The repeating unit (except for keratan) consists of an amino sugar (N-acetylglucosamine or N-acetylgalactosamine) along with a uronic sugar (glucuronic acid or iduronic acid) or galactose. Glycosaminoglycans are highly polar and attract water.
(27) A “calibration sample” is a reference sample, for analysis of known identity and concentration.
(28) A “nanopore” is an orifice with nanometer diameter and depth in a solid material. The nanopore shape may be irregular and change according to conditions and use, or the nanopore may be fixed in dimensions and may be a regular shape such as circular. One or more nanopores may be assembled in a device to measure electrical signals such as current and voltage.
(29) “Machine learning” is a statistical technique to iteratively refine and improve models by which raw data can be classified and used to make predictions on data.
(30) A “Support Vector Machine” or “SVM” refers to a supervised learning model to separate different classes in a hyperdimensional space and is a type of machine learning algorithm, used here as a tool of analyzing data from single molecule detection.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
(31) The invention provides a nanopore and SVM (support vector machine) method to identify, quantify, and characterize heparins and chondroitin sulfate, which represent a class of polysaccharides that pose great challenges to current analytical techniques for their separation and identification due to their varied compositions, charges, and polydispersity. Chondroitin sulfate is a sulfated glycosaminoglycan (GAG) composed of a chain of alternating sugars (N-acetylgalactosamine and glucuronic acid) and is usually found attached to proteins as part of a proteoglycan. A chondroitin chain can have over 100 individual sugars, each of which can be sulfated in variable positions and quantities. Chondroitin sulfate is an important structural component of cartilage.
(32) The solid-state nanopore is one of the simplest single molecule sensor and SVM is a machine learning algorithm. By combining these two together, heparins and chondroitin sulfate can be identified with high accuracy (>90%) and quantified with high accuracy. Chondroitin sulfate can be identified in a mixture with the heparin to a level as low as 0.8% (w/w). The data indicates that the nanopore/SVM method has potential to identify an impurity present at as low as about 0.5 to 0.05% (molar ratio). In addition, data shows that the nanopore has a limit of detection about 1.0 nanomolar (nM) and 5 decades of magnitude in dynamic range. Also, the nanopore/SVM technique distinguished between unfractionated heparin (UFH) and enoxaparin with an accuracy of about 94% on average. Using a reference sample to calibrate nanopores, heparin is quantified using different nanopores with reasonable accuracy, achieving nanomolar sensitivity and a 5-Log dynamic range, demonstrating that the nanopore/SVM technique can be used to monitor heparins and identify GAGs. These results show that the nanopore technique powered by machine learning can be a simple and cheap tool for monitoring heparins and other glycosaminoglycans (GAGs) from lot to lot with reference to standard samples that are fully characterized by NMR, mass spectrometry, and other analytic methods.
(33) The present invention employs a silicon nitride nanopore for differentiation between GAGs, such as two sulfated GAGs; heparin and oversulfated chondroitin sulfate (OSCS). Both heparin and OSCS in a mixture can be qualitatively identified either by current blockade magnitudes or blockade durations despite considerable overlaps of these parameter distributions in a 2D scatter plot. Nano-electronic technologies are useful for the identification and sequencing of carbohydrates. Solid-state nanopores can be used to distinguish heparins from chondroitin sulfate on a single molecule basis with the aid of machine learning, and the frequency of translocation signals can be used for heparin quantitation over a wide dynamic range. As illustrated in
(34) Identification of heparins and related GAGs:
(35) Heparin, a member of a group of sulfated glycosaminoglycans (GAGs) has the repeating structure:
(36) ##STR00003##
(37) Heparin (CAS Reg. No. 9005-49-6) has a molecular weight of about 12,000-15,000 and is a naturally occurring anticoagulant produced by basophils and mast cells. In therapeutic doses, it acts as an anticoagulant, preventing the formation of clots and extension of existing clots within the blood.
(38) Heparins were identified using a nanopore device (
(39) Nanopore/SVM Identification of HP.sub.dp20 and CS.sub.dp20.
(40) Monodisperse GAG samples HP.sub.dp20 and CS.sub.dp20 were characterized by size exclusion (
(41)
(42) TABLE-US-00001 TABLE 1 Statistical parameters derived from curve fitting of raw data in FIGS. 5D and 5E HP.sub.dp20 CS.sub.dp20 Dwell Time Blockade Dwell Time Blockade (ms) Ratio (ms) Ratio Median Mean Mean σ Median Mean Mean σ Run-1 0.083 0.092 0.120 0.027 0.080 0.092 0.105 0.019 Run-2 0.087 0.098 0.118 0.027 0.073 0.081 0.111 0.028 Note: (1) all adj. R.sup.2 >95%; (2) all fitting errors for median <±1.0% and for mean <±1.0%; (3) all fitting errors for Peak <±0.1% and for Width <±0.2%. ms = milliseconds
(43) Machine learning, particularly SVM, was used to analyze the translocation current signals of the nanopore data. SVM is a method that requires many independent parameters (or features) to classify data in a hyperdimensional space. A plethora of features were extracted by Fourier transform (FFT) and cepstrum transform of the nanopore data recorded in the time domain to index individual spikes (
(44) TABLE-US-00002 TABLE 2 Accuracy of SVM calling individual spikes SVM Calling SVM Training Remaining Training No. Training 50% data Untrained data data set Analyte Features accuracy HP.sub.dp20 CS.sub.dp20 HP.sub.dp20 CS.sub.dp20 SVM-2 50% of HP.sub.dp20 11 100 93.2 7.9 Run-2 94.2 4.9 Run-1 CS.sub.dp20 6.8 92.1 5.8 95.1 SVM-4 50% of HP.sub.dp20 11 100 96.2 6.0 Run-1 93.7 8.8 Run-2 CS.sub.dp20 3.8 94.0 6.3 91.2
(45) Detection of CS.sub.dp20 in a binary mixture.
(46) The nanopore/SVM method was tested for its potential use in the detection of impurities in a heparin product. CS.sub.dp20 was mixed with HP.sub.dp20 in a molar percentage of 1%, 5%, 10%, 20%, and 50%, respectively. The nanopore measurement was carried out in the same way as described above, following a sequence of CS.sub.dp20, HP.sub.dp20, and the mixture. Here, the pure CS.sub.dp20 and HP.sub.dp20 samples were used as standards to produce the reference data for training SVM. The nanopore was easily blocked by these mixtures presumably during the translocation process. However, the measurements were also conducted using multiple nanopores. As shown in Table 3, six nanopores were used in this study, and each mixture was measured at least twice either by the same nanopore or different nanopores.
(47) TABLE-US-00003 TABLE 3 CS.sub.dp20 molar percentages in mixtures determined by the nanopore/SVM method Molar percentage of CS.sub.dp20 in HP.sub.dp20 SVM 1% 5% 10% 20% 50% Nanopore score Percentage (%) of CS.sub.dp20 determined by SVM calling Pore-1 98.3 46.1 Pore-1.sup.α 100 57.2 Pore-2 93.3 1.4 11.1 Pore-3 94.0 3.0 7.4 Pore-4 84.1 23.2 Pore-5 83.3 0.5 3.9 Pore-6 80.8 9.3 21.9 Average ± SEM.sup.b 1.6 ± 0.7 5.7 ± 1.8 10.2 ± 0.9 22.6 ± 51.7 ± CV (%).sup.c 0.7 5.6 Error rate (%).sup.d 77.5 43.8 12.5 4.1 15.2 60.0 14.0 2.0 13.0 3.4 .sup.aa repeat of the measurement in Pore-1; .sup.bSEM: standard error of mean; .sup.cCV: coefficient of variation; .sup.dError rate = <averaged called percentage − molar percentage in sample>/molar percentage in sample.
(48) Following the nanopore measurement, SVM with data from the reference samples was trained in the same manner as described above and engaged the SVMs to call HP.sub.dp20 and CS.sub.dp20 in the mixtures from their nanopore data (Table 4). Table 4 lists the percentages of CS.sub.dp20 in the different mixtures determined by the best-scored SVM. Although the measured percentage of CS.sub.dp20 varies more or less from nanopore to nanopore, which may be attributed to the variations in the geometry of nanopores, the average is close to the correct percentage existing in the mixture. A plot of the average called percentages against molar percent of CS.sub.dp20 in the mixtures fits a linear function (
(49) TABLE-US-00004 TABLE 4 SVM scores of repeat measurements with same nanopore Molar percentage of CSdp20 in HP.sub.dp20 1% 5% 10% 20% 50% Total Events No. SVM Percentage (%) of CSdp20 determined by SVM Nanopore HP.sub.dp20 CS.sub.dp20 Mixtures Features score calling Pore-1 399 914 631 7 98.3 46.1 10 97.4 45.1 11 98.0 44.8 pore-1.sup.a 278 195 957 10 100 57.2 11 100 62.3 12 99.6 65.3 Pore-2 853 1012 1011/1933 6 91.9 1.2 9.9 6 92.0 0.3 12.4 11 93.3 1.4 11.1 Pore-3 219 845 2697/95 4 90.6 1.7 6.3 4 90.6 2.0 8.7 14 94.0 3.0 7.4 Pore-4 1731 1931 2180 9 82.5 22.8 14 84.0 22.8 15 84.1 23.2 Pore-5 9663 1265 22029/9177 20 79.4 0.2 6.0 23 83.3 0.5 3.9 26 81.8 0.5 4.5 Pore-6 3317 1844 2560/2205 8 80.2 8.9 20.3 10 80.8 9.3 21.9 12 80.4 7.4 19.3
(50) Identification of unfractionated heparin and enoxaparin.
(51) To explore its potential application in pharmaceutical production, the nanopore/SVM method was tested on samples with clinical uses, e.g. unfractionated heparin (UFH) and a LMWH (low molecular weight heparin) drug enoxaparin (enoxaparin sodium: CAS Reg. No. 9005-49-6, trade names: Lovenox®, Clexane®, Xaparin®). UFH and LMWH drugs are widely used in the prevention and treatment of thromboembolic disorders (Cosmi, B. et al, Thromb. Res. (2012) 129:388-91; Lee, S. et al, Nat. Biotechnol. (2013) 31:220-6). Enoxaparin, with an average MW of about 4500, is a product of depolymerizing the UFH, so called low molecular weight heparins (LMWH), a mixture of polydisperse oligosaccharides each containing an unsaturated urinate residue at its non-reducing end and an amino sugar or a 1,6-anhydro amino sugar at its reducing end. Nonetheless, the enoxaparin still remains about 20% of antithrombin-binding fractions of UFH (Mourier, P. A. et al, J. Pharm. Biomed. Anal. 2015, 115:431-42). Besides being shorter than UFH, enoxaparin is also polydisperse and contains an unsaturated uronate residue at its non-reducing end as well as an amino sugar or a 1,6-anhydro amino sugar at its reducing end, having the structure:
(52) ##STR00004##
(53) Serial measurements of HP.sub.dp20, CS.sub.dp20, enoxaparin, and UFH were conducted at 1.0 μM concentrations using a newly drilled nanopore of about 3 nm in diameter (
(54) TABLE-US-00005 TABLE 5 Dwell Times derived from curve fitting to the Lognormal function HPdp20 CSdp20 Enoxaparin UFH Median (ms) 0.063 0.074 0.064 0.061 Mean (ms) 0.069 0.091 0.071 0.083 Adj. R.sup.2 0.98 0.99 0.97 0.98
(55) To identify these GAGs, 50% of the data was randomly selected from each of collected data sets to train the SVM with up to 88 available signal features and then applied the trained SVM to classify the rest of 50% remaining data. The effectiveness of SVM for the GAG identification is quickly determined without laborious multiple runs. The four products can be distinguished in pairs by SVM. All of SVMs were trained to be capable of identifying individual spikes in the training data with 100% accuracy, as a trained machine learning algorithm. As shown in Table 5, both UFH and CS.sub.dp20 were called with an accuracy of 94.6% and 91.0%, respectively, 92.5% on average (Entry 1); UFH and enoxaparin (designated as Enox) with an accuracy of about 96% and 93%, respectively (Entry 2). However, UFH and HP.sub.dp20 were distinguished with a lower accuracy of about 85% on average (Entry 3). For calling Enox, the SVM distinguished between Enox and HP.sub.dp20 (Entry 5) marginally better than between UFH and HP.sub.dp20. The SVM distinguished between Enox and CS.sub.dp20 with an averaged accuracy of about 73% (Entry 4), significantly lower than it did between UFH and CS.sub.dp20 which is consistent with the scatter plots in
(56) TABLE-US-00006 TABLE 6 Accuracy of SVM calling heparins and CS.sub.dp20 Training Accuracy Identification accuracy (%).sup.1 Entry Entity (%) UFH CS.sub.dp20 Enox. HP.sub.dp20 Average 1 UFH vs 100 94.6 ± 1.2 91.0 ± 1.5 92.5 ± 1.0 CS.sub.dp20 2 UFH vs 100 95.9 ± 0.9 92.9 ± 0.9 94.4 ± 0.6 Enox 3 UFH vs 100 87.2 ± 1.6 82.1 ± 1.4 84.7 ± 1.1 HP.sub.dp20 4 Enox vs 100 73.8 ± 2.7 72.1 ± 1.7 73.0 ± 1.6 CS.sub.dp20 5 Enox vs 100 88.4 ± 2.3 86.4 ± 1.3 87.4 ± 1.3 HP.sub.dp20 6 HP.sub.dp20 vs 100 88.7 ± 1.8 83.3 ± 1.4 86.0 ± 1.1 CS.sub.dp20 7 Pool of 100 80.6 ± 1.6 65.1 ± 2.6 65.2 ± 3.3 64.8 ± 3.3 68.9 ± 1.4 Four .sup.1each value is an average of three calls by the SVMs trained from three randomly selected subsets of data.
(57) Quantification of HP.sub.dp20.
(58) The use of solid-state nanopores was demonstrated for quantitation of sulfated GAGs. In the nanopore measurement, the event rate changes with the concentration of an analyte. Thus, the determination of concentrations becomes counting of spikes. Event rates linearly increased with concentrations of heparin in a range of 0.25 to 1.25 μM (a five-fold change). The HP.sub.dp20 concentration was measured ranging from 1.0 nM to 100 μM using multiple nanopores to build a calibration curve with calibration samples. For the measurement, each concentration was repeated at least once in the same or a different nanopore. From the nanopore data was extracted all of the ionic current signals above a threshold as conducted for the SVM analysis, defining an event rate as spikes/s (Table 7). The event rate varied from nanopore to nanopore for the same concentration. That may be attributed to different diameters and shapes among these nanopores even though they were fabricated under the same TEM conditions.
(59) TABLE-US-00007 TABLE 7 Translocation Frequencies of HPdp20 through nanopores (spikes/sec) Nanopore Sample dilution [μM] pore d [nm] 0.001 0.01 0.05 0.1 0.5 1 3.3 3.078 ± 0.529 3.259 ± 0.177 2 2.0 0.143 ± 0.014 0.199 ± 0.018 0.270 ± 0.057 0.358 ± 0.052 2* 2.0 0.147 ± 0.016 0.399 ± 0.041 3 2.5 0.760 ± 0.067 4 2.5 0.101 ± 0.011 0.157 ± 0.015 5 2.6 0.088 ± 0.012 0.536 ± 0.158 0.859 ± 0.147 1.204 ± 0.109 6 3.2 0.272 ± 0.022 0.335 ± 0.047 7 2.7 0.120 ± 0.017 0.138 ± 0.016 0.441 ± 0.018 8 3.6 0.597 ± 0.053 1.663 ± 0.510 Nanopore Sample dilution [μM] pore d [nm] 1 5 10 100 1 3.3 5.716 ± 0.299 2 2.0 3.477 ± 0.211 2* 2.0 3 2.5 2.319 ± 0.656 3.503 ± 1.404 4 2.5 6.891 ± 0.631 5 2.6 9.530 ± 0.947 6 3.2 1.189 ± 0.207 2.098 ± 1.260 7 2.7 1.082 ± 0.080 9.573 ± 0.762 8 3.6 4.373 ± 0.507
(60) In order to compare the event rates between nanopores, all the event rates measured were normalized with the same nanopore by referencing the one at 0.1 μM as 1.0 (Table 8).
(61) TABLE-US-00008 TABLE 8 Normalized frequencies (referenced to those from 0.1 μM sample as 1) Nanopore Sample dilution micromolar (μM) pore d [nm] 0.001 0.01 0.05 0.1 0.5 1 3.3 1.000 ± 0.172 1.058 ± 0.057 2 2.0 0.400 ± 0.041 0.554 ± 0.050 0.752 ± 0.160 1.000 ± 0.146 2* 2.0 0.368 ± 0.042 1.000 ± 0.103 3 2.5 1.000 ± 0.088 4 2.5 0.648 ± 0.070 1.000 ± 0.099 5 2.6 0.102 ± 0.014 0.623 ± 0.184 1.000 ± 0.171 1.400 ± 0.127 6 3.2 0.812 ± 0.067 1.000 ± 0.141 7 2.7 0.274 ± 0.039 0.314 ± 0.036 1.000 ± 0.041 8 3.6 0.359 ± 0.032 1.000 ± 0.306 Ave. 0.286 ± 0.133 0.469 ± 0.158 0.729 ± 0.096 1.000 ± 0.000 1.229 ± 0.241 Nanopore Sample dilution micromolar (μM) pore d [nm] 1 5 10 100 1 3.3 1.856 ± 0.097 2 2.0 9.688 ± 0.589 2* 2.0 3 2.5 3.051 ± 0.863 4.609 ± 1.847 4 2.5 43.894 ± 4.024 5 2.6 11.085 ± 1.102 6 3.2 3.551 ± 0.619 6.264 ± 3.762 7 2.7 2.453 ± 0.182 21.693 8 3.6 2.629 ± 0.305 Ave. 2.708 ± 0.637 5.436 ± 1.170 10.386 ± 0.987 32.793 ± 15.69
(62) The normalized data plotted on a log-log scale (
(63) Using two newly fabricated nanopores, the standard curve was tested for the HP.sub.dp20 samples. In the same way as shown in
(64) TABLE-US-00009 TABLE 9 Raw and normalized event rate of the pores used in Table 10 Pore Concentration [μM] Index 0.01 0.05 0.1 1 5 5-1 Raw Event Rate 0.676 1.002 1.573 2.396 9.714 Normalized Event Rate 0.430 0.637 1.000 1.523 6.175 Determined conc. 0.008 0.03 — 0.6 4.7 5-2 Raw Event Rate 0.796 — 1.563 3.854 — Normalized Event Rate 0.509 — 1.000 2.466 — Determined conc. 0.013 — — 1.2 —
(65) As shown in Table 10, four HP.sub.dp20 samples were measured for their event rates with a nanopore (designated as Pore 9), from which their concentrations were derived by applying either of the two functions in
(66) TABLE-US-00010 TABLE 10 Fitting functions for determination of HP.sub.dp20 concentration by nanopores Concentration Measurement Pore Sample conc. (μM) 0.01 0.05 1.0 5.0 9 Event rate* 0.430 0.637 1.523 6.175 (spikes/s) Derived conc. (μM) 0.008 0.025 0.6 4.8 10 Event rate* 0.509 — 2.466 — (spikes/s) Derived conc. (μM) 0.013 — 1.20 — Average (μM) 0.011 ± 0.004 0.90 ± 0.42 *normalized event rates and see Table 7 for their raw data.
(67) A nanopore/SVM method for identification of sulfated GAGs is demonstrated that distinguishes between heparins as well as between heparin and chondroitin sulfate with high accuracies. The nanopore/SVM method was also able to identify CS.sub.dp20 in its mixtures with HP.sub.dp20 at a level down to 0.8% (w/w), comparable to the NMR technique for detection of OSCS. Besides its bulkiness and expensiveness, NMR spectrometers also require more materials for the analysis.
(68) To address the issue on non-uniformity of nanopore size and geometry, a reference sample (calibration sample) was used for the calibration of nanopores (0.1 μM HP.sub.dp20) which allowed normalization of the data from different nanopores. It was observed HP.sub.dp20 can be quantified with reasonable accuracy by the multiple-nanopore measurement. Nanopore measurement has a nanomolar (nM) limit of detection and five orders of magnitude dynamic range. Thus, such a nanopore device can potentially be used to monitor the heparin level in the human blood since the range of plasma heparin is about 1 to 2.4 mg per liter, equivalent to a range of 67 to 160 nM (assuming an average molecular weight of 15,000 for UFH). An array of nanopores may be produced and used to optimize different machine learning algorithms for the identification of GAGs.
EXAMPLES
Example 1 Fabrication of Nanopores
(69) Silicon chips (5×5 mm) coated with silicon nitride (30 nm thick) were purchased from Norcada Inc. (part number: NX5025X). Following a process of argon plasma cleaning, nanopores were drilled using the electron beam in JEOL 2010FEG and ARM 200F transmission electron microscope (TEM) at 200 keV. The size of the pores was controlled by the electron beam size and exposure time. The nanopores were imaged right after the drilling. The nanopore was drilled in a 30 nm-thick silicon nitride membrane by TEM, which shows a conical shape with a diameter of about 3.2 nm at its narrowest section (
Example 2 Preparation of Sample Solutions
(70) Stock solutions of HP.sub.dp20 and CS.sub.dp20 (Iduron) were prepared respectively by dissolving the sample in H.sub.2O. Their actual concentrations were determined based on the carbazole assays. These two stock solutions of HP.sub.dp20 (10 mM) and CS.sub.dp20 (10 mM) were used to prepare mixtures of HP.sub.dp20 and CS.sub.dp20 with a ratio of 1, 5, 10, 20, and 50% of CS.sub.dp20. The final concentrations of these mixtures were diluted to be 0.5-1 μM with an electrolyte solution of 0.4 M KCl in 1 mM phosphate buffer (pH 7.4). For the dilution study, the 10 mM stock solution of HP.sub.dp20 was diluted to various concentrations in a range of 1 mM to 10 nM and injected into the cis reservoir to make the final concentrations of the analyte 100 μM to 1 nM for the measurement.
Example 3 Nanopore Measurements
(71) Prior to the measurement, a nanopore chip was cleaned by immersing in a hot piranha acid (piranha etch) solution (H.sub.2O.sub.2:H.sub.2SO.sub.4=1:4) for 20 min, and then rinsed with Milli-Q water (a resistivity of about 18.2 MΩ×cm and total organic carbon of less than 5 ppb). Piranha acid solutions are extremely energetic and may result in explosion or skin burns if not handled with extreme caution. After drying with N.sub.2 gas, the nanopore chip was placed in a piranha-cleaned PCTFE cell to form a cis reservoir and sealed with a quick-curing silicone elastomer gasket. The PCTFE cell with a nanopore chip was then assembled with a PTFE base to form a trans reservoir. The electrolyte solution used was 0.4 M KCl in 1 mM phosphate buffer (pH 7.4), which was filtered with a Millipore 0.2 μm filter. Ag/AgCl electrodes, freshly made from Ag wires with bleach, were inserted into both cis and trans reservoirs for ionic current measurement. All of analytes were dissolved in the electrolyte solution for the nanopore analysis.
(72) For the measurement, both cis and trans reservoir were filled with the electrolyte solution, and the nanopore was soaked for about 1 to 2 hours, followed by applying a high voltage (about 1 V) between two reservoirs for about 5 to 10 minutes to obtain a steady baseline current and no electrical spikes, an indicator of achieving an open and wet nanopore. Then, an analyte solution (about 10 μl) was injected into the cis reservoir with a final concentration of about 1 μM. A translocation bias was applied to the Ag/AgCl electrode in the trans reservoir, while the electrode in the cis reservoir was kept grounded to avoid adsorption of analyte molecules to the reference electrode. After recording the ionic current, the cis reservoir was drained and rinsed with the electrolyte solution. Another baseline was recorded to ensure no contaminations left in cis reservoir before a new analyte solution was injected.
(73) TABLE-US-00011 TABLE 11 SVM scores of repeat measurements with same nanopore SVM Training Training SVM score on SVM score on Data trained data set untrained data set (Prediction No. Training Analyte HP.sub.dp20 CS.sub.dp20 HP.sub.dp20 CS.sub.dp20 SVM Index Data) Features Score (Events) (609) (361) (1640) (1271) SVM-1 Run-1 11 100 HP.sub.dp20 92.6 9.4 93.4 6.5 (Run-2) (610) CS.sub.dp20 7.4 90.6 6.6 93.5 (361) SVM-2 Run-1 11 100 HP.sub.dp20 93.2 7.9 94.2 4.9 (Run-2) (610) CS.sub.dp20 6.8 92.1 5.8 95.1 (361) SVM-3 Run-1 9 100 HP.sub.dp20 93.4 12.4 95.6 7.1 (Run-2) (610) CS.sub.dp20 6.6 87.6 4.4 92.9 (361) SVM Training Training SVM score on SVM score on Data trained data set untrained data set (Prediction No. Training Analyte HP.sub.dp20 CS.sub.dp20 HP.sub.dp20 CS.sub.dp20 SVM Index Data) Features Score (Events) (820) (636) (1219) (722) SVM-4 Run-2 11 100 HP.sub.dp20 96.2 6.0 93.7 8.8 (Run-1) (820) CS.sub.dp20 3.8 94.0 6.3 91.2 (635) SVM-5 Run-2 10 100 HP.sub.dp20 95.0 6.0 88.1 5.8 (Run-1) (820) CS.sub.dp20 5.0 94.0 11.9 94.2 (635) SVM-6 Run-2 8 100 HP.sub.dp20 95.8 8.8 91.2 7.6 (Run-1) (820) CS.sub.dp20 4.2 91.2 8.8 92.4 (635)
Example 4 Data Collection
(74) Ionic currents were collected at a 500 kHz sampling rate with a 100 kHz low pass filter using patch clamp amplifier Axon Axopatch 200B, with digitizer DigiData 1550A from Axon Instruments Inc. PClamp 10.4 software and an in-house developed LabView program were used for data recording.
Example 5 SVM Data Analysis
(75) A program written in MATLAB was used for the data process to identify GAGs. First, a baseline of recorded ionic currents was determined by the most probable electrical current, the width of which was determined by 6σ (standard deviation) of the trace. Those spikes larger than the baseline width were recognized as translocation events. Then, each of them was subjected to Fourier transformation by down-sampling it to 20 equal frequency bins, corresponding to 25 kHz bin size. The Fourier transformed frequency spectrum was further transformed to cepstrum domain and down-sampled into 51 equal bins (
(76) TABLE-US-00012 TABLE 12 Features and their descriptions for SVM data analysis Feature Name Description Amplitude Maximum amplitude of the event. Average Amplitude Average amplitude of the event. Dwell Time Width of the event. Blockade Ratio Ratio of the maximum amplitude of the event with respect to the baseline. Number of Levels Number of levels of the event. Step Size Magnitude of the differences between levels. (Zero was assigned for no leveled event.) Fluctuation Number of local maximum peaks of the event. Roughness Standard deviation of the event. Peak in Beginning Maximum amplitude of the first 10 μs data of the event. (First a-third data for the event shorter than 30 μs.) Peak in Middle Maximum amplitude of the data out of the first and last 10 μs of the event. (Second a-third data for the event shorter than 30 μs.) Peak in Last Maximum amplitude of the last 10 μs data of the event. (Last a-third data for the event shorter than 30 μs.) Peak FFT 1-20 The normalized power spectrum, down- sampled into 20 equal frequency bands. Peak FFT Total Total summation of frequency spectra of the event. Peak FFT Maximum 1-4 The ‘n-th’ dominant frequency band on the power spectrum. Peak HighLow The ratio of the top quarter of the power spectrum to bottom quarter of the spectrum. Peak Cepstrum 1-51 Average magnitude of the cepstrum spectra down-sampled into 51 equal windows of the event.
(77) To avoid features with a large numeric range from dominating those with a small numeric range, all the calculated features were normalized to make the mean of each feature with its standard deviation between 0 and 1. The normalized correlation was calculated between different pairs of all the features and selected one of them as a representative feature for the following analysis. The features were ranked according to the ratio between the in-group fluctuation (variation over repeated experiments of the same analyte) and the out-group fluctuation (variation between different analytes), and then the low ranked features were removed. Those survived features were evaluated by the classification accuracy, from which an optimized set of features was chosen to achieve a maximum true positive accuracy. The SVM was run with the kernel-mode adapted from https://github.com/vjethava/svm-thetai and its running parameters C and gamma were optimized through cross-validation of randomly selected sub-data set.
(78) Statistical analysis was carried out in OriginPro 2017, in which the Levenberg-Marquardt algorithm was used for the curve fitting.
(79) Computational Modeling, DFT calculations were performed using Spartan'16 for Windows, available software from Wave Function, Inc. Two dimensional molecular structures were drawn in ChemDraw Ultra 12.0 and imported to Spartan'16 to generate corresponding 3D structures. Each structure was subjected to energy minimization using the built-in MMFF molecular mechanics prior to optimization calculation. The DFT calculations were performed at their ground-state equilibrium geometry conformation using B3LYP/6-31G* basis set in vacuum.
(80) Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, the descriptions and examples should not be construed as limiting the scope of the invention. Accordingly, all suitable modifications and equivalents may be considered to fall within the scope of the invention as defined by the claims that follow. The disclosures of all patent and scientific literature cited herein are expressly incorporated in their entirety by reference.