MODELLING METHOD USING A CONDITIONAL VARIATIONAL AUTOENCODER
20220383985 · 2022-12-01
Inventors
Cpc classification
G16B25/10
PHYSICS
International classification
Abstract
The present invention relates to a computer-implemented method for modelling genomic data represented in an unsupervised neural network, trVAE, comprising a conditional variational autoencoder, CVAE, with an encoder (f) and a decoder (g).
Claims
1. A computer-implemented method for modelling single-cell gene expression, scGE, data represented in an unsupervised neural network, trVAE, comprising a conditional variational autoencoder, CVAE, with an encoder (f) and a decoder (g), the method comprising: obtaining, first input data comprising one or more sets of multivariate scGE data, referred to as batches X, and one or more first conditions s associated with respective elements of said one or more sets of multivariate scGE data; processing the first input in the encoder (f) of the trVAE, thereby obtaining first latent data Z represented in a low-dimensional space of a hidden layer of the trVAE; processing the obtained latent data Z associated with the first conditions s in a first part (g.sub.1) of the decoder (g) thereby obtaining first reconstructed data Y; processing the obtained first reconstructed data Y of the first layer in one or more subsequent layers in a second part (g.sub.2) of the decoder (g), to obtain second reconstructed data {circumflex over ( )}X, thereby learning a model for reconstructing the first multivariate scGE data to facilitate curative or diagnostic interpretation, wherein the first reconstructed data Y of the first layer are subject to a cost function derived from a known cost function L.sub.VAE of the CVAE imposed with penalty based on a distance metric known as maximum mean discrepancy, MMD, or a Wasserstein distance metric.
2. The method of the preceding claim, further comprising the following steps carried out in the trVAE with the learned model incorporated: obtaining second input data comprising one or more batches X of multivariate scGE data with second conditions s, wherein s=0, denoted as (X.sub.s=0, s=0); processing the second input in the encoder (f) of the trVAE with the learned model incorporated, thereby obtaining a second latent representation {circumflex over ( )}Z with third conditions s, wherein s=0, denoted as ({circumflex over ( )}Z.sub.s=0, s=0); processing {circumflex over ( )}Z.sub.s=0 with fourth conditions s, wherein s=1, denoted as ({circumflex over ( )}Z.sub.s=0, s=1) in the decoder (g) of the trVAE with the learned model incorporated, to obtain transformed data {circumflex over ( )}Z associated with fourth conditions s, wherein s=1, denoted as ({circumflex over ( )}Z.sub.s=0, s=1), said transformed data representing the second multivariate scGE data associated with the fourth conditions s predicted according to the learned model.
3. The method of the two preceding claims, wherein each of the first, second, third, and fourth conditions s comprises n-tupels of scalar conditions, wherein n is a natural number.
4. The method of any of the preceding claims, wherein the batches of multivariate data are randomized batches.
5. The method of any of the preceding claims, wherein curative interpretation and diagnostic interpretation comprise predictions for cellular perturbation response to treatment and disease, respectively, based on the first multivariate scGE data.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0028]
[0029]
[0030]
[0031]
[0032]
[0033] (a) UMAP visualization of conditions and cell type for gut cells.
[0034] (b-c) Mean and variance expression of 1,000 genes comparing trVAE-predicted and real infected Tuft cells together with the top 10 differentially-expressed genes highlighted in red (R.sup.2 denotes Pearson correlation between ground truth and predicted values).
[0035] (d) Distribution of Defa24: the top response gene to H.poly.Day10 infection between control, predicted and real stimulated cells for different models. Vertical axis: expression distribution for Defa24. Horizontal axis: control, real and predicted distribution by different models.
[0036] (e) Comparison of Pearson's R.sup.2 values for mean and variance gene expression between real and predicted cells for different models. Center values show the mean of R.sup.2 values estimated using n=100 random subsamples for the prediction of each model and error bars depict standard deviation.
[0037] (f) Comparison of R.sup.2 values for mean gene expression between real and predicted cells by trVAE for the eight different cell types and three conditions. Center values show the mean of R.sup.2 values estimated using n=100 random subsamples for each cell type and error bars depict standard deviation.
[0038]
[0039] (a) UMAP visualization of peripheral blood mononuclear cells (PBMCs).
[0040] (b-c) Mean and variance per 2,000 dimensions between trVAE-predicted and real natural killer cells (NK) together with the top 10 differentially-expressed genes highlighted in red.
[0041] (d) Distribution of ISG15: the most strongly changing gene after IFN-β perturbation between control, real and predicted stimulated cells for different models. Vertical axis: expression distribution for ISG15. Horizontal axis: control, real and predicted distribution by different models.
[0042] (e) Comparison of R2 values for mean and variance gene expression between real and predicted cells for different models. Center values show the mean of R2 values estimated using n=100 random subsamples for the prediction of each model and error bars depict standard deviation.
[0043] The present invention is defined by the subject matter of the appended claims.
SUMMARY
[0044] Single-cell transcriptomics has become an established tool for unbiased profiling of complex and heterogeneous systems. The generated datasets are typically used for explaining phenotypes through cellular composition and dynamics. Of particular interest are the dynamics of single cells in response to perturbations, be it to dose, treatment or knockout of genes. Moreover, combinatorial drug treatments to cure a disease, e.g. cancers have recently been studied in single cell settings. The present invention thus provides a deep learning model as described herein which can provide in-silico predictions how combination drugs will treat the diseased samples. Previously published methods provide in-silico predictions for the scenario that only one drug exist and are not able to predict combination of drug. Moreover, previous approaches use two step modeling, first, projecting the data to a latent space and then the second algorithm performs the predictions. However, the present invention provides an end-to-end solution which does previous steps in one model. The present invention shows the performance of the model on a variety of examples as is illustrated in the examples section herein below.
Accordingly, the present invention relates to
[0045] A computer-implemented method for modelling single-cell gene expression, scGE, data represented in an unsupervised neural network, trVAE, comprising a conditional variational autoencoder, CVAE, with an encoder (f) and a decoder (g), the method comprising:
[0046] obtaining, first input data comprising one or more sets of multivariate scGE data, referred to as batches X, and one or more first conditions s associated with respective elements of said one or more sets of multivariate scGE data;
[0047] processing the first input in the encoder (f of the trVAE, thereby obtaining first latent data Z represented in a low-dimensional space of a hidden layer of the trVAE;
[0048] processing the obtained latent data Z associated with the first conditions s in a first part (g.sub.1) of the decoder (g) thereby obtaining first reconstructed data Y;
[0049] processing the obtained first reconstructed data Y of the first layer in one or more subsequent layers in a second part (g.sub.2) of the decoder (g), to obtain second reconstructed data {circumflex over ( )}X, thereby learning a model for reconstructing the first multivariate scGE data to facilitate curative or diagnostic interpretation,
[0050] wherein the first reconstructed data Y of the first layer are subject to a cost function derived from a known cost function L.sub.VAE of the CVAE imposed with penalty based on a distance metric known as maximum mean discrepancy, MMD, or a Wasserstein distance metric.
[0051] Preferably, the computer-implemented method of the present invention further comprises the following steps carried out in the trVAE with the learned model incorporated:
[0052] obtaining second input data comprising one or more batches X of multivariate scGE data with second conditions s, wherein s=0, denoted as (X.sub.s=0, s=0);
[0053] processing the second input in the encoder (f) of the trVAE with the learned model incorporated, thereby obtaining a second latent representation {circumflex over ( )}Z with third conditions s, wherein s=0, denoted as ({circumflex over ( )}Z.sub.s=0, s=0);
[0054] processing {circumflex over ( )}Z.sub.s=0 with fourth conditions s, wherein s=1, denoted as ({circumflex over ( )}Z.sub.s=0, s=1) in the decoder (g) of the trVAE with the learned model incorporated, to obtain transformed data {circumflex over ( )}Z associated with fourth conditions s, wherein s=1, denoted as ({circumflex over ( )}Z.sub.s=0, s=1), said transformed data representing the second multivariate scGE data associated with the fourth conditions s predicted according to the learned model.
[0055] Preferably, in the computer-implemented method of the present invention each of the first, second, third, and fourth conditions s comprises n-tupels of scalar conditions, wherein n is a natural number.
[0056] Preferably, the batches of multivariate data are randomized batches in the context of the computer-implemented method of the present invention.
[0057] Preferably, curative interpretation and diagnostic interpretation comprise predictions for cellular perturbation response to treatment and disease, respectively, based on the first multivariate scGE data in the context of the computer-implemented method of the present invention.
[0058] The methods of the present invention provide for in-silico predictions, e.g. how combination drugs will treat a diseased subject, e.g. a mammalian subject, such as a human subject.
[0059] The methods of the present invention provide for qualitatively, preferably, improved predictions for cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data.
[0060] In the context of the present invention, genomic data may encompass data obtained from one or more transcriptomes, data obtained from one or more genomes, or data obtained from sequencing, e.g. NGS. Genomic data may be from one or more single cells, such as mammalian cells, e.g. human cells.
DETAILED DESCRIPTION
[0061] For the implementation, we use multi-scale RBF kernels defined as:
where k(x,x′,γ.sub.i)=e.sup.−γi∥x−x′∥.sup.
[0062] Addressing the domain adaptation problem, the “Variational Fair Autoencoder” (VFAE) (Louizos et al.,2015) uses MMD to match latent distributions q.sub.Φ(z|s=0) and q.sub.Φ(z|s=1)—where s denotes a domain—by adapting the standard VAE const function L.sub.VAE according to
L.sub.VFAE(Φ,θ,X,X′,S,S′)=L.sub.VAE(Φ,θ,X′,S′)−βl.sub.MMD(Z.sub.s=0,Z′.sub.s′=1) (6)
[0063] Where X and X′ are two high-dimensional observations with their respective conditions S and S′. In contrast to GANs (Goodfellow et al., 2014), whose training procedure is notoriously hard due to the minmax optimization problem, training models using MMD or Wasserstein distance metrics is comparatively simple (Li et al., 2015; Arjovsky et al., 2017; Dziugaite et al., 2015a) as only a direct minimization of a straight forward loss is involved during the training. It has been shown that MMD based GANs have some advantages over Wasserstein GANs resulting in a simpler and faster-training algorithm with matching performance (Birikowski et al., 2018). This motivated us to choose MMD as a metric for regularizing distribution matching.
Defining the Transformer VAE or Transfer VAE
[0064] Let us adapt the following notation for the transformation within a standard CVAE. High-dimensional observations x and a scalar or low-dimensional condition s are transformed using f (encoder) and g (decoder), which are parametrized by weight-sharing neural networks, and give rise to predictors {circumflex over (z)}, ŷ and {circumflex over (x)}.
{circumflex over (z)}=f(x,s) (7a)
ŷ=g.sub.1({circumflex over (z)},s) (7b)
{circumflex over (x)}=g.sub.2({circumflex over (y)}) (7c)
where we distinguished the first (g.sub.1) and the remaining layers (g.sub.2) of the decoder g=g.sub.2∘g.sub.1 (
[0065] While z formally depends on s, it is commonly empirically observed Z∥S, that is, the representation z is disentangled from the condition information s. By contrast, the original representation typically strongly covaries with S:XS. The observation can be explained by admitting that an efficient z-representation, suitable for minimizing reconstruction and regularization losses, should be as free as possible from information about s. Information about s is directly and explicitly available to the decoder (equation 7b), and hence, there is an incentive to optimize the parameters of f to only explain the variation in x that is not explained by s. Experiments below demonstrate that indeed, MMD regularization on the bottleneck layer z does not improve performance.
[0066] However, even if z is completely free of variation from s, the y representation has a strong s component, YS, which leads to a separation of y.sub.s=1 and y.sub.s=0 into different regions of their support
. In the standard CVAE, without any regularization of this y representation, a highly varying, non-compact distribution emerges across different values of s (
across s forces learning common features across s where possible. The more of these common features are learned, the more accurately the transformation task will performed, and the higher are chances of successful out-of-sample generation. Using one of the benchmark datasets introduced, below, we qualitatively illustrate the effect (
[0067] During training time, all samples are passed to the model with their corresponding condition labels (x.sub.s,S). At prediction time, we pass (x.sub.s=0,s=0) to the encoder f to obtain the latent representation {circumflex over (z)}.sub.s=0. In the decoder g, we pass ({circumflex over (z)}.sub.s=0,s=1) and through that, let the model transform data to {circumflex over (x)}.sub.s=1.
[0068] The cost function of trVAE derives directly from the standard CVAE cost function, as introduced in the backgrounds section,
L.sub.CVAE(Φ,θ,X,α,η)=ηE.sub.qθ(Z|X,S)log(p.sub.Φ(X|Z,S))−αD.sub.KL (q.sub.θ(Z|X,S)∥p.sub.Φ(Z|X,S)) (8)
[0069] Consistent with the above, let ŷ.sub.s=0=g.sub.1(f(x,s=0) and ŷ.sub.s=1=g.sub.1(f(x′,s=1), s=1). Through duplicating the cost function for X′ and adding an MMD term, the loss of trVAE becomes:
L.sub.trVAE(Φ,θ,X,X′S,S′,α,η,β)=L.sub.CVAE(Φ,θ,X,X′,S,S′,α,η)−βl.sub.MMD(Ý.sub.s=0,Ŷ.sub.s′=1) (9)
EXPERIMENTS
[0070] We demonstrate the advantages of an MMD-regularized first layer of the decoder by benchmarking versus a variety of existing methods and alternatives: [0071] Vanilla CVAE (Sohn et al., 2015) [0072] CVAE with MMD on bottleneck (MMD-CVAE), similar to VFAE (Louizos et al., 2015) [0073] MMD-regularized autoencoder (Dziugaite et al., 2015b; Amodio et al., 2019) [0074] CycleGAN (Zhu et al., 2017) [0075] scGen, a VAE combined with vector arithmetics (Lotfollahi et al., 2019) [0076] scVl, a CVAE with a negative binomial output distribution (Lopez et al., 2018a)
[0077] First, we demonstrate trVAE's basic out-of-sample style transfer capacity on two established image datasets, on a qualitative level. We then address quantitative comparisons of challenging benchmarks with clear ground truth, predicting the effects of biological perturbation based on high-dimensional structured data. We used convolutional layers for imaging examples in section 4.1 and fully connected layers for single-cell gene expression datasets in sections 4.2 and 4.3. The optimal hyper-parameters for each application were chosen by using a parameter gird-search for each model. The detailed hyper-parameters for different models are reported in tables 1-9 in appendix A.
MNIST and Celebra Style Transformation
[0078] Here, we use Morpho-MNIST (Castro et al., 2018) which contains 60,000 images each of .sub.“normal” and .sub.“transformed” digits, which are drawn with a thinner and thicker stroke. For training, we used all normal-stroke data. Hence, the training data covers all domains (d∈{0, 1, 2, . . . ,9 }) in the normal stroke condition (s=0). In the transformed conditions (thin and thick strokes, s∈{1,2}), we only kept domains d∈{1, 3, 6, 7}.
[0079] We train a convollutional trVAE in which we first encode the stroke width via two fully-connected layers with 128 and 784 features, respectively. Next, we reshape the 784-dimensional into 28*28*1 images and add them as another channel in the image. Such trained trVAE faithfully transfroms digits of normal stroke to digits of thin and thicker stroke to the out-of-sample domains (
[0080] Next we apply trVAE to CelebA (Liu et al., 2015), which contains 202,599 images of celebrity faces with 40 binary attributes for each image. We focus on the task of learning a transformation that turns a non-smiling face into a smiling face. We kept the smiling (s) and gender (d) attributes and trained the model with images from both smiling and non-smiling men but only with non-smiling women.
[0081] In this case, we trained a deep convolutional trVAE with a U-Net-like architecture (Ronneberger et al., 2015). We encoded the binary condition labels as in the Morpho-MNIST example and fed them as an additional channel in the input.
[0082] Predicting out-of-sample, trVAE successfully transforms non-smiling faces of women to smiling faces while preserving most aspects of the original image (
Infection Response
[0083] Accurately modeling cell response to pertubations is a key question in computational biology. Recently, neural network models have been proposed for out-of-sample predictions of high-dimensional tabular data that quantifies gene expression of single-cells (Loffollahi et al., 2019; Amodio et al., 2018). However, these models are not trained on the task relying instead on hard-coded transformations and cannot handle more than two conditions.
[0084] We evaluate trVAE on a single-cell gene expression dataset that characterizes the gut (Haber et al., 2017) after Salmonella or Heligmosomoides polygyrus (H. poly) infections, respectively. For this, we closely follow the benchmark as introduced in (Lotfollahi et al., 2019). The dataset contains eight different cell types in four conditions: control or healthy cells (n=3,240), H.Poly infection a after three days (H.Poly.Day3, n=2,121), H.poly infection after 10 days (H.Poly.Day10, n=2,711) and salmonella infection (n=1,770) (
[0085]
[0086] In order to show our model is able to handle multiple conditions, we performed another experiment with all three conditions included. We trained trVAE holding out each of the eight cells types in all perturbed conditions.
[0087] Applying trVAE on gut after infection show the effectiveness of MMD regularization on the first layer of the decoder and proposed architectures, allowing to model multiple drugs at the same time. The ability to analyze and predict multiple perturbations allow trVAE to be applied to experiments with many biological conditions. Specifically, recent advances in massive single-cell compounds screening (Srivatsan et al., 2020) provide great potential to exploit our model for further experimental design and the study of interaction effects among different drugs leading to advance in computational drug discovery.
[0088] Modelling multiple drugs may be carried out by applying trVAE, for example in a scenario involving two drugs:
[0089] Drug A which has condition s=[1, 0, 0], and Drug B with condition, s=[0, 1, 0] and finally control condition without any drugs with s=[0, 0, 1], wherein conditions s refer to condition labels in
[0090] After training the model, the combination of A+B can be tested by feeding a control cell to the encoder with {circumflex over (z)}=f(X,S=[0, 0, 1]).
[0091] Next, {circumflex over (z)} can be fed to MMD layer (Y) with s=[1, 1, 0] which means transferring this cell into a condition where both drug A and B have been applied: ŷ=g.sub.1({circumflex over (z)},s=[1, 1, 0]).
[0092] Finally, {circumflex over (x)} which is the prediction of drug combination of A+B can be achieved by: {circumflex over (x)}=g.sub.2(ŷ).
Stimulation Response
[0093] Similar to modeling infection response as above, we benchmark on another single-cell gene expression dataset consisting of 7,217 IFN-β stimulated and 6,359 control peripheral blood mononuclear cells (PBMCs) from eight different human Lupus patients (Kang et al., 2018). The stimulation with IFN-β induced dramatic changes in the transcriptional profiles of immune cells, which causes big shifts between control and stimulated cells (
[0094] trVAE accurately predicts mean (
[0095] As shown in
Summary
[0096] By arguing that the vanilla CVAE yields representations in the first layer following the bottleneck that vary strongly across categorical conditions, we introduced an MMD regularization that forces these representations to be similar across conditions. The resulting model (trVAE) outperforms existing modeling approaches on benchmark and real-world data sets.
[0097] Within the bottleneck layer, CVAEs already display a well-controlled behavior, and regularization does not improve performance. Further regularization at later layers might be beneficial but is numerically costly and unstable as representations become high-dimensional. However, we have not yet systematically investigated this and leave it for future studies.
[0098] Further future work will concern the application of trVAE on larger and more data, focusing on interaction effects among conditions. For this, an important application is the study of drug interaction effects, as previously noted by Amodio et al. (2018). Future conceptual investigations concern establishing connections to causal-inference-inspired models such as CEVAE (Louizos et al., 2017): faithful modeling of an interventional distribution might possibly be re-framed as successful pertubation effect prediction across domains.
REFERENCES
[0099] Matthew Amodio, David van Dijk, Ruth Montgomery, Guy Wolf, and Smita Krishnaswamy. Out-of-sample extrapolation with neuron editing. arXiv:1805.12198, 2018.
[0100] Matthew Amodio, David van Dijk, Krishnan Srinivasan, William S Chen, Hussein Mohsen, Kevin R. Moon, Allison Campbell, Yujiao Zhao, Xiaomei Wang, Manjunatha Venkataswamy, Anita Desai, V. Ravi, Priti Kumar, Ruth Montgomery, Guy Wolf, and Smita Krishnaswamy. Exploring single-cell data with deep multitasking neural networks. bioRxiv, 2019. doi: 10.1101/237065.
[0101] Martin Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein generative adversarial networks. In Doina Precup and Yee Whye Teh (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 214-223, International Convention Centre, Sydney, Australia, 6-11 Aug. 2017. PMLR.
[0102] Mikalaj Birikowski, Dougal J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying mmd gans. arXiv:1801.01401, 2018.
[0103] Daniel C. Castro, Jeremy Tan, Bernhard Kainz, Ender Konukoglu, and Ben Glocker. Morpho-MNIST: Quantitative assessment and diagnostics for representation learning. 2018.
[0104] Gintare Karolina Dziugaite, Daniel M. Roy, and Zoubin Ghahramani. Training generative neural networks via maximum mean discrepancy optimization. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, UAI′15, pp. 258-267, Arlington, Va., United States, 2015a. AUAI Press.
[0105] Gintare Karolina Dziugaite, Daniel M Roy, and Zoubin Ghahramani. Training generative neural networks via maximum mean discrepancy optimization. arXiv:1505.03906, 2015b.
[0106] Andrew Gelman and Jennifer Hill. Data analysis using regression and multilevel/hierarchical models. Cambridge university press, 2006.
[0107] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672-2680, 2014.
[0108] Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13:723-773, 2012.
[0109] Adam L. Haber, Moshe Biton, Noga Rogel, Rebecca H. Herbst, Karthik Shekhar, Christopher Smillie, Grace Burgin, Toni M. Delorey, Michael R. Howitt, Yarden Katz, Itay Tirosh, Semir Beyaz, Danielle Dionne, Mei Zhang, Raktima Raychowdhury, Wendy S. Garrett, Orit Rozenblatt-Rosen, Hai Ning Shi, Omer Yilmaz, Ramnik J. Xavier, and Aviv Regev. A single-cell survey of the small intestinal epithelium. Nature, 551:333,2017.
[0110] Hyun Min Kang, Meena Subramaniam, Sasha Targ, Michelle Nguyen, Lenka Maliskova, Elizabeth McCarthy, Eunice Wan, Simon Wong, Lauren Byrnes, Cristina M Lanata, et al. Multiplexed droplet single-cell rna-sequencing using natural genetic variation. Nature biotechnology, 36(1):89, 2018.
[0111] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv:1312.6114, 2013. Yujia Li, Kevin Swersky, and Rich Zemel. Generative moment matching networks. In International Conference on Machine Learning, pp. 1718-1727, 2015.
[0112] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild.
[0113] In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
[0114] Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I Jordan. Learning transferable features with deep adaptation networks. arXiv:1502.02791,2015.
[0115] Romain Lopez, Jeffrey Regier, Michael B. Cole, Michael I. Jordan, and Nir Yosef. Deep generative modeling for single-cell transcriptomics. Nature Methods, 15(12):1053-1058, 2018a.
[0116] Romain Lopez, Jeffrey Regier, Michael I Jordan, and Nir Yosef. Information constraints on auto-encoding variational bayes. In Advances in Neural Information Processing Systems, pp. 6114-6125, 2018b.
[0117] Mohammad Lotfollahi, F Alexander Wolf, and Fabian J Theis. scGen predicts single-cell perturbation responses. Nature methods, 16(8):715, 2019.
[0118] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. arXiv:1511.00830, 2015.
[0119] Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep latent-variable models. In Advances in Neural Information Processing Systems, pp. 6446-6456, 2017.
[0120] L. McInnes, J. Healy, and J. Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426, 2018.
[0121] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv:1411.1784, 2014. Yong Ren, Jun Zhu, Jialian Li, and Yucen Luo. Conditional generative moment-matching networks.
[0122] In Advances in Neural Information Processing Systems, pp. 2928-2936, 2016.
[0123] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, pp. 234-241, 2015.
[0124] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems 28, pp. 3483-3491. 2015.
[0125] Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. Deep domain confusion: Maximizing for domain invariance. arXiv:1412.3474, 2014.
[0126] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV), 2017.
[0127] Erasian et al., “Deep learning: new computational modelling techniques for genomics”; https://doi.orq/10.1038/s41576-019-0122-6
[0128] Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
[0129] Doersch (2016). Tutorial on variational autoencoders. arXiv: 1606.05908.
TABLE-US-00001 TABLE 1 Convolutional trVAE detailed architecture used for Morpho-MNIST dataset. Name Operation NoF/Kernel Dim. Dropout BN Activation Input input — (28, 28, 1) X X — — conditions — 2 X X — — FC-1 FC 128 X Leak ReLU conditions FC-2 FC 784 0.2 Leaky ReLU FC-1 FC-2_resh Reshape (28, 28, 1) X X X FC-2 Conv2D_1 Conv2D (4, 4, 64, 2) X X Leaky ReLU [FC-2_resh, input] Conv2D_2 Conv2D (4, 4, 64, 64) X X Leak
ReLU Conv2D_1 FC-3 FC 128 X Leak
ReLU Flatten(Conv2D_2) mean FC 50 X X Linear FC-3 var FC 50 X X Linear FC-3 z FC 50 X X Linear [mean, var] FC-4 FC 128 X Leak
ReLU conditions FC-5 FC 784 0.2 Leaky ReLU FC-4 FC-5_resh Reshape (28, 28, 1) X X X FC-5 MMD FC 128 X Leak
ReLU [z, FC-5_resh] FC-6 FC 256 X Leak
ReLU MMD FC-7_resh Reshape (2, 2, 64) X X X FC-6 Conv_transp_1 Conv2D Transpose (4, 4, 128, 64) X Leak
ReLU FC-7_resh Conv_transp_2 Conv2D Transpose (4, 4, 64, 64) X Leak
ReLU UpSampling2D(Conv_tr Conv_transp_3 Conv2D Transpose (4, 4, 64, 64) X Leak
ReLU Conv_transp_2 Conv_transp_4 Conv2D Transpose (4, 4, 2, 64) X Leak
ReLU UpSampling2D(Conv_tr output Conv2D Transpose (4, 4, 1, 2) X ReL
UpSampling2D(Conv_tr Optimizer Adam Learning Rate 0.001 Leaky ReLU slope 0.2 Batch Size 1024 # of Epochs 5000 α 0.001 β 1000
indicates data missing or illegible when filed
TABLE-US-00002 TABLE 2 U-Net trVAE detailed architecture used for CelebA dataset. Name Operation NoF/Kernel Dim. Dropout BN Activation Input input — (64, 64, 3) X X — — conditions — 2 X X — — FC-1 FC 128 X ReLU conditions FC-2 FC 1024 0.2
ReLU FC-1 FC-2_reshaped Reshape (64, 64, 1) X X X FC-2 Conv_1 Conv2D (3, 3, 64, 4) X X ReLU [FC-2_reshaped, input] Conv_2 Conv2D (3, 3, 64, 64) X X ReLU Conv_1 Pool_1 Pooling2D X X X X Conv_2 Conv_3 Conv2D (3, 3, 128, 64) X X ReLU Pool_1 Conv_4 Conv2D (3, 3, 128, 128) X X ReLU Conv_3 Pool_2 Pooling2D X X X X Conv_4 Conv_5 Conv2D (3, 3, 256, 128) X X ReLU Pool_2 Conv_6 Conv2D (3, 3, 256, 256) X X ReLU Conv_5 Conv_7 Conv2D (3, 3, 256, 256) X X ReLU Conv_6 Pool_3 Pooling2D X X X X Conv_7 Conv_8 Conv2D (3, 3, 512, 256) X X ReLU Pool_3 Conv_9 Conv2D (3, 3, 512, 512) X X ReLU Conv_8 Conv_10 Conv2D (3, 3, 512, 512) X X ReLU Conv_9 Pool_4 Pooling2D X X X X Conv_10 Conv_11 Conv2D (3, 3, 512, 256) X X ReLU Pool_4 Conv_12 Conv2D (3, 3, 512, 512) X X ReLU Conv_11 Conv_13 Conv2D (3, 3, 512, 512) X X ReLU Conv_12 Pool_4 Pooling2D X X X X Conv_13 flat Flatten X X X X Pool_4 FC-3 FC 1024 X X ReLU flat FC-4 FC 256 0.2 X ReLU FC-3 mean FC 60 X X Linear FC-4 var FC 60 X X Linear FC-4 z-sample FC 60 X X Linear [mean, var] FC-5 FC 128 X
ReLU conditions MMD FC 256 X
ReLU [z-sample, FC-5] FC-6 FC 1024 X X ReLU MMD FC-7 FC 4096 X X ReLU FC-6 FC-7_reshaped Reshape X X X FC-7 Conv_transp_1 Conv2D Transpose (3, 3, 512, 512) X X ReLU FC-7_reshaped Conv_transp_2 Conv2D Transpose (3, 3, 512, 512) X X ReLU Conv_transp_1 Conv_transp_3 Conv2D Transpose (3, 3, 512, 512) X X ReLU Conv_transp_2 up_sample_1 UpSampling2D X X X X Conv_transp_3 Conv_transp_4 Conv2D Transpose (3, 3, 512, 512) X X ReLU up_sample_1 Conv_transp_5 Conv2D Transpose (3, 3, 512, 512) X X ReLU Conv_transp_4 Conv_transp_6 Conv2D Transpose (3, 3, 512, 512) X X ReLU Conv_transp_5 up_sample_2 UpSampling2D X X X X Conv_transp_6 Conv_transp_7 Conv2D Transpose (3, 3, 128, 256) X X ReLU up_sample_2 Conv_transp_8 Conv2D Transpose (3, 3, 128, 128) X X ReLU Conv_transp_7 up_sample_3 UpSampling2D X X X X Conv_transp_8 Conv_transp_9 Conv2D Transpose (3, 3, 64, 128) X X ReLU up_sample_3 Conv_transp_10 Conv2D Transpose (3, 3, 64, 64) X X ReLU Conv_transp_9 output Conv2D Transpose (1, 1, 3, 64) X X ReLU Conv_transp_10 Optimizer Adam Learning Rate 0.001 Leaky ReLU slope 0.2 Batch Size 1024 # of Epochs 5000 α 0.001 β 1000
indicates data missing or illegible when filed
TABLE-US-00003 TABLE 3 trVAE detailed architecture. We used the same architecture for all the examples in the paper. The input_dim parameter for each dataset is: IFN-β (2,000), H.poly (1,000). Name Operation NoF/Kernel Dim. Dropout BN Activation Input input — input_dim X X — — conditions — n_conditions X — — FC-1 FC 800 0.2
Leaky ReLU [input, conditions] FC-2 FC 800 0.2 ✓ Leaky ReLU FC-1 FC-3 FC 128 0.2 ✓ Leaky ReLU FC-2 mean FC 50 X X Linear FC-3 var FC 50 X X Linear FC-3 z-sample FC 50 X
Linear [mean, var] MMD FC 128 0.2
Leaky ReLU [z-sample, conditions] FC-4 FC 800 0.2 ✓ Leaky ReLU MMD FC-5 FC 800 0.2 ✓ Leaky ReLU FC-3 output FC input_dim X X ReLU FC-4 Optimizer Adam Learning Rate 0.001 Leaky ReLU slope 0.2 Batch Size 512 # of Epochs 5000 α 0.00001 β 100 η 100
indicates data missing or illegible when filed
TABLE-US-00004 TABLE 4 scGen detailed architecture. Name Operation NoF/Kernel Dim. Dropout BN Activation Input input — input_dim X — — FC-1 FC 800 0.2
Leaky ReLU input FC-2 FC 800 0.2 ✓ Leaky ReLU FC-1 FC-3 FC 128 0.2 ✓ Leaky ReLU FC-2 mean FC 100 X X Linear FC-3 var FC 100 X X Linear FC-3 z FC 100 X
Linear [mean, var] MMD FC 128 0.2
Leaky ReLU z FC-4 FC 800 0.2 ✓ Leaky ReLU MMD FC-5 FC 800 0.2 ✓ Leaky ReLU FC-3 output FC input_dim X X ReLU FC-4 Optimizer Adam Learning Rate 0.001 Leaky ReLU slope 0.2 Batch Size 32 # of Epochs 300 α 0.00050 β 100 η 100
indicates data missing or illegible when filed
TABLE-US-00005 TABLE 5 CVAE detailed architecture. Name Operation NoF/Kernel Dim. Dropout BN Activation Input input — input_dim X X — — conditions — 1 X — — FC-1 FC 800 0.2
Leaky ReLU [input, conditions] FC-2 FC 800 0.2 ✓ Leaky ReLU FC-1 FC-3 FC 128 0.2 ✓ Leaky ReLU FC-2 mean FC 50 X X Linear FC-3 var FC 50 X X Linear FC-3 z-sample FC 50 X
Linear [mean, var] MMD FC 128 0.2
Leaky ReLU [z-sample, conditions] FC-4 FC 800 0.2 ✓ Leaky ReLU MMD FC-5 FC 800 0.2 ✓ Leaky ReLU FC-3 output FC input_dim X X ReLU FC-4 Optimizer Adam Learning Rate 0.001 Leaky ReLU slope 0.2 Batch Size 512 # of Epochs 300 α 0.001
indicates data missing or illegible when filed
TABLE-US-00006 TABLE 6 MMD-CVAE detailed architecture. Name Operation NoF/Kernel Dim. Dropout BN Activation Input input — input_dim X X — — conditions — 1 X — — FC-1 FC 800 0.2
Leaky ReLU [input, conditions] FC-2 FC 800 0.2 ✓ Leaky ReLU FC-1 FC-3 FC 128 0.2 ✓ Leaky ReLU FC-2 mean FC 50 X X Linear FC-3 var FC 50 X X Linear FC-3 z-sample FC 50 X
Linear [mean, var] MMD FC 128 0.2
Leaky ReLU [z-sample, conditions] FC-4 FC 800 0.2 ✓ Leaky ReLU MMD FC-5 FC 800 0.2 ✓ Leaky ReLU FC-3 output FC input_dim X X ReLU FC-4 Optimizer Adam Learning Rate 0.001 Leaky ReLU slope 0.2 Batch Size 512 # of Epochs 500 α 0.001 β 1
indicates data missing or illegible when filed
TABLE-US-00007 TABLE 7 Style transfer GAN detailed architecture. Name Operation NoF/Kernel Dim. Dropout BN Activation Input input — input_dim X X — — FC-1 FC 700 0.5 ✓ Leaky ReLU input FC-2 FC 100 0.5 ✓ Leaky ReLU FC-1 FC-3 FC 50 0.5 ✓ Leaky ReLU FC-2 FC-4 FC 100 0.5 ✓ Leaky ReLU FC-3 FC-5 FC 700 0.5 ✓ Leaky ReLU FC-4 generator_out FC 6,998 X ✓ ReLU FC-5 FC-6 FC 700 0.5 ✓ Leaky ReLU generator_out FC-7 FC 100 0.5 ✓ Leaky ReLU FC-6 discriminator_out FC 1 X X Sigmoid FC-7 Generator Optimizer Adam Discriminator Optimizer Adam Optimizer Adam Learning Rate 0.001 Leaky ReLU slope 0.2 # of Epochs 1000
TABLE-US-00008 TABLE 8 scVI detailed architecture. Name Operation NoF/Kernel Dim. Dropout BN Activation Input input — input_dim X X — — conditions — 1 X X — — FC-1 FC 128 0.2 ✓ ReLU input mean FC 10 X X Linear FC-1 var FC 10 X X Linear FC-1 z FC 10 X X Linear [mean, var] FC-2 FC 128 0.2 ✓ ReLU [z, conditions] output FC input_dim X X ReLU FC-2 Optimizer Adam Learning Rate 0.001 Batch Size 128 # of Epochs 1000 a 0.001
TABLE-US-00009 TABLE 9 SAUCIE detailed architecture Name Operation NoF/Kernel Dim. Dropout BN Activation Input input — input_dim X X — — conditions — 1 X X — — FC-1 FC 512 X ✓ Leaky ReLU [input, conditions] FC-2 FC 256 X X Leaky ReLU FC-1 FC-3 FC 128 X X Leaky ReLU FC-2 FC-4 FC 20 X X Leaky ReLU FC-3 FC-5 FC 128 X X Leaky ReLU FC-4 FC-6 FC 256 X X Leaky ReLU FC-5 FC-7 FC 512 X X Leaky ReLU FC-6 output FC input_dim X X ReLU FC-4 Optimizer Adam Learning Rate 0.001 Leaky ReLU slope 0.2 Batch Size 256 # of Epochs 1000