AUGMENTATION OF MULTIMODAL TIME SERIES DATA FOR TRAINING MACHINE-LEARNING MODELS

20230045548 · 2023-02-09

    Inventors

    Cpc classification

    International classification

    Abstract

    The present invention relates to training predictive data-driven model for predicting an industrial time dependent process. A data driven generative model is introduced for modelling and generating complex sequential data comprising multiple modalities, by learning a joint time-dependent representation of the different modalities. The model may be configured to handle any combination of missing modalities, which enables conditional generation based on known modalities, providing a high degree of control over the properties of the generated sequences.

    Claims

    1. A device for generating synthetic samples for expanding a training dataset of a predictive data-driven model for predicting an industrial time dependent process, comprising: an input unit; a processing unit; and an output unit; wherein the input unit is configured to receive historical data of at least one condition parameter indicative of a condition under which the industrial time dependent process took place and at least one KPI provided for quantifying the industrial time dependent process; wherein the processing unit is configured to apply a data-driven generative model to derive synthetic samples of the at least one condition parameter and the at least one KPI from the historical data, wherein the data-driven generative model is parametrized or trained based on a training dataset comprising real-data examples of the at least one condition parameter and the at least one KPI; and wherein the output unit is configured to provide the synthetic samples to the training dataset of the predictive data-driven model.

    2. The device according to claim 1, wherein the synthetic samples comprise a synthetic sequence representative of a time series of the at least one condition parameter and the at least one KPI.

    3. The device according to claim 2, wherein the data-driven generative model comprises an RNN-MVAE model with the at least one condition parameter and the at least one KPI as input and a synthetic sequence of the at least one condition parameter and the at least one KPI as output; wherein the RNN-MVAE model comprises a multimodal variational autoencoder, MVAE; wherein the MVAE comprises two recurrent neural networks, RNNs, that act as an encoder-decoder pair for the at least one condition parameter; and wherein the MVAE comprises two RNNs that act as an encoder-decoder pair for the at least one KPI.

    4. The device according to claim 2, wherein the data-driven generative model comprises a Seq-MVAE model with the at least one condition parameter and the at least one KPI as an initial input and a synthetic sample of the at least one condition parameter and the at least one KPI as output; wherein the Seq-MVAE model comprises a multimodal variational autoencoder, MVAE; wherein the MVAE comprises two feed forward neural networks, FFNNs, that act as an encoder-decoder pair for the at least one condition parameter; wherein the MVAE comprises two feed forward neural networks, FFNNs, that act as an encoder-decoder pair for the at least one KPI; wherein each decoder and encoder are coupled to a respective recurrent neural network, RNN; and wherein for each point in time the output of the Seq-MVAE is aggregated into a vector representative of the synthetic sequence.

    5. The device according to claim 3, wherein the RNN comprises at least one of: an echo state network, ESN; a gated recurrent unit, GRU, network an ordinary differential equation, ODE, network; and a long short-term memory, LSTM, network.

    6. An apparatus for predicting an industrial time dependent process, comprising: an input unit; a processing unit; and an output unit; wherein the input unit is configured to receive currently measured data indicative of a current condition under which the industrial time dependent process currently takes place, wherein at least one key performance indicator, KPI, is provided for quantifying the industrial time dependent process; wherein the input unit is configured to receive at least one expected condition parameter indicative of a future condition under which the industrial time dependent process will take place within a prediction horizon; wherein the processing unit is configured to apply a predictive data-driven model to an input dataset comprising the currently measured data and the at least one expected condition parameter to estimate a future value of the at least one KPI within the prediction horizon, wherein the predictive data-driven model is parametrized or trained according to a training dataset comprising historical data of the at least one condition parameter and the at least one KPI and synthetic samples of the at least one condition parameter and the at least one KPI; and wherein the output unit is configured to provide a prediction of the future value of at least one KPI within the prediction horizon which is usable for monitoring and/or controlling the industrial time dependent process.

    7. An apparatus for predicting an industrial time dependent process, comprising: an input unit; a processing unit; and an output unit; wherein the input unit is configured to receive previously measured data indicative of a past condition under which the industrial time dependent process took place, wherein at least one key performance indicator, KPI, is provided for quantifying the industrial time dependent process; wherein the input unit is configured to receive at least one condition parameter indicative of a current condition under which the industrial time dependent process currently takes place; wherein the processing unit is configured to apply a predictive data-driven model to an input dataset comprising the previously measured data and the at least one condition parameter to estimate a current value of the at least one KPI, wherein the predictive data-driven model is parametrized or trained according to a training dataset comprising historical data of the at least one condition parameter the at least one KPI and synthetic samples of the at least one condition parameter and the at least one KPI; and wherein the output unit is configured to provide a prediction of the current value of at least one KPI which is usable for monitoring and/or controlling the industrial time dependent process.

    8. A method for generating synthetic samples for expanding a training dataset of a predictive data-driven model for predicting an industrial time dependent process, comprising: a) receiving, via an input channel, historical data of at least one condition parameter indicative of a condition under which the industrial time dependent process took place and at least one KPI provided for quantifying the industrial time dependent process; b) applying, via a processor, a data-driven generative model to generate synthetic samples of the at least one condition parameter and the at least one KPI from the historical data, wherein the data-driven generative model is parametrized or trained based on a training dataset comprising real-data examples of the at least one condition parameter and the at least one KPI; and c) providing, via an output channel, the synthetic samples to the training dataset of the predictive data-driven model.

    9. The method according to claim 8, wherein the synthetic samples comprise a synthetic sequence representative of a time series of the at least one condition parameter and the at least one KPI.

    10. The method according to claim 9, wherein the data-driven generative model comprises an RNN-MVAE model with the at least one condition parameter and the at least one KPI as input and a synthetic sequence of the at least one condition parameter and the at least one KPI as output; wherein the RNN-MVAE model comprises a multimodal variational autoencoder, MVAE; wherein the MVAE comprises two recurrent neural networks, RNNs, that act as an encoder-decoder pair for the at least one condition parameter; and wherein the MVAE comprises two RNNs that act as an encoder-decoder pair for the at least one KPI.

    11. The method according to claim 9, wherein the data-driven generative model comprises a Seq-MVAE model with the at least one condition parameter and the at least one KPI as an initial input and a synthetic sample of the at least one condition parameter and the at least one KPI as output; wherein the Seq-MVAE model comprises a multimodal variational autoencoder, MVAE; wherein the MVAE comprises two feed forward neural networks, FFNNs, that act as an encoder-decoder pair for the at least one condition parameter; wherein the MVAE comprises two feed forward neural networks, FFNNs, that act as an encoder-decoder pair for the at least one KPI; wherein each decoder and encoder is coupled to a respective recurrent neural network, RNN; and wherein for each point in time the output of the Seq-MVAE is aggregated into a vector representative of the synthetic sequence.

    12. A method for predicting an industrial time dependent process, comprising: a1) receiving, via an input channel, currently measured data indicative of a current condition under which the industrial time dependent process currently takes place, wherein at least one key performance indicator, KPI, is provided for quantifying the industrial time dependent process; b1) receiving, via the input channel, at least one expected condition parameter indicative of a future condition under which the industrial time dependent process will take place within a prediction horizon; c1) applying, via a processor, a predictive data-driven model to an input dataset comprising the currently measured data and the at least one expected condition parameter to estimate a future value of the at least one KPI within the prediction horizon, wherein the predictive data-driven model is parametrized or trained according to a training dataset comprising historical data of the at least one condition parameter and the at least one KPI and synthetic samples of the at least one condition parameter and the at least one KPI; and d1) providing, via an output channel, a prediction of the future value of at least one KPI within the prediction horizon which is usable for monitoring and/or controlling the industrial time dependent process.

    13. A method for predicting an industrial time dependent process, comprising: a2) receiving, via an input channel, previously measured data indicative of a past condition under which the industrial time dependent process took place, wherein at least one key performance indicator, KPI, is provided for quantifying the industrial time dependent process; b2) receiving, via the input channel, at least one condition parameter indicative of a current condition under which the industrial time dependent process currently takes place; c2) applying, via a processor, a predictive data-driven model to an input dataset comprising the previously measured data and the at least one condition parameter to estimate a current value of the at least one KPI, wherein the predictive data-driven model is parametrized or trained according to a training dataset comprising historical data of the at least one condition parameter the at least one KPI and synthetic samples of the at least one condition parameter and the at least one KPI; and d2) providing, via an output channel, a prediction of the current value of at least one KPI which is usable for monitoring and/or controlling the industrial time dependent process.

    14. A computer program product comprising a computer program with program code for performing a method according to claim 8.

    15. (canceled)

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0077] These and other aspects of the invention will be apparent from and elucidated further with reference to the embodiments described by way of examples in the following description and with reference to the accompanying drawings, in which

    [0078] FIG. 1 schematically shows a device for generating synthetic samples for expanding a training dataset of a predictive data-driven model for predicting an industrial time dependent process according to some embodiments of the present disclosure.

    [0079] FIGS. 2A to 2C show an example of a MVAE model, showing how all modalities are generated under different combinations of missing modalities.

    [0080] FIG. 3 illustrates a visualization of the Seq-MVAE architecture for a scenario commonly encountered in industrial dynamic processes.

    [0081] FIG. 4 shows a comparison of the forecasting performance of the RNN-MVAE, the Seq-MVAE, and the LSTM forecasting models.

    [0082] FIG. 5 shows the KPIs predicted by the LSTM model from the condition parameters (PCs) generated by the RNN-MVAE model, and comparing them to the corresponding generated KPIs.

    [0083] FIG. 6 shows the KPIs predicted by the LSTM model from the PCs generated by the Seq-MVAE model, and comparing them to the corresponding generated KPIs.

    [0084] FIG. 7 shows a comparison of the forecasting performance of the Seq-MVAE model versus that of an LSTM forecasting model for the real-world dataset.

    [0085] FIG. 8 shows the performance of a predictive LSTM trained on a training set augmented by different amount of samples generated using different types of conditioning.

    [0086] FIG. 9 schematically shows an apparatus for predicting an industrial time dependent process according to some embodiments of the present disclosure.

    [0087] FIG. 10 schematically shows an apparatus for predicting an industrial time dependent process according to some other embodiments of the present disclosure.

    [0088] FIG. 11 shows a flow chart illustrating a method for generating synthetic samples for expanding a training dataset of a predictive data-driven model for predicting an industrial time dependent process according to some embodiments of the present disclosure.

    [0089] FIG. 12 shows a flow chart of a method for predicting an industrial time dependent process according to some embodiments of the present disclosure.

    [0090] FIG. 13 shows a flow chart of a method for predicting an industrial time dependent process according to some other embodiments of the present disclosure.

    [0091] It should be noted that the figures are purely diagrammatic and not drawn to scale. In the figures, elements which correspond to elements already described may have the same reference numerals. Examples, embodiments or optional features, whether indicated as non-limiting or not, are not to be understood as limiting the invention as claimed.

    DETAILED DESCRIPTION OF EMBODIMENTS

    [0092] Machine learning is more and more applied in industrial applications. Machine learning needs a high number of training data, covering enough variations and also a sufficient large set of test data to test the quality of the trained model. This may be a major challenge in industrial application, where the data is generated during production runs. This may limit the availability of gathering more data. Dependent on the length of a production cycle (e.g., months, years), gathering the training data may become more challenging.

    [0093] For example, on forecasting of process behavior in the chemical industry, some problems may limit the performance of non-linear machine learning models on the real-world dataset. The first problem may be the overall small size of the training set for the real-world data, while the second problem may be the slight difference in dynamics between the training and test sets caused by changes in the distribution of the process and/or storage condition data.

    [0094] The changes in process data distribution may be caused for example by: catalyst bed exchange in the reactor, changes in plant equipment, changes of feed concentration, etc.

    [0095] The problem of the differences between the training and test datasets may be difficult to overcome, since learning and modelling patterns and/or dynamics different from the training set is outside the scope of machine learning, and thus impossible to achieve for any machine-learning model without any further information about the test data.

    [0096] For this reason, data augmentation via a data driven generative model is proposed to reduce the negative effects of the small size of the training set and/or the slight difference in dynamics between the training and test sets.

    [0097] FIG. 1 schematically illustrates an example of a device 10 for generating synthetic samples for expanding a training dataset of a predictive data-driven model for predicting an industrial time dependent process. In some examples, prediction of an industrial time dependent process may be used to identify whether a chemical substance, a component, an equipment, and/or a system is deviating from or will deviate from its typical behavior in the future. In some examples, prediction of an industrial time dependent process may be used to identify the off-the-shelf performance of a chemical substance, a component, an equipment, and/or a system.

    [0098] The device 10 comprises an input unit 12, a processing unit 14, and an output unit 16. The input unit 12, the processing unit 14, and the output unit 16 may be a software, or hardware dedicated to running said software, for delivering the corresponding functionality or service.

    [0099] Each unit may be part of, or include an ASIC, an electronic circuit, a processor (shared, dedicated, or group) and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logical circuit, and/or other suitable components that provide the described functionality.

    [0100] The input unit 12 is configured to receive historical data of at least one condition parameter indicative of a condition under which the industrial time dependent process took place and at least one KPI provided for quantifying the industrial time dependent process. Examples of the condition may include an operating condition and a storage condition. The condition parameter may include operating parameters and/or storage parameters.

    [0101] The at least one KPI may be selected from parameters comprising: a parameter contained in a set of measured process and/or storage condition data and/or a derived parameter representing a function of one or more parameters contained in a set of the measured process and/or storage condition data and/or storage condition data. The at least one KPI may be defined by a user (e.g. process operator) or by a statistical model e.g. an anomaly score measuring the distance to the “healthy” state of a chemical substance, component, equipment, and/or system in a multivariate space of relevant process and/or storage condition data, such as the Hotelling T.sup.2 score or the DModX distance derived from principal component analysis (PCA). Here, the healthy state may refer to the bulk of states that are typically observed during periods in the historic process and/or storage condition data that were labelled as “usual”/“unproblematic”/“good” by an expert for the production process.

    [0102] The historical data may comprise data collected from the similar or same types of chemical substance, component, equipment, and/or system.

    [0103] The processing unit 14 is configured to apply a data-driven generative model to derive synthetic samples of the at least one condition parameter and the at least one KPI from the historical data. The data-driven generative model is parametrized or trained based on a training dataset comprising real-data examples of the at least one condition parameter and the at least one KPI.

    [0104] An example of the data-driven generative model is a latent variable generative model, e.g. multimodal variational autoencoders (MVAEs). A latent representation, i.e. compressed feature vector, is generated with the help of neural networks suitable for handling time series, such as RNNs. Then, the latent representation is used to generate synthetic data by use of the data-driven generative model.

    [0105] In an example, the data-driven generative model may comprise an RNN-MVAE model with the at least one condition parameter and the at least one KPI as input and a synthetic sequence of the at least one condition parameter and the at least one KPI as output. The RNN-MVAE model may comprise a multimodal variational autoencoder (MVAE). The MVAE may comprise two recurrent neural networks (RNNs) that act as an encoder-decoder pair for the at least one condition parameter. The MVAE may comprise two RNNs that act as an encoder-decoder pair for the at least one KPI. The RNNs may comprise at least one of: an echo state network (ESN), a gated recurrent unit (GRU) network, an ordinary differential equation (ODE) network, and a long short-term memory (LSTM) network.

    [0106] In an example, the data-driven generative model may comprise a Seq-MVAE model with the at least one condition parameter and the at least one KPI as an initial input and a synthetic sample of the at least one condition parameter and the at least one KPI as output. The Seq-MVAE model comprises a multimodal variational autoencoder (MVAE). The MVAE comprises two feed forward neural networks (FFNNs) that act as an encoder-decoder pair for the at least one condition parameter. The MVAE comprises two feed forward neural networks (FFNNs) that act as an encoder-decoder pair for the at least one KPI. Each decoder and encoder are coupled to a respective recurrent neural network (RNN). For each point in time, the output of the Seq-MVAE is aggregated into a vector representative of the synthetic sequence. The RNNs may comprise at least one of: an echo state network (ESN), a gated recurrent unit (GRU) network, an ordinary differential equation (ODE) network, and a long short-term memory (LSTM) network.

    [0107] The output unit 16 is configured to provide the synthetic samples to the training dataset of the predictive data-driven model.

    [0108] In the following, we focus on the scenario of industrial aging process (IAP) forecasting, in particular, where the temporal evolution of a target KPI needs to be predicted based on a sequence of known process conditions represented by condition parameters. We draw upon insights from a series of variational autoencoders capable of modelling and generating data consisting of multiple modalities, to introduce a model capable of learning and generating truly multimodal time series under any combination of missing values. We evaluate the effectiveness of our generative model using two IAP datasets. The first one is an artificial dataset, where the differential equation relating the process conditions to the KPI is known. This dataset provides the conditions to unambiguously evaluate how well our generative model captures the underlying process dynamics, by directly comparing the KPIs generated by our novel generative model to those obtained by applying the underlying differential equation. The second dataset is a real-world dataset with a small number of sequences, which also exhibits a slight shift in the dynamics between the training and test sets. Using this dataset once again obtain way to unambiguously evaluate the effectiveness of our generative model, by observing how the predictive performance of a simple predictive model on the test set changes when the training set is augmented by different amounts of generated sequences, which have also been conditioned using different modalities.

    1. Background

    [0109] Variational autoencoders are generative models, which use the variational inference scheme to approximate the marginal likelihood of the data, which is intractable. To bypass this problem, the evidence lower bound (ELBO) is minimized instead:


    ELBO(x)=custom-character.sub.qϕ(z|x)[λ log p.sub.θ(x|z)]−βKL(q.sub.ϕ(z|x)∥p(z))  (1)

    [0110] Here KL(p, q) is the KL-divergence between two distributions, the parameters λ and β are balancing terms, and the distributions p and q are parametrized as encoding and decoding neural networks, allowing us to minimize the ELBO using gradient descent. In the context of a variational autoencoder, the first term in the ELBO (eq. 1) represents the reconstruction error, while the second is used for regularization of the approximate posterior, ensuring that it is well behaved and enabling efficient sampling based on the prior p(z). The framework of multimodal VAEs (MVAEs) was developed in a series of models which attempt to learn a joint probability distribution over multiple modalities.

    [0111] First, we define multimodal data as set X of N different modalities, x1, x2, . . . , x.sub.N. The central assumption is that given a common latent variable z, the individual modalities are conditionally independent, meaning that it is possible to factorize the joint distribution as:


    p.sub.θ(x.sub.1, . . . ,x.sub.N,z)=p.sub.θ(x.sub.1|z) . . . p.sub.θ(x.sub.N|z)p(z).

    [0112] Having the joint distribution in this form means that we can ignore missing modalities when evaluating the marginal likelihood, making it possible to calculate the ELBO based on only the set of currently present modalities, given by X={x.sub.i|i-th modality is present}:

    [00001] ELBO ( X ) = E q ϕ ( z .Math. "\[LeftBracketingBar]" X ) .Math. .Math. x i X λ i log p θ ( x i .Math. "\[LeftBracketingBar]" z ) .Math. - β KL ( q ϕ ( z .Math. "\[LeftBracketingBar]" X ) .Math. p ( z ) ] ( 2 )

    [0113] To handle the missing modalities, a naive implementation would have to define 2.sup.N inference networks, one for each combination of missing and present modalities. This problem can be avoided thanks to the assumption of conditional independence of the modalities, which allows for the following approximation of the joint posterior:

    [00002] p ( z .Math. "\[LeftBracketingBar]" x 1 , .Math. , x N ) = p ( z ) .Math. i = i N q ~ ( z .Math. "\[LeftBracketingBar]" x i ) .

    [0114] This gives us the product of experts (PoE), including a prior expert, which as usual is taken to be the standard normal distribution. The PoE is used to combine the distributions of the N individual modalities into an approximate joint posterior. Given that the distributions of the individual modalities are all Gaussian, we can replace 2.sup.N multimodal inference networks required by an efficient computation based on the distributions given by N uni-modal networks. For example, FIGS. 2A to 2C illustrate an example of a MVAE model, showing how all modalities are generated under different combinations of missing modalities. Finally, a latent representation is sampled from the joint distribution and is passed to the N independent decoder networks, who then generate their designated modality.

    [0115] 2. Sequential Multimodal Variational Autoencoder

    [0116] The MVAE model presented in the previous section is only capable of generating single samples and needs to be adapted in order to generate sequential data. First, we present a straightforward way of generating sequential data with the MVAE model, by using RNNs as encoders and decoders, and argue why this approach is suboptimal. Next, we will introduce our Seq-MVAE model, which is an extension of the MVAE capable of generating multimodal time series one time-point at a time.

    [0117] 2.1. Using RNNs as Encoders and Decoders

    [0118] One possible extension of the MVAE towards sequential data would be to use RNNs as the encoder and decoder networks for the sequential modalities, producing a single joint posterior for the entire sequences of all modalities. This architecture is analogous to the one used in (Wu and Goodman, 2018), with architectures being used in other works dealing with the generation of sequences. We will call this model the RNN-MVAE.

    [0119] 2.1.1. RNN-MVAE Architecture

    [0120] We start off by using an RNN followed by fully connected layers to parametrize the variational approximate posterior of each sequential modality as follows:


    μ.sub.i,σ.sub.i=f.sub.i,ϕ(RNN.sub.i,ϕ(x.sub.i(≤T))),q.sub.ϕ(z.sub.ix.sub.i(≤T))=custom-character(μ.sub.i,σ.sub.i).

    [0121] This means that we obtain one approximate posterior representing an entire sequence of a given modality, so after the modality-specific distributions are combined using the PoE, the result is one joint posterior distribution for an entire multimodal sequence.

    [00003] q ϕ ( z .Math. "\[LeftBracketingBar]" X ) = N ( 0 , I ) .Math. x i X N q ϕ ( z i .Math. "\[LeftBracketingBar]" x i ( T ) ) .

    [0122] Finally the individual decoder networks, which are also RNNs, use the latent state z sampled from the joint posterior as an initial conditioning, either by including it as an initial hidden state or by including it with the input at every time step, after which they attempt to reconstruct the corresponding modalities:


    {circumflex over (x)}.sub.i(t)=RNN.sub.i,θ(z;{circumflex over (x)}.sub.i(<t)),z˜q(z|X).

    [0123] 2.2.2. Models and Training Hyperparameter Details

    [0124] For the Seq-MVAE model the encoders and decoders consist of fully connected networks with two hidden layers with a dimensionality of 128, while the RNNs in our case were LSTMs with one layer also with a dimensionality of 128. The size of the latent representations was 64, and we also shared weights across the network by using feature extractor layers of size 128 for each modality and the latent representation.

    [0125] For the RNN-MVAE we used LSTMs with a dimensionality of 512 for the encoders, decoders and for the latent representation, in order to allow for more information to be encoded into the hidden and latent states, which in this case would need to describe the entire dynamics of the sequences.

    [0126] Finally as a forecasting model for predicting the KPI from the given PCs we use an LSTM with a dimensionality of 512 and 128 for the artificial and real-world datasets, respectively. The generative models are trained with Adam with 10.sup.−3 learning rate, reducing it by a factor of 0.2 on plateau, and using early stopping based on the validation set. A batch size of 128 was used for the artificial dataset and 32 for the real-world dataset. The forecasting models are trained with stochastic gradient descent with Nesterov momentum, once again with a learning rate of 10.sup.−3 and a momentum of 0.95, with batch sizes of 32 and 16 for the artificial and real-world datasets respectively. The learning rate adaptation and early stopping were employed in the same way as with the generative models.

    [0127] The RNN-MVAE may have certain limitation, as the dynamics of many time series may be too complex to capture in just a single latent variable of the RNN-MVAE, and the sampling from the posterior as the single source of variability will likely make the RNN-MVAE struggle to recreate the temporal variability in the original time series, especially with longer sequences.

    [0128] 2.2 Generating Individual Multimodal Sequences in a Time Dependent Manner

    [0129] In order to make the generative model capable of reproducing the time dynamics of any multimodal sequence more accurately, The Seq-MVAE model, uses the basic MVAE architecture to generate individual multimodal time samples one at a time, while using RNNs to maintain the temporal context and dependence across samples generated at different time points within each sequence. A visualization of the overall architecture for a scenario commonly encountered in industrial dynamic processes is given in FIG. 3, which is based on a simple example with related time series given as separate modalities, one of which is univariate and one of which is multivariate.

    [0130] To keep the notation more uniform we assume that all modalities are time series of length T, however due to the independent handling of modalities it is easy to see that each modality can be a sequence of any length, which also includes non-sequential data as a special case. We first describe how the time dependent joint posterior is obtained. For each modality, given the current time sample of the modality x.sub.i(t) along with the hidden state from the previous time point h.sub.i(t−1), which is produced by the RNN used to maintain the time context for the given modality, the modality-specific, time dependent posteriors are obtained as follows:


    μ.sub.i(t),σ.sub.i(t)=f.sub.i,ϕ(x.sub.i(t),h.sub.i(t−1))q.sub.ϕ(z.sub.i(t)|x.sub.i(≤t),z(<t))=custom-character(μ.sub.i(t),σ.sub.i(t)).

    [0131] Instead of using a standard normal prior expert, in order to encode the temporal context for each modality we use a neural network dependent on the previous hidden state to obtain the prior mean and variance:


    μ.sub.prior(t),σ.sub.prior(t)=f.sub.ϕ.sup.prior(h.sub.1(t−1), . . . ,h.sub.N(t−1)).

    [0132] Finally, the joint posterior for the current time point is obtained by using the PoE to combine the approximate posteriors of the individual modalities:

    [00004] q ϕ ( z ( t ) .Math. "\[LeftBracketingBar]" X ( t ) , z ( < t ) ) ) = 𝒩 ( μ prior ( t ) , σ prior ( t ) ) .Math. x i X ( t ) N q ϕ ( z i ( t ) .Math. "\[LeftBracketingBar]" x i ( t ) , z ( < t ) ) .

    [0133] The decoding process also needs to be modified in order to ensure that the generated sequences maintain the proper time dynamics. Once again we use N decoding networks f.sub.i,θ, one for each modality, which use the latent representation z.sub.i(t) sampled from the joint posterior along with the hidden state h.sub.i(t−1) to generate a new time sample:


    {circumflex over (x)}.sub.i(t)=f.sub.i,θ(z(t),h.sub.i(t−1)),


    (z(tq.sub.ϕ(z(t)|X(≤t),z(<t))

    [0134] For our implementation we combine the different hidden states by simply taking their mean, thereby keeping the size of the prior network independent of the number of modalities. Using the generated time samples {circumflex over (x)}.sub.i(t) we use the RNNs to update the time context and calculate the new set of hidden states hi(t) which will be used for the generation of the subsequent time samples:


    h.sub.i(t)=RNN.sub.i(x.sub.i(t),z(t)).

    [0135] Finally, the multimodal ELBO in Eq. 2 is modified to a time dependent ELBO, calculating the loss for all modalities present, as shown in Eq. 3: This formulation of the ELBO allows for straightforward handling of missing and differing sampling rates, by simply leaving out the modality at any time point where a value is missing or it has not been sampled. Additionally, the Seq-MVAE may easily in-corporate non-sequential modalities, simply by repeatedly proving them as a present modality after a number of time steps have passed. In the extreme they would be included only once at the beginning of the sequence or at every time point. For training, we recommend to use the sub-sampled paradigm discussed in (Wu and Goodman, 2018), with the additional possibility of choosing which modalities to exclude once per sequence or at every time point within the sequence.

    [00005] ELBO ( X ( T ) ) = E q ϕ ( z ( T ) .Math. "\[LeftBracketingBar]" X ( T ) ) .Math. ? .Math. x i X ( t ) λ i log p θ ( x i ( t ) .Math. "\[LeftBracketingBar]" z ( t ) , x ( < t ) ) - β KL ( q ϕ ( z ( t ) .Math. "\[LeftBracketingBar]" X ( t ) , z ( < t ) ) .Math. p ϕ ( z ( t ) .Math. "\[LeftBracketingBar]" X ( t ) , z ( < t ) ) ) ] . ( 3 ) ? indicates text missing or illegible when filed

    [0136] 2.3 Conditioning on Different Modalities

    [0137] Another major advantage of the Seq-MVAE is the model's ability to sample from the joint posterior distribution conditioned on any combination of modalities. This enables us to condition our model's posterior based on any provided modalities regardless of whether they are sequential or non-sequential, giving us a great degree of control over the properties of the multimodal sequences we want to generate for data augmentation, as well as making our model capable of missing values imputation. New synthetic samples can be generated by sampling in the following ways: from the prior distribution with no modalities as input (1), from the posterior conditioned on either the condition parameters (PCs) (2), the KPIs (3) or both (4). In the first case, the resulting synthetic samples will be one a completely independent one, where the functional relationship between the PCs and KPIs is maintained. In the second case, the generated PCs should closely resemble the PCs used as input/conditioning, and the generative model will try to generate the KPIs that are still properly functionally related to the PCs used as input. This applies analogously for the third case, while in the fourth and final case, the generative model will attempt to produce synthetic samples that very closely resemble the PCs and KPIs given as input, with only small differences caused by the random sampling of the latent variable from the joint posterior.

    [0138] Conditioning the generative model with existing PCs or KPIs can be used to guide the generated samples depending on the task requirements. For example, if we want to increase the model accuracy within the operating ranges which are seldom observed in the reality, e.g. low feed load in the plant (here, it represents the condition parameter). Such regime is mostly observed during the start up of the plant, because most part of the time the plant is operated with a maximal feed load. That means that we very often do not have enough real data to train the model with such values of condition parameters.

    [0139] Instead, we would feed the generative model these values of condition parameters and obtain generated KPIs that would simulate what happens in the plant in these conditions. Similarly, if we want to train our model for a particular type of KPI cycle (e.g. optimal or suboptimal production), one could give these types of KPI cycles to the generative model to generate the corresponding PCs.

    3. Experiments

    [0140] To evaluate our proposed generative model, we analyzed two examples from the chemical industry. Here, we focused on a particular case of industrial aging processes (IAPs), namely the deactivation of the heterogeneous catalyst due to coking, e.g., surface deposition of elementary carbon in the form of graphite. One of the most important features of such degradation processes are distinct memory effects, where the value of the inputs in the plant x(t), which we will refer to as condition parameters (CPs), effects the out-put y(t′), as measured by some key performance indicators (KPIs), at much later time points t′>t. Therefore, the catalyst deactivation can be observed only on long-term timescales, which makes such processes very challenging to model using mechanistic models, i.e. as sets of differential equations describing the degradation process. Given enough historical data, we can use machine learning to model the degradation process in a data-driven manner instead. However, the data acquisition in real-world chemical plants is highly expensive, leading to a lack of historical data for training. Additionally, covariate shifts can often occur due to the sensitive and changing conditions within the plant itself, which makes this an excellent scenario to test the effectiveness of data augmentation using generative modelling. In this work, we consider two datasets. The first dataset represents artificial data, which is generated using a mechanistic model meant to simulate a degradation process. The second dataset contains real-world data from a large-scale plant at BASF.

    [0141] 3.1 Artificial Dataset

    [0142] The reason for working with artificial data based on a deterministic mechanistic model, is that we can know the exact functional relationship between the PCs x(t) and its KPIs y(t). Since we expect our generative model to be able to learn this relationship, we can use our mechanistic model as a ground truth to evaluate the performance of the generative model.

    [0143] For our artificial use case, we analyzed an example of catalyst deactivation in a continuously operated fixed-bed reactor. The catalyst deactivation over time causes unacceptable conversion rates in the reaction process, requiring catalyst regeneration or replacement. This process step characterizes the end of one cycle.

    [0144] Based on the current operating conditions of the process and the unobservable state variable of the system (here, the catalyst activity), we used a mechanistic model to generate a multivariate time series [x(t), y(t)] for roughly 1000 degradation cycles, which represents 25 years of historical data. The final artificial dataset is composed of 6 PCs x(t) and one KPI y(t), which in this case is the conversion rate. The catalyst activity A(t) is an unobservable state variable and therefore not part of the dataset. It is important to note that the system output y(t) is not only affected by current process parameters x(t), but also the catalyst activity A(t), which decreases non-linearly over each cycle.

    [0145] 3.2. Real-World Dataset

    [0146] This dataset is five times smaller than the full artificial dataset and contains process and/or storage condition data for the production of aldehyde (ALD) in a continuous large-scale production plant at BASF. Here, we will give you only a brief description of the process. In this case as well, catalyst in the reactor suffers from coking, which leads to the reduction of catalyst activity and increasing fluid resistance. The latter can be measured by an increasing pressure drop over the reactor (Δp). The real-world dataset consists of 12 PCs x(t) and one KPI y(t) and contains seven years of process and/or storage condition data with 336 degradation cycles belonging to three different catalyst batches. Each catalyst batch slightly different dynamics owing to small differences between the catalysts in each batch. The input dataset contains four directly measured variables, with the additional eight variables representing engineered features.

    [0147] 3.3. Models and Training

    [0148] For both chemical plant datasets, we separate the data into two sequential modalities, one modality containing all of the PCs for a cycle and the other containing the KPI. The reason why we cannot split the PCs into separate modalities is the assumption of conditional independence between the modalities. Even though the PCs are independent on their own, they are not conditionally independent given the differential equations governing the process dynamics and the hidden catalyst activity, which represent our latent variable z. Knowing the differential equations and the state of the catalyst activity, having information about some of the PCs allows us to infer possible values for the missing ones.

    [0149] We use the same Seq-MVAE and RNN-MVAE model across all experiments and keep the model sizes relatively small to reduce the risk of overfilling due to the small dataset sizes. Additionally as a forecasting model for predicting the KPI from the given PCs we use a single layer LSTM.

    [0150] For training the datasets are split into a training, validation and test set, with ratios of 0.8, 0.1, 0.1 for the artificial dataset and 0.68, 0.7, 0.25 for the real-world dataset. In the real-world dataset, data from two of the three catalyst batches is shuffled into the training and validation sets, while the test set contains data exclusively from the third batch, producing the aforementioned covariate shift.

    [0151] As mentioned previously, for the generative models we use the semi-supervised procedure, where for the semi-supervised case the modalities were removed as entire sequences, and not onetime point at a time.

    4. Evaluation

    [0152] 4.1 Artificial Dataset

    [0153] Forecasting. First we will examine how the generative models perform in the degradation forecasting scenario described in Section 3, where the curve of a KPI describing catalyst degradation over time is predicted based on the sequence of PCs used to control the plant. Both the Seq-MVAE and RNN-MVAE models are not trained for forecasting, since during the semi-supervised training procedure the loss for the missing modalities is not calculated, so the model is never explicitly trained to perfectly predict the missing modalities based on the present ones. Still, a generative model should be accurately capture the relationship between the PCs and KPIs and perform reasonably well compared to a dedicated forecasting model. On FIG. 4 we can see a comparison of the forecasting performance of the simple RNN-MVAE, the Seq-MVAE and the LSTM forecasting models. As expected, the RNN-MVAE fails to capture the within sequence dynamics and only predicts an average degradation curve, with an RMSE of 3.44. On the other hand, both the LSTM and the Seq-MVAE models predict the course of the KPI accurately, with the error of the Seq-MVAE being 1.12, slightly higher than the one of the dedicated forecasting model. The differences inaccuracy between the Seq-MVAE and forecasting LSTM are likely owing to the training procedure which doesn't prioritize accurate forecasts. Modifying the training procedure to calculate the loss for the excluded modalities is likely to increase the performance, but then the procedure would not be applicable to datasets with actual missing values.

    [0154] Modelling the differential equation. A major advantage of the artificial dataset is that the relation between the PCs and KPIs is known exactly, so we can exploit this fact to examine how well the generative models reproduce the relation between these two modalities. We do this by using a generative model to produce two sequential modalities, then giving the generated PCs as an input to mechanistic model to obtain the true KPIs corresponding to these generated PCs, and finally calculating the RMSE between true KPIs and the generated ones to obtain the modelling error, which captures how well the generated models can reproduce the dynamics of the mechanistic model. This setting allows us to evaluate the model in more detail than the forecasting scenario, since we can also evaluate how accurately the multimodal sequences generated with different types of conditioning, including entirely new sequences generated with no conditioning, capture the underlying dynamics defined by the mechanistic model.

    [0155] The results are shown on FIGS. 5 and 6 for the RNN-MVAE and Seq-MVAE, respectively. The results clearly show that the quality of the generated samples of the Seq-MVAE model is much higher than those of the RNN-MVAE. We see that the RNN-MVAE sequences display virtually no internal dynamics beyond the long term degradation trend, whereas the Seq-MAE samples show dynamics close to those of the true artificial dataset, with the modelling error being significantly smaller for the Seq-MVAE in all but the case where the models are conditioned on the KPIs.

    [0156] It is precisely this case of conditioning the models on the KPIs that is interesting, since the largest difference modelling errors for both models are found with this type of conditioning. The reason for this is that one particular cycle of KPIs can be produced by many different combinations of PCs, making this case degenerate. Since there is no single set of PCs corresponding to a given KPI, it is expected any model of this type is likely to struggle when generating the PCs, so having a larger error is no surprise. Still, we can see that for many sequences, the modelling error of the Seq-MVAE is small, with the larger average error being driven by a subset of samples where the dynamics are captured particularly poorly. It is also interesting to see that the Seq-MVAE captures the model dynamics very accurately when generating sequences without any conditioning, with the modelling error being almost as small as when conditioning on both the KPIs and the PCs.

    [0157] These results clearly show the advantages of the Seq-MVAE architecture over the RNN-MVAE for the case of highly dynamics sequential data. The modelling error is significantly lower in all but the degenerate case, and the within sequence dynamics are reproduced in a manner close to those of the artificial dataset itself.

    [0158] 4.2. Real-World Dataset

    [0159] Forecasting. As with the artificial dataset, first we evaluate how the Seq-MVAE model performs on the IAP forecasting task compared to the forecasting LSTM model. The results are shown on FIG. 7, where unlike for the artificial dataset we can see that the Seq-MVAE outperforms the LSTM model. In the case of small data and covariate shift, the semi-supervised training procedure for the Seq-MVAE turns out to be an advantage. Since the Seq-MVAE is not directly trained to forecast the KPI from the PCs, it does not overfit to the training set as much as the LSTM model leading to a better performance on the test set.

    [0160] Data augmentation. The main goal of developing the Seq-MVAE is to use the generated data for data augmentation for small datasets, which is why we chose to evaluate the generative model by measuring how much the regression performance of the baseline forecasting LSTM model improves when augmenting the training dataset with data from the Seq-MVAE. We added different amounts of generated samples to the training dataset, generated using all four types of conditioning, and retrained and re-evaluated the predictive model 20 times for each setting to obtain more stable estimates of the performance changes due to data augmentation. On FIG. 8 we see the results of these evaluations. We see that performance significantly increases across all types of conditioning after adding 100 generated samples, and reaching its maximum from 350 to 500 added samples. This clearly demonstrates that data augmentation using our Seq-MVAE model can greatly increase predictive performance in the face of small amounts of data and differences between the training and test sets. Further increasing the amount of generated data seems to once again start degrading performance, meaning that a certain balance between real and generated samples is needed to achieve optimal performance.

    [0161] The best overall performance is achieved when generating from the posterior conditioned on the KPIs and the unconditioned model, with the RMSE reaching 5.19 compared to the RMSE of 5.98 for the LSTM model trained only on the original training data, which is a 13% improvement in performance. A plausible reason for the improved performance of these two types of conditioning is that they produce data with higher variability, with the KPI conditioned model having a weak conditioning that allows for the generation of different sets of PCs for the same KPI, while the other model has no conditioning at all and is free to generate completely new samples. Another reason for the improved performance with samples from the KPI conditioned model is that as seen in FIG. 6, the LSTM forecast model struggles to properly predict the quick rise of the KPI values at the end of each cycle. When conditioned on the KPls, the Seq-MVAE generates KPIs that are very similar to the ones used for conditioning, which results in the model seeing many more examples of such steep rises of the KPI values with different accompanying PCs, making it learn to generalize and predict the exponential growth of the KPI more accurately. Another observation is that the two best performing types of conditioning also exhibit the smallest reduction in error on the original training dataset, once again showing that the samples they produce are more diverse, but not so different that they would also cause an increase in the error.

    [0162] Finally, we also examine how the predictive LSTM performs on the test set when trained exclusively on data generated by the Seq-MVAE. In order to make a fair comparison, for each type of conditioning we generated 256 samples, the same number as the training set, and trained the LSTM with the same hyperparameters as with the original training set. We repeated this procedure 20 time, each time generating a new set of sequences to get a stable estimate of the error. The results are presented on Table 1. Interestingly, here we largely see the same pattern as with the data augmentation, where training on the generated samples conditioned on the KPI leading to the best performance, even outperforming the original training dataset. The data generated without conditioning once again has the second lowest error on the test set, with a performance similar to the original training set, while conditioning on the PCs or both modalities once again leads less diverse data with lower errors on the original training set but high errors on the test set. This experiment once again confirms our conclusions from the data augmentation experiments, which is that having control over the properties of the generated data allows us to pro-duce diverse data that still maintains the relation between the modalities, which is crucial for improving performance in scenarios with small datasets and/or covariate shifts.

    TABLE-US-00001 TABLE 1 original conditioning data all PCs KPI none test set 5.98 6.38 6.79 5.57 6.00 training set 3.70 4.00 5.04 5.44 5.40

    [0163] Accordingly, the Seq-MVAE generative model for multimodal time series is also compatible with missing data and non-sequential modalities. The model is also capable of generating data conditioned on known modalities, providing a high degree of control over the types of data being generated. We use this model to tackle a challenging machine learning scenario encountered in many datasets from real world processes, where data is scarce and subject to covariate shifts.

    [0164] Taking the problem of forecasting industrial aging processes as a case study, we show that our generative model is capable of learning and recreating the temporal dynamics within and between the different modalities, and show that controlling the properties of the data being generated is crucial to achieving the best improvement in performance with data augmentation.

    [0165] FIG. 9 schematically shows an apparatus 100a for predicting an industrial time dependent process, in particular for predicting a future value of KPI(s).

    [0166] The apparatus 100a comprises an input unit 110a, a processing unit 120a, and an output unit 130a. The input unit 110a, the processing unit 120a, and the output unit 130a may be a software, or hardware dedicated to running said software, for delivering the corresponding functionality or service. Each unit may be part of, or include an ASIC, an electronic circuit, a processor (shared, dedicated, or group) and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logical circuit, and/or other suitable components that provide the described functionality.

    [0167] The input unit 110a is configured to receive currently measured data indicative of a current condition under which the industrial time dependent process currently takes place. At least one key performance indicator (KPI) is provided for quantifying the industrial time dependent process.

    [0168] The at least one KPI may be selected in dependence on the use cases. Taking the industrial aging process as an example, despite the large variety of affected asset types in a chemical production plant, and the completely different physical or chemical degradation processes that underlie them, the selected parameters representing the one or more degradation KPIs may have at least one of the following characteristics:

    [0169] On a time scale longer than a typical production time scale, e.g., batch time for discontinuous processes or typical time between set point changes for continuous processes, the selected parameters change substantially monotonically to a higher or lower value, thereby indicating an occurrence of an irreversible degradation phenomenon. The term “monotonic”, or “monotonically”, means that the selected parameters representing the degradation KPIs either increase or decrease on a longer time sale, e.g., the time scale of the degradation cycle, and the fluctuations on a shorter time scale do not affect this trend. On shorter time scales, the selected parameters may exhibit fluctuations that are not driven by the degradation process itself, but rather by varying condition parameters or background variables such as the ambient temperature. In other words, the one or more degradation KPIs are to a large extent determined by the condition parameters, and not by uncontrolled, external factors, such as bursting of a flawed pipe, varying outside temperature, or varying raw material quality.

    [0170] The selected parameters may return to their baseline after a regeneration phase. As used herein, the term “regeneration” may refer to any event/procedure that reverses the degradation, including exchange of process equipment or catalyst, cleaning of process equipment, in-situ re-activation of catalyst, burn-off of cokes layers, etc.

    [0171] The input unit 110a is configured to receive at least one expected condition parameter indicative of a future condition under which the industrial time dependent process will take place within a prediction horizon.

    [0172] The condition parameter may include e.g. operating parameters and/or storage parameters.

    [0173] The at least one expected condition parameter may be known and/or controllable over the prediction horizon instead of uncontrolled, external factors. Examples of the uncontrolled, external factors may include catastrophic events, such as busting of a flawed pipe. Further examples of the uncontrolled, external factors may include a less catastrophic, but more frequent external disturbance, such as varying outside temperature, or varying raw material quality. In other words, the one or more expected condition parameters may be planned or anticipated over the prediction horizon.

    [0174] The processing unit 120a is configured to apply a predictive data-driven model to an input dataset comprising the currently measured data and the at least one expected condition parameter to estimate a future value of the at least one KPI within the prediction horizon. The predictive data-driven model is parametrized or trained according to a training dataset comprising historical data of the at least one condition parameter and the at least one KPI and synthetic samples of the at least one condition parameter and the at least one KPI, which are optionally provided by a device as described above.

    [0175] In other words, an apparatus is proposed for predicting an industrial time dependent process, such as a time dependent process in a chemical production plant, based on a data driven model. The data driven model is trained using real world data and synthetic data. The synthetic data is derived from historical data information and represents the correlation in the historical data. The synthetic data is generated with the help of neural networks, such as RNN-MVAEs or Seq-MVAEs. The synthetic data thus increases the span of the training set. The increased span of the training set can help up bridge the gap between the training and test sets and thus improve the generalization performance of the predictive data driven model.

    [0176] The output unit 130a is configured to provide a prediction of the future value of at least one KPI within the prediction horizon which is usable for monitoring and/or controlling the industrial time dependent process.

    [0177] As application examples, the method may be used to predict and forecast at least one of the following degradation processes in a chemical production plant: deactivation of heterogeneous catalysts due to coking, sintering, and/or poisoning; plugging of a chemical process equipment on process side due to coke layer formation and/or polymerization; fouling of a heat exchanger on water side due to microbial and/or crystalline deposits; and erosion of an installed equipment in a fluidized bed reactor. Further application examples may include load forecasting and battery discharge forecasting.

    [0178] FIG. 10 schematically shows an apparatus 100b for predicting an industrial time dependent process, in particular for predicting a current value of KPI(s).

    [0179] The apparatus 100b comprises an input unit 110b, a processing unit 120b, and an output unit 130b. The input unit 110b, the processing unit 120b, and the output unit 130b may be a software, or hardware dedicated to running said software, for delivering the corresponding functionality or service. Each unit may be part of, or include an ASIC, an electronic circuit, a processor (shared, dedicated, or group) and/or memory (shared, dedicated, or group) that execute one or more software or firmware programs, a combinational logical circuit, and/or other suitable components that provide the described functionality.

    [0180] The input unit 110b is configured to receive previously measured data indicative of a past condition under which the industrial time dependent process took place. At least one key performance indicator, KPI, is provided for quantifying the industrial time dependent process.

    [0181] The input unit 110b is configured to receive at least one condition parameter indicative of a current condition under which the industrial time dependent process currently takes place.

    [0182] The processing unit 120b is configured to apply a predictive data-driven model to an input dataset comprising the previously measured data and the at least one condition parameter to estimate a current value of the at least one KPI. The predictive data-driven model is parametrized or trained according to a training dataset comprising historical data of the at least one condition parameter the at least one KPI and synthetic samples of the at least one condition parameter and the at least one KPI, which are optionally provided by a device as described above.

    [0183] In other words, an apparatus is proposed for predicting an industrial time dependent process, such as the off-the-shelf performance of an enzyme, based on a data driven model. The data driven model is trained using real world data and synthetic data. The synthetic data is derived from historical data information and represents the correlation in the historical data. The synthetic data is generated with the help of a neural network, such as RNN-MVAEs or Seq-MVAEs. The synthetic data thus increases the span of the training set. The increased span of the training set can help us bridge the gap between the training and test sets and thus improve the generalization performance of the predictive data driven model.

    [0184] The output unit 130b is configured to provide a prediction of the current value of at least one KPI which is usable for monitoring and/or controlling the industrial time dependent process.

    [0185] As application examples, the method may be used to predict off-the-shelf performance of a chemical substance (e.g., enzyme), component (e.g., battery), equipment, and/or system.

    [0186] FIG. 11 shows a flow chart illustrating a method 200 for generating synthetic samples for expanding a training dataset of a predictive data-driven model for predicting an industrial time dependent process.

    [0187] In step 210, historical data of at least one condition parameter indicative of a condition under which the industrial time dependent process took place and at least one KPI provided for quantifying the industrial time dependent process are received via an input channel.

    [0188] In step 220, i.e. step b), a data-driven generative model is applied, via a processor, to generate synthetic samples of the at least one condition parameter and the at least one KPI from the historical data. The data-driven generative model is parametrized or trained based on a training dataset comprising real-data examples of the at least one condition parameter and the at least one KPI.

    [0189] In some examples, the synthetic samples may comprise a synthetic sequence representative of a time series of the at least one condition parameter and the at least one KPI.

    [0190] In some examples, the data-driven generative model may comprise a Seq-MVAE model with the at least one condition parameter and the at least one KPI as an initial input and a synthetic sample of the at least one condition parameter and the at least one KPI as output. The Seq-MVAE model comprises a multimodal variational autoencoder (MVAE). The MVAE comprises two feed forward neural networks (FFNNs) that act as an encoder-decoder pair for the at least one condition parameter. The MVAE comprises two feed forward neural networks (FFNNs) that act as an encoder-decoder pair for the at least one KPI. Each decoder and encoder are coupled to a respective recurrent neural network (RNN). For each point in time the output of the Seq-MVAE is aggregated into a vector representative of the synthetic sequence.

    [0191] In some examples, the data-driven generative model may comprise an RNN-MVAE model with the at least one condition parameter and the at least one KPI as input and a synthetic sequence of the at least one condition parameter and the at least one KPI as output. The RNN-MVAE model comprises a multimodal variational autoencoder (MVAE). The MVAE comprises two recurrent neural networks (RNNs) that act as an encoder-decoder pair for the at least one condition parameter. The MVAE comprises two RNNs that act as an encoder-decoder pair for the at least one KPI.

    [0192] In step 230, the synthetic samples are provided, via an output channel, to the training dataset of the predictive data-driven model.

    [0193] FIG. 12 shows a flow chart of a method 300a for predicting an industrial time dependent process.

    [0194] In step 310a, i.e. step a1), currently measured data indicative of a current condition under which the industrial time dependent process currently takes place is received via an input channel. At least one key performance indicator (KPI) is provided for quantifying the industrial time dependent process.

    [0195] In step 320a, i.e. step b1), at least one expected condition parameter indicative of a future condition under which the industrial time dependent process will take place within a prediction horizon is received via the input channel.

    [0196] In step 330a, i.e. step c1), a predictive data-driven model is applied by a processor to an input dataset comprising the currently measured data and the at least one expected condition parameter to estimate a future value of the at least one KPI within the prediction horizon. The predictive data-driven model is parametrized or trained according to a training dataset comprising historical data of the at least one condition parameter and the at least one KPI and synthetic samples of the at least one condition parameter and the at least one KPI, which are optionally provided by a method as described above.

    [0197] In step 340a, i.e. step d1), a prediction of the future value of at least one KPI within the prediction horizon which is usable for monitoring and/or controlling the industrial time dependent process is provided via an output channel.

    [0198] FIG. 13 shows a flow chart of a method 300b for predicting an industrial time dependent process.

    [0199] In step 310b, i.e. step a2), previously measured data indicative of a past condition under which the industrial time dependent process took place is received via an input channel. At least one key performance indicator (KPI) is provided for quantifying the industrial time dependent process.

    [0200] In step 320b, i.e. step b2), at least one condition parameter indicative of a current condition under which the industrial time dependent process currently takes place is received via the input channel.

    [0201] In step 330b, i.e. step c2), a predictive data-driven model is applied by a processor to an input dataset comprising the previously measured data and the at least one condition parameter to estimate a current value of the at least one KPI. The predictive data-driven model is parametrized or trained according to a training dataset comprising historical data of the at least one condition parameter the at least one KPI and synthetic samples of the at least one condition parameter and the at least one KPI, which are optionally provided by a method according to a method as described above.

    [0202] In step 340b, i.e. step d2), a prediction of the current value of at least one KPI is provided via the output channel which is usable for monitoring and/or controlling the industrial time dependent process.

    [0203] It will be appreciated that the above operation may be performed in any suitable order, e.g., consecutively, simultaneously, or a combination thereof, subject to, where applicable, a particular order being necessitated, e.g., by input/output relations.

    [0204] The present techniques may be implemented as a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.

    [0205] The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

    [0206] Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

    [0207] Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some examples, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present disclosure.

    [0208] Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to aspects of the present disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

    [0209] These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

    [0210] The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

    [0211] The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various aspects of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

    [0212] It has to be noted that embodiments of the invention are described with reference to different subject matters. In particular, some embodiments are described with reference to method type claims whereas other embodiments are described with reference to the device type claims. However, a person skilled in the art will gather from the above and the following description that, unless otherwise notified, in addition to any combination of features belonging to one type of subject matter also any combination between features relating to different subject matters is considered to be disclosed with this application. However, all features can be combined providing synergetic effects that are more than the simple summation of the features.

    [0213] While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive. The invention is not limited to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing a claimed invention, from a study of the drawings, the disclosure, and the dependent claims. In the claims, the word “comprising” does not exclude other elements or steps, and the indefinite article “a” or “an” does not exclude a plurality. A single processor or other unit may fulfil the functions of several items re-cited in the claims. The mere fact that certain measures are re-cited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage. Any reference signs in the claims should not be construed as limiting the scope.

    REFERENCES

    [0214] Junyoung Chung, Kyle Kastner, Laurent Dinh, KratarthGoel, Aaron C Courville, and Yoshua Bengio. A recurrent latent variable model for sequential data. In Advances inneural information processing systems, pages 2980-2988, 2015. [0215] Mike Wu and Noah Goodman. Multimodal generative models for scalable weakly-supervised learning. In Advances in Neural Information Processing Systems, pages 5575-5585, 2018.