OPTIMISATION OF PROCESSES FOR THE PRODUCTION OF CHEMICAL, PHARMACEUTICAL AND/OR BIOTECHNOLOGICAL PRODUCTS

Abstract

A computer-implemented method of determining at least one recipe for a production process to produce a chemical, pharmaceutical and/or biotechnological product is provided, wherein the production process is defined by a plurality of steps specified by recipe parameter(s) controlling an execution of the production process and a recipe comprises the plurality of steps defining the production process.

Claims

1-15. (canceled)

16. A computer-implemented method of determining at least one recipe for a production process to produce a chemical, pharmaceutical and/or biotechnological product, wherein the production process is defined by a plurality of steps specified by recipe parameter(s) controlling an execution of the production process and a recipe comprises the plurality of steps defining the production process, the method comprising: retrieving: one or more recipe template composites, wherein a recipe template composite comprises one or more recipe templates and a recipe template is a recipe in which at least one of the recipe parameter(s) specifying the plurality of steps is a variable recipe parameter being variable and having no predetermined value at the outset; process evolution information that describes the time evolution of process variable(s) that describe a state of the production process, wherein the process evolution information comprises evolution parameter(s) and at least one of the evolution parameter(s) is a variable evolution parameter being variable and having no predetermined value at the outset; a likelihood function for the evolution parameter(s), wherein the likelihood function provides a likelihood that value(s) for the evolution parameter(s) are feasible for the production process; a mapping that maps the process variable(s) onto performance indicator(s); a utility function for at least one of the performance indicator(s), wherein the utility function provides a desirability value associated to the performance indicator(s); selecting a recipe template composite and performing a first optimization step comprising: providing a set of input values for the at least one variable recipe parameter in the recipe template composite; performing a second optimization step providing a utility score; repeating the second optimization step for at least another set of input values, thereby obtaining a plurality of utility scores; identifying the optimal utility score from the plurality of utility scores; if only one recipe template composite was retrieved, selecting the set of input values that yields the optimal utility score together with the one recipe template composite as the at least one recipe for the production process; or if more than one recipe template composite was retrieved: repeating the first optimization step for each recipe template composite, thereby obtaining a plurality of optimal utility scores; identifying the best optimal utility score among the plurality of optimal utility scores; selecting the recipe template composite and the set of input values that yield the best optimal utility score as the at least one recipe for the production process; wherein: the second optimization step comprises: providing a set of likely values for the at least one variable evolution parameter based on the likelihood function; performing a third optimization step providing a utility tally; if there is at least another set of likely values that is suitable, repeating the third optimization step for the at least another set of likely values, thereby obtaining a plurality of utility tallies; computing the utility score by weighting the utility tally(ies) by the likelihood function; the third optimization step comprises: performing a fourth optimization step providing a utility value; if a stochasticity flag is set, repeating the fourth optimization step, thereby obtaining a plurality of utility values; computing the utility tally as the utility value or by compounding the plurality of utility values; the fourth optimization step comprises: generating a simulation by simulating the execution of the production process using a recipe template in the recipe template composite with the set of input values for the at least one variable recipe parameter, the process evolution information and the set of likely values for the at least one variable evolution parameter; if the recipe template composite comprises more than one recipe template, repeating the step of generating the simulation for each recipe template in the recipe template composite; determining, from the one or more simulations, trajectory(ies) for the process variable(s), wherein a trajectory corresponds to a time-based profile of values recordable during the simulated execution of the production process; computing the utility value by evaluating the utility function using the process variable(s) of the trajectory(ies) and the mapping.

17. The method of claim 16, wherein the chemical, pharmaceutical and/or biotechnological product is a first product and retrieving the likelihood function comprises: retrieving akin data related to at least one akin production process to produce a second chemical, pharmaceutical and/or biotechnological product, wherein the at least one akin production process has been at least partially performed and wherein the at least one akin production process is different from the production process; retrieving relatedness information providing a quantitative indication of similarity between the production process and the akin production process; computing the likelihood function based on the akin data and the relatedness information.

18. The method of claim 16, wherein retrieving the likelihood function comprises: retrieving current data related to an ongoing run of the production process; and computing the likelihood function based on the current data.

19. The method of claim 16, wherein retrieving the likelihood function comprises: retrieving historical data related to one or more past runs of the production process; and computing the likelihood function based on the historical data.

20. The method of claim 16, wherein retrieving the likelihood function comprises: retrieving historical data related to one or more past runs of the production process; and computing the likelihood function based on the historical data.

21. The method of claim 20, further comprising: storing data from the executed production process as historical data and/or storing data from the executing production process as current data.

22. The method of claim 16, further comprising updating the likelihood function based on the trajectory(ies) and storing the updated likelihood function.

23. A computer program product comprising computer-readable instructions, which, when loaded and executed on a computer system, cause the computer system to perform operations according to the method of claim 16.

24. A computer system operable to determine at least one recipe for a production process to produce a chemical, pharmaceutical and/or biotechnological product, wherein the production process is defined by a plurality of steps specified by recipe parameter(s) controlling an execution of the production process and a recipe comprises the plurality of steps defining the production process, the computer system comprising: a retrieving module configured to retrieve: one or more recipe template composites, wherein a recipe template composite comprises one or more recipe templates and a recipe template is a recipe in which at least one of the recipe parameter(s) specifying the plurality of steps is a variable recipe parameter being variable and having no predetermined value at the outset; process evolution information that describes the time evolution of process variable(s) that describe a state of the production process, wherein the process evolution information comprises evolution parameter(s) and at least one of the evolution parameter(s) is a variable evolution parameter being variable and having no predetermined value at the outset; a likelihood function for the evolution parameter(s), wherein the likelihood function provides a likelihood that value(s) for the evolution parameter(s) are feasible for the production process; a mapping that maps the process variable(s) onto performance indicator(s); a utility function for at least one of the performance indicator(s), wherein the utility function provides a desirability value associated to the performance indicator(s); a computing module configured to select a recipe template composite and perform a first optimization step comprising: providing a set of input values for the at least one variable recipe parameter in the recipe template composite; performing a second optimization step providing a utility score; repeating the second optimization step for at least another set of input values, thereby obtaining a plurality of utility scores; identifying the optimal utility score from the plurality of utility scores; wherein the computing module is further configured, if only one recipe template composite was retrieved, to select the set of input values that yields the optimal utility score together with the one recipe template composite as the at least one recipe for the production process; or the computing module is further configured, if more than one recipe template composite was retrieved, to: repeat the first optimization step for each recipe template composite, thereby obtaining a plurality of optimal utility scores; identify the best optimal utility score among the plurality of optimal utility scores; select the recipe template composite and the set of input values that yield the best optimal utility score as the at least one recipe for the production process; and wherein: the second optimization step comprises: providing a set of likely values for the at least one variable evolution parameter based on the likelihood function; performing a third optimization step providing a utility tally; if there is at least another set of likely values that is suitable, repeating the third optimization step for the at least another set of likely values, thereby obtaining a plurality of utility tallies; computing the utility score by weighting the utility tally(ies) by the likelihood function; the third optimization step comprises: performing a fourth optimization step providing a utility value; if a stochasticity flag is set, repeating the fourth optimization step, thereby obtaining a plurality of utility values; computing the utility tally as the utility value or by compounding the plurality of utility values; the fourth optimization step comprises: generating a simulation by simulating the execution of the production process using a recipe template in the recipe template composite with the set of input values for the at least one variable recipe parameter, the process evolution information and the set of likely values for the at least one variable evolution parameter; if the recipe template composite comprises more than one recipe template, repeating the step of generating the simulation for each recipe template in the recipe template composite; determining, from the one or more simulations, trajectory(ies) for the process variable(s), wherein a trajectory corresponds to a time-based profile of values recordable during the simulated execution of the production process; computing the utility value by evaluating the utility function using the process variable(s) of the trajectory(ies) and the mapping.

25. The computer system of claim 24, wherein the chemical, pharmaceutical and/or biotechnological product is a first product and the retrieving module is further configured to: retrieve akin data related to at least one akin production process to produce a second chemical, pharmaceutical and/or biotechnological product, wherein the at least one akin production process has been at least partially performed and wherein the at least one akin production process is different from the production process; and retrieve relatedness information providing a quantitative indication of similarity between the production process and the akin production process; wherein the computing module is further configured to compute the likelihood function based on the akin data and the relatedness information.

26. The computer system of claim 24, wherein the retrieving module is further configured to retrieve current data related to an ongoing run of the production process and the computing module is further configured to compute the likelihood function based on the current data.

27. The computer system of claim 25, wherein the retrieving module is further configured to retrieve current data related to an ongoing run of the production process and the computing module is further configured to compute the likelihood function based on the current data.

28. The computer system of claim 24, wherein the retrieving module is further configured to retrieve historical data related to one or more past runs of the production process and the computing module is further configured to compute the likelihood function based on the historical data.

29. The computer system of claim 24, further configured to be interfaced with a control system for controlling a production process equipment, wherein: and the computer system is configured to provide the at least one recipe to the control system; the control system is configured to execute the production process based on the at least one recipe.

30. The computer system of claim 29, further configured to store data from the executed production process as historical data and/or to store data from the executing production process as current data.

31. The computer system of claim 24, wherein: the computing module is further configured to update the likelihood function based on the trajectory(ies).

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0414] Details of exemplary embodiments are set forth below with reference to the exemplary drawings. Other features will be apparent from the description, the drawings, and from the claims. It should be understood, however, that even though embodiments are separately described, single features of different embodiments may be combined to further embodiments.

[0415] FIG. 1 shows a computer system for determining a recipe for a production process to produce a chemical, pharmaceutical, or biotechnological product.

[0416] FIG. 2 shows a block diagram indicating inputs and outputs of the recipe determination.

[0417] FIGS. 3a to 3e show a method for recipe determination of a production process.

[0418] FIG. 4 shows the nested loops of the method for recipe determination.

[0419] FIGS. 5a to 5c show exemplary plots representing some of the inputs of the recipe determination.

[0420] FIGS. 6a to 6d show a conceptual diagram of an exemplary recipe determination and related plots.

[0421] FIGS. 7a to 7f show a conceptual diagram of another exemplary recipe determination and related plots.

[0422] FIGS. 8a to 8c show a conceptual diagram of yet another exemplary recipe determination and related plots.

[0423] FIGS. 9a to 9e show a conceptual diagram of a further exemplary recipe determination and related plots.

DETAILED DESCRIPTION

[0424] In the following, a detailed description of examples will be given with reference to the drawings. It should be understood that various modifications to the examples may be made. Unless explicitly indicated otherwise, elements of one example may be combined and used in other examples to form new examples.

[0425] FIG. 1 shows a computer system 10 for determining at least one recipe for a production process to produce a chemical, pharmaceutical, or biotechnological product.

[0426] The computer system 10 may include a processing unit, a system memory, and a system bus. The system bus couples various system components including the system memory to the processing unit. The processing unit may perform arithmetic, logic and/or control operations by accessing the system memory. The system memory may store information and/or instructions for use in combination with the processing unit. The system memory may include volatile and non-volatile memory, such as a random access memory (RAM) and a read only memory (ROM).

[0427] The computer system 10 may further include a hard disk drive for reading from and writing to a hard disk (not shown), and an external unit drive for reading from or writing to a removable unit. The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules and other data for the personal computer 920. The data structures may include relevant data for the implementation of the method as described above.

[0428] A number of program modules may be stored on the hard disk, external disk, ROM or RAM, including an operating system (not shown), one or more application programs, other program modules (not shown), and program data. The application programs may include at least a part of the functionality as described below.

[0429] A user may enter commands and information into the computer system 10 through input devices such as keyboard and mouse. A monitor or other type of display device is also connected to the system bus via an interface, such as a video input/output.

[0430] The computer system 10 may communicate with other electronic devices. To communicate, the computer system 10 may operate in a networked environment using connections to one or more electronic devices.

[0431] In particular, the computer system may interface and communicate with a control system 20. The computer system 10 may be operable, possibly in conjunction with other devices, to determine a recipe for a production process.

[0432] The control system 20 may be connected to a bioreactor 30 constituting the equipment for performing the production process. The control system 20 and the computer system 10 may be located in different rooms of the same facility or in any two informatically connected (e.g. via a network) locations. The computer system 10 may be a separate entity from the control systems 20. In other examples (not shown) the computer system 10 may coincide with the control system 20.

[0433] In some examples, a database 40 may be provided. The database may be connected to a network, such that the database is accessible by multiple devices/users. The database may be implemented as a cloud database, i.e. a database that runs on a cloud computing platform. In other words, the database may be accessible over the Internet via a provider that makes shared processing resources and data available to computers and other devices on demand. The database may be implemented using a virtual machine image or a database service. The database may use an SQL based or NoSQL data model.

[0434] The database 40 may store any of: sets of values for recipe parameters, recipe template composites, process evolution information, setup specification, likelihood functions, utility functions, mappings. Communications between the database 40 and the computer system 10 may be secured, e.g. via Internet protocol security (IPSEC) or other security protocols. A virtual private network (VPN) may also be used.

[0435] The database 40 may be hosted by a service provider, possibly on a virtual machine, and may be accessible by various users from multiple organizations, possibly located in a variety of different geographic locations around the world. Alternatively, the database 40 may be hosted locally, e.g. in the computer system 10. Accordingly, the computer system 10 and the database 40 may or may not be located in physical proximity. In particular, the database 40 may be located in a location that is geographically distant (e.g. on another continent) from the computer system.

[0436] An example of a production process for which a recipe is determined by the computer system 10 may be a fed-batch process comprising the following phases: [0437] add media to bioreactor [0438] condition (set temperature, pH) [0439] add inoculum [0440] allow to grow in batch phase (control pH, DO, temperature; sample at intervals) [0441] when nutrients exhausted, move to fed phase [0442] allow to grow in fed phase (control pH, DO, temperature; sample at intervals; supply additional nutrients) [0443] harvest product.

[0444] FIGS. 3a to 3e show an exemplary method 300 for determining a recipe for a production process. The method will be described in conjunction with FIG. 2, which shows a block diagram indicating inputs and outputs of the recipe determination, and with FIG. 4, which shows the method from the perspective of the nested loops therein. FIG. 4 is divided into two halves that are connected at the dots according to the corresponding numbers. In the following, the method will be described in relation to a production process in a bioreactor.

[0445] Starting with FIG. 3a, at step 310 one or more recipe template composites 200 are retrieved, wherein each recipe template composite comprises one or more recipe templates. A recipe template is a recipe having one or more free parameters, the variable recipe parameters.

[0446] Further, at 310, process evolution information 210 is retrieved, wherein the process evolution information describes the time evolution of one or more process variables that describe a state of the production process. The process evolution information 210 characterises how the process variables change with time, exemplarily including initial conditions for the process variables and relations among the process variables.

[0447] In particular, the process evolution information 210 may comprise relations empirically derived from previous executions of the production process and/or equations derived by theoretical models about the evolution of the production process. These relations and equations may be characterized by various parameters, denoted as evolution parameters. At least one of these evolution parameters may be a free parameter.

[0448] Further, at 310, a likelihood function 220, a mapping 230 and a utility function 240 are retrieved. The likelihood function 220 is a probability distribution for the values of the evolution parameters. In other words, the likelihood function 220 indicates how likely it is that the process evolution information 210, given a certain value for an evolution parameter, models the production process in an accurate manner. An exemplary plot of a likelihood function 220 for two evolution parameters, growth rate and death rate, is shown in FIG. 5a.

[0449] The mapping 230 translates the values of the process variables into values of performance indicators. In some cases, it is not straightforward to understand the link between the process variables and the relevant outputs of the production process, and the mapping 230 provides exactly this link. FIG. 5b shows an exemplary plot of a mapping 230 that maps two process variables, inhibitor concentration and viable cell density, to product quality, which is a performance indicator.

[0450] The utility function 240 adds a further refinement to assessing the performance of the production process by associating a desirability value to the values of the performance indicators. In other words, the utility function 240 provides a quantitative characterization of the process performance in terms of the values of its performance indicators. An exemplary plot of a utility function 240 for two performance indicators, titre and quality of the product, is shown in FIG. 5c.

[0451] The recipe template composites 200, the process evolution information 210, the likelihood function 220, the mapping 230 and the utility function 240 may be retrieved by means of a file system. Exemplarily, these input data may be stored in a data storage device.

[0452] At 320, a recipe template composite is selected and a first optimisation step is performed. The first optimisation step is illustrated in FIG. 3b. At 410 a set of input values is provided for the variable recipe parameter(s) and, using this set of input values, at 415 a second optimisation step is performed, which gives a utility score as a result. At 420 a plurality of utility scores are obtained by performing the second optimisation step for different sets of input values. The repetition of the second optimisation step corresponds to loop 510 shown in FIG. 4.

[0453] The second optimisation step is illustrated in FIG. 3c. At 430 a set of likely values is provided for the variable evolution parameter(s) based on the likelihood function 220. Once the set of likely values has been assigned, at 435 a third optimisation step is performed, which gives a utility tally as a result. If there are other sets of likely values to be considered, the third optimisation step is repeated, as illustrated also by loop 520 in FIG. 4.

[0454] The third optimisation step is illustrated in FIG. 3d. At 445 a fourth optimisation step is performed, which gives a utility value as a result. If stochasticity is considered, the fourth optimisation step is repeated a predetermined or predeterminable number of times. The repetition of the fourth optimisation step corresponds to loop 530 in FIG. 4.

[0455] The fourth optimisation step is illustrated in FIG. 3e. At 455 a simulation of the production process is generated on the basis of the recipe template composite selected at 320, the set of input values provided at 410 and the set of likely values provided at 420. During the simulation, the values of the process variables over time are recorder in order to obtain trajectories 250 at 460. If there is more than one recipe template in the recipe template composite selected at 320, steps 455 and 460 are performed for each recipe template. The repetition of steps 455 and 460 corresponds to loop 540 in FIG. 4.

[0456] Once the loop 540 has completed, at 465 the utility value is computed by evaluating the utility function 240 using the values of the process variables from the trajectories 250 and the mapping 230.

[0457] Once one or more utility values have been obtained for the same recipe template composite, set of input values and set of likely values, depending on whether stochasticity is accounted for, the method proceeds at 450 by computing the utility tally from the one or more utility values.

[0458] Once one or more utility tallies have been obtained corresponding, respectively, to one or more sets of likely values, and for the same recipe template composite and set of input values, at 440 the utility score is computed by weighting the utility tally(ies) with the likelihood function 220.

[0459] From the plurality of utility scores obtained at 420, at 425 the optimal utility score is identified. In other words, the optimal set of input values that optimizes the utility score is identified. If there is only one recipe template composite, that recipe template composite 270 combined with the optimal set of input values 280 is determined to be the at least one recipe for the production process.

[0460] If there is more than one recipe template composite, each of the recipe template composites is selected in turn and at 330 the first optimisation step is repeated for all remaining recipe template composites. The repetition of the first optimisation step corresponds to the outermost loop 500 shown in FIG. 4.

[0461] Accordingly, a plurality of optimal utility scores corresponding to the plurality of recipe template composites has been identified and, then, at 340 the best optimal utility score 260 is identified among those, i.e. the maximum or minimum of the plurality of optimal utility scores. The combination of the recipe template composite 270 and the set of input values 280 associated with (i.e. that result in) the best optimal utility score 260 is determined at 350 to be the at least one recipe for the production process.

[0462] The general method illustrated above with reference to FIGS. 2, 3a-3e and 4 will be discussed in the following in relation to four specific exemplary scenarios.

First Scenario

[0463] FIGS. 6a to 6d show a conceptual diagram of an exemplary recipe determination and related plots. The first scenario involves a culture process with simple growth.

[0464] At 310, the process evolution information 210 is retrieved. The model for this production process may assume that viable cells reproduce at a maximum rate, r, and that growth is slowed down as the cell population approaches maximum capacity. The process evolution information 210 in this case comprises the following two differential equations:

[00019] $\begin{matrix} \frac{d V}{d t} = r V \frac{k_{V}}{k_{V} + I_{V}} \\ \frac{d I_{V}}{d t} = V \end{matrix}$

wherein V is the viable cell density (VCD) [cells/ml], r is the cell growth rate [hr.sup.-1], k.sub.v is half of a capacity constant [cells.hr/ml] and l.sub.v is the cumulative cell count [cells.hr/ml]. The process evolution information 210 in this case comprises only one variable evolution parameter, namely the cell growth rate.

[0465] Further, a single recipe template composite 200 comprising one recipe template is retrieved, and the recipe template has one variable recipe parameter, the harvest time. A mapping 230 is also retrieved, wherein the mapping 230 is a function of the VCD and is the identity function, i.e. the VCD is itself considered a performance indicator. Additionally, a utility function 240 is retrieved, wherein the utility function 240 is only a function of the VCD, U(V)= a* log(V), wherein a is a constant.

[0466] In this example retrieving the likelihood function 220 comprises retrieving akin data, relatedness information and historical data and computing the likelihood function 220 based on all those, as explained in the following.

[0467] The cell type that will be used for the production process is cell type B. The prior knowledge available includes historical data for cell type B, in particular trajectories for the process variables, D.sub.b and a prior on the growth rate for cell type B, r.sub.b, derived e.g. from literature or from experience, as well as akin data related to an akin production process that used cell type A together with relatedness information. The akin data include trajectories for the process variables, D.sub.a, and a prior for the growth rate of cell type A, r.sub.a. The relatedness information between process B (using cell type B) and process A (using cell type A) may be based on experimental/literature knowledge of cell line behaviours and/or derived from a comparison between the historical data and the akin data.

[0468] The prior on r.sub.b is a uniform distribution in the interval [0, 0.15) and the prior on r.sub.a is a uniform distribution in the interval [0.02, 0.12). For each cell type x, Bayesian inference may be used to obtain a posterior distribution p(r.sub.x | D.sub.x) ~ p(D.sub.x | r.sub.x) p(r.sub.x), wherein p(r.sub.x) is the prior and p(D.sub.x | r.sub.x) is the conditional distribution of the data for a given value of the growth rate. The value of the conditional distribution may be calculated at each data point e.g. by assuming a Gaussian distribution centred on the mean r.sub.x. The posterior distributions obtained in this manner are shown in FIG. 6b.

[0469] The relatedness information in this case is the probability of a value for r.sub.b given a value for r.sub.a. The relatedness is R(r.sub.b, r.sub.a) = N(rb- r.sub.a, σa.sub.b), wherein N is the normal distribution.

[0470] FIG. 6c shows two plots for the relatedness, in the upper plot the variance is set to 0.01 and in the lower plot the variance is set to 0.02. In this scenario the variance is assumed to be 0.01.

[0471] The available prior knowledge discussed above is merged in the joint probability distribution for r.sub.a and r.sub.b, p(r.sub.a, r.sub.b | D.sub.a, D.sub.b) ~ p(r.sub.a | D.sub.a) p(r.sub.b | D.sub.b) R (r.sub.b, r.sub.a). It should be noted that, if the relatedness is not taken into consideration, the joint probability reduces to p(r.sub.a | D.sub.a) p(r.sub.b | D.sub.b). FIG. 6d shows the difference between the normalized joint probability without relatedness (plot on the left) and with relatedness (plot on the right). The probability landscape is more constrained and the maximum probability area is more than two times higher.

[0472] The likelihood function 220, L(r.sub.b), is computed as the integral over the joint probability distribution:

[00020] $L (r_{b}) = \int R (r_{b}, r_{a}) p (r_{b} | D_{b}) p (r_{a} | D_{a}) d r_{a}$

After having retrieved all the necessary inputs, the first optimisation step is performed at 320, which comprises providing 410 an input value (set of one element) for the harvest time and then performing 415 the second optimisation step. The second optimisation step comprises providing 430 a likely value (set of one element) for the growth rate based on the likelihood function 220. In this example, the likely value for r.sub.b is chosen as the most likely value, i.e. the value that maximises the likelihood function

[00021] ${\bar{r}}_{b} =$

argmax L(r.sub.b). Then at 435 the third optimisation step is performed. No other values are considered for the growth rate, this means there is no loop within the second optimisation step and the utility score computed at 440 will coincide with the only utility tally. Further, if stochasticity is not considered, the utility tally computed at 450 will coincide with the utility value computed at 465 by performing the fourth optimisation step at 445.

[0473] In the fourth optimisation step, at 455 a simulation is generated using the process evolution information 210 of equations (5) with the most likely value for the growth rate and given initial conditions, the retrieved recipe template composite 200 comprising only one recipe template, and the input value for the harvest time, denoted e.g. with ht.sub.1. From the simulation, the values of the VCD at different times are obtained at 460, i.e. the trajectory for the VCD. In particular, in order to evaluate the utility function 240, the value of the VCD at the time coinciding with the harvest time may be considered. At 465 the utility value is then computed as U[V(ht.sub.1)].

[0474] At 420 the above steps are repeated and a plurality of utility scores U[V(ht.sub.1)], ... U[V(ht.sub.n)] is obtained, with n greater than or equal to 2. The plurality of input values for the harvest time may be retrieved together with the recipe template composite, or input by a user, or drawn from a distribution according to certain criteria.

[0475] Alternatively, in this very specific case, instead of performing the loop to obtain the plurality of utility scores, the trajectory for the VCD obtained from a single simulation may be used to compute U[V(ht.sub.1)], ... U[V(ht.sub.n)], since the trajectory is indeed V(t). This “shortcut” can be generally applied to cases in which the variable recipe parameter is a timepoint e.g. for performing an action, because the trajectories already provide the values of the process variables at different times. The same holds if there are more variable recipe parameters and each one of them is a timepoint.

[0476] At 425 the score function US(ht) = U[V(ht)] is optimised e.g. using numerical methods in order to find the optimal utility score and the associated ht.sup.opt. The recipe template with harvest time ht.sup.opt is provided to the control system of the bioreactor, so that harvest is carried out automatically at ht.sup.opt, thereby optimizing the utility score and, thus, the production process.

Second Scenario

[0477] FIGS. 7a to 7f show a conceptual diagram of an exemplary recipe determination and related plots. The second scenario involves a culture process with growth and apoptosis with inhibitor production.

[0478] At 310, the process evolution information 210 is retrieved. The model for this production process may assume that viable cells reproduce at a maximum rate, r, and that growth is slowed down is slowed by an inhibitory toxin which is produced as a byproduct of cell growth. Further, cells have a finite lifetime and become apoptotic at a constant rate δ.sub.v, and apoptotic cells quickly lyse at rate δ.sub.A, after which they can no longer be detected. The process evolution information 210 in this case comprises the following three differential equations:

[00022] $\begin{matrix} \frac{d V}{d t} = \frac{r V}{1 + e^{- s (τ - c)}} - δ_{V} V \\ \frac{d A}{d T} = δ_{V} V - δ_{A} A \\ \frac{d τ}{d τ} = \frac{θ r V}{1 + e^{- s (τ - c)}} \end{matrix}$

wherein V is the viable cell density [cells/ml], r is the cell growth rate [hr.sup.-1], s is the sensitivity to inhibitor [mM], c is the sigmoid centre for inhibitor response [mM], δ.sub.v is the apoptosis rate [hr.sup.-1], A is the apoptotic cell density [cells/ml], δ.sub.A is the lysis rate [hr.sup.- .sup.1], τ is the inhibitor concentration [mM] and θ is the inhibitor production [.Math.mol/cell]. The process evolution information 210 in this case comprises two variable evolution parameters, namely the cell growth rate and the inhibitor production.

[0479] Further, a single recipe template composite 200 comprising one recipe template is retrieved, and the recipe template has one variable recipe parameter, the harvest time. A mapping 230 is also retrieved, wherein the mapping 230 is a function of the VCD and the inhibitor concentration and is the identity function, i.e. V and τ are both process variables and performance indicators. Additionally, a utility function 240 is retrieved, wherein the utility function 240 depends not only on reaching sufficient cell density but also on the amount of inhibitor in the media, namely

[00023] $U (V, τ) = \{\begin{matrix} \frac{V [1 - (e^{0.14 τ} - 1)]}{1 + e^{0.4 (15 - V)}} & i f V > 13 \cdot 10^{5} cells/ml \\ 0 & i f V \leq 13 \cdot 10^{5} cells/ml \end{matrix}) .$

In this example retrieving the likelihood function 220 comprises retrieving akin data related to two different akin processes, A and B, and corresponding relatedness information and computing the likelihood function 220 based on all those, as explained in the following.

[0480] The production process C for which an optimal recipe should be determined uses a certain cell type at a large scale. The prior knowledge available includes akin data for akin production process A, D.sub.a, which uses the same cell type as C but at a smaller scale, and akin data for production process B, D.sub.b, which uses a different cell type than C but at the same scale as C. For example, the datasets D.sub.a and D.sub.b may include trajectory data from a large number of runs.

[0481] For each process (A and B), Bayesian inference may be used to obtain a posterior distribution p(r.sub.x, θ.sub.X | D.sub.x) ~ p(D.sub.x | r.sub.x, θ.sub.x), wherein p(D.sub.x | r.sub.x, θ.sub.x) is the conditional distribution of the data for a given value of the growth rate and the inhibitor production, while the prior is uniform. The conditional distribution may be obtained by computing the probability that, for each run in D.sub.x, the run data could have been produced if the “true” values for the evolution parameters had been r.sub.x, θ.sub.x. For example, the run data in D.sub.x may comprise a time series of cell densities, a(t), and simulated data for the times series, b(t), may be obtained based on the evolution parameters r.sub.x, θ.sub.x. Exemplarily, the probability is estimated as the integral of N(a(t)-b(t), σ), where σ is an estimate of the standard error of the measurement system for the cell density. The posterior distributions obtained in this manner are shown in FIG. 7b.

[0482] The relatedness information may be based on experimental/literature knowledge about the production processes. The relatedness information in this case is the probability of value for the evolution parameters for process C given values for evolution parameters for process A/B. It may be assumed that R(A, C) = R(r.sub.c, r.sub.a)* R(θ.sub.c, θ.sub.a ), and similarly for B.

[0483] As mentioned, the cell type for processes A and C is the same. According to the scientific literature, this cell type slows down its processes at the scale of C due to the limits on the dissolved oxygen. Even if the dissolved oxygen is not an evolution parameter in the process evolution information, this information can be taken into consideration in the relatedness between A and C. In particular, the relatedness is chosen as a distribution with a longer tail towards the lower end, reflecting the probability that both the growth rate and the inhibitor production will be lower for process C than for process A. FIG. 7c shows on the left side a plot for the relatedness R(r.sub.c, r.sub.a), assuming a distribution with variance 0.02 and skewness 4, and on the right side a plot for the relatedness R(θ.sub.c, θa), assuming a distribution with variance 0.03 and skewness 4.

[0484] The cell types for processes B and C are different but similar, and it is assumed that they behave similarly in terms of growth rate and inhibitor production, although there is a larger uncertainty about the latter, which is reflected in the choice of a larger variance. FIG. 7d shows on the left side a plot for the relatedness R(r.sub.c, r.sub.b), assuming a normal distribution with variance 0.01, and on the right side a plot for the relatedness R(θ.sub.c, θb), assuming a distribution with variance 0.04.

[0485] The posterior distributions obtain from the akin data and the relatedness information are combined to compute the likelihood function 220 for process C. Specifically,

[00024] $\begin{array}{l} L (r_{c}, θ_{c}) = \frac{1}{2} [L (r_{c}, θ_{c} | A) + L (r_{c}, θ_{c} | B)] \\ = \frac{1}{2} [\int R (r_{c}, r_{a}) R (θ_{c}, θ_{a}) p (r_{a}, θ_{a} | D_{a}) d r_{a} d θ_{a}) \\ (+ \int R (r_{c}, r_{a}) R (θ_{c}, θ_{a}) p (r_{a}, θ_{a} | D_{a}) d r_{a} d θ_{a}] \end{array}$

[0486] After having retrieved all the necessary inputs, the first optimisation step is performed at 320, which comprises providing 410 an input value (set of one element) for the harvest time and then performing 415 the second optimisation step. The second optimisation step comprises providing 430 a set of likely values for the variable evolution parameters based on the likelihood function 220 of equation (9). In this example, the likely values for the growth rate and the inhibitor production are chosen as the most likely values, i.e. the values that maximise the likelihood function

[00025] $({\bar{r}}_{c},$

[00026] $({\bar{θ}}_{c})$

= argmax L(r.sub.c, θ.sub.c). Then at 435 the third optimisation step is performed. No other values are considered for the variable evolution parameters, this means there is no loop within the second optimisation step and the utility score computed at 440 will coincide with the only utility tally. Further, if stochasticity is not considered, the utility tally computed at 450 will coincide with the utility value computed at 465 by performing the fourth optimisation step at 445.

[0487] In the fourth optimisation step, at 455 a simulation is generated using the process evolution information 210 of equations (7) with the most likely values for (r.sub.c, θ.sub.c) and given initial conditions, the retrieved recipe template composite 200 comprising only one recipe template, and the input value for the harvest time, denoted e.g. with ht.sub.1. From the simulation, the values of the process variables (in particular VCD and inhibitor concentration) at different times are obtained at 460, i.e. the trajectories. The left-side plot in FIG. 7f shows the trajectories for the process variables V, τ and A.

[0488] In this example, similarly to the first scenario, the utility score function can be directly obtained from the trajectories, which already provide V and τ as a function of time, without looping over the second optimisation step. The values of the process variables are fed to the utility function 240 of equation (8), a plot of which is shown on the right side of FIG. 7f.

[0489] At 425 the score function US(ht) = U[V(ht), τ (ht)] is optimised e.g. using numerical methods in order to find the optimal utility score and the associated ht.sup.opt. In this example the optimal utility score equals 13.13, while ht.sup.opt = 108 hours. These values correspond to V = 23.65*10.sup.5 cells/ml and τ= 2.54 mM. The recipe template with harvest time ht.sup.opt is provided to the control system of the bioreactor, so that harvest is carried out automatically at ht.sup.opt, thereby optimizing the utility score and, thus, the production process C.

Third Scenario

[0490] FIGS. 8a to 8c show a conceptual diagram of an exemplary recipe determination and related plots. The third scenario involves a culture process with growth, inhibitor production and protein production.

[0491] At 310, the process evolution information 210 is retrieved. The model for this production process may assume that viable cells reproduce at a maximum rate, r, and that growth is slowed down is slowed by an inhibitory toxin which is produced as a byproduct of cell growth. Further, cells have a finite lifetime and become apoptotic at a constant rate δv Viable cells produce protein with activity r and protein is degraded by the build-up of the inhibitor at a rate δτ. The process evolution information 210 in this case comprises the following four differential equations:

[00027] $\frac{d V}{d t} = r V \frac{k}{k + I_{τ}} - δ_{V} V$

[00028] $\begin{matrix} \frac{d I_{τ}}{d t} = τ \\ \frac{d τ}{d t} = θ r V \frac{k}{k + I_{τ}} \\ \frac{d p}{d t} = ρ V - p τ δ_{τ} \end{matrix}$

[0492] Further, a mapping 230 is retrieved, wherein the mapping 230 is a function of the VCD and the viable protein concentration and is the identity function, i.e. V and p are both process variables and performance indicators. Additionally, a utility function U(V,p) 240 is retrieved.

[0493] In this scenario, the production process has already started and the recipe is updated and optimized on the fly. Accordingly, the recipe template composite 200 retrieved corresponds to the recipe currently being used, in which the harvest time is the variable recipe parameter whose value has to be chosen in order to optimize the utility function.

[0494] In this example, retrieving the likelihood function 220 comprises retrieving current data related to an ongoing run of the production process B, D.sub.b, akin data related to an akin process A, D.sub.a, as well as corresponding relatedness information and computing the likelihood function 220 based on all those, as explained in the following. Process B uses cell type B and process A uses cell type A.

[0495] In particular, the dataset D.sub.a may include trajectory data from a large number of runs. For process A, Bayesian inference may be used to obtain a probability or posterior distribution p(r.sub.a, k.sub.a | D.sub.a) ~ p(D.sub.a | r.sub.a, k.sub.a), wherein p(D.sub.a | r.sub.a, k.sub.a) is the conditional distribution of the data given values for the growth rate r, and the inhibitor concentration saturation constant k, while the prior is again uniform. The conditional distribution may be obtained by computing the probability that, for each run in D.sub.a, the run data could have been produced if the “true” values for the evolution parameters had been r.sub.a, k.sub.a.

[0496] Indeed, if data relative to more than a single run are available, the probability surfaces can be combined additively to give an increasingly more informative description of the probability landscape. The larger the dataset, the closer is the probability landscape to predicting the population mean for each of the evolution parameters of interest. FIG. 8b shows the probability surface for r and k as obtained from data D.sub.a when the data relate to a single run (left side) and to 500 runs (right side). The estimates for the population means for rand kfrom a single run are r= 0.054 and k = 17.2. The estimates for the population means for rand k from 500 runs are r = 0.069 and k = 15.7.

[0497] The relatedness information may be based on experimental/literature knowledge about the production processes and may be a joint distribution over the two variable evolution parameters, R(r.sub.b, r.sub.a, k.sub.b , k.sub.a), e.g. a multivariate normal distribution having standard deviation equal to 0.012 for the growth rate and standard deviation equal to 0.008 for the inhibitor saturation constant (the off-diagonal terms of the covariance matrix are set to zero).

[0498] Based on the partial data D.sub.b from the ongoing process, it is also possible to obtain a posterior or probability distribution p(r.sub.b, k.sub.b | D.sub.b) similarly to what explained above for A. Since it is based on a single, partial run, this distribution may not be particularly informative, but it can be combined with p(r.sub.a, k.sub.a | D.sub.a) and the relatedness information R(r.sub.b, r.sub.a, k.sub.b , k.sub.a) to obtain a likelihood function 220 that is more informative, namely

[00029] $L (r_{b}, k_{b}) = \int \int p (r_{b}, k_{b} | D_{b}) R (r_{b}, r_{a}, k_{b}, k_{a}) p (r_{a}, k_{a} | D_{a}) d r_{a} d k_{a}$

[0499] FIG. 8c shows the difference between p(r.sub.b, k.sub.b | D.sub.b) on the left and L(r.sub.b, k.sub.b ) on the right.

[0500] After having retrieved all the necessary inputs, the first optimisation step is performed at 320, which comprises providing 410 an input value (set of one element) for the harvest time and then performing 415 the second optimisation step. The second optimisation step comprises providing 430 a set of likely values for the variable evolution parameters based on the likelihood function 220 of equation (11). In this example, the likely values for the growth rate and the inhibitor concentration capacity are chosen as the most likely values, i.e. the values that maximise the likelihood function (r.sub.b,

[00030] $({\bar{k}}_{b})$

= argmax L(r.sub.b,k.sub.b ). Then at 435 the third optimisation step is performed. No other values are considered for the variable evolution parameters, this means there is no loop within the second optimisation step and the utility score computed at 440 will coincide with the only utility tally. Further, if stochasticity is not considered, the utility tally computed at 450 will coincide with the utility value computed at 465 by performing the fourth optimisation step at 445.

[0501] In the fourth optimisation step, at 455 a simulation is generated using the process evolution information 210 of equations (10) with the most likely values for (r.sub.b, k.sub.b) and given initial conditions, the recipe being used in the ongoing run, and the input value for the harvest time, denoted e.g. with ht.sub.1. From the simulation, the values of the process variables at different times are obtained at 460, i.e. the trajectories.

[0502] In this example, similarly to the first and second scenarios, the utility score function can be directly obtained from the trajectories, which already provide V and p as a function of time, without looping within the first optimisation step. At 425 the score function US(ht) = U[V(ht), p (ht)] is optimised e.g. using numerical methods in order to find the optimal utility score and the associated ht.sup.opt. The value for harvest time ht.sup.opt is provided to the control system of the bioreactor, so that the recipe is updated on the fly and harvest is carried out automatically at ht.sup.opt, thereby optimizing the utility score and, thus, the production process B.

[0503] Once the run is completed, the trajectories data are stored so that they can be part of the prior knowledge space for subsequent runs.

Fourth Scenario

[0504] FIGS. 9a to 9e show a conceptual diagram of an exemplary recipe determination and related plots. The fourth scenario involves a culture process with pH-dependant growth, inhibitor production and protein production.

[0505] At 310, the process evolution information 210 is retrieved. The model for this production process may assume that viable cells reproduce at a maximum rate, r, and that growth is slowed down is slowed by an inhibitory toxin which is produced as a byproduct of cell growth. Further, cells have a finite lifetime and become apoptotic at a constant rate δ.sub.v. Viable cells produce protein with activity ρ and protein is degraded by the build-up of the inhibitor at a rate δτ. The cell growth rate is dependent on the pH being maintained around an optimal level pH.sub.gro, while deviations on either side of this optimum will inhibit growth. The protein is stable for high pH but is denatured at low pH at a rate δ.sub.pH and with centre point of the response c.

[0506] The process evolution information 210 in this case comprises the following four differential equations:

[00031] $\begin{matrix} \frac{d V}{d t} = r V \frac{k}{k + I_{τ}} \cdot e^{- \frac{(p H - p H_{g r o}) 2}{2 σ^{2}}} - δ_{V} V \\ \frac{d I_{τ}}{d t} = τ \\ \frac{d τ}{d t} = θ r V \frac{k}{k + I_{τ}} \\ \frac{d p}{d t} = ρ V - p τ δ_{τ} - \frac{p δ_{p H}}{1 + e^{- s (p H - c)}} \end{matrix}$

wherein V is the viable cell density [cells/ml], r is the cell growth rate [hr.sup.-1], k is the inhibitor concentration capacity [mM*hr/ml], δ.sub.v is the apoptosis rate [hr.sup.-1], l.sub.τ is the cumulative inhibitor concentration [inhibitor*hr/ml], τ is the inhibitor concentration [mM], θ is the inhibitor production [.Math.mol/cell], δ.sub.τ is the protein degradation [mM.sup.-1*hr.sup.-1], p is the viable protein concentration [units/ml], ρ is the protein production rate [units/cell*hr], pH.sub.gro is the optimal pH for growth [pH], σ is the tolerance to deviation from pH.sub.gro [pH], δ.sub.pH is the rate of pH-mediated protein degradation [hr.sup.-1], s is the sensitivity to pH change [pH.sup.-1] and c is the centre point of response [pH]. The process evolution information 210 in this case comprises three variable evolution parameters, namely the cell growth rate, the inhibitor production and the protein production rate.

[0507] Besides the process evolution information, two recipe template composites 200 are retrieved, each comprising a single recipe template. The first recipe template involves a shift of the pH during the execution and comprises three variable recipe templates, the starting value for the pH, the magnitude of the pH shift, Δ.sub.pH, and the time of the shift, t.sub.pH. The second recipe template comprises one variable recipe template, the value at which the pH is set, and according to this recipe template the pH is kept constant.

[0508] Further, a mapping 230 is retrieved, wherein the mapping 230 is a function M of the viable protein concentration and associates it to the titre T as performance indicator. Additionally, a utility function 240 is retrieved. The goal of the optimisation is not simply to produce the most protein but to consider also the costs of the process, in particular due to the running time ft. The utility function 240 is given by U(T,ft) = 10T

[00032] $6 \sqrt{f t}$

(13) and is plotted in FIG. 9b. It can be seen that a gain due to increasing the protein titre is weighted against the increased running time cost of achieving the titre.

[0509] In this example, retrieving the likelihood function 220 comprises retrieving historical data related to one or more previous runs of the production process for which the recipe has to be optimized and computing the likelihood function 220 based on those, as explained in the following. In particular, the historical data may include trajectory data for at least some of the process variables. FIG. 9c comprises two exemplary plots showing the trajectories for the VCD and the viable protein concentration, for a run with pH = 6.9 (on the left) and a run with pH = 7.4 (on the right).

[0510] Based on the historical data, Bayesian inference may be used to obtain the likelihood function L(r, ρ, θ) as the probability or posterior distribution p(r, ρ, θ| D) ~ p(D | r, ρ, θ), wherein p(D | r, ρ, θ) is the conditional distribution of the data for a given value of the growth rate and the inhibitor concentration capacity, while the priors are uniform. The conditional distribution may be obtained by computing the probability that, for each run in D, the run data could have been produced if the “true” values for the evolution parameters had been r, ρ, θ. FIG. 9d shows a slice of the three-dimensional likelihood space for a fixed θ equal to 0.1. The inferred distribution parameters for the likelihood function are, given X = [r, θ, ρ],

[00033] $\begin{matrix} μ_{X} = [0.07, 0.1, 0.01] \\ Σ_{X} = (\begin{matrix} 64 & 0.32 & - 2.5 \\ 0.32 & 4.0 & 0.032 \\ - 2.56 & 0.032 & 2.56 \end{matrix}) \times 10^{- 6} \end{matrix}$

After having retrieved all the necessary inputs, the first recipe template is selected and the first optimisation step is performed at 320, which comprises providing 410 a set of input values for pH, Δ.sub.pH, and t.sub.pH, and then performing 415 the second optimisation step. The second optimisation step comprises providing 430 a set of likely values for the variable evolution parameters r, θ, ρ based on the likelihood function 220. In this example, the likely values may be drawn randomly with the probability of selection dependant on the likelihood function 220.

[0511] Then at 435 the third optimisation step is performed. If stochasticity is not considered, the third optimisation step coincides with the fourth optimisation step and the utility tally computed at 450 will coincide with the utility value computed at 465 by performing the fourth optimisation step at 445. In the fourth optimisation step, at 455 a simulation is generated using the process evolution information 210 of equations (12) with the provided set of likely values, given initial conditions, the first recipe template, and the set of input values for the variable recipe parameters.

[0512] From the simulation, the values of the process variables at different times are obtained at 460, i.e. the trajectories, in particular for the protein concentration p. Via the mapping, the values for the titre T can be obtained and fed to the utility function 240 of equation (13), in particular T at the final time, which is also the running time, T(ft).The utility value is then UV=U(T(ft), ft] and is associated to the given recipe template composite, the given set of input values and the set of likely values.

[0513] The third/fourth optimisation step is repeated by providing different sets of likely values, thereby obtaining a plurality of utility values/utility tallies UV.sub.1, UV.sub.2, ... UV.sub.n. At 440 the utility score for a specific set of input values

[00034] $(\bar{p H}, \bar{t_{P H}}, \bar{Δ_{p H}})$

is computed by weighting the utility values by the likelihood function 220:

[00035] $U S (\bar{p H}, \bar{t_{p H}}, \bar{Δ_{p H}}) | .Math.R_{1}.Math. = \int \int \int U V (r, θ, ρ) | .Math.\bar{p H}, \bar{t_{p H}}, \bar{Δ_{p H}}.Math. L (r, θ, ρ) d r d θ d ρ$

At 420 the second optimisation step is repeated by iterating over different values of pH, Δ.sub.pH, t.sub.pH, thereby obtaining a plurality of utility scores, from which a score function US (pH,t.sub.pH,Δ.sub.pH)|.Math.R.sub.1.Math. is defined, and at 425 the optimal utility score is identified, namely by optimizing the score function

[00036] $p H^{o p t}, t_{p H}^{o p t}, Δ_{p h}^{o p t} = \underset{p H, t_{p H}, Δ p H}{\arg \max} U S (p H, t_{p H}, Δ_{p H}) | .Math.R_{1}.Math.$

[0514] The optimal utility score

[00037] $U S_{R 1}^{o p t}$

= US (pH.sup.opt,

[00038] $t_{p H}^{o p t}$

, Δ.sub.pH.sup.opt)|.Math.R.sub.1.Math. associated with the first recipe template R.sub.1 equals 78.9. FIG. 9e shows on the left side the utility score as a function of Δ.sub.pH, t.sub.pH for an intial pre-shift pH of 7.1.

[0515] The whole procedure is repeated again at 330 for the second recipe template R.sub.2, obtaining an optimal utility score associate thereto,

[00039] $U S_{R 2}^{o p t}$

= 37.55. At 340 the best optimal utility score is identified, which in this case is

[00040] $U S_{R 1}^{o p t} .$

FIG. 9e shows in the upper plot on the right side the trajectories of the simulation performed with the first recipe template and the values of the variable recipe parameters that give the optimal utility score of 78.93, namely starting pH = 7.1, Δ.sub.pH = 0.24 and t.sub.pH = 106 hours. The lower plot on the right side of FIG. 9e shows the trajectories of the simulation performed with the second recipe template and the value that gives the optimal utility score of 37.55, namely constant pH = 7.1.

[0516] The first recipe template with the values starting pH = 7.1, Δ.sub.pH = 0.24 and t.sub.pH = 106 hours is provided to the control system of the bioreactor, so that the production process is optimised, i.e. executed in a manner that maximises utility.

OPTIMISATION OF PROCESSES FOR THE PRODUCTION OF CHEMICAL, PHARMACEUTICAL AND/OR BIOTECHNOLOGICAL PRODUCTS

Assignee

Inventors

Cpc classification

Classification Explorer

G05B19/4155

PHYSICS

Classification Explorer

G05B2219/32287

PHYSICS

Classification Explorer

G05B17/02

PHYSICS

International classification

Classification Explorer

G05B19/4155

PHYSICS

Abstract

Claims

Description