MULTIVARIATE APPROACH FOR CELL SELECTION

Abstract

According to some aspects of the disclosure, a computer-implemented method, a computer program and a process control device for selecting at least one set of target cells from multiple sets of candidate cells are provided. The method can include receiving data collected from a plurality of processes, wherein each of the processes produces a distinct set of candidate cells. The method further comprises the received data including values of process outputs being a product quality attribute or a key performance indicator for selecting the target cells.

Claims

1. A computer-implemented method for selecting at least one set of target cells from multiple sets of candidate cells, the method comprising: receiving data collected from a plurality of processes, wherein each of the processes produces a distinct set of candidate cells, the received data including values of process parameters and process outputs of the processes, each of the process outputs being a product quality attribute or a key performance indicator for selecting the target cells; correlating the received data; receiving a selection of the process parameters and a selection of the process outputs; receiving multivariate evaluation criteria for the selected process parameters and/or the selected process outputs, the multivariate evaluation criteria including one or more of the following: weights for prioritization; prioritization ranges and/or targets, wherein each target is an extremumand/or a target value; calculating, via a multivariate selection function, scores for each one of the sets of candidate cells from the correlated data according to the multivariate evaluation criteria; ranking the sets of candidate cells according to the scores; and selecting at least one of the sets of candidate cells as the target cells using the ranking.

2. The method of claim 1, wherein the target cells are at least one of the following: a cell line, a cell strain, a clone.

3. The method of claim 1, wherein the values of the process parameters include time series values, wherein the process parameters were controlled and/or measured during each of the processes.

4. The method of claim 1, wherein the received data includes substantially all data from each of the processes.

5. The method of claim 1, further comprising: identifying whether the received data for any of the processes is incomplete, wherein one of the processes is identified as having incomplete data when data is not collected during a portion of the process; when any of the processes has incomplete data, predicting values for the incomplete data using at least one multivariate technique, wherein the multivariate technique may include partial least squares regression or interpolation.

6. The method of claim 1, wherein the correlating includes verifying and correcting values of the data, wherein the correcting comprises revising or excluding values that violate one or more known metabolic dependencies.

7. The method of claim 1, further comprising: applying mechanistic modelling to the received data to obtain additional values of the process parameters and/or additional process outputs; supplementing the received data with the additional values of the process parameters and/or the additional process outputs.

8. The method of claim 1, further comprising: excluding, from the correlated data, data received from ones of the processes according to exclusion criteria; if at least one of the selected process outputs has a corresponding acceptability range, then the exclusion criteria include the corresponding acceptability range for the at least one of the selected process outputs.

9. The method of claim 1, wherein the evaluation criteria further comprise: a time based profile of one or more of the process parameters, a profile describing one or more of the process outputs, a trajectory describing time based development of one or more of the process parameters.

10. The method of claim 9, further comprising: displaying the correlated data for the selected process parameters and/or the selected process outputs, comprising, displaying correlation patterns for the glycan profiles of the sets of candidate cells.

11. The method of claim 1, wherein the selection function includes an objective function, particularly a cost function.

12. The method of claim 1, wherein the selection function includes at least one magnifying function, wherein the magnifying function magnifies a distance between values, wherein each of the prioritization ranges and/or targets has an associated magnifying function, and wherein the magnifying function is non-linear, particularly exponential.

13. The method of claim 1, wherein there are at least 5 sets of candidate cells, at least 10 sets of candidate cells, at least 20 sets of candidate cells, at least 30 sets of candidate cells, or at least 50 sets of candidate cells.

14. A computer program comprising computer-readable instructions, which, when loaded and executed on a computer system, cause the computer system to perform operations according to the method of claim 1.

15. A process control device for selecting at least one set of target cells from multiple sets of candidate cells, the device comprising: a plurality of vessels, each of the vessels being configured to contain fluid including one of the sets of candidate cells; a robot capable of addressing each of the vessels, dispensing fluid to each of the vessels, and extracting samples of fluid from each of the vessels; a controller operable to: control, at least partly in parallel, conditions in each of the vessels; receive data collected from a plurality of processes, wherein each of the processes produces a distinct set of candidate cells, the received data including values of process parameters and process outputs of the processes, each of the process outputs being a product quality attribute or a key performance indicator for selecting the target cells; correlate the received data; receive a selection of the process parameters and a selection of the process outputs; receive multivariate evaluation criteria for the selected process parameters and/or the selected process outputs, the multivariate evaluation criteria including one or more of the following: weights for prioritization; prioritization ranges and/or targets, wherein each target is an extremum and/or a target value; calculate, via a multivariate selection function, scores for each one of the sets of candidate cells from the correlated data according to the multivariate evaluation criteria; rank the sets of candidate cells according to the scores; and select at least one of the sets of candidate cells as the target cells using the ranking.

16. The device of claim 15, wherein the each of the vessels has at least one of the following characteristics: it is a bioreactor or a microbioreactor; it includes stirring means for stirring its contents, wherein the stirring means may be an impeller; it includes delivery means for gas delivery, wherein the delivery means may include a sparge tube; it includes sensing means for measuring at least one of the following: pH, dissolved oxygen, temperature; it has a volume of: at least 1 ml, at least 10 ml, at least 15 ml, less than 2000 L, less than 1000 L, less than 100 L, less than 50 l, less than 5 l, less than 1 L; and it is disposable.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0075] FIG. 1 shows steps that may be performed in a method for selecting at least one set of target cells from multiple sets of candidate cells.

[0076] FIG. 2 shows an output of the method for selecting at least one set of target cells from multiple sets of candidate cells in which the set of target cells is highlighted.

[0077] FIG. 3 displays evaluations of multiple processes according to multivariate evaluation criteria.

[0078] FIG. 4 shows glycan profiles for multiple sets of candidate cells.

[0079] FIG. 5 displays multivariate evaluations of multiple sets of candidate cells.

[0080] FIG. 6 also shows steps that may be carried out as part of the method for selecting at least one set of target cells from multiple sets of candidate cells.

[0081] FIG. 7 shows an exemplary use of mechanistic modeling to smooth process outputs.

[0082] FIG. 8 is a perspective view from above of a portion of a process control device.

[0083] FIG. 9 shows a cross-sectional view of a vessel of the process control device.

DETAILED DESCRIPTION

[0084] In the following text, a detailed description of examples will be given with reference to the drawings. It should be understood that various modifications to the examples may be made. In particular, one or more elements of one example may be combined and used in other examples to form new examples.

[0085] FIG. 1 shows steps that may be carried out in a method for selecting at least one set of target cells from multiple sets of candidate cells. At step S101, data collected from a plurality of processes is received. Each of the processes produces a distinct set of candidate cells. The received data includes values of process parameters and process outputs of the processes. Each of the process outputs is a product quality attribute or a key performance indicator for selecting the target cells.

[0086] In the following example, the sets of candidate cells are clones and the target cells are the best clone in the sets of candidate clones. It should be understood that although the following example is described in the context of clones, it is applicable to other types of cells.

[0087] The plurality of processes are carried out in parallel via a process control device (shown in FIGS. 8 and 9 and discussed in more detail below). More specifically, there are 24 processes carried out in 24 vessels. Each vessel has about 11 to 15 ml working volume. Data from offline measurements (e.g., glycan measurements, glucose, lactate, viable cell density (VCD), amino acid levels, monoclonal antibody concentration) is received at the process control device.

[0088] The received data is organized into two separate tables (not shown) for 24 candidate clones. A first table is a process data table for seven process parameters including cell viability, process concentration, and specific productivity. Process parameter values were obtained over the course of several days taking one measurement per day. The second data table is a quality data table. The quality data table includes process outputs, specifically, one glycan profile having 13 separate process outputs (i.e., measurements) for each clone and a calculated distance from a target profile.

[0089] Step S101 may also include correlating the received data. Step S101 may also include receiving a selection of the process parameters and a selection of the process outputs. In the present case, the selection may be represented by the data stored in the process data table (i.e., the selection of process parameters) and the quality data table (i.e., the selection of process outputs).

[0090] At step S103, multivariate evaluation criteria for the selected process parameters and/or the selected process outputs is received. According to the present example, the multivariate criteria includes the following four prioritization targets: [0091] final product concentration: maximize, but at least minimum product concentration value (e.g. 3.5 g/L); [0092] final specific productivity: maximize, but at least minimum specific productivity value (e.g. 4.5 grams per cell per day); [0093] quality (distance from profile of candidate cells to profile of reference cells): minimize, but not more than maximum distance value (e.g. 15 units); [0094] final viability: maximize, but at least minimum viability value (e.g. 65%).

[0095] The distance specified for the quality prioritization target may be a Euclidian distance, calculated according to the formula specified in the context of S105.

[0096] The prioritization targets may be listed in order of priority. Priorities may be set by weights, as discussed in more detail below. For example, the final product concentration may have a higher weight than the quality.

[0097] In the example above, each of the prioritization targets includes an extremum; three of the prioritization targets include a maximum and one prioritization target includes a minimum. Each of the prioritization targets also includes a target value (particularly the minimum product concentration value, the minimum specific productivity value, the maximum distance value and/or the minimum viability value, e.g., 3.5 g/L, 4.5 grams per cell per day, 65% or 15 units, as specified above, respectively).

[0098] It should be noted that although the example above only includes prioritization targets;

[0099] prioritization ranges may be supplied as an addition or alternative.

[0100] The example is carried out in the process control device described above. Accordingly, when the four prioritization targets are applied to the 24 candidate clones, no clone meets all criteria. However, lowering the limit for final viability (i.e. the minimum viability value) from 65% to 60% leads to the result that a clone referred to as CS1_7 is selected as the target clone. CS1_7 has a product concentration of 4.5 g/L, a specific productivity of 4.51 grams per cell per day, a quality of 14 and a final viability of 60%. Clone CS1_7 would be selected as the target clone according to conventional approaches, particularly because conventional approaches typically rely on at least one hard limit that leads to the exclusion of clones that do not meet the hard limit.

[0101] Conventional approaches including hard limits can be automated using a decision tree. Further, multiple hard limits can be set by the user. However, use of one or more hard limits may lead to strict exclusion of clones that do not meet those limits and might not result in the selection of the best (i.e., optimal) target clone.

[0102] Applying a multivariate approach, as discussed in more detail in the following steps, leads to selection of a clone that better matches the given criteria, increasing the likelihood that the selected clone will lead to a safe and effective product. The clone selected according to the following steps would have been excluded according to the traditional approach because the specific productivity of the clone falls just outside the specified limit of 4.5 grams per cell per day. The steps discussed below enable process parameter values and process outputs to be combined into a common score. Scores for candidate clones can then be sorted to provide a final ranking leading to the selection of a given number of clones. A multivariate selection function (discussed in more detail below) used in the ranking accounts for each prioritization target and/or prioritization range, as well as weights for prioritization. Accordingly, a consistently ranked list of available clones is provided. Subjectivity is excluded and each ranking performed on the same data (even by different users) leads to a reliable and consistent result.

[0103] In steps S105 to S111, a multivariate selection function is used to calculate scores for each one of the sets of clones from the correlated data according to the multivariate evaluation criteria.

[0104] At step S105, an objective function may be used to determine a distance between a process variable value (process parameter value and/or process output) and the prioritization target. More particularly, multiple process parameters and/or process outputs may be combined into a component, and the distance may be between the component and the prioritization target. The component may be derived using principle component analysis (PCA), however, other means suitable for calculating orthogonal components (i.e., vectors) from variables could also be used. Accordingly, components for each of the prioritization targets may be calculated. The components may be orthogonal (i.e., not correlated) and suitable for Euclidean distance calculations. Distance calculations may also be performed using partial least squares or orthogonal partial least square projections. According to one example, the distance between a candidate clone and the prioritization target may be calculated as follows: [0105] 1. calculate projection vectors t.sub.i for each clone, [0106] 2. the projection vectors t.sub.i are orthogonal, with a length proportional to the variance explained by the vector, and [0107] 3. the distance (D) between specific clones, D.sup.2=Σ.sub.k=0.sup.i(t(c,i)−t(r,i)).sup.2 [0108] where c is a candidate clone, and [0109] r is the prioritization target(s) or set of reference clones.

[0110] Thus, an exemplary objective function is provided in point 3 above. The objective function may include principal components (i.e., projection vectors) derived from process parameters and/or process outputs. The objective function may include a Euclidean distance calculation involving the principal (orthogonal) components. The combination of Euclidian distance and orthogonal components may be particularly advantageous, since possible correlations between variables (e.g., correlated glycans, as shown in FIG. 4) are reflected in the orthogonal components and an assumption that the variables are not correlated is unnecessary.

[0111] Regarding points 1 and 2 above, conventional approaches may consider a subset of process outputs as a basis for evaluation. In such approaches, prioritization may assume that there is no correlation between these process outputs, particularly because it is difficult to prioritize one variable over another variable when the variables are correlated. By comparison, the objective function in point 3 does not require an assumption that the variables (i.e., parameters or outputs) are not correlated, since orthogonal projection vectors reflecting variable correlations are calculated from the variables and then used to determine the distance D.

[0112] Advantageously, the objective function may consider all process parameters and process outputs (i.e., if all are selected) or a subset (e.g., proper subset) of the process parameters and process outputs, as discussed in the example. Further, the absence of hard limits may prevent exclusion of optimal (e.g., the most efficient) clones.

[0113] Returning to the example, the clone CS2_2 may have a product concentration of 4.8 g/L. Step S105 may include determining the distance between 4.8 g/L and the target product concentration of 3.5 g/L. Determining the distance may involve normalizing the distance. In this case, since the product concentration is to be maximized, the inverse of the difference between the product concentration of clone CS2_2 and the target product concentration may be used.

[0114] After the distance is determined via the objective function, the distance is magnified in step S107. In particular, the selection function includes a magnifying function. The magnifying function may be a continuous non-linear function.

[0115] Using a non-linear magnifying function (in contrast to a linear function) may be advantageous, since such a function will favor clones (i.e., cause them to be ranked higher) having more acceptable values (e.g., process outputs) in comparison to clones having fewer acceptable values. In this context, an acceptable value may be within a prioritization range (e.g., an acceptable value of 4.5 within a range of 2-6) or between a target value and its corresponding extremum (e.g., an acceptable value of 4.5 having a target value of 3 and an extremum of maximize).

[0116] In contrast, use of a linear magnifying function (or no magnifying function) may result in clones that meet only a few of the targets (e.g., having relatively few acceptable values in between a target value and its corresponding extremum) being selected, if the few acceptable values (e.g., process outputs) are sufficiently close to the extremum in comparison to other clones. In other words, use of a linear function (or no function) could cause selection of clones that do not have acceptable values with respect to a relatively large number of targets. This may be undesirable.

[0117] The magnifying function may also be referred to as a penalty function, since the magnifying function serves to increase the impact of the distance (i.e., impose a penalty according to the distance) between a value (e.g., process output) corresponding to a candidate clone and the prioritization target.

[0118] The magnifying function may be the same for all prioritization targets. In this way, the magnifying function can be used to influence values corresponding to a candidate clone (without consideration of others), while the weights can be used to prioritize values for corresponding to different prioritization targets against each other (e.g., by setting a weight for one prioritization target higher than a weight for another prioritization target).

[0119] At step S109, the magnified distance may be modified based on priority. In particular, the weights for prioritization may be used to modify the magnified distance. Each weight may be a value between 0 and 1 and the magnified distances may be modified by multiplying them by their corresponding weights.

[0120] At step S111, the modified distances may be aggregated to produce the score corresponding to the clone. In particular, for each clone, the distances for all variables (i.e., process parameters and process outputs) are combined into a total distance value. More specifically, the distances may be added together.

[0121] Applying this approach results in clone CS2_2 being ranked higher (i.e., having a lower total distance from the prioritization target) than clone CS1_7. In particular, clone CS22 has a product concentration of 4.8 g/L, a specific productivity of 4.45 grams per cell per day, a quality of 8.0 and a final viability of 70%. Even though the specific productivity target of 4.5 grams per cell per day is not reached, the described approach results in the selection of clone CS2_2 as the target clone. Although clone CS2_2 has better values in targets other than specific productivity in comparison to the other candidate clones, clone CS22 would still have been discarded according to conventional approaches.

[0122] Use of the selection function and the multivariate approach described above ensures the best possible ranking and enables consideration of an arbitrary number of process parameters and process outputs in clone selection.

[0123] FIG. 2 shows the selection of a set of target cells from multiple sets of candidate cells. In the example of FIG. 2, the target cells and the candidate cells are clones, however, the described approach is applicable to other types of cells.

[0124] In a criteria filter pane 201, multivariate evaluation criteria for the selected process outputs are shown. In particular, the displayed prioritization targets include final viability (shown as a percentage with a target value of 65), final product concentration (shown in g/L with a target value of 3.5), final specific productivity (shown as “Qp”, with a target value of 4.5), and quality (shown as “Distance”, with a target value of 15 units). The final viability, product concentration, and final specific productivity are to be maximized. The quality (i.e., distance) is to be minimized. Weights for prioritization are also shown, with a weight of 0.4 for final viability, a weight of 1 for final product concentration, a weight of 0.8 for final specific productivity, and a weight of 0.8 for quality.

[0125] A ranking pane 203 is also shown. The ranking pane 203 includes a first column for the candidate clones and a second column for the corresponding scores of the candidate clones. As shown in the depicted example, the clone CS2_2 (displayed as CS2-2) has the lowest score of 0.473 and therefore the highest rank. A clone/variable plot 205 shows each of the criteria for clone CS2_2 in relation to the other candidate clones. Thus, clone CS2_2 had the highest product concentration, the final viability and final specific productivity values of CS2_2 were about average, and CS2_2 had a relatively low quality (i.e., distance). However, particularly because of the weights for prioritization allocated to each of the criteria, CS2_2 was given the highest rank.

[0126] A raw data pane 207 shows a process trajectory for CS2_2 in comparison to process trajectories for the other clones.

[0127] FIG. 3 shows how process parameter values and process outputs can be combined for evaluation. In particular, multivariate statistical techniques (e.g., principal component analysis) may be used to combine multiple process parameters and/or multiple process output. For example, cell similarity indices may be calculated in both the quality domain e.g., using glycan profiles or spectral fingerprints, as well as in the process domain, where time series data can be combined and evaluated. Process outputs may correspond to the quality domain, whereas process parameters may correspond to the process data domain.

[0128] Further details regarding principal component analysis and other multivariate statistical process control methods may be found in “Process Analysis, Monitoring and Diagnosis Using Multivariate Projection Methods”, Theodore Curti, John F. McGregor, Chemometrics and Intelligent Laboratory Systems 28, 1995, which is incorporated herein by reference.

[0129] In addition to principal component analysis, partial least squares and/or orthogonal partial least squares may also be implemented for selecting target cells. Accordingly, an overview map may be presented, as shown in FIG. 3, such that similarities between cells can be visualized and interpreted.

[0130] In the example depicted in FIG. 3, a CS1 group of clones and a CS2 group of clones are displayed. The groups CS1 and CS2 may be delineated according to separate culture stations of the process control device. Although clones are referred to, the disclosed techniques are applicable to other types of cells.

[0131] In the depicted example, each point is the multivariate representation of the process parameters and the process outputs for one clone. The combination of process parameters and process outputs, as shown in FIG. 3, may highlight potential experimental problems not easily detected in the underlying data. Further, the depicted multivariate analysis may be useful in selecting the target clone.

[0132] Multivariate statistics (e.g., principal component analysis) may be particularly useful when clones are represented using a set of process outputs (e.g., a glycan profile). Accordingly, the multivariate statistical analysis may provide similarity indices (e.g., principal components derived from process outputs), as shown in FIG. 3, to be used in the ranking of sets of candidate clones. The similarity indices may be evaluated using the objective function, e.g., as discussed in the context of step S105 above.

[0133] Further, the depicted example may also provide information on the correlation among different process parameters. This information can be applied when evaluating process parameters and process outputs against the prioritization targets, thereby improving the ranking. Further, correlation information may be useful for the user when setting values for prioritization targets or prioritization ranges.

[0134] In the example of FIG. 3, data is collected from 24 processes. Each of the processes produces a distinct clone. The CS1 clones were produced in a first culture station and the CS2 clones were produced in a second culture station. Data collected from the processes carried out to produce the clones may be correlated and further evaluated as described in the context of FIG. 1. FIG. 3 shows values with respect to two principal components, t[1] and t[2]. In order to calculate scores via the multivariate selection function further principal components may also be evaluated in addition to prioritization ranges, prioritization targets, and weights for prioritization as discussed in the context of FIG. 1.

[0135] The orthogonal components used in FIG. 3 may be more efficient for use in the ranking process in comparison to the many process parameters and process outputs. This is particularly the case because of conflicting (e.g., inversely correlated) targets, as discussed in more detail in the context of FIG. 4.

[0136] FIG. 4 shows process outputs (i.e., glycans) collected from a plurality of processes. More particularly, FIG. 4 shows a glycan profile derived from data collected from processes that produced the components of the 24 clones shown in FIG. 3.

[0137] In the context of FIG. 4, targets (i.e., prioritization targets) have been set for both G1′f and G0f. The prioritization targets include an extremum, i.e., a minimum. Accordingly, it is desirable for both G1′f and G0f to have values as low as possible. However, since G1′f and G0f are inversely correlated, it is not possible to minimize both G1′f and G0f. Accordingly, one of these prioritization targets must be prioritized over the other. In other words, either the prioritization target of minimizing G1′f must be prioritized over the prioritization target of minimizing G0f, or the prioritization target of minimizing G0f must be prioritized over the prioritization target of minimizing G1′f. More generally, once multivariate evaluation criteria for the selected process outputs are received, a display may be provided indicating an inverse correlation between at least two of these selected process outputs. Accordingly, an external input (such as by the user) may provide further prioritization targets before the scores are calculated via the multivariate selection function.

[0138] In the example depicted in FIG. 4, the glycan profile for each clone is based on 12 process outputs. The glycan profile provides information regarding dependencies between different process outputs. In particular, each glycan may be considered a distinct process output and the glycan profile may show a correlation pattern among the glycans.

[0139] FIG. 5 shows a projection plot depicting a principal component transformation of the 12 glycan profile variables depicted in FIG. 4. Accordingly, a value for each clone is displayed and the value is derived from the 12 glycan profile variable values for that clone. The principal components depicted in FIG. 5 are an example of components (i.e., projection vectors t) that can be used in the context of the objective function discussed in connection with FIG. 1. Further, the components depicted in FIG. 5 may be used to calculate the scores via the multivariate selection function.

[0140] The x axis depicted in FIG. 5 represents 98% of the total variation in the 12 glycan profile variables. Thus, 98% of the variation in the 12 glycan profile process outputs is one-dimensional. While combining process outputs into principal components may make calculation of the scores via the multivariate selection function more efficient, it would also be possible to use process parameters and/or process outputs directly, without the additional step of determining principal components.

[0141] It should be noted that even though principal component analysis is discussed in the context of FIGS. 3 and 5, other multivariate statistical analysis techniques may also be used, such as partial least squares regression and/or orthogonal partial least squares regression.

[0142] In the example of FIG. 5, the origin of the projection plot is a score corresponding to a reference clone. The reference clone may be represented by a plurality of target values.

[0143] FIG. 6 shows functionality of a tool that may be used to implement the method for selecting at least one set of target cells from multiple sets of candidate cells, as discussed above. The tool may be implemented in hardware and/or software. In particular, the tool may be implemented in the process control device mentioned above and depicted in FIGS. 7 and 8.

[0144] As shown by the arrows in FIG. 6, there may be some overlap between steps S601 to S609. For example, actions carried out in step S601 may be carried out after actions carried out in step S603. Similarly, as indicated by the arrows, some of the results of step S601 may be used in steps subsequent to step S603, without the processing carried out in step S603. Corresponding considerations may apply to the other steps.

[0145] At step S601, data is imported. Step S601 may include receiving data collected from a plurality of processes, wherein each process produces a distinct set of candidate cells. The imported data may include process data. Process data may be considered synonymous with process parameter values. Process data may include time dependent data sampled from the processes. Examples of process data are pH, product titer, viable cell density, glucose, dissolved oxygen, and/or oxygen consumption.

[0146] In addition, in step S601, quality data may be imported. Quality data may be understood as process outputs. The quality data may describe the end quality of the candidate cells. More particularly, the quality data may describe the cell line, cell strain, or clone processed. Typical quality data may be glycosylation patterns, charge variants, aggregates, low molecular weight species, and/or glycan residues displayed as a profile (e.g., as profile vectors). Process outputs may also include aggregated process data. For example, viable cell density may be measured throughout the process and the measurements may be received as process parameter values. A final viable cell density may be received as a process output of a process.

[0147] Step S601 may also include handling missing data. This step may be carried out in the context of correlating the received data. Missing data may include data that is not collected or sampled frequently enough. For example, glucose and/or lactate may be sampled only once per day, however, a more complete picture of glucose and/or lactate levels may be desired. Accordingly, it may be desirable to simulate hourly measurement of glucose or lactate by filing in data for missing samples.

[0148] The missing data may be filled in using mechanistic modeling procedures and/or multivariate prediction (e.g., partial least squares regression). Mechanistic modeling and multivariate prediction models may also be used to predict future behavior of candidate cells. Accordingly, mechanistic modeling may provide input on the biological state of candidate cells at any given time and may enable early evaluation of candidate cells.

[0149] Prediction of future behavior may make it possible to determine what would happen if processing of candidate cells is terminated prematurely, e.g., due to an infection. For example, if processing of candidate cells is terminated prematurely, prediction of future behavior of the candidate cells may still enable process data for those candidate cells to be incorporated into calculations performed via the multivariate selection function in order to arrive at a score for the candidate cells that can be used in the ranking discussed above.

[0150] Mechanistic modeling can also be used to improve the quality of measured data by verifying and correcting values that violate known metabolic ratios, thereby ensuring that higher quality data is used in the ranking. Mechanistic modeling may also be used to estimate process parameter values (e.g., cell death rate) that are difficult or impossible to measure directly and thereby add further viable information that can be used when calculating scores for each of the candidate cells.

[0151] For example, viable cell density measurements may have an error of +/−10%. Smoothing of the measurements may be carried out in order to exclude the erroneous measurements, e.g., process parameter values or process outputs.

[0152] A specific example of a mechanistic model that can be used to fill in missing data or improve the quality of measured data (e.g., via data smoothing) is described in the context of FIG. 7.

[0153] Step S601 may also include visual quality control. Accordingly, the received data may be graphically presented such that outliers can be easily identified and excluded. Further, data can be corrected as desired.

[0154] Step S601 may also involve data exclusion. More particularly, at least one of the processes may fail. In particular, events such as contamination and/or system failure, unexplained or inconsistent biological factors, or human error may lead to failure of the process. Data from the failed process may be excluded.

[0155] Step S601 may also include grouping the received data. For example, replicates, minipools, or biosimilar cells (e.g., clones) may be grouped for analysis. Prioritization ranges and/or prioritization targets may be set for each group, or for the entire set of cells.

[0156] Step S601 may also include data matching. In particular, data may be received from various sources. The data may be matched and synchronized for analysis. The sources may include the process control device itself, and possibly an external analysis device. External analysis devices may include one or more of the following: a device for offline spectroscopy measurement (spectrometer), a device for inline (i.e., online, in situ) biomass measurement, a device for nutrient and metabolite measurement.

[0157] The spectrometer may be an apparatus to separate (subatomic) particles, atoms, and molecules by their mass, momentum, or energy). For example, spectroscopic measurements may be carried out using an Acquity iClass UPLC and Xevo TQS triple quadrupole mass spectrometer (Waters, Milford, Mass.). Other devices may also be used.

[0158] The device for nutrient and metabolite measurement may measure parameters online. Examples of nutrients include glucose and lactate. Examples of metabolites include methanol and ethanol. In particular, the device may perform up to 60 analyses per hour for a filtration probe and up to 30 analyses per hour via a dialysis setup. More specifically, glucose and lactate may be analyzed using a Bioprofile flex (Nova Biomedical Corporation, Waltham, Mass.).

[0159] Viable cell concentration may be analyzed using a cell viability analyzer, such as the Vi-Cell Automated cell viability analyzer (Beckman Coulter, Brea, Calif.). Other devices may also be used.

[0160] Step S601 may also include creating a project. The project may provide a framework to use and store multivariate evaluation criteria, as discussed in more detail below.

[0161] At step S603, criteria for the selected process parameters and/or the selected process outputs may be set. In particular, step S603 may include selecting a (proper) subset of the process parameters and/or a (proper) subset of the process outputs. Accordingly, the selection of the process parameters may exclude one or more of the process parameters. The selection of the process outputs may exclude one or more of the process outputs. The selected process parameters and/or process outputs may then be received, e.g., stored by, the process control device.

[0162] Step S603 may include displaying the received data collected from the plurality of processes. Further, the process parameters and process outputs may be displayed.

[0163] Data may be displayed in the form of a table. In particular, process parameter values may be visualized in a data table. The table may facilitate correction of obvious errors in the data.

[0164] Display of the data in the data table may facilitate exclusion of candidate cells or identification of one or more sets of candidate cells as reference cells. For example, if one of the sets of candidate cells exhibits an abnormal profile, the set of candidate cells may be eliminated.

[0165] Acceptability ranges for process outputs may be set. The acceptability ranges may be set so as to exclude one or more outliers. Setting acceptability ranges (i.e., acceptable ranges) may be part of a prefiltering process. To aid in the setting of acceptability ranges, an overview display of the received data may be provided. Accordingly, it can be easily determined how much of the received data is excluded by setting an acceptability range. If the acceptability range is set too strictly, the number of sets of candidate cells that pass through the prefiltering process may be too limited. For example, if an acceptability range is set to filter out sets of candidate cells having a low titer, the absence of high titer producing sets of candidate cells among all the sets of candidate cells may limit the number of sets of candidate cells that are passed through the prefiltering process. This may be undesirable.

[0166] A display showing the sets of candidate cells that are excluded for a specified acceptability range may be useful in assisting the user to set a suitable acceptability range. Step S603 may also include a raw data visualization. The raw data visualization may help facilitate understanding of which data is excluded by the specified acceptability ranges.

[0167] Step S605 may include receiving multivariate evaluation criteria for the selected process parameters and/or the selected process outputs. The multivariate evaluation criteria may include aggregating or further processing at least one of the selected process parameters and/or the selected process outputs.

[0168] The multivariate evaluation criteria may include weights for prioritization. The multivariate evaluation criteria may include prioritization ranges and/or targets, wherein each target is an extremum and/or a target value. Prioritization targets may be set for a (proper) subset of the selected process parameters and/or process outputs. The extremum may be to maximize or minimize. The prioritization target may include a target value or set point. The target value may be a specific reference or limit value.

[0169] Prioritization ranges may be received as input from the user. However, the prioritization ranges may be modified when calculating the scores for each one of the sets of candidate cells. Weights for prioritization may be used to define the importance of each of the prioritization targets. For example, a weight of 1.0 may be given to the most important prioritization target. Weights close to 0 may be given to relatively unimportant prioritization targets or prioritization ranges.

[0170] Step S605 may include calculating, via the multivariate selection function, scores for each one of the sets of candidate cells from the correlated data according to the multivariate evaluation criteria. The multivariate selection function may include an objective function, more particularly a cost function. The objective function may be referred to as a desirability function. The objective function may be used to rank the candidate cells according to how well they fit the multivariate evaluation criteria. The objective function may be non-linear. The objective function may be exponential, e.g., quadratic. The objective function may quantify the distance from a numerical target (e.g., a prioritization target or range) and aggregate penalties based on the evaluation criteria and the weights for prioritization. The numerical target may be a set of reference cells having a biosimilar definition in comparison to the sets of candidate cells. The objective function may be as specified in the discussion of step S105 of FIG. 1 above.

[0171] Calculation of scores for each one of the sets of candidate cells may be performed multiple times (i.e., iteratively) using different evaluation criteria, particularly because it may be determined that two or more of the evaluation criteria are inversely correlated, as shown in FIG. 4.

[0172] The multivariate selection function may produce a score that reflects how well a set of candidate cells fulfills the multivariate evaluation criteria. The score may be referred to as a desirability index. Selected profiles for candidate cells can be visualized graphically and enable the inspection of cell profiles with scores that are close to each other, as shown in FIG. 2. For example, if a set of candidate cells is ranked fifth, e.g., in the ranking 203, the corresponding variable plots for the set of candidate cells and other sets of candidate cells may facilitate determination of how the ranking was generated. Such comparisons may enable reprioritization and guide further selection iterations. Accordingly, the display of graphical data, e.g., as shown in FIG. 2, may facilitate analysis that would not be possible from the underlying raw data. Moreover, the number of selection iterations may be reduced and require less user intervention in comparison to conventional approaches.

[0173] A multivariate correlation analysis tool, e.g., the tool implementing the method for selecting at least one set of target cells for multiple sets of candidate cells, can help the user to adjust the selection process, including prioritization targets, weights, and ranges, in order to ensure that the optimal set of target cells is selected. Further, the selection of sets of candidate cells discussed above may reflect both process parameter values and process outputs.

[0174] The evaluation criteria can be saved for later use with other sets of candidate cells or shared with other users. Use of the same multivariate evaluation criteria by multiple users for different evaluations may ensure consistency. In particular, the same criteria may be applied to different data sets by different users. This may eliminate the subjectivity that is often present in conventional approaches. Further, the same criteria may be used for different batches of data from the same project.

[0175] At step S607, analysis may be carried out. As noted above, some of the actions carried out in the context of step S607 may be carried out in combination with or even before steps carried out in the context of step S605.

[0176] To facilitate ranking of the sets of candidate cells, at least one of the prioritization targets may be calculated. More particularly, at least one of the target values may be calculated. For example, integrals and/or averages of the process parameter values, relations, and predictions based on process history may be calculated. Further, some process outputs may also be calculated. Calculated and/or predicted values may provide more stable input for the ranking of the sets of candidate cells and the selection of at least one of the sets of candidate cells as the target cells.

[0177] In addition, mechanistic modeling algorithms may be used to fill in missing data and predict process trajectories and/or final results, as discussed above. For example, candidate cells that stopped growing earlier than expected may be compared with other candidate cells that continued producing results until the end of their corresponding processes. Process data and quality data for the candidate cells that stopped growing earlier than expected may be extrapolated or interpolated from the other candidate cells that continued producing results until the end of their corresponding processes. The trajectory predictions for candidate cells that terminated early may provide a more complete set of data that can be used to improve the ranking of the sets of candidate cells.

[0178] Further, multivariate modeling, as discussed above in connection with correlating the received data, may be used to compensate for errors in measurement due to sampling, handling or inherent flaws in external analytic devices. The multivariate modeling may use fundamental bioprocessing correlations to produce more reliable data. This may be part of the correlation carried out after data is received or may be carried out in a further iteration of the described method after an initial ranking of the sets of candidate cells has been produced. The score calculated via the multivariate selection function may be based on all selected process parameters and all selected process outputs. Further, information from the process trajectory of each process may also be used. Scores calculated via the multivariate selection function may be compared to a prioritization target (e.g., biosimilar cells or a group value).

[0179] Comparisons can be made within a group or to a specific reference process (i.e., a process that produces reference cells). These comparisons may facilitate setting different prioritization ranges, targets and weights in further iterations of the above described method. Further, process trajectories for each of the sets of candidate cells can be used to calculate distances between sets of candidate cells, groups, or from target values for use in the ranking of the sets of candidate cells. In addition to process trajectories, other multivariate criteria may be used in the ranking of candidate cells, particularly other quality data such as glycan profiles.

[0180] Step S607 may also include correlation analysis, as discussed in connection with FIGS. 4 and 5. The correlation analysis may give information on how to improve candidate cell selection, since some of the received evaluation criteria may contradict other criteria (e.g., the criteria may include inversely correlated targets). Understanding the correlation between different variables (i.e., process parameters and process outputs) may facilitate tuning of the tool in order to arrive at an optimal selection of target cells, e.g., in a minimal number of iterations.

[0181] Selected candidate cells may be compared with all other candidate cells for analysis. Multivariate approaches such as principal component analysis (PCA) and partial least squares (PLS) regression can be used for visual comparison and calculation of distance between different sets of candidate cells. Graphic representation and visualization of the results may help the user determine similarities between specific sets of candidate cells or groups. Components describing individual sets of candidate cells may be displayed as dots in two-dimensional diagrams, as shown in FIGS. 3 and 5, with a possibility to color or set sizes according to any variable or preference used in the selection process. The coloring and sizing may be part of a three-dimensional graph.

[0182] At step S609 a report may optionally be produced. The report may include a document reporting one or more selection results, the multivariate evaluation criteria, prioritization targets, target values and/or extrema, rankings, statistical correlations, and/or observations. In particular, the report may provide information supporting (i.e., reasons for) the selection of the set of target cells from the multiple sets of candidate cells. If multiple iterations of the method have been carried out, the report may include a summary of results from each iteration.

[0183] FIG. 7 shows an example of how mechanistic modeling can be used to smooth process outputs. The example of FIG. 7 depicts “E5” cells, i.e., a particular type of candidate cell.

[0184] VCD measurements (in cells/mL) are denoted by “x” marks and cell viability measurements (indicating the percentage of cells that are viable, i.e., still alive) are denoted by filled in circles. The VCD measurements include an initial VCD measurement of 2.38827. The cell viability measurements include an initial cell viability measurement close to 100%. VCD in cells/mL and cell viability in percentage are shown on the vertical axis and time in days is shown on the horizontal axis.

[0185] A VCD curve 701 and a cell viability curve 703 are calculated from the measurements and cell-specific constants “K” corresponding to the cells. The constants “K” are specific to the candidate cells depicted and differ for other (different) sets of candidate cells.

[0186] The curves 701 and 703 may be generated from selected process outputs and cell-specific constants according to the following equations:

[00001] $\begin{matrix} X_{total} = X_{V C D} + X_{d e a d} & (1) \\ \frac{{dX}_{V C D}}{dt} = u_{\max} * \frac{1}{\frac{X_{total}}{K_{inhibit}} + 1} X_{V C D} & (2) \\ \frac{d X_{dead}}{d t} = K_{d e a d} \log (X_{V C D}) + K_{t o x i c} X_{d e a d} X_{V C D} & (3) \end{matrix}$

[0187] The equations above are an example of how mechanistic modelling could be carried out. Other equations could be used.

[0188] With regard to equations (1) to (3) above, X.sub.total (total number of cells, alive and dead), X.sub.VCD (viable cell density), and X.sub.dead (number of dead cells) reflect values of process parameters measured or controlled during a process. u.sub.max, K.sub.dead, K.sub.toxic, and K.sub.inhibit, are cell-specific constants; equations (1) to (3) may be solved to derive the cell-specific constants according to a conventional optimization method.

[0189] Accordingly, as depicted in FIG. 7, u.sub.max (maximum growth rate of the cells) is 0.714950953243, K.sub.toxic (increase in death rate due to environmental toxicity brought about by dead cells) is 0.00439812036682, K.sub.dead (cell death rate) is 0.178336658011, and K.sub.inhibit (coefficient characterizing a reduction in growth rate due to the total number of cells) is 108.398985599.

[0190] K.sub.inhibit reflects the principle that cells grow more slowly when the total number of cells is greater. Thus, K.sub.inhibit may grow with cell density (i.e., cells may be inhibited from growing as cell density increases). Equations (1) to (3) reflect the effects of cell density on cell growth, but other effects may also be considered.

[0191] Mechanistic modeling may be carried out during correlation of received data in order to fill in missing measurements, e.g., by a plotting one or more curves (as shown in FIG. 7) based on existing measurements and known physical characteristics of the cells. Accordingly, the mechanistic modeling may affect values of the selected process parameters and selected process outputs.

[0192] The use of cell-specific constants in the context of correlation (e.g., filling in missing data, data smoothing or interpolation) of process parameter values and process outputs via mechanistic modeling (as described above) has the effect of reflecting physical properties of cells when carrying out the correlation. This may lead to more accurate calculation of the scores for the sets of candidate cells (particularly in comparison to approaches that rely exclusively on multivariate statistical approaches such as PCA or PLS), and accordingly, selection of the optimal target cells from the sets of candidate cells.

[0193] A process control device 10 (also referred to as a bioreactor system) including an array of vessels 100 (e.g., microscale bioreactors) is shown in FIG. 8. The process control device 10 may be mounted to the deck of a base station in a larger scale process control device. In particular, the process control device 10 may be a microscale process control device suitable for mounting to a macroscale process control device. The macroscale process control device may include vessels having a size that differs from a size of the vessels of the microscale process control device by at least one order of magnitude.

[0194] The process control device 10 comprises a base 12, to which is mounted a base plate 13 defining a receiving station 14 for removably receiving a plurality of vessels 100. A clamp plate (not shown) may be removably connected to the base plate 13, in a position overlaying the receiving station 14, via a pair of posts 22 projecting from the upper surface of the base plate 13. The clamp plate may facilitate a drive connection between the drive mechanism of a stirrer 116 (described below).

[0195] In the depicted example, the receiving station 14 can hold up to twelve vessels 100 in two rows of six at respective locations 16. In FIG. 8, six vessels 100 are shown in position in their respective vessel receiving locations 16, while six of the vessel receiving locations 16 are shown empty to better illustrate fluid ports 314 a-c in the base plate 13.

[0196] The receiving station 14 could be designed to accommodate a greater or lesser number of vessels 100 and the vessels 100 could be arranged in any suitable configuration.

[0197] One or more heaters or chillers (not shown) may be located adjacent to the vessel receiving locations 16 to control the temperature of the vessels 100.

[0198] With reference to FIG. 8, one of the vessels 100 comprises a chamber 105 for receiving a fluid 107 (e.g., a cell culture solution) having a headspace 109 above. The vessel 100 includes a pipette access port 106, to which a cap 108 is removably attached. The cap 108 is removed for fluids to be pipetted into or out of the vessel 100. A fluid input port 112 may include a filter 114.

[0199] The stirrer 116, comprising blades 118, may be rotatably mounted at the base of a vertical shaft 120 within the vessel 100. The upper end of the vertical shaft 120 includes a drive input 124 (e.g., for the drive mechanism, not shown).

[0200] A pH sensor spot 126 and a dissolved oxygen (DO) sensor spot 128 are disposed at the bottom of the vessel 100, such that they are able to detect the pH and DO levels of the fluid 107 and to be interrogated from the exterior of the vessel 100.

[0201] Venting of the vessel chamber 105 is achieved via a labyrinthine path connecting the chamber 105 to the atmosphere via the stirrer shaft drive input 124. Alternatively, a separate vent port may be provided towards the top of the vessel 100.

[0202] A lip 130 may project out to the side of the vessel 100. The lip 130 includes a through port 132b (two optional additional ports 132a and 132c are not shown). A gallery plate 134 is secured above a portion of the top of the vessel 100. The gallery plate 134 includes at least one groove 136b extending to the fluid input port 112 at the top of at least one tube 110b. The gallery plate further includes at least one through port 132b. The lip 130 and the gallery plate 134 together form a rigid ledge projecting to the side of the vessel.

[0203] The clamp plate (not shown) may also reinforce a seal between the through port 132b and the fluid ports 314a-c.

[0204] A valve assembly 300 is mounted to the underside of the base 12. The valve assembly is received in a cavity of the base station when the process control device 10 is connected to the base station.

[0205] In order to carry out a process, the process control device 10 is loaded with vessels 100, each vessel being placed in a respective vessel receiving location within the receiving station 14. When the vessels 100 are inserted into the receiving station 14, the port 132b in the bottom surface of the lip 130 is aligned with and forms a sealed connection with the corresponding receiving station fluid port 314b on the upper surface of the base plate 13.

[0206] The respective ports are automatically aligned with one another on insertion by virtue of the defined locations of the vessel receiving station, including the fluid ports 314a-c adjacent thereto, and the rigid ledge, which places the corresponding vessel connection ports 132a-c in alignment with the receiving station fluid ports 314a-c.

MULTIVARIATE APPROACH FOR CELL SELECTION

Assignee

Inventors

Cpc classification

Classification Explorer

G16B40/00

PHYSICS

Classification Explorer

G05B13/048

PHYSICS

International classification

Classification Explorer

G16B40/00

PHYSICS

Classification Explorer

G05B13/04

PHYSICS

Abstract

Claims

Description