METHOD AND SYSTEM FOR TUNING A COMPUTING ENVIRONMENT USING A KNOWLEDGE BASE

Abstract

A tuning system and related computer implemented tuning method carried on an IT system including a System Under Test (SUT) including a stack of software layers, provided with a number of adjustable parameters are disclosed. The method includes the steps of supplying a characterization and prediction module, a tuner module, and a knowledge base (KB). The KB is composed by N tuples, (s.sub.i, {right arrow over (w)}.sub.i, {right arrow over (x)}.sub.i, y.sub.i) being gathered over iterative tuning sessions where each iteration is started by applying to the SUT s.sub.i a configuration {right arrow over (x.sub.l)} suggested by the tuner module, exposing the system s.sub.i to an external working condition w.sub.i and gathering performance metrics resulting in a performance indicator score y.sub.i. The characterization and prediction module builds a characterization vector {right arrow over (c.sub.l)} for each tuple stored in the KB (KB) using the information stored in the KB and produces a prediction about the characterization vector {right arrow over (c.sub.l+1)} of the next tuning iteration i+1.

Claims

1. A computer implemented tuning method carried on an IT system comprising a System Under Test (SUT) including a stack of software layers, provided with a number of adjustable parameters, the method comprising the steps of supplying a characterization and prediction module, a tuner module, and a knowledge base (KB) composed by N tuples, (s.sub.i, {right arrow over (w)}.sub.i, {right arrow over (x)}.sub.i, y.sub.i) being gathered over iterative tuning sessions where each iteration is started by applying to a System Under Test (SUT) s.sub.i a configuration {right arrow over (x.sub.l)} suggested by said tuner module, exposing said system s.sub.i to an external working condition w.sub.i and gathering performance metrics resulting in a performance indicator score y.sub.i, wherein said characterization and prediction module builds a characterization vector {right arrow over (c.sub.l)} for each tuple stored in said knowledge base (KB) using the information stored in said knowledge base and produces a prediction about the characterization vector {right arrow over (c.sub.l+1)} of the next tuning iteration i+1, where characterization vector {right arrow over (c.sub.l)} is a numeric representation of the tunable properties of system s.sub.i when exposed to said external working condition w.sub.i, and the distance between two characterization vectors {right arrow over (c.sub.l)}, {right arrow over (c.sub.j)} represents how similarly the systems s.sub.i, s.sub.j should have been configured in previous tuning iterations i, j when the two systems s.sub.i, s.sub.j were exposed to external working conditions w.sub.i, w.sub.j respectively, said tuner module leverages said knowledge base (KB) and said characterization vectors {right arrow over (c.sub.l)} to select a new suggested configuration {right arrow over (x.sub.l+1)} to be applied to said System Under Test s.sub.i+1 in the next iteration i+1, and further wherein, assuming that there exists tuples of systems and working conditions (S.sub.i, w.sub.i) which should be configured in a similar way, and a group of said tuples t.sub.k=((s.sub.i, w.sub.i), (s.sub.j, w.sub.j), . . . ) is called archetypal task t.sub.k, said characterization module identifies said archetypal tasks in the knowledge base (KB), said archetypal tasks are identified by assigning each tuple of the knowledge base (KB) to an initial task and then iteratively measuring distances between tasks and clustering similar ones, so that tuples which should be tuned similarly are assigned to the same task, said initial tasks are selected according to a user-specified parameter N, indicating that a new task should be created every N tuning iterations.

2. The computer implemented tuning method as in claim 1, wherein the external working condition w.sub.i is characterized by an external characterization methodology, and this characterization is used as an input for a clustering algorithm to derive said initial tasks instead of the user-specified parameter N.

3. The computer implemented tuning method as in claim 1, wherein the distance between two tasks t, t′ used for the task clustering procedure, is computed as: $\sqrt{\frac{{.Math.}_{i = 1}^{n} {(r_{t} (x_{i}) - r_{t'} (x_{i}))}^{2}}{n}}$ where {x.sub.i}.sup.N.sub.i=1 is the set of configurations that have been evaluated on both tasks t and t′, and r.sub.t(x.sub.i) is a relevance score being a scalar value indicating how much task t benefits from the configuration x.sub.i.

4. The computer implemented tuning method as in claim 3, wherein said relevance score r.sub.t(x.sub.i) is defined as: $r_{t} (x_{i}) = \frac{Σ_{j \in J} \frac{f_{j} (x_{i})}{f_{j} (x_{0})}}{.Math. J .Math.}$ where J is the set of entries in the knowledge base (KB) assigned to task t for which x.sub.i has been evaluated, ƒ.sub.j(x.sub.i) is the performance score y.sub.j obtained at iteration j when the system is configured using configuration x.sub.i, and ƒ.sub.j(x.sub.0) is the performance score obtained at iteration j by the baseline configuration.

5. The computer implemented tuning method as in claim 1, wherein a surrogate model used to select configurations in said tuner model based on Bayesian Optimization (BO) is a Gaussian Process (GP).

6. The computer implemented tuning method as in claim 5, wherein said Gaussian Process (GP) is a contextual gaussian process bandit tuner (CGPTuner) which uses as context said characterization vector {right arrow over (c.sub.l)} provided by said computed similarity.

7. The computer implemented tuning method as in claim 6, wherein Gaussian Process (GP) is provided with a combined kernel function κ(({right arrow over (x)}, {right arrow over (c)}), ({right arrow over (x)}′, {right arrow over (c)}′)) which is the sum of configuration kernel κ({right arrow over (x)}, {right arrow over (x)}′) defined over a configuration space (X) and a task kernel κ({right arrow over (c)}, {right arrow over (c)}′) defined over a task characterization space (C).

8. The computer implemented tuning method as in claim 7, wherein said Gaussian Process (GP) has an additive structure made of a characterization component (g.sub.{right arrow over (c)}) which models overall trends among tasks and a configuration component (g.sub.{right arrow over (x)}) which models configuration-specific deviation from said overall trends.

9. The computer implemented tuning method as in claim 1, wherein it is provided a preliminary step when said knowledge base (KB) is empty, including preliminary tuning iterations dedicated to bootstrap said knowledge base.

10. The computer implemented tuning method as in claim 9, wherein said preliminary tuning iterations include evaluating vendor default (baseline) configuration.

11. Tuning system for a System Under Test (SUT) including a stack of hardware and software layers, provided with a number of adjustable parameters, comprising a knowledge base (KB) composed by N tuples (s.sub.i, {right arrow over (w)}.sub.i, {right arrow over (x)}, y.sub.i, where s.sub.i represents the System Under Test (SUT), {right arrow over (w)}.sub.i represents an external working condition to which the System Under Test (SUT) s is exposed, {right arrow over (x)}.sub.i is a configuration vector in a configuration space X of said adjustable parameters and y.sub.i represents performance indicator score of said System Under Test (SUT), the knowledge base (KB) being stored in a memory, a characterization and prediction module, and a tuner module, wherein said characterization and prediction module and tuner module are arranged and made to operate as in claim 1.

12. The computer implemented tuning method as in claim 10, wherein said preliminary tuning iterations further include evaluating a number of user-defined randomly-selected configurations.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0058] The accompanying drawings are incorporated in and constitute a part of this specification; the drawings are intended to illustrate preferred embodiments of the invention and together with the detailed description are meant to explain the principles of the invention.

[0059] The following detailed description of preferred embodiments is given by way of example and shall be read together with the accompanying drawings, wherein:

[0060] FIGS. 1A-1C are plots of throughput of DBMS in different IT systems where values of some tunable parameters were modified;

[0061] FIG. 2 is a diagram showing a layout of the tuning process and architecture of the invention;

[0062] FIGS. 3A and 3B are two plots exemplifying a Bayesian optimization process; and

[0063] FIGS. 4A-4C are plots of Reaction Matching method used to compute tasks distances, in particular FIG. 4A representing the value of a performance metric obtained by a task t when compiled with flags specified in {right arrow over (x)}; FIG. 4B is the relevance computed with an equation 2 (see below); and FIG. 4C shows the distances between task t and t0, t1, t2 measured with Equation 3 (see below) using {right arrow over (x)}.sub.0 and {right arrow over (x.sub.1)}.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0064] In the following, the solution which is suggested to the autotuning problem is disclosed.

[0065] A tuning method (implemented through a tuner module) for a System under Test (SUT) provided with a number of adjustable parameters is mainly composed of two components: a characterization module and a tuner module. To give a better understanding, each component is described below in a separate way, and finally their combination is disclosed.

[0066] The explanation is however given by first formalizing the configuration autotuning problem, then describing Bayesian Optimization and its contextual extension, then the Reaction Matching characterization and finally the actual autotuning method is described in detail.

Problem Description

[0067] The goal of tuning method or an optimization process is to find such a tuned configuration vector {right arrow over (x)} in the configuration space X which, when applied to an IT system s∈S, leads to optimize a certain performance indicator y∈R.

[0068] The performance indicator y can be any measurable property of the system (like throughput, response time, memory consumption, etc. . . . ) or a formula combining multiple of such properties. The tuning method is intended to select the configuration vector {right arrow over (x)} by taking into account the particular external working conditions {right arrow over (w)}∈W to which the IT system s is exposed, where W is the space of the possible working conditions and {right arrow over (w)} is a descriptor vector of the specific condition provided by a certain characterization methodology.

[0069] We also assume that it is available a knowledge base composed by N tuples . Where y.sub.i is the value of the performance indicator saved in the iteration i of the knowledge base, and was computed after having applied configuration {right arrow over (x.sub.l)} to system s.sub.i: y.sub.i=ƒ(s.sub.i, {right arrow over (x)}.sub.i, {right arrow over (w)}.sub.i). Note that, performance indicators y.sub.i might have been collected on different systems s.sub.i, configurations {right arrow over (x)}.sub.i and under different working conditions {right arrow over (w)}.sub.i. The condition {right arrow over (w.sub.l)} is usually not available, so it is not present in the KB. If, instead, it has been measured and {right arrow over (w.sub.l)} is available, then it might be present in the KB as well.

[0070] As said in the introduction, measuring the condition {right arrow over (w.sub.l)} is often not possible (e.g., the temperature of the datacenter when using a cloud service provider) and so it is not in the knowledge base. A key aspect of the disclosed invention is in fact to work when such a characterization is not available.

In other words, in the following specification it is made reference to the vector {right arrow over (w)} to refer to different incoming workloads and other external factors such as the variability of other cascading services; but the availability of such information is not a strict requirement of the proposed solution. Nonetheless, if such a vector or characterization is available, the disclosed invention is capable of taking advantage of it.

Solution Description

[0071] The tuning method is made to run as depicted in FIG. 2 and it is repeated in an iterative way as time passes.

[0072] If no knowledge is available, the tuning method starts with an empty knowledge base and performs preliminary steps. In particular, the preliminary tuning iterations are dedicated to bootstrap the knowledge base. The bootstrap process works as follows: initially, the vendor default (baseline) configuration is evaluated x.sub.0. Then, the user can specify and input additional configurations to evaluate, if promising configurations are available; otherwise, the method would imply a random step wherein a number of user-defined randomly-selected configurations are evaluated. It is supposed that the knowledge has been gathered in the same search space that it is currently optimizing: x.sub.i∈X.

Once the knowledge base is initialized, the tuning process can start.

[0073] Iteration i is started by applying to a System Under Test (SUT) a configuration {right arrow over (x)}.sub.i suggested by the tuner module (described in detail in the following), see step (1). Then, the configuration is evaluated to gather performance metrics, resulting in a performance score y.sub.i, see step (2). It shall be noted that at iteration i the system s.sub.i is subject to an externally controlled working condition {right arrow over (w)}.sub.i, which cannot be controlled and usually not even measured. The evaluation y.sub.i might be also subject to noise. Meaning that multiple performance evaluations in the same condition (s.sub.i, {right arrow over (x)}.sub.i, {right arrow over (w.sub.l)}) might lead to different performance measurements. At this point the evaluation of x.sub.i has been completed. the collected data (i.e., the tuple)) are stored in the knowledge base (see step (3)) and get ready for the next iteration.

Notice that, while tuning a system, the applied configuration is modified from one iteration to another (i.e., typically {right arrow over (.sub.l)}≠{right arrow over (x.sub.l+1)}). However, the system typically does not change from one tuning iteration to the other (i.e., s.sub.i=s.sub.i+1) and we use the notation s.sub.i to indicate that the system information is stored in the knowledge base, which might include tuples coming from different tuning sessions, conducted on different systems. Nonetheless, it is possible in principle to modify the tuned system at each iteration.

[0074] The characterization and prediction module (described in detail in the following) uses the data stored in the knowledge base (see step (4)) to build a characterization vector {right arrow over (c)}.sub.i, which will be used later by the tuning module. During a characterization step, it is required to reconstruct the missing information about all the past working condition {right arrow over (w)}.sub.0, . . . , i and also to capture the information about previous systems s.sub.o, . . . i. More explicitly, the characterization vector {right arrow over (c.sub.l)} contains information about both s.sub.i and {right arrow over (w.sub.l)}.

[0075] However, the proposed solution does not try to simply derive a characterization of the workload of the system (e.g., requests/sec), which would lack information about other external working conditions. Instead, it builds a characterization of the tuning properties of (s.sub.i, {right arrow over (w.sub.l)}). In other words, the characterization c.sub.i captures how the tuner should behave at iteration i, that is, from which settings the system s.sub.i receives a benefit when exposed to working condition {right arrow over (w.sub.l)}.

The advantage of the proposed approach is that the characterization of the tunable properties simply captures how the system behaves, without trying to explain this behaviour in terms of conditions, applications, software versions, hardware architectures, or a combination of these factors or other unknown causes.

[0076] Furthermore, the characterization and prediction module produces a prediction about the next characterization vector {right arrow over (c)}.sub.i+1.

[0077] Finally, the computed characterization {right arrow over (c)}.sub.i, the predicted characterization vector {right arrow over (c)}.sub.i+1, and the information coming from the knowledge base are supplied to the tuner module, see step (5). At this point, a next iteration can be started again (see step (6)), obtaining a new suggestion from the tuner module and going back to step (1).

[0078] Now a separate description of the tuning module and characterization and prediction module is given in the following.

Bayesian Optimization

[0079] In this section it is explained the behaviour of the tuning module, which is based on Bayesian Optimization.

[0080] Bayesian Optimization (BO) is a powerful tool that has gained great popularity in recent years (an extensive review of BO can be found in Bobak Shahriari et al. “Taking the Human Out of the Loop: A Review of Bayesian Optimization”. In “Proceedings of the IEEE 104.1 (January 2016)”, pp. 148-175. issn: 0018-9219.). Here it is given just a brief introduction, visually summarized in FIGS. 3A and 3B.

[0081] Formally, it is required to optimize an unknown objective function ƒ:

[00001] $\begin{matrix} \overset{.fwdarw.}{x} = \arg \max_{\overset{.fwdarw.}{x} \in X} f (\overset{.fwdarw.}{x}) & (Eq . 1) \end{matrix}$

where ƒ has no simple closed-form but can be evaluated at any arbitrary query point {right arrow over (x)} in the relevant domain. The evaluation of function ƒ can lead to find noisy observations in terms of performance parameters y∈R, and usually is quite expensive to perform. The goal is to quickly converge toward good points {right arrow over (x)} so to quickly optimize ƒ.

[0082] Moreover, it is required to avoid evaluating points {right arrow over (x)} that lead to bad function values. In an ideal scenario, as said above, the autotuner module would run directly in a production environment, and so it is necessary to quickly find well performing configurations and avoid exploring too much the search space. In IT tuning terms, it is desired to find a configuration that optimizes a certain performance indicator y, and this search process should be run to simultaneously: (i) explore the configuration space to gather knowledge, and (ii) exploit the gathered knowledge to quickly converge toward well-performing configurations. Notice that Equation 1 is equivalent to the performance autotuning problem that has been exposed at the beginning of this description, except for the system and working conditions dependence which is not considered at this stage.

[0083] More precisely, it is now given a description of a tuner module that works for a single (fixed) SUT-working condition combination.

[0084] Bayesian Optimization (BO) is a sequential model-based approach for solving the optimization problem of Equation 1. Essentially, a surrogate model of ƒ is created and it is sequentially refined as more data are observed. Using this model, a value of an Acquisition Function α.sub.i, which is used to select the next point {right arrow over (x)}.sub.i+1 to evaluate, is iteratively computed.

[0085] The Acquisition Function α.sub.i is analytically derived from the surrogate model output. Intuitively, the acquisition function α.sub.i evaluates the utility of candidate points for the next evaluation of ƒ by trading off the exploration of uncertain regions with the exploitation of good regions. As the acquisition function α.sub.i is analytically derived from the surrogate model, it is very easy to optimize. As an example, the simplest Acquisition Function is the Lower Confidence Bound (LCB), which, for each point, is defined as the value predicted by the surrogate model for that point minus the uncertainty over the prediction. Assuming our goal is to minimize the goal function, the LCB provides an optimistic estimate of the function. The points to evaluate are thus selected either due to a very good predicted value (exploitation) or to a high prediction uncertainty (exploration).

[0086] Thus, Bayesian Optimization (BO) has two key ingredients: the surrogate model and the acquisition function α.sub.i.

[0087] In this contest, Gaussian Processes (GPs) are considered as surrogate models, which is a very popular choice in Bayesian Optimization (BO), however different choices available in the literature (like Random Forests or Neural Networks) would work as well. In the example depicted in FIGS. 3A and 3B, Lower Confidence Bound (LCB) is used as an Acquisition Function, but other choices are possible, like Expected Improvement, Probability of Improvement, Thompson Sampling, or even mixed strategies like the GPHedge approach which, instead of focusing on a specific acquisition function, adopts a portfolio of acquisition functions governed by an online multi-armed bandit strategy. The key idea of the GP-Hedge approach is to compute many different Acquisition Functions at each tuning iteration, and progressively select the best one according to previous performance.

[0088] Making reference to FIGS. 3A and 3B, an exemplifying Bayesian Optimization (BO) is schematically shown. In that embodiment, it is intended to minimize the unknown function and 3 points have already been observed, which are used to compute the GP predicted mean and uncertainty: these two attributes of GP are combined in the Acquisition Functions to select the next point to evaluate. Different Acquisition Functions select different points.

[0089] A Gaussian process GP(μ.sub.0,κ) is a nonparametric model that is fully characterized by (i) its prior mean function μ.sub.0:X.fwdarw.R and (ii) its positive-definite kernel or covariance function, k:X×X.fwdarw.R.

[0090] Let D.sub.i={({right arrow over (x)}.sub.n, y.sub.n)}.sub.n=0.sup.i−1 be the set of past observations and {right arrow over (x)} an arbitrary test point. The random variable ƒ({right arrow over (x)}) conditioned on observations D.sub.i follows a normal distribution with a mean and variance functions that depend on the prior mean function and the observed data through the kernel. So, the kernel is representing the covariance of the Gaussian Process random variables. In other terms the kernel function models the covariance between each pair in a space.

[0091] The GP prediction depends on the prior function and the observed data through the kernel. The prior function controls the GP prediction in unobserved regions and, thus, can be used to incorporate prior “beliefs” about the optimization problem. The kernel (or covariance function), instead, controls how much the observed points affect the prediction in nearby regions and, thus, governs the predictions in the already explored regions.

[0092] Accordingly, when two points (i.e., two configurations) in the search space {right arrow over (x)}.sub.1, {right arrow over (x)}.sub.2 have a high correlation (according to the selected kernel), it is expected that their associated function values y.sub.1, y.sub.2 (i.e., their performance indicators) to be similar.

[0093] The kernel is thus used to measure the similarity of two configurations, expressing the idea that similar configurations should yield similar performance.

[0094] Essentially, the GP is a regression model that is very good at predicting the expected value of a point and the uncertainty over that prediction. In Bayesian Optimization (BO), the two quantities are combined by the Acquisition Function to drive the search process. In the present embodiment, it has been used for the proposed examples a Matérn 5/2 kernel, which is the most widely used class of kernels in literature, however, other kernel choices would work as well.

[0095] The assumption so far has been that the performance ƒ can be expressed as a function of the configuration {right arrow over (x)} only. However, the performance of an IT system, a DBMS for example, also depends on others, uncontrolled, variables such as the working condition {right arrow over (w)} to which the application is exposed and, more importantly, the actual SUT s that is addressed.

[0096] Bayesian Optimization (BO) has been extended to handle such situations. The key idea is that there are several correlated tasks ƒ.sub.t({right arrow over (x)}) that shall be optimized. Essentially, the data from a certain task ƒ.sub.t({right arrow over (x)}) briefly said t in the following, can provide information about another task ƒ.sub.t′({right arrow over (x)}) briefly said t′ in the following.

[0097] A task t is considered a definition of a certain SUT-working condition combination (e.g., the Cassandra DBMS executing a read-only workload), and it has associated a numeric characterization {right arrow over (c)}, which was introduced in the section above and will be further described in the following.

[0098] It is assumed that {right arrow over (c)} is a numeric characterization of task t and it is defined a new combined kernel (or covariance) function that works over configurations ({right arrow over (x)}, {right arrow over (x′)}) and task characterizations {right arrow over (c)}, {right arrow over (c)}′: κ(({right arrow over (x)}, {right arrow over (c)}), ({right arrow over (x)}′, {right arrow over (c)}′).

[0099] This new combined kernel is formalized as the sum of two kernels defined over the configuration space X and over the task characterization space C, respectively: κ(({right arrow over (x)}, {right arrow over (c)}), ({right arrow over (x)}′, {right arrow over (c)}′))=κ({right arrow over (x)}, {right arrow over (x)}′)+κ({right arrow over (c)}, {right arrow over (c)}′). The kernel κ({right arrow over (x)}, {right arrow over (x)}′) is called the configuration kernel and measures the covariance between configurations, the kernel κ({right arrow over (c)}, {right arrow over (c)}′) is the task kernel and measures the covariance between tasks.

[0100] A GP with this combined kernel is a predictive model with an additive structure made of two components: g=g.sub.{right arrow over (x)}+g.sub.{right arrow over (c)}. The g.sub.{right arrow over (c)} component models overall trends among tasks, while the g.sub.{right arrow over (x)} component models configuration-specific deviation from this trend. Similarly to what has been done for the configuration kernel, the solution uses a Matérn 5/2 kernel also for the task kernel in the proposed examples, but other choices would work as well.

[0101] It should be understood that different covariance functions can be selected for the configuration kernel and for the task kernel.

[0102] In the method of the invention, it is exploited that the performance of a certain SUT-working condition combination is correlated with the performance of other similar SUT-condition combination (as a motivating example see FIG. 1), where the similarity is measured by the combined kernel function defined above. In other terms, the main goal of the invention is to provide a task characterization {right arrow over (c.sub.l)} which models how a particular combination (s, {right arrow over (w.sub.l)}) should be tuned, so to use the tunable properties of the system as a contextual information for the tuning algorithm.

[0103] Notice that the combined kernel has been defined using a sum operation. This means that two points in the search space are considered similar (highly correlated) when they either (1) have a similar configuration (highly correlated according to the selected configuration kernel function) or (2) have a similar task (highly correlated according to the selected task kernel). The two quantities could be combined using different operations, obtaining different behaviors. As an example, if the two kernels are multiplied, the two points would be considered similar only when they are highly correlated both in terms of configuration and task.

[0104] In other terms, by using this combined kernel the Gaussian Process space is enlarged with the task characterization component, but when optimizing the Acquisition Function only the configuration subspace is optimized. So, it is obtained a suggested configuration which is tailored for the current SUT-working condition, and this configuration is selected by leveraging all the configurations which have been tested in the past, even on different SUTs and/or working conditions.

Reaction Matching

[0105] In this section, is it described the Reaction Matching process, i.e. a distance computation methodology which is used to derive a task characterization.

[0106] The final goal is to provide the tuner module with a characterization vector {right arrow over (c.sub.l)}. In order to do that, the method provides to start computing similarities between different tasks by using Reaction Matching: a technique inspired by the field of Recommender Systems (RS).

[0107] Recommender System (RS) programs are widely used in our every-day life. They are adopted to suggest items to users, helping them to navigate huge catalogs (like Netflix® or Amazon®). The essential purpose of a Recommender System (RS) is to predict the relevance r.sub.u(i) of an item i for a user u.

[0108] To achieve this goal, Recommender System (RS) exploits similarities between items or users.

[0109] So, the proposed method treat the target task (SUT-working condition combination) t=(s, {right arrow over (w)}) as a user in a Recommender System (RS), and a certain configuration {right arrow over (x)} as the item within the catalog X of the Recommender System (RS), containing all the feasible settings for the system. The relevance r.sub.(s, {right arrow over (w)})({right arrow over (x)})=r.sub.t({right arrow over (x)}) reflects how much the system s benefits from the configuration {right arrow over (x)} w.r.t. the baseline (i.e., vendor default) configuration {right arrow over (x)}.sub.bsl when exposed to the condition {right arrow over (W)}.

[0110] When it is required to optimize a certain performance indicator y.sub.i (e.g., execution time), it is defined ƒ.sub.(s,{right arrow over (w)})({right arrow over (x)})=ƒ.sub.t({right arrow over (x)}) as the value of this performance indicator obtained by the system under test (SUT) s when configured using a configuration {right arrow over (x)} and exposed to the condition {right arrow over (w)}. Assuming the baseline configuration {right arrow over (x)}.sub.bsl the relevance score is defined as:

[00002] $\begin{matrix} r_{(s, \overset{.fwdarw.}{w})} (\overset{.fwdarw.}{x}) = r_{t} (\overset{.fwdarw.}{x}) = \frac{f_{t} (\overset{.fwdarw.}{x})}{f_{t} ({\overset{.fwdarw.}{x}}_{bsl})} - 1 & (Eq . 2) \end{matrix}$

[0111] To illustrate the Reaction Matching approach, it is supposed to have access to some previously collected information about the performance of some tasks (i.e., SUT-working condition combinations) t.sub.0, t.sub.1, t.sub.2, . . . (which might be the same system exposed to a different condition or even entirely different systems) where each task has been evaluated on a variety of configurations {right arrow over (x)}.sub.i:t.sub.0:({right arrow over (x)}.sub.0, {right arrow over (x)}.sub.1, . . . )t.sub.1:({right arrow over (x)}.sub.2, {right arrow over (x)}.sub.3, . . . )t.sub.2:({right arrow over (x)}.sub.4, {right arrow over (x)}.sub.5, . . . ). The evaluated configurations can be different across different tasks, but it is assumed that there is at least some overlap (i.e., for every pair of tasks we must have at least two common configurations).

[0112] To measure the similarity between two tasks, it is used the Reaction Matching (RM) process, which is graphically represented in FIGS. 4A-4C and described in the following. FIG. 4B shows the relevance scores computed using Eq. 2, which brings all the baselines to 0.

[0113] After having computed the performance scores with Eq. 2, the similarity between two tasks is defined as inversely proportional to the distance of the relevance scores they received with the same configurations.

[0114] Defining {{right arrow over (x)}.sub.i}.sub.i=1.sup.n as the set of the configurations that have been evaluated both on a target task t (the task to be characterized) and on another task t′ (which is available in the knowledge base), the distance between two tasks t, t′ is computed as:

[00003] $\begin{matrix} d_{tt'} = \sqrt{\frac{{.Math.}_{i = 1}^{n} {(r_{t} (\overset{.fwdarw.}{x_{i}}) - r_{t'} (\overset{.fwdarw.}{x_{i}}))}^{2}}{n}} & (Eq . 3) \end{matrix}$

[0115] In the example of FIG. 4C, Reaction Matching (RM) uses {right arrow over (x)}.sub.1 to measure the distances, and identifies t.sub.1 as the most similar task to t, as similarity is defined as the inverse of distance.

[0116] Similarity is also defined as the inverse of a distance of relevance scores of the first task ƒ.sub.t({right arrow over (x)}) to be evaluated and a second task ƒ.sub.t′({right arrow over (x)}) received with a set of configurations {{right arrow over (x)}.sub.j}.sub.j=1.sup.i.

[0117] It shall be noted that the definition of distance depends on the configuration vectors {{right arrow over (x)}.sub.i}.sub.i=1.sup.n which is evaluated on both the tasks and hence it is time dependent. In other words, Reaction Matching (RM) is a ‘real time’ or ‘on the fly’ process and the computed distances vary as the tuning proceeds and more knowledge becomes available. This is in contrast with other characterization methodologies and code-based approaches which never update their “beliefs”. Moreover, by measuring the actual performance, the Reaction Matching (RM) characterization depends on the external conditions and is decoupled from the source code.

Tuning model

[0118] In this section, finally all the components of the tuning method of the invention are brought together. It has been explained how Reaction Matching (RM) measures the distance between different tasks t, t′, but the main goal is to provide to the tuner module a characterization {right arrow over (c)} of the current task t.

[0119] To do that, the method starts by measuring the Reaction Matching (RM) distance between all the tasks that can be found in the knowledge base KB.

[0120] When starting from scratch, with an empty knowledge base KB, as explained above the first N tuning iterations (where N is defined by the user) are used to evaluate either the vendor default configuration or some randomly-sampled ones (depending on the user's preference) so to initialize the available knowledge base KB.

[0121] Then, the distances computed by the Reaction Matching (RM) process are used as an input to a clustering algorithm (for example k-means, DBSCAN, EM).

[0122] The computed clusters are considered as archetypal tasks, as they represent types of tasks that should be tuned in similar manners. Having being identified these archetypal tasks by means of the Reaction Matching, tasks which benefit from the same configurations are automatically grouped together. In other words, archetypal tasks are representative of a particular combination of systems and working conditions which should be configured similarly. The similarity in the tunable properties could be due to a similarity in the conditions, or in the application, or in the versions of the used components, or in the architecture upon which we ran the test, or in a combination of these factors or other unknown causes. The advantage of this approach is that it just observes and exploits similarities in the tunable properties, without trying to explain them.

[0123] Then, the distances between the target task t to be evaluated and the various archetypal tasks are finally used as a characterization for CGPTuner (contextual CP Tuner). To compute this distance, it is sufficient to compute the Reaction Matching (RM) distance between the current target task and all the archetypal tasks in the knowledge base (KB). Then, for each cluster it is taken the average of the distances between the target task and all the tasks that were assigned to the considered cluster by the clustering algorithm. Doing this for all the archetypes, it is obtained a vector of distances, which is used as the characterization vector {right arrow over (c)} to be input into the CGPTuner.

[0124] It is important to stress that the characterization of the working condition does not rely on the identification of the current number and types of requests processed by the system, as per workload definition. Instead, SUT and working conditions are characterized using Reaction Matching (RM). As a general understanding, if two different SUTs, when exposed to different working conditions, require similar settings, then these two tasks should be considered as similar, and can be tuned using the same configuration. If the two tasks experience a performance boost from the same configurations, it can be inferred that they also show similar reactions to other configurations. In other words, it is expected that a pair of configuration-SUT-working conditions have similar patterns so that certain configurations are beneficial to both of them, while other ones are detrimental, and others again have no effect at all.

[0125] As said above, the current target task t is characterized in terms of its Reaction Matching (RM) distance w.r.t. the archetypal tasks that have been determined from the knowledge base by the clustering algorithm. Said archetypal tasks are identified by using the clustering algorithm on the knowledge base KB. So, once the knowledge base KB has been partitioned into some clusters, the real tuning can start. Notice that, as the tuning progresses, more and more points will be added to the knowledge base, potentially modifying the output of the clustering algorithm.

[0126] When tuning a certain system s, the last N iterations of the tuning process {{right arrow over (x)}.sub.i−N, . . . , {right arrow over (x)}.sub.i} are considered. It is supposed that the user is able to provide a good estimate of N such that, inside the N considered iterations, it can be assumed that the actual working condition {right arrow over (w)} did not change. It is considered the last N iterations as the current target task t and the N configurations as the set {{right arrow over (x)}.sub.i}.sub.i=1.sup.N used to compute the Reaction Matching (RM) distance. It is searched in the knowledge base KB all the available clusters and, for each cluster, the Reaction Matching (RM) distance between the current task t and the cluster is measured using the set {{right arrow over (x)}.sub.i}.sub.i=1.sup.N, resulting in a characterization vector {right arrow over (c)} as explained above.

[0127] It is to be noted that a stopping criterion is not necessarily required, as the tuning process can keep optimizing the system continuously. In this way, the tuner takes care of adapting the configuration to the working condition, both in terms of incoming workloads and other external factors. Since the proposed technique is based on BO, it will converge towards optimum configurations once they are found.

[0128] Each iteration of the tuning method involves 5 steps. Iteration i has access to all the previous i−1 entries of the KB: (s.sub.j, {right arrow over (x)}.sub.j, y.sub.j)1≤j≤i−1. [0129] 1. In the starting step, knowledge base KB is divided into tasks and each entry is assigned to a task. The goal of this step is to group together all the entries that were collected on similar working conditions. This can be done in three different ways: [0130] a. if working condition information {right arrow over (w)} is available, it can be directly used and it can be applied an off-the-shelf clustering algorithm; [0131] a. if the user can provide a number N representing a number of tuning iterations under which the underlying condition w does not vary, a new task every N iterations can be created. In other terms, this requires to user to be able to identify the time scale under which the condition remains on similar values;

[0132] In any case, a task t is assigned to each entry: ({right arrow over (x)}.sub.j, y.sub.j, t.sub.j). Ideally, there are more entries than tasks, so to have multiple entries assigned to each task. [0133] 2. Using the computed task information, the RM distance between every pair of tasks in the KB is computed. The task distances are then used to run an off-the-shelf clustering algorithm, assigning each task t to a cluster q. With this step, we have collapsed together the entries which were collected on different working conditions. This can happen for two reasons: [0134] a. the initial guess in step 1 was wrong; as an example, when we use strategy 1b and the working condition repeats in a periodic way. Different repetitions of the same working condition are initially assigned to different tasks, and then grouped together by the clustering. [0135] b. the two different tasks are indeed representative of two different working conditions or systems, but they should be tuned in a similar way (i.e., benefit from the same configurations).

[0136] The knowledge base KB becomes: ({right arrow over (x)}.sub.j, y.sub.j, q.sub.j), where the task t is replaced with the newly created task q. At this point, we have reduced the number of identified tasks, and therefore we have increased the number of entries per task. As this allows to compute a more accurate RM distance, step 2 can be repeated iteratively. [0137] 3. At this point, computed tasks and task distances are used to derive the task characterization information. When characterizing task t, all the other tasks t′, t″, . . . . Are considered. For each other task, the entries that were assigned to it: t′:x′.sub.1, x′.sub.2, . . . are considered. Then, the RM distance between target task t and the other tasks t′, t″, . . . is computed: these will be the distances between the target task t and the archetypal tasks t′, t″, . . . . i.e. the numeric vector {right arrow over (c)}. It can be noted that, if the computed characterization is very different from the one which was forecasted at step 5 of the previous iteration, it can be decided to use this information as a change point detection and create a new task, going back to step 2. [0138] 4. Using ({right arrow over (c)}.sub.j, {right arrow over (x)}.sub.j, y.sub.j) it is computed a CGP and the associated AF. [0139] 5. Using all the computed characterization vectors {right arrow over (c)}.sub.j, it can be predicted the characterization {right arrow over (c)}.sub.i for the next point. {right arrow over (c)}.sub.i represents the characterization of the tunable properties of the SUT-working condition combination of the next iteration, and essentially is a forecast of the working condition. The easiest forecasting methodology consists in using the last one: {right arrow over (c)}.sub.i={right arrow over (c)}.sub.i−1. The forecasted characterization is used to optimize the AF and select the next configuration {right arrow over (x)}.sub.i to evaluate. At this point, said configuration is applied, the associated performance score y.sub.i is measured, the knowledge base KB is updated and then the system can go to the next iteration, restarting from step 1.

[0140] In the above embodiment, it has been assumed that a significantly large knowledge base KB is available, so as to be able to obtain a relevant amount of information about different tasks. This can be done by evaluating many different systems in a test environment before moving to real production systems. In fact, the entire characterization is built with the goal of reusing old KBs. However, when this is not the case, it is even possible to start from scratch with an empty knowledge base KB. In this instance, the process would just create a single task and resort to simple Bayesian Optimization with GP. As more and more points become available, the number of tasks is increased and it is possible to derive useful characterizations.

[0141] As can be understood from the above description, the proposed method represents a valid solution to the configuration autotuning problem. Relevant elements of the method are represented by Reaction Matching process and clustering process to derive a characterization of both the system and the working condition: this allows to favourably exploit knowledge that was gathered on different tunable systems, to obtain a faster and more reliable optimization process.

[0142] Although the present disclosure has been described with reference to specific embodiment, it should be understood that the method and apparatus provided by the present disclosure can have a number of variations and amendments without departing from the scope and background of the present disclosure. The description given above is merely illustrative and is not meant to be an exhaustive list of all possible embodiments, applications or modifications of the invention. Thus, various modifications of the described methods and apparatus of the invention will be apparent to those skilled in the art without departing from the scope of the invention.

METHOD AND SYSTEM FOR TUNING A COMPUTING ENVIRONMENT USING A KNOWLEDGE BASE

Inventors

Cpc classification

Classification Explorer

G06N20/10

PHYSICS

Classification Explorer

G06N7/01

PHYSICS

Classification Explorer

G06F11/3409

PHYSICS

Classification Explorer

G06F11/2289

PHYSICS

Classification Explorer

G06F11/3452

PHYSICS

Classification Explorer

G06F11/3442

PHYSICS

Classification Explorer

G06N5/022

PHYSICS

International classification

Classification Explorer

G06N5/02

PHYSICS

Abstract

Claims

Description