METHOD AND DEVICE FOR TUNING A HYPERPARAMETER OF A MACHINE-LEARNING ALGORITHM
20220397910 · 2022-12-15
Assignee
Inventors
Cpc classification
G06N7/01
PHYSICS
G06N3/006
PHYSICS
G06N5/01
PHYSICS
International classification
G06F11/34
PHYSICS
Abstract
A computer-implemented method of managing a machine-learning, ML, algorithm which is dependent on one or more hyperparameters is described. The method comprises: providing a plurality of instances of the ML algorithm using different values of the hyperparameters; obtaining a plurality of input states; mapping each predefined input to a plurality of outputs using the instances of the ML algorithm; evaluating a predefined quality metric for the outputs; and on the basis of statistics of the quality metric for the outputs, selecting at least one instance of the ML algorithm for continued use. In some embodiments, a technical system is controlled using one primary instance of the ML algorithm. If the selected instance of the ML algorithm is not the primary instance, the primary instance may optionally be replaced by the selected instance.
Claims
1. A computer-implemented method of managing a machine-learning, ML, algorithm which is dependent on one or more hyperparameters, the method comprising: providing a plurality of instances of the ML algorithm using different values of the hyperparameters; obtaining a plurality of input states; mapping each input state to a plurality of outputs using the instances of the ML algorithm; evaluating a predefined quality metric for the outputs; and on the basis of statistics of the quality metric for the outputs, selecting at least one instance of the ML algorithm for continued use.
2. The method of claim 1, wherein the hyperparameters include one or more of: learning rate, discount factor, parameters affecting convergence rate, probability of fallback to taking a random action.
3. The method of claim 1, wherein said selecting includes modifying the values of the hyperparameters, which are used in the instances of the ML algorithm, to be more similar to those of the selected instance, and repeating the mapping and evaluating steps.
4. The method of claim 3, wherein said selecting includes ensuring a least spread of the modified values of the hyperparameters, which are used in the instances of the ML algorithm.
5. The method of claim 1, wherein the quality metric relates to at least one of the following quantities for a technical system: revenue, productivity, uptime, operating cost, energy consumption, battery degradation, hardware wear.
6. The method of claim 1, wherein said selecting is based on comparing averages of the quality metric across the instances of the ML algorithms.
7. The method of claim 1, wherein one primary instance, from said plurality of instances of the ML algorithm, is utilized to control a technical system and the remaining instances are operated in a hot standby mode, and wherein the input states are obtained by identifying or estimating states of the technical system, the method further comprising replacing the primary instance by the selected instance of the ML algorithm.
8. The method of claim 1, further comprising: using the selected instance of the ML algorithm in a traffic planning method for controlling a plurality of vehicles, wherein each vehicle occupies one node in a shared set of planning nodes and is movable to other nodes along predefined edges between pairs of the nodes in accordance with a finite set of motion commands, wherein the input state indicates the planning nodes occupied by the vehicles and the output corresponds to a sequence of motion control commands to be applied to the vehicles.
9. The method of claim 8, wherein the vehicles are autonomous vehicles.
10. A device configured to manage a machine-learning, ML, algorithm which is dependent on one or more hyperparameters, the device comprising processing circuitry configured to execute the method of claim 1.
11. The system of claim 10, further comprising an interface configured to control a technical system.
12. A computer program comprising instructions which, when executed, cause a processor to execute the method of any of claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] Aspects and embodiments are now described, by way of example, with reference to the accompanying drawings, on which:
[0022]
[0023]
[0024]
[0025]
[0026]
DETAILED DESCRIPTION
[0027] The aspects of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, on which certain embodiments of the invention are shown. These aspects may, however, be embodied in many different forms and should not be construed as limiting; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and to fully convey the scope of all aspects of the invention to those skilled in the art. Like numbers refer to like elements throughout the description.
[0028]
[0029] The method 100 may be executed by a general-purpose computer or, in particular, by a device 200 of the type illustrated in
[0030] In a first step 110 of the method 100, a plurality of instances of the ML algorithm using different values h.sub.j, j=1, . . . , J, of the hyperparameters are provided. A scalar- or vector-valued hyperparameter h.sub.j may represent a learning rate, a discount factor (relating to reward), a parameter affecting convergence rate or the like. The values h.sub.j,j=1, . . . , J may be provided by a random or quasi-random sampling, which may be modified based on knowledge obtained from a foregoing iteration of the method steps. Providing an instance of the ML algorithm may include creating a data structure, reserving memory space, instantiating an object or the like, wherein the value h.sub.j is applied. It may further include training the ML algorithm, wherein the training is in accordance with h.sub.j as regards learning rate, discount factor, convergence etc.
[0031] In a second step 112 of the method 110, a plurality of input states x.sub.k, k=1, . . . , K, are obtained. The input states may be obtained by random sampling (e.g., in offline hyperparameter tuning), simulation of a technical system (e.g., in a testing phase), measurement by reading one or more sensors associated with the technical system, or estimation by processing one or more such sensor readings (e.g., during operation, as discussed below with reference to
[0032] In a next step 114, each input state x.sub.k is mapped to a plurality of outputs y.sub.j,k=P(x.sub.k; h.sub.j) using the J instances of the ML algorithm. In cases where the ML algorithm is a neural network, this includes applying the input state and reading the output of the network.
[0033] In a fourth step 116 of the method 100, a predefined quality metric μ is evaluated for each of the outputs y.sub.j,k. The quality metric μ may be revenue, productivity, uptime, operating cost, energy consumption, battery degradation, hardware wear or similar factors, or a combination of these. For a vehicle fleet, the quality metric μ may represent a payload quantity, payload-distance, passenger-distance, passengers per unit distance/time, a customer satisfaction indicator (e.g., percentage of on-time delivery of goods), a safety indicator (e.g., incidence of near-collisions), a driving economy indicator (e.g., speed variability, steering-angle variability, energy consumption) etc. In particular the difference of revenue and cost may be used.
[0034] The quality metric may depend directly on the output y.sub.j,k. It may further-more depend on the evolution of the technical system which occurs when the output y.sub.j,k is applied as a control signal. In this case, the evaluating step 116 includes controlling the technical system on the basis of the output y.sub.j,k, as decided by the ML algorithm, when the system is in the input state x.sub.k and determining how the technical system evolves. The evolution may be determined by running a computer simulation of the technical system or performing measurements on the technical system in operation. In the case of a vehicle fleet, a sequence of motion control commands corresponding to the output y.sub.j,k may be applied to the vehicles and the resulting end state of the fleet and/or the intervening vehicle movements may be recorded. The evolution of the vehicle fleet may be determined by computer simulation, a physical simulation (e.g., using a scale model), or by operating the actual vehicles.
[0035] In a next step 118, statistics S{μ(y.sub.j,1), μ(y.sub.j,2), . . . , μ(y.sub.j,K)} of the quality metric for the outputs are computed and used as a basis for selecting at least one in-stance of the ML algorithm for continued use. A statistic S of the quality metric may be a quantity calculated from the metric of the outputs y.sub.j,1, y.sub.j,2, . . . , y.sub.j,K, such as average, standard deviation, variance, variation coefficient, α-quantile (0<α<1), minimum, maximum and peak-to-average ratio. For example, the average may be calculated as
(Optionally, the averaging is performed over K(j)≤K terms, that is, even if some instances of the ML algorithm do not produce an output for all K input states, the evaluation may still be performed on the basis of statistics which for those instances are based on K(j)<K outputs only.) Accordingly, step 118 may include a comparison of input state values mapped under compositions of the form S∘P∘P. Here, where P depends parametrically on the hyperparameter h.sub.j in the sense that a learning phase has been in accordance with h.sub.j.
[0036] To illustrate steps 116 and 118, reference is made to
[0037] In step 118, “continued use” may refer to the application to the technical system to be controlled. It may as well refer to a next iteration of the steps of the method 100, which are then repeated after adjusting the values h.sub.j, j=1, . . . , J, of the hyperparameters in view of any conclusions drawn from the statistics. For this next iteration, the same input states x.sub.k, k=1, . . . , K, may be used (i.e., no repetition of step 112), or different/modified ones are used.
[0038] The step 118 may further include modifying the values of the hyper-parameters that are to be used in the instances of the ML algorithm in the next iteration may be modified to be more similar to (to approach) those of the selected instance h.sub.j.sub.
h.sub.jh.sub.j+κ(h.sub.j.sub.
where 0<κ<1 represents a tunable a control gain. If the hyperparameter space is multidimensional, this operation may be applied to some or all of its components. This may optionally be accompanied by a step to ensure a predefined minimum spread of the modified hyperparameter values. For example, if one component is subjected to the approaching operation above, the remaining components may be modified by addition of (independently sampled) random numbers, so as to ensure that the hyperparameter space is efficiently explored in the next iteration. A criterion for the spread may be based on statistics for the mutual spacing of the hyper-parameter values or on properties of their convex hull, as explained in detail above.
[0039] In one embodiment, the step 118 is not succeeded by a new iteration in search of a more suitable hyperparameter value, but instead by a step 120 of replacing a primary instance of the ML algorithm with the instance which was selected in step 118. As used in this description, the “primary instance” of the ML algorithm is that instance which is used to control a technical system.
[0040]
[0041] A quality metric module 402 maps the outputs from the respective instances of the ML algorithm in accordance with a predefined quality metric p. Downstream of this, a statistics module 403 computes statistics relating to each of the instances of the ML algorithm. The statistics may be a moving quantity, such as a moving average, moving median or moving variance, which reflects the performance of each ML algorithm instance in the recent past, e.g., a window of fixed length. Alternatively, the quality metric module 402 or statistics module 403 includes a buffer configured to store the values relating to an ongoing operation episode, wherein the statistics module 403 is configured to compute the statistics once a sufficient quantity of data has accumulated. A selection module 404 downstream thereof is configured to select that instance of the ML algorithm which performs best in view of the statistics, and to update the position of the switch 405 accordingly. It is recalled that
[0042] An optional further functionality of the selection module 404 is to generate updated values of the hyperparameter values in view of the statistics of the quality metric. The updated values of the hyperparameter values may be used to provide a new set of instances of the ML algorithm, and training may be performed in accordance with these. It is noted that the option of taking the hyperparameter value h.sub.j.sub.
[0043] A still further optional functionality of the selection module 404 is to operate the switch 405 in such manner that the primary instance of the ML algorithm is replaced in case of an execution failure detected at runtime.
[0044] It is noted that the arrangement in
[0045] With reference to
[0046] The vehicles v1, v2, v3, v4 may be trucks, buses or construction equipment. Each vehicle can be controlled by an individual control signal, which may indicate a command chosen from a finite set of predetermined commands. If the vehicles are autonomous, the control signal may be a machine-oriented signal which controls actuators in the vehicle; if the vehicles are conventional, the control signals may be human-intelligible signals directed to their drivers. It is understood that individual control signals may be multiplexed onto a common carrier. A predefined command may represent an action to be taken at the next waypoint (e.g., continue straight, continue right, continue left, stop), a next destination, a speed adjustment, a loading operation or the like. Implicit signaling is possible, in that a command has a different meaning depending on its current state (e.g., toggle between an electric engine and a combustion engine, toggle between high and low speed, drive/wait at the next waypoint). A vehicle/driver which receives no control signal or a neutrally-valued control signal may be configured/instructed to continue the previously instructed action or to halt. The predetermined commands preferably relate to tactical decision-making, which corresponds to a time scale typically shorter than strategic decision-making and typically longer than operational (or machine-level) decision-making. Different vehicles may have different sets of commands.
[0047] An ML algorithm of the type which method 100 or device 200 targets can be used for centralized control of the vehicles v1, v2, v3, v4. The vehicles v1, v2, v3, v4 are to be controlled as a group, with mutual coordination. The mutual coordination may entail that any planning node utilization conflicts which could arise between pairs of vehicles are deferred to the ML algorithm and resolved at the planning stage. The planning may aim to maximize productivity, such as the total quantity of useful transport system work or the percentage of on-schedule deliveries of goods. The planning may additionally aim to minimize cost, including fuel consumption, battery wear, mechanical component wear or the like. These aims may overlap with those measured by the quality metric p.
[0048] To illustrate the decision-making in the planning phase, the planning node utilization conflicts that may arise will be briefly illustrated. The node occupancies (input state) shown in
Here, no vehicle blocks any other vehicle. It can also be seen that these node occupancies provide each vehicle with a next waypoint to which it can move in a next epoch. The choice is not arbitrary, however, as both vehicles v1 and v4 may theoretically move to waypoint wp3, but this conflict can be avoided by routing vehicle v1 to waypoint wp2 instead. If the system is evolved in this manner, that is, into
then vehicle v4 will block vehicle v1 from moving. This blocking state temporarily reduces the vehicle system's productivity but will be resolved once vehicle v4 continues to waypoint wp4. It is easy to realize that the difficulty of the blocking states (as measured, say, by the number of vehicle movements needed to reach a non-blocking state) in a given waypoint topology will increase with the number of vehicles present. The efficiency gain of deploying many vehicles may therefore be consumed by the increased risk of conflicts. A waypoint topology populated with many vehicles may also include deadlock states, where no vehicle movement is possible at all. This may correspond to a real-life scenario where the controlled vehicles need external help to resume operation, such as operator intervention, towing etc.
[0049] A model of the vehicle system in
[0050] Within the scope of the present disclosure, the ML algorithm for controlling the vehicle system of
[0051] The aspects of the present disclosure have mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the invention, as defined by the appended patent claims.