METHOD AND SYSTEM FOR CONTROLLING A FLUID TRANSPORT SYSTEM

20230221682 · 2023-07-13

    Inventors

    Cpc classification

    International classification

    Abstract

    A method for controlling operation of a fluid transport system by applying a self-learning control process. The method includes: receiving obtained values of input signals during operation of the system during a first period of time, which is controlled by a predetermined control process, automatically selecting a subset of the input signals based on the received obtained values of the input signals, receiving obtained values of at least the selected subset of input signals during a second period of time, which is controlled by applying the self-learning control process, which is configured to control operation based only on the selected subset of input signals, and wherein applying the self-learning control process includes updating the self-learning control process based on the received obtained values of the selected subset of the input signals and based on at least an approximation of a performance indicator function.

    Claims

    1. A computer-implemented method for controlling operation of a fluid transport system, by a applying a self-learning control process, the method comprising the steps of: receiving obtained values of a plurality of input signals during operation of the fluid transport system during a first period of time, wherein operation of the fluid transport system during the first period of time is controlled by a predetermined control process, automatically selecting a subset of the plurality of input signals based on the received obtained values of the plurality of input signals, receiving obtained values of at least the selected subset of input signals during operation of the fluid transport system during a second period of time, wherein operation of the fluid transport system during the second period of time is controlled by applying the self-learning control process, wherein the self-learning control process is configured to control operation of the fluid transport system based only on the selected subset of input signals, and wherein applying the self-learning control process comprises updating the self-learning control process based on the received obtained values of the selected subset of the input signals and based on at least an approximation of a performance indicator function.

    2. A computer-implemented method according to claim 1, wherein the predetermined control process is a non-adaptive control process.

    3. A computer-implemented method according to claim 1, wherein the plurality of input signals defines an input space having a first number of dimensions; wherein the selected subset of input signals defines a reduced input space having a reduced number of dimensions, smaller than the first number of dimensions.

    4. A computer-implemented method according to claim 1, wherein automatically selecting includes applying one or more information-theoretic selection criteria.

    5. A computer-implemented method according to claim 4, wherein the one or more information-theoretic selection criteria include a mutual information criterion based on a determined mutual information measure between respective ones of the plurality of input signals and an observed performance measure.

    6. A computer-implemented method according to claim 5, wherein the observed performance measure includes at least one observed performance indicator evaluated at a plurality of times, optionally implementing a time-dependent weighting of performance indicator values or a time-dependent weighting in dependence of a rate of fluid flow in the fluid transport system.

    7. A computer-implemented method according to claim 1, wherein the automatically selecting includes selecting at least one input signal that is associated with a time shift delay dependent on a flow rate of fluid flow in the fluid transport system.

    8. A computer-implemented method according to claim 1, further comprising configuring an initial version of the self-learning control process based on the selected subset of input signals; wherein configuring the initial version of the self-learning control process comprises pre-training the initial version of the self-learning control process based on the received obtained values of the plurality of input signals during the first period of time and based on performance indicator values recorded during operation of the fluid transport system during the first period of time.

    9. A computer-implemented method according to claim 8; wherein the automatic selection and the configuration of the initial version of the self-learning control process are performed during a transitional period, subsequent to the first period and prior to the second period.

    10. A computer-implemented method according to claim 1, wherein the self-learning control process implements a reward-based learning agent.

    11. A computer-implemented method according to claim 10, wherein the reward-based learning agent is a reinforcement learning agent.

    12. A computer-implemented method according or claim 10, wherein updating the self-learning control process is based on one or more observed performance indicators, observed during a time horizon or in particular a flow-dependent time horizon.

    13. A computer-implemented method according to claim 1, wherein the self-learning control process includes at least one stochastic component.

    14. A computer-implemented method according to claim 1, wherein updating the self-learning control process is based on an approximation of the performance indicator function, wherein said approximation is a performance approximator function approximating a dependence of the performance indicator function on the selected subset of input signals and/or on one or more control actions taken by the self-learning control process to control the fluid transport system.

    15. A computer-implemented method according to claim 14, wherein the performance approximator function is parametrized by a plurality of weight parameters, and wherein the updating the self-learning control process comprises updating one or more of the plurality of weight parameters.

    16. A computer-implemented method according to claim 1, wherein the performance indicator function includes a comfort indicator and/or a cost indicator.

    17. A computer-implemented method according to claim 1, further comprising: automatically selecting a new subset of the plurality of input signals based on the received obtained values of the plurality of input signals, received during the second period, receiving obtained values of at least the selected new subset of input signals during operation of the fluid transport system during a third period of time, wherein operation of the fluid transport system during the third period of time is controlled by applying a new self-learning control process adapted to the selected new subset of input signals, wherein the new self-learning process is configured to control operation of the fluid transport system based only on the selected new subset of input signals, and wherein applying the new self-learning control process comprises updating the new self-learning control process based on the received obtained values of the selected new subset of the input signals and based on at least the approximation of the performance indicator function.

    18. A control system for controlling a fluid transport system; wherein the control system is configured to perform the steps of the computer-implemented method according to claim 1.

    19. A control system according to claim 18, comprising a control unit communicatively coupled to one or more controllable components of the fluid transport system; wherein the control unit is configured to receive obtained values of at least the selected subset of input signals during operation of the fluid transport system and to selectively control operation of the fluid transport system by applying the predetermined control process or by applying the self-learning control process.

    20. A control system according to claim 18, comprising a data processing system configured to receive the obtained values of a plurality of input signals during operation of the fluid transport system during the first period of time, and to automatically select the subset of the plurality of input signals based on the received obtained values of the plurality of input signals.

    21. A control system according to claim 20, wherein the data processing system is a remote data processing system, in particular a cloud service, located remotely from the control unit.

    22. A control system according to claim 20, wherein the data processing system is further configured to configure an initial version of the self-learning control process based on the selected subset of input signals; wherein configuring the initial version of the self-learning control process comprises training the initial version of the self-learning control process based on the received obtained values of the plurality of input signals during operation of the fluid transport system during the first period of time and based on performance indicator values recorded during operation of the fluid transport system during the first period of time.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0048] The above and other aspects will be apparent and elucidated from the embodiments described in the following with reference to the drawing in which:

    [0049] FIG. 1 schematically illustrates an example of a fluid transport system.

    [0050] FIG. 2 schematically illustrates another example of a fluid transport system.

    [0051] FIG. 3 schematically illustrates a control system for controlling a fluid transport system.

    [0052] FIG. 4 schematically illustrates a process for controlling a fluid transport system.

    [0053] FIG. 5 schematically illustrates a process for controlling a fluid transport system by a self-learning control process.

    [0054] FIG. 6 schematically illustrates a process for selecting input signals for a self-learning control process.

    DETAILED DESCRIPTION

    [0055] In the following, embodiments of aspects disclosed herein will be described in the context of heating systems such as HVAC systems or district heating networks.

    [0056] In this context, embodiments of the method and system disclosed herein provide a data driven method for self-commissioning and optimal control of a building or a district heating network. Here, the term self-commissioning refers to a system that itself selects which data points from a large pool to use as inputs to the data-driven control method.

    [0057] Embodiments of the process and system described herein apply a self-learning control process that reduces, if not minimizes, the cost of operation of a building HVAC system or a district heating network, while maintaining good comfort. Contrary to existing solutions, embodiments of the system and method described herein do not require large amount of posterior knowledge for the purpose of configuring the control system.

    [0058] Additionally, embodiments of the method and system described herein may be applied to new fluid transport systems, where no data log is yet available for configuration.

    [0059] FIG. 1 schematically illustrates an embodiment of a fluid transport system, in particular a heating system.

    [0060] The system comprises one or more controllable components 40, also referred to as actuators. Examples of controllable components of a heating system include valves, pumps and/or dampers. For example, the controllable components 40 may include a valve/pump combination that constitutes a so-called mixing loop. It will be appreciated that some examples of fluid transport systems may include alternative and/or additional types of controllable components, e.g. fans, etc. The system further comprises additional components (not explicitly shown) in addition to the controllable components 40, such as such as pipes, fluid reservoirs, radiators, etc. Some or all of the additional components are directly or indirectly operationally coupled to—e.g. in fluid communication with—the controllable components 40.

    [0061] The heating system comprises a control system 10 operatively coupled to the controllable components 40 and configured to control one or more controllable variables of the fluid transport system by controlling the controllable components 40. Examples of controllable variables include a valve setting—in particular a valve opening degree—a temperature set point, a pump pressure set point or a pump speed set point, an opening degree of a damper, a fan speed and/or the like.

    [0062] The control system 10 may be implemented by a suitably programmed data processing system, such as a suitably programmed computer, or by another data processing device or control unit. In some embodiments, the control system 10 may be implemented as a distributed system including more than one computers, data processing devices or control units. The control system 10 is communicatively coupled to the controllable components 40, e.g. via a wired or wireless connection. The communication between the control system and the controllable components may be via a direct connection or via an indirect connection, e.g. via one or more nodes of a communications network.

    [0063] Examples of a wired connection include a local area network, a serial communications link, a control bus, a direct control line, etc. Examples of wireless connections include a radio frequency communications link, e.g. Wi-Fi, Bluetooth, cellular communication, etc.

    [0064] The control system 10 comprises a suitably programmed processing unit 11, e.g. a CPU, a microprocessor, etc. The control system further comprises a memory 12 which may have stored thereon a computer program and/or data for use by the processing unit 10. It will be appreciated that the control system 10 may comprise additional components, e.g. one or more communications interfaces and/or a user-interface, e.g. including a graphical user-interface displayed on a display of the data processing system such as on a touch screen. Examples of communications interfaces include a wired or wireless network adaptor, a serial data interface, a Bluetooth transceiver, etc.

    [0065] The heating system comprises a plurality of sensors 30. Examples of sensors include temperature sensors, pressure sensors, sensors for sensing wind speed, sensors for sensing the operational state of windows, doors etc.

    [0066] The sensors 30 are communicatively coupled to the control system 10, e.g. via a wired or wireless connection. The communication between the sensors 30 and the control system 10 may be via a direct connection or via an indirect connection, e.g. via one or more nodes of a communications network. Examples of a wired connection include a local area network, a serial communications link, a data bus, a direct data line, etc. Examples of wireless connections include a radio frequency communications link, e.g. Wi-Fi, Bluetooth, cellular communication, etc. In the example of FIG. 1, the control system is directly coupled to some sensors and indirectly coupled to other sensors. The indirect coupling may be via a building management system 20 or other form of data server that receives sensor signals from multiple sensors and/or from other data sources and that forwards some or all of the sensor data to the control system 10. In some embodiments a building management system may implement both a data server and the control system 10.

    [0067] Generally, examples of input signals include sensor data, e.g. from load-indicating sensors associated with the heating system, e.g. temperatures, pressures, flows etc., or event indicators e.g. such as window or door switches, or other types of sensors.

    [0068] Additionally or alternatively to the sensor signals from sensors 30, the control system 10 may also receive data or other input signals from other sources. For example, the control system may receive weather forecast data from a weather service, occupancy data from a booking system, information about energy prices from an external system, etc.

    [0069] Accordingly, during operation, the control system 10 receives sensor inputs from sensors 30 and, optionally, further input from other sources. Generally, the inputs from the sensors 30 and, optionally, from other sources may be received in the form of digital signals or in the form of analogue signals which may then be converted into digital signals. For the purpose of the present description, the received inputs will be referred to as input signals. The control system 10 may receive the input signals intermittently, e.g. periodically, e.g. such that the control system 10 receives one or more time series of sensed values indicative of respective input variables sensed by the sensors at different points in time.

    [0070] The control system 10 is configured to execute a control process that controls the controllable components 40 responsive to the received input signals or at least responsive to a selected subset of input signals as described herein. In particular, the control system is configured to execute a process as described herein, e.g. one or more of the processes described below with reference to FIGS. 3-6.

    [0071] FIG. 2 shows another embodiment of a fluid transport system. The system of FIG. 2 is identical to the system of FIG. 1, except that the control system 10 is a distributed system. In particular, the control system includes a local control unit 10A which is communicatively coupled to the controllable components 40 and to the sensors 30. The control system further comprises a remote data processing system 10B, e.g. a remote computer, a distributed computing environment, etc. The local control unit 10A comprises a processing unit 11A and a memory 12A, as described in connection with FIG. 1. The remote data processing system 10B also comprises one or processing units 11B, e.g. one or more CPUs, and at least one memory 12B. The local control unit 10A and the remote data processing system 10B are communicatively coupled with each other, e.g. via a direct or indirect communication link, which may be wired or wireless. For example, the local control unit 10A and the remote data processing system 10B may be communicatively coupled via the internet or another suitable computer network. In some embodiments, the remote data processing system may receive inputs directly from the building management system 20 and/or from sensors 30.

    [0072] In the embodiment of FIG. 2, the local control unit 10A and the remote data processing system 10B may cooperate to implement an embodiment of the process described herein. For example, the remote data processing system 10B may implement the selection of input signals and/or the configuration and, optionally, pre-training of the self-learning control process, while the local control unit 10A may execute the self-learning control process. Alternatively, the remote data processing system 10B may also execute a part of the self-learning control process, e.g. the determination of actions to be taken and/or the updating of the self-learning control process. In such embodiments, the local control unit 10A may receive information from the remote data processing system about actions to be taken and translate the information in specific control commands to the controllable components 40. The local control unit 10A may further collect the input signals and forward at least the selected input signals to the remote data processing system 10B.

    [0073] FIG. 3 schematically illustrates a self-learning control process, generally designated 310, for controlling a fluid transport system, in particular a heating system for heating a building. The self-learning control process 310 may be a reinforcement learning control process or other self-learning control process for controlling controllable components 40 of a fluid transport system 60, e.g. of a hydronic system such as an HVAC or a district heating system.

    [0074] The self-learning control process 310 receives a plurality of input signals, e.g. signals from sensors or from other sources. The input signals may include the controllable control variables through which the self-learning control process can impose control actions. The input signals may further include other signals, e.g. signals that describe the load on the system; these may considered disturbances to the control. The self-learning control process uses a subset of the available input signals as a representation of the state of the fluid transport system. The selected variables may include some or all of the control variables and/or signals describing the disturbances. In the following, the total pool of available input variables will be denoted by x. The values of x at time t is a vector x.sub.t. The selected input signals will be referred to as states s. The values of s at time t is a vector denoted by state vector s.sub.t. The state vector thus represents the state of the fluid transport system at time t. The process of automatically selecting which subset of input signals to use by the self-learning control process will also be referred to as state selection. The self-learning control process may receive the input signals directly the respective signal sources or from a data server or another suitable data repository where input signals from multiple sources can be stored or otherwise aggregated. The self-learning control process and the data server may be implemented as separate modules or integrated with each other. In particular, they may be implemented by a local control unit, a building management system, or otherwise by the same data processing system, e.g. in a system as illustrated in FIG. 1. Alternatively, the self-learning control process and/or the data server may be implemented by a remote data processing system communicatively coupled to a local control unit, e.g. in a system as illustrated in FIG. 2.

    [0075] The self-learning control process adjusts one or more control variables for the controllable components. The control variables may be set points for local control loops in the heating system, such as temperature, pressure or flow control loops. For example, in connection with a mixing loop, the control set points may be a temperature and a pump pressure. To this end, the control system executing the self-learning control process 310 has an interface to the controllable components 40 of the heating system, via which the control system imposes actions on the heating system. The adjustments of the control variables by the self-learning control process in response to the received state information are referred to as a set of actions. The set of actions imposed at time t will be designated as a vector a.sub.t. The controllable components 40 may for example be valves, pumps or dampers. In one example, the controllable components include a valve/pump combination that constitutes a combination known as a mixing loop. The process determines the actions a.sub.t responsive to the selected subset of received input signals, i.e. responsive to the state s.sub.t.

    [0076] The control process is self-learning, i.e. the control actions for a given state change over time to improve performance, as new knowledge is learned about the result of previous control actions.

    [0077] To this end, the self-learning control process receives one or more performance indicators indicative of one or more results of the control actions taken by the self-learning control process. The process may base the updating of the self-learning process on a weighted combination of multiple performance indicators and/or another function of one or more performance indicators. The values of a performance indicator at time t will also be referred to as the reward r.sub.t. It will be appreciated that the performance indicator may be an overall indicator, e.g. a combination of multiple different performance indicators. Hence, taking actions a.sub.t and bringing the system into a state s.sub.t+1 yields a reward r.sub.t+1. The reward may include one or more performance indicators, e.g. a control error and, optionally, an operating cost for the heating system. The reward may be a function of the selected input signals.

    [0078] Generally, a reinforcement-learning control process may be configured to update itself, i.e. to learn, by reinforcing the desired behaviour by a reward as illustrated in FIG. 3. The self-learning control process 310 seeks to choose an action that optimizes a combination of rewards over time, also referred to as the return G. In particular, the return may be defined as a cumulative n-step reward:


    G.sub.t+n=Σ.sub.k=0.sup.n−1γ.sup.kr.sub.t+k+1.

    [0079] Here, 0≤γ≤1 is a discount rate diminishing the influence of future rewards on the return. The added discount rate ensures that the return is well defined going to infinite time, i.e. n−>∞, while ensuring a higher importance of rewards happening sooner. In reinforcement learning, the control strategy for the control system is often called a policy π.

    [0080] At the time an action is taken, the rewards and, hence, the return resulting from that action cannot yet be measured. The self-learning control process may thus consider an expected future return. An action-value function may be defined that describes the expected return of being in a state s.sub.r, taking action a.sub.t and following the policy π, e.g. as


    Q.sub.π(s,a)=E[G.sub.t+∞|s.sub.t=s,a.sub.t=a].

    [0081] Accordingly, the action-value function is an example of a performance indicator function. Other examples of a performance indicator function include a value function describing an expected return of being in a state s and following the policy.

    [0082] A policy that, given state s, chooses the action a that maximizes the expectation of return is called a greedy policy. A problem associated with the greedy policy is that no new exploration into a potentially more rewarding action is done. The trade-off between exploration and exploitation of current knowledge is an interesting aspect of reinforcement learning since both optimality and adaptiveness are desired. Therefore, stochastic policies where an amount of exploration can be achieved may be used. An example of such a policy is the so-called s-greedy policy which takes random actions with probability ε.

    [0083] A self-learning control process seeks to improve the performance of the system by applying a learning process. For example, in U-learning an optimal policy, which maximises the return, is sought to be approximated by finding an optimal action-value function, no matter what policy is followed:

    [00001] Q ( s t + 1 , a t + 1 ) Q ( s t + 1 , a t + 1 ) + α [ r t + 1 + γ max a Q ( s t + 1 , a t + 1 ) - Q ( s , a ) ] .

    [0084] Here, α∈[0;1] is the learning rate. While one goal for the learning agent is to approximate the action-value function or other performance indicator function, another goal is for the agent to find the optimal policy that maximises the return for every state. Since the value function and the optimal policy are dependent upon each other, they are often optimized in an iterative fashion called the value-policy iteration.

    [0085] Generally, the self-learning control process may use an approximation, in particular a function approximation, of the performance indicator function, where the selected subset of input signals and, optionally, the one or more actions are inputs to the function approximation. The approximation will also be referred to as performance approximator function. The performance approximator function may be a parametrized approximation, e.g. in the form of a neural network function approximation or it may have a functional structure derived from domain knowledge. For example, in some embodiments, the control system may maintain an estimate or approximation of the action-value function as performance approximator function, in the following denoted {circumflex over (Q)}(s, a, w) where w denotes the weight vector which parametrizes {circumflex over (Q)}. By taking actions a.sub.t and sampling states s.sub.r and rewards r.sub.t, the self-learning control process 310 can improve the estimate of the performance indicator function {circumflex over (Q)}.

    [0086] The update of the estimate of the performance indicator function may employ a suitable learning process, e.g. temporal difference learning. Temporal difference learning is another interesting aspect in reinforcement learning. It may be described as a mixture of Monte Carlo and Dynamic Programming. In Monte Carlo the full episode of actions, state transitions and returns are measured and then the estimate of the state-action-value function is computed purely from measurements. In dynamic programming a model of the Markov Decision Process is already known, so an estimate from this knowledge can be used for bootstrapping. In temporal difference learning, the bootstrap target is calculated, both from the sampled reward and the system knowledge already acquired. The temporal difference error is the error between the current estimation of the state-action-value function and the new estimate.

    [0087] In some embodiments, a multi-step method is employed. Multi-step methods often perform better than single step methods, as they use more samples. To this end, a parametrization using a trace decay λ of returns may be employed such that the method can span from λ=0, corresponding to a one step method, and up to λ=1, corresponding to a Monte Carlo method:

    [00002] G t λ = ( 1 - λ ) .Math. n = 1 T - t - 1 λ n - 1 G t + n + λ T - t - 1 G t + .

    [0088] A multi-step method may preferably be implemented as eligibility traces due to the computational advantages. An eligibility trace utilizes a trace vector z that changes according to the partial derivatives of an estimated performance indicator function with respect to the weights that parametrize the estimated performance indicator function. The trace vector decays by γλ:


    z.sub.t=γλz.sub.t−1+∇.sub.w{circumflex over (Q)}(s.sub.t,a.sub.t,w.sub.t).

    The weights are then adjusted according to


    w.sub.t+1=w.sub.t+αδ.sub.tz.sub.t

    [0089] Where the temporal difference error, δ.sub.t is the error between the current estimation of the state-action-value function and the new estimate.

    [0090] In some embodiments, the self-learning control process uses a volume flow rate to schedule the time horizon over which rewards are considered for the purpose of updating the performance approximator function. Multiple horizons are possible for multiple signals. The time horizon may be explicitly defined or implicitly, e.g. by a suitable weighting or decay, such as by a trace decay as described herein.

    [0091] The subset of input signals used by the self-learning control process are selected from the pool of available input signals. This selection is performed automatically, preferably by a data-driven method, also referred to a state selection method.

    [0092] The state selection method may be performed before the self-learning control process is used for the first time to control the fluid transport system. To this end, data may initially be collected from the fluid transport system while it is controlled by a standard non-adaptive control process. Moreover, a new state selection may be performed subsequently, e.g. periodically and/or triggered by trigger event. In some embodiments, the state selection runs as often as it is deemed necessary based on domain knowledge, e.g. in order to keep the subset of selected input signals updated, e.g. as described in connection with FIG. 4. Subsequent state selections may be based on data collected during control of the fluid transport system by a current self-learning control process.

    [0093] Generally, the data-driven state selection may identify the input signals containing the most relevant information for the self-learning control process. Using only a subset of input signals results in a faster learning rate of the self-learning control process. The state selection method may be computationally expensive and may advantageously be performed by a remote data processing system, e.g. by a cloud computing environment, with access to the pool of available input signals x, e.g. as described in connection with FIG. 2. For example, the state selection method applies a mutual information method, e.g. as described in connection with FIG. 6.

    [0094] FIG. 4 schematically illustrates a process for controlling a fluid transport system.

    [0095] In initial steps S1 and S2, during a first period of time, the process controls the fluid transport system by applying a predetermined control process, in particular a non-adaptive control process. The predetermined control process may e.g. by implemented as a conventional, commercially available controller, e.g. a PLC, known as such in the art. The process samples and records a plurality of input signals while the fluid transport system is controlled by the predetermined control process. The process further samples and records performance indicators during the first period, the performance indicators being indicative of the performance of the fluid transport system under the control of the predetermined control process. During the data collection, in step S1, training data may be gathered for the state selection. The training data may be collected during a predetermined period of time t.sub.s. The training data for the state selection includes a set of input signals x.sub.s, the actions a.sub.s taken by the predetermined control process during period t.sub.s, and the performance indicators, also referred to as rewards r.sub.s, recorded during period t.sub.s Here, the subscript s refers to the state selection.

    [0096] Additionally, in step S2, further data may be gathered for use as validation data, in particular for defining a stopping criteria. The validation data may be collected during a predetermined period of time t.sub.v. The validation data for the state selection includes a set of input signals x.sub.v, the actions a.sub.v taken by the predetermined control process during period t.sub.v, and the rewards r.sub.v recorded during period t.sub.v. Here, the subscript v refers to the validation data.

    [0097] During step S1 and/or step S2, the process may record flow values q, which may serve as input signals and/or be used for the purpose of performing flow correction.

    [0098] The duration of the first period of time t.sub.1=t.sub.s+t.sub.v and the periods i.sub.s and t.sub.v may be pre-determined. The choice of a suitable duration may depend on various parameters, such as the complexity of the system to be controlled, the number of input signals, the frequency with which input signals are sampled, etc.

    [0099] Some embodiments of the process described herein apply a compensation of flow-dependent effects, in particular a flow-dependent selection of input signals and/or a flow-dependent weighting of rewards by the self-learning control process. To this end, in optional step S3, the process selects a constant φ for use in the compensation of flow-dependent effects, in particular for calculating a flow variable trace decay. Given the selected constant φ, the flow variable trace decay λ may be calculated as

    [00003] λ ( q η ) = ϕ q η ( t ) .

    where q.sub.η(t) represents a nomalized fluid flow in the fluid transport system at time t. The constant φ may be calculated from a normalized lumped volume v.sub.η:ϕ=h(v.sub.η); it may be determined based on domain knowledge about the fluid transport system. For example, the lumped volume coefficient v.sub.η may be based on the delay between two variables that results in a maximum mutual information between the variable, or otherwise on a parameters that indicates a general transport delay in the system. In particular, with regard to an HVAC or district heating system, the two variables may be a supply temperature and a return temperature, the latter being a function of the supply temperature at an earlier instant.

    [0100] In subsequent step S4, when flow compensation is applied for the purpose of state selection, the return G.sub.t or other performance measure is calculated using the decay rate λ determined in step S3.

    [0101] In subsequent step S5, the process selects the subset of input signals for use by a self-learning control process during subsequent control of the fluid transport system by said self-learning control process. This selection process is also referred to as state selection. The state selection preferably uses a mutual information criterion or another suitable selection criterion, e.g. another information-theoretic selection criterion. In particular, the mutual information selection criterion may base the selection on a mutual information between the respective input signals and the calculated return or other suitable performance measure. To this end, the process may apply the flow-compensated return G.sub.t calculated in step S4. An example of a state selection process will be described in greater detail below and with reference to FIG. 6.

    [0102] In step S6, the process configures an initial version of the self-learning control process, based on the selected input signals. In particular, the process pre-trains the self-learning control process based on the training data collected during steps S1 and/or the data collected during step S2. This pre-training may be performed using the self-learning scheme that is subsequently applied during actual use of the self-learning control process, e.g. as described below with reference to FIG. 5. However, the pre-training is performed “off-policy”, since the actions used during pre-training are the ones performed by the predetermined control process rather than by the self-learning control process that is being pre-trained. The pre-training is further based on the, optionally flow-compensated, return G.sub.t.

    [0103] Finally, in step S7, the process controls the fluid transport system using the pre-trained self-learning control process. Control of the fluid transport system by the self-learning control process includes updating the self-learning control process according to a suitable self-learning scheme, e.g. reinforcement learning. An example of this process will be described in more detail below with reference to FIG. 5. Control of the fluid transport system by the self-learning control process may proceed during a second period of time, e.g. until it is stopped. For example, at predetermined intervals and/or triggered by a suitable trigger event, such as by a user command, the process may determine (step S8) whether a renewed state selection should be performed. This determination may be based on the time elapsed since the previous state selection and/or on one or more performance indicators and/or based on user commands. If the process initiates a renewed state selection, the process returns to step S5. The renewed state selection may then be based on data logged during step S7, i.e. during control of the fluid transport system by the current self-learning control process. If the renewed state selection results in a selection of alternative and/or additional input signals, a new self-learning control process may be configured and pre-trained and then replace the current self-learning control process.

    [0104] Since the state selection is potentially computationally expensive it may be preferable to perform the state selection by a cloud computing environment or other remote data processing system. Moreover, state selection is preferably only run as often as the controlled system has experienced a large change whereby other signals are more feasible to use for the learning agent. A large change might for example be due to a structural change in the controlled system, new sensors or a radical load change.

    [0105] If the learning agent uses only a few input signals, the self-learning control process may be sufficiently computationally inexpensive for it to be implemented on a local computing device of a heating system, e.g. of an HVAC or district heating system or even by a control unit of a component of the heating system, such as by a smart valve or a centrifugal pump. Nevertheless, in some embodiments, the self-learning control process may be implemented by a cloud computing environment or other remote data processing system.

    [0106] Generally, during a first period, the process controls the fluid transport system with a predetermined, e.g. conventional, control process, and the process may sample respective time series of the plurality of input signals. In a transition period, the process may select, among the plurality of input signal, a subset of input signals that give the most information compared to the performance measure. Also in the transition period, the process may pre-train a self-learning control process with the selected input signals. In a subsequent second period, the process causes the self-learning control process to operate on live, selected input signals and, in this second period, the self-learning control process controls the fluid transport system and continues to optimize itself, in particular by adapting a parametrised estimate of the performance indicator function.

    [0107] A specific example of the process of FIG. 4, e.g. when applied to a mixing loop of a heating system, may be summarized as follows:

    TABLE-US-00001 Result: Plug and Play Control Scheme Initialize: Commercial Controller Parameters: m.sub.s .Math. m.sub.v repeat  | Commercial mixing loop control  | Log input variables x.sub.s, actions a.sub.s  | and rewards r.sub.s until Runtime = t: t+m.sub.s; repeat  | Commercial mixing loop control  | Log input variables x.sub.v, actions a.sub.v  | and rewards r.sub.v until Runtime = t + m.sub.s: t + m.sub.v; [00004] Determine υ * as max υ l ( T s , t - υ / q : T - υ / q ; T r , t : T ) Determine ϕ from υ* Use ϕ to compute G.sub.t:T.sup.λ(q) from logged data Do state selection as in FIG. 6 Pretrain RL agent with selected states off-policy  Q.sub.ϕ(0, λ) using data sets  [s.sub.t:t+m.sub.s.sub.+m.sub.v, a.sub.t:t+m.sub.s.sub.+m.sub.v, G.sub.t:t+m.sub.s.sub.+mv.sup.λ(q)] repeat  | RL Q.sub.ϕ,(σ, λ) mixing loop control as in FIG. 5 until Runtime = ∞;

    [0108] Here and in the following, when the notation ‘is used, it refers to the causality between values of the same parameter. An example is s being a state vector at some iteration/step and s’ being the state vector in the following iteration/step.

    [0109] FIG. 5 schematically illustrates a process for controlling a fluid transport system by a self-learning control process. In an initialization step S71, the process loads initial weights w, in particular the weights resulting from pre-training the model. If no pre-training had been performed, or for the purpose of pre-training, the weights may be initialized in another suitable manner, e.g. to random values or to zero. The process further loads an initial trace vector z, which may also result from the pre-training. The process further observes the current state s.sub.t of the fluid transport system, e.g. receives current values of the subset of input signals selected during state selection. State vector s may describe the state of a controlled system adhering to the Markov property. When sufficient input signals have been observed, e.g. considering delayed input signals as determined during state selection, the process computes an action a according the current control policy. The action vector a describes what action the learning agent controls the system with, i.e. which control variables to modify and how.

    [0110] The fluid transport system can be moved to a different state by the action. Being in a state yields a reward r. The self-learning control process seeks to maximize a weighted sum of rewards over a time horizon. The weighted sum is called the return G.sub.t. The learning agent holds a state-action-value function. The state-action-value function describes how well the system is expected to perform in relation to the return given a state, action and a control policy. The state-action values are approximated via a performance approximator function of the form {circumflex over (Q)}={circumflex over (Q)}(s, a, w), such as e.g. a neural network. Here, {circumflex over (Q)} is the performance approximator function value approximating the state-action value Q for a given state s and action a. The performance approximator function depends on a set of weights w, e.g. according to

    [00005] Q ^ ( s , a , w ) = .Math. i = 1 d w i b i ( s , a )

    where b.sub.i is a suitable basis function such as the radial basis function.

    [0111] The performance approximator function is continuously improved over time to better match the system. This is done by the learning agent by measuring states and rewards while taking different actions and updating the weights according to a suitable backup function. The backup function utilizes the temporal difference error (δ) which describes the difference between the current knowledge of the state action-value space and newly acquired knowledge that forms a target that the function should be moved towards.

    [0112] To ensure exploration and to facilitate updating the state-action-value function with new information, the agent may take explorative actions that, by current knowledge of the agent, are suboptimal. How much exploration that the agent does versus exploiting the current knowledge to optimize the system is determined by a control policy.

    [0113] The trace vector z with a decay rate of λ is used to determine how fast the influence of historic rewards onto the return decays. For a higher λ it follows that the influence of the trace of rewards decays faster.

    [0114] In particular, once initialized, the process enters a control and updating loop, which is repeated until the process is terminated. In particular, in step S72, the process observes the reward r.sub.t and states s.sub.t+1 resulting from the previous action taken.

    [0115] In step S73, the process computes action a.sub.t+1 according to a selected control policy, e.g. based on an ε-greedy policy.

    [0116] In step S74, the process calculates the basis vector b.sub.t+1 from the observed states s.sub.t+1 and the chosen action a.sub.t+1 via a basis function, e.g. a radial basis function.

    [0117] In step S75, the process calculates the temporal difference error δ.

    [0118] The selection of temporal difference error may differ from pre-training (which is off policy) and online training (which is on policy).

    [0119] For example, during on-policy training, the temporal difference error may be selected as


    δ.sub.t.sup.S=r.sub.t+1+δQ(s.sub.t+1,a.sub.t+1)−Q(s.sub.t,a.sub.t).

    [0120] In off-policy methods, the behavior policy being used by the agent is different from the target policy being learned. For example, Q-learning is off policy since the target policy is the optimal policy as seen by the bootstrapping using the maximizing action

    [00006] δ t Q = r t + 1 + γ max a Q ( s t + 1 , a ) - Q ( s t , a t ) .

    [0121] During pre-training, i.e. in step S3 of the process of FIG. 4, data obtained during a first period, where a predetermined control process is used to control the fluid transport system, is used for an initial training of the self-learning control process before the self-learning control process takes over control, i.e. the pre-training occurs in a transitional period before the second period. Accordingly, the pre-training is off policy. To achieve knowledge sharing, a temporal difference error may be based on a parametrization, which provides a way of shifting between on policy and off policy.


    δ.sub.t.sup.σ=σδ.sub.t.sup.s+(1−σ)δ.sub.t.sup.Q.

    [0122] By setting σ=0 and letting the reinforcement learning algorithm train on data logged in the first period, a transfer of knowledge can be achieved.

    [0123] In step S76, the process observes the flow q and calculates q.sub.n normalized by q.sub.max. From this, the process calculates the flow variable trace decay λ and then updates the trace vector z. In particular, to implement the reinforcement learning as a multistep method, a dutch trace may be used. The flow dependent trace decay changes the horizon of when actions impact the reward. Some embodiments apply a trace decay that is proportional to the flow. In this and other embodiments of the present method, a lumped pipe volume approximation is used. This means that only the volume where the impact on the input-output delay is highest is used. The trace decay may, at every sample, be computed as

    [00007] λ ( q η ) = ϕ q η ( t ) ,

    where q.sub.η(t) ∈[q.sub.η,min, 1] is the flow normalized by the maximum flow and where ϕ∈[0,1] is a constant that may empirically be determined as a function of lumped volume, e.g. in the relation between forward temperature and a return temperature:


    ϕ=h(v.sub.η).

    [0124] A normalization with respect to the maximum flow of the system may be done on the flow and the lumped volume

    [00008] v η = v q max , q η ( t ) = q ( t ) q max .

    [0125] A description of the lumped volume v is given here in the context of a mixing loop. Considering an example of a system with no terminal units and only pipe connections between supply and return of the mixing loop, the return temperature is a function of the forward temperature acting at different delays due to different pipe routes:

    [00009] T r ( t ) = h ( T s , q ) where T s = [ T s ( t - V 1 q 1 ) , .Math. , T s ( t - V N q N ) ] T q = [ q 1 , .Math. , q N ] T .

    [0126] The individual flows in the different pipe routes are not always known; For example, in some applications of a mixing loop, only the total flow leaving the mixing loop is known. Therefore, a flow ratio β may be introduced where the sum of flow ratios for p pipe routes is


    Σ.sub.N=1.sup.pβ.sub.N=1


    q.sub.N(t)=β.sub.Nq(t).

    [0127] The ratio of flow β may be assumed constant due to only the main flow being known. The terminal units are controlled by regulating valves, which can change how the flow ratios are distributed. Changes to outside temperature might change little in ratios due to affecting all zones, where solar radiation only hitting one side of a building might change the ratios more and make the approximation of the assumption less accurate depending on the specific building. Now v.sub.N may be defined as

    [00010] v N = V N β N .

    [0128] Applying this to the above example gives

    [00011] T s = [ T s ( t - v 1 q ) , .Math. , T s ( t - v N q ) ] T .

    [0129] Therefore a minimum flow threshold may be used.

    [0130] In step S77, the process updates the weights w of the performance approximator function. The process stores the basis vector b.sub.t+1 and the state-action value Q for the next iteration.

    [0131] In step S78, the process executes the action a.sub.t+1 and returns to step S72.

    [0132] It will be appreciated that different embodiments may use different types of self-learning control processes, e.g. different types of backup functions and/or different types of temporal difference measures.

    [0133] A specific implementation of the process of FIG. 5, e.g. when applied to a mixing loop, may be summarized as follows:

    TABLE-US-00002 Result: Online Q.sub.ϕ(σ, λ) Initialize: Weights w, trace vector z. Take action 2′  according to ε-greedy π(.Math.|s.sub.0). Calculate feature state  b = b(s.sub.0, a′) .Math. custom-character .sub.old = 0 Parameters: ε, α, γ, ϕ, σ repeat every sample | Observe r and s′ |Choose a′ according to ε-greedy π | b′ ← b (s′, a′) | Q ← w.sup.Tb | Q.sub.S′ ← w.sup.Tb′ | Q.sub.Q′ ← max (w.sup.Tb(s′, a′)) | a′ | δ.sup.σ ←σ(r + γQ.sub.s′-Q) + (1-σ)(r + γQ.sub.Q′-Q) | Observe flow q | if q.sub.max ≤ q then | |q.sub.n ← 1 | else if q ≤ q.sub.min then | | q.sub.n ← q.sub.min/q.sub.max | else | | q.sub.n ← q/q.sub.max | end | [00012] λ ϕ q n | z ← γλz + (1-αγλz.sup.Tb)b | w ← w + α(δ.sup.σ + Q-Q.sub.old)z-α(Q-Q.sub.old)b | Q.sub.old ← σQ.sub.S′ + (1-σ)Q.sub.Q′ | b ← b′ | Take action a′ until Mixing Loop Stop;

    [0134] Interesting aspects of the embodiments described herein include the approximation of the state-action space by a performance approximator function. To ensure stability, methods are preferred that guarantee convergence, e.g. by employing performance approximator functions that are linear in the weights. Multistep methods implemented by eligibility trace have shown good performance.

    [0135] The above and other embodiments employ a flow dependent eligibility trace decay. To compensate for the flow variable transport delays of the fluid transport system a flow variable trace decay is used. With this compensation lower flow leads to a slower decay, which in turn increases the influence of returns at larger delays. A lumped volume parameter is used to determine the decay rate, A. The lumped volume may be found by analyzing correlation between supply and return temperature yielding information about the nature of the delays of the system. An example of how a flow variable eligibility trace may be applied is described in Overgaard, A., Nielsen, B. K., Kallesoe, C. S., & Bendtsen, J. D. (2019). Reinforcement Learning for Mixing Loop Control with Flow Variable Eligibility Trace. In CCTA 2019-3rd IEEE Conference on Control Technology and Applications. https://doi.org/10.1109/CCTA.2019.8920398.

    [0136] The reward function used in the above embodiment is a weighted sum of user comfort and cost. User comfort may be indicated by one or more variables related to comfort in a building. Examples of indicators of user comfort include an average of temperature and/or humidity errors for the different zones in the building heated by a heating system. The cost may be a measure of the cost of heat power and actuator energy consumption. The reward function may be constructed such that, during setback periods, which may be user-defined, the comfort measurement is removed from the reward function and replaced by a soft boundary at a specified temperature. This soft boundary temperature determines how low the zone temperatures are allowed to go during setback periods. Since the learning agent optimizes the reward over time (the return) it will learn how to perform optimal setback with cost effective reheating depending on the cost structure of the heat source.

    [0137] FIG. 6 schematically illustrates a process for selecting input signals for a self-learning control process. The selection of input signals, also referred to as state selection, is done such that the dimension of the state space of the learning agent is reduced to improve training speed. Dimension reduction can be done in a multitude of ways such as Principle Component Analysis or Pruning.

    [0138] For the reinforcement agent or another self-learning control process to be able to learn it needs to be able to predict the future return from the states and actions. This means that the states should hold enough information for the process to be able to make a reasonable prediction of the return. For building heating and cooling via e.g. a mixing loop, this is dependent on the specific building in which the mixing loop is installed. One building may have large windows where an observation of solar radiation gives information of free heat. Another building might be poorly insulated and leaky, where observations of wind speeds has more information. It can be argued that if all available inputs are fed into the Reinforcement Learning agent it would still converge if the needed information is available. However, using input variables that hold no information or even redundant information would decrease the learning rate of the algorithm due to the curse of dimensionality; whereas the dimension of the input set rises linearly, the total volume of the model domain increases exponentially. In the present disclosure, the inventors propose a data driven state selection such that they may be chosen according to the specific building, but without need of expert knowledge about the specific building.

    [0139] To this end, the problem of automatic state selection is here handled as a prediction problem where the process determines a set of input signals that carry information suitable for predict the future return.

    [0140] The reinforcement learning method that is applied in the above embodiment uses the action-value function where a prediction of the expected return is done as a function of the state that the system is in and the action that is taken. The actions space includes the controllable variables controllable by the control process. For example, in the context of a mixing loop this includes the pump speed and the forward temperature.

    [0141] Since these give information towards the prediction of the return, this information may be removed from the prediction target before choosing the input signals.

    [0142] The present embodiment and other embodiments described herein employ variable selection via mutual information. Interesting features of this method include the ability to handle nonlinear correlations, being model free and being a filter method in the sense that whole inputs are removed via a filter. Further details of mutual information may be found in Overgaard, C. S. Kallesoe, J. D. Bendtsen, and B. K. Nielsen, “Input selection for return temperature estimation in mixing loops using partial mutual information with flow variable delay,” in 1st Annual IEEE Conference on Control Technology and Applications, CCTA 2017, vol. 2017-January, 2017, pp. 1372-1377. The above article describes application of a mutual information to the estimation of a return temperature of a heating system. Embodiments of the methods described in the present disclosure apply the mutual information criteria to the selection of input signals of a self-learning control process for controlling a fluid transport system. In particular, in some embodiments of the method disclosed herein, mutual information is applied to determine whether an input signal has information about a future return or other suitable performance measure.

    [0143] The process may be performed in an iterative manner: After having found the first input signal containing highest mutual information of the return, a second input signal is sought that gives highest mutual information after the first input signal is already given. The information already given by the first input signal is removed by making an estimation of the return using only the chosen input signal and then subtracting the estimate from the return. To this end, an estimation may be made using a function approximator based on the observed states and performance measure. An example of such function approximator includes a neural network [f(s.sub.t:t+k,w)≅G.sub.t:t+k] where the weights w are tuned to minimize the absolute error between the predictor output and the observed performance measure

    [00013] min w .Math. "\[LeftBracketingBar]" G t : t + k - f ( s t : t + k , w ) .Math. "\[RightBracketingBar]" .

    Likewise, an estimate of the input signals using only the chosen input is also made and subtracted from the remaining input signals, thus leaving a set of residuals.

    [0144] Specifically referring to FIG. 6, in step S51, the process loads a training data set and a validation data set, in particular the training and validation data sets collected in steps S1 and S2, respectively, of FIG. 4. Each data set contains a complete set of input signals x for the system. One of the input signals is also a volume flow variable q.sub.t and a return G.sub.t. The return can be calculated via rewards that describe how well the system is controlled.

    [0145] In step S52, the process chooses an input signal x.sup.i to analyse from the set of input signals x. A loop runs through all input signals in the set.

    [0146] In steps S53 through S55, the process calculates mutual information of an input signal.

    [0147] In particular, in step S53, the process determines a volume constant v for every state that gives a time offsetting as:

    [00014] t o , t = v q t

    such that the time offset maximizes the mutual information with regards to the return. The improvement from using flow dependent time shift delays is due to transport delays in the system. Minimum offset would typically be zero (given by volume constant v.sub.min=0) for signals without transport delay. The mutual information value is maximized over the volume constant interval (v.sub.min to v.sub.max) for a given signal.

    [0148] In step S54, the process computes mutual information I between the offset signal vector x.sup.i and the return G.sub.t in the training data set.

    [0149] Mutual information between two variables may be defined as

    [00015] I ( ; .Math. ) = p ( , .Math. ) log ( p ( ) p ( .Math. ) p ( , .Math. ) ) d d .Math.

    [0150] The process may compute an approximation of the mutual information. In particular, a discrete approximation of the mutual information may be calculated that is based on estimates of the marginal and joint probability density functions of the input signal and return, using a number m of samples of the input signals and the return, respectively.

    [0151] In step S55, the process checks whether the v between v.sub.min and v.sub.max, which yields the highest mutual information, has been found. If so, the process proceeds at step S56; otherwise the process returns to step S53.

    [0152] In step S56, the process updates a sorted index vector to keep track of how much mutual information the selected input signal contains compared to other already investigated input signals. In particular, the process stores the i'th index in a vector sorted according to maximum mutual information.

    [0153] In step S57, the process determines whether all input signals have been analyzed. If so, the process proceeds at step S58; otherwise the process returns to step S52, i.e. steps S52 to S57 are repeated until a signal index vector has been sorted according to mutual information level of the signals with the corresponding volume constant v for all signals.

    [0154] In step S58, the process adds the signal with now highest mutual information s to a signal vectors.

    [0155] In step S59, the process computes an estimate of the return G.sub.t using the signals in s from the validation dataset.

    [0156] In step S510, the process checks if a stopping criterion is fulfilled. If not, the process proceeds at step S511 and calculates a new training data set, which does not contain the information from the added signal. This calculation is performed in steps S511 to S514 and it is followed by a repetition of steps S52 to S57 until a sorted index vector for the new data set is obtained.

    [0157] The present method uses a partial mutual information in an iterative manner, namely by choosing the input variable with most information, then removing that information from the prediction target, thus leaving a new residual prediction target. Then the input giving most information about the residual prediction target is found and so forth until stopped or all input variables are sorted.

    [0158] To this end, the process may compare mutual information of the respective input variables at respective time shift delays, in particular at time shift delays resulting in a highest mutual information for the respective variables.

    [0159] In particular, in step S511, the process generates an estimated return G.sub.t based on s and generate an estimated state vector {circumflex over (x)}.sub.t based on s.

    [0160] In step S512, the process calculates a new state vector by subtracting the estimated state vector from the new state vector: x.sub.t,j+1=x.sub.t,j−{circumflex over (x)}.sub.t,j.

    [0161] In step S513, the process calculates a new return by subtracting the estimated return from the previous return: G.sub.t,j+1=G.sub.t,j−Ĝ.sub.t,j.

    [0162] In step S514, the process sets G.sub.t=G.sub.t,j+1 and x.sub.t=x.sub.t,j+1 as new training set and returns to step S52.

    [0163] If the stopping criterion of step S510 is fulfilled, the process proceeds at step S55 and selects the state vector as s. Then the state selection process is completed.

    [0164] An example of a suitable stopping criterion to be applied in step S510 may be based on a white noise comparison, i.e. a determination as to whether the signal with the now highest mutual information adds more to the description of the return than white noise. If this is not the case, the signal does not contain information about the return and the state selection can be stopped.

    [0165] Another possible stopping criterion may be based on an RMSE (Root Mean Square Error) improvement, i.e. on a determination as to whether the RMSE of the estimation of the return G.sub.t in the validation data set improves above a certain level when adding the signal with the now highest mutual information. If this is not the case, the process may be terminated.

    [0166] A specific implementation of the process of FIG. 6, e.g. when applied to a mixing loop, may be summarized as follows:

    TABLE-US-00003 Result: State Selection Initialize: Load training data of all inputs x text missing or illegible when filed return G.sup.λ(q) text missing or illegible when filed   and flow q text missing or illegible when filed  . Load validation data of n inputs  text missing or illegible when filed  , return G.sup.λ(q) text missing or illegible when filed   and flow q text missing or illegible when filed   Parameters: tol repeat | Find input with highest mutual information as | | [00016] s ? max j , v I ( x j ? ; G λ ( q ) ? ) | Add s to set of selected inputs s | Generate estimators custom-character [G.sup.λ(q) text missing or illegible when filed  |s text missing or illegible when filed  ] | and custom-character [x text missing or illegible when filed  |s text missing or illegible when filed  ] | Calculate residuals as | G.sup.λ(q) text missing or illegible when filed   ← G.sup.λ(q) text missing or illegible when filed   − custom-character [G.sup.λ(q) text missing or illegible when filed  |s text missing or illegible when filed  ] | x text missing or illegible when filed   ← x text missing or illegible when filed   − custom-character [x text missing or illegible when filed  |s text missing or illegible when filed  ] | | | [00017] RMSE Σ t = 1 m v ( G t : t + m v λ ( q ) - 𝔼 [ G t : t + m v λ ( q ) | s t : t + m v ] ) 2 m v | RMSE.sub.prev ← RMSE [00018] until tol > RMSE prev - RMSE RMSE prev ; text missing or illegible when filed indicates data missing or illegible when filed

    [0167] While embodiments of the various aspects disclosed herein have mainly been described in the context of heating systems for buildings, it It will be appreciated that embodiments of the method and system described herein may also be applied for the control of other types of fluid transport systems.

    [0168] For example, embodiments of the method and system described herein may also be applied for the control of water supply systems. In the context of water supply systems, examples of suitable input signals may include one or more of the following: [0169] Flow and pressure measured at respective pump stations, including controllable flow and/or pressure and/or flow/pressure that cannot directly be controlled, e.g. at non-controlled pump stations. [0170] Water levels in respective tanks or other reservoirs. [0171] Weather forecast data, e.g. regarding precipitation and/or temperature. [0172] Data from pressure sensors within a network. [0173] Data from meters that measure water consumption at one or more consumers.

    [0174] Control variables controllable by the control process may include flow and/or pressure at one or more pump stations of a water supply system.

    [0175] Similarly, embodiments of the method and system described herein may also be applied for the control of wastewater systems. In the context of wastewater systems, examples of suitable input signals may include one or more of the following: [0176] Flow and pressure measured at respective pump stations, including controllable flow and/or pressure and/or flow/pressure that cannot directly be controlled, e.g. at non-controlled pump stations. [0177] Water levels in gravity-based wastewater conduits and/or in reservoirs. [0178] Weather forecast data, e.g. regarding precipitation. [0179] Data regarding water consumption [0180] Waste-water production data from sources of wastewater, e.g. from one or more large industrial sources of wastewater.

    [0181] Control variables controllable by the control process may include flow and/or pressure at one or more pump stations of a wastewater system and/or set points of levels at one or more pump stations.

    [0182] Embodiments of the method described herein can be implemented by means of hardware comprising several distinct elements, and/or at least in part by means of a suitably programmed microprocessor. In the apparatus claims enumerating several means, several of these means can be embodied by one and the same element, component or item of hardware. The mere fact that certain measures are recited in mutually different dependent claims or described in different embodiments does not indicate that a combination of these measures cannot be used to advantage.

    [0183] It should be emphasized that the term “comprises/comprising” when used in this specification is taken to specify the presence of stated features, elements, steps or components but does not preclude the presence or addition of one or more other features, elements, steps, components or groups thereof.