SENSOR CONTROL SYSTEM FOR CONTROLLING A SENSOR NETWORK
20230213920 · 2023-07-06
Assignee
Inventors
Cpc classification
G06N7/01
PHYSICS
G06N3/006
PHYSICS
G05B19/4183
PHYSICS
International classification
Abstract
A sensor control system (202) for managing at least a first set of one or more sensors (101) for monitoring a first domain of an industrial process and a second set of one or more sensors (102) for monitoring a second domain of the industrial process, wherein the sensor control system (202) comprises at least a first reinforcement learning, RL, agent (A1) and a second RL agent (A2), wherein the first and second RL agents were trained using reinforcement learning and a process graph (196) representing the industrial process.
Claims
1. A method performed by a sensor control system for managing at least a first set of one or more sensors for monitoring a first domain of an industrial process and a second set of one or more sensors for monitoring a second domain of the industrial process, wherein the sensor control system comprises at least a first reinforcement learning (RL) agent and a second RL agent, wherein the first and second RL agents were trained using reinforcement learning and a process graph representing the industrial process, the method comprising: the sensor control system receiving sensor data from the first set of one or more sensors; the sensor control system using the received sensor data and the process graph to decide whether or not to reconfigure the first set of sensors and/or the second set of sensors; and the sensor control system providing configuration information to the first set of sensors and/or the second set of sensors as a result of the sensor control system deciding to reconfigure the first set of sensors and/or the second set of sensors.
2. The method of claim 1, further comprising: training the first RL agent using: i) the process graph, ii) sensor data, and iii) communication capacity information; and training the second RL agent using: i) the process graph, ii) the sensor data, and iii) the communication capacity information.
3. The method of claim 2, wherein training the first and second RL agents comprises: performing a first training phase where the first RL agent is trained to optimize a first local optimization function and the second RL agent is trained to optimize a second local optimization function; and performing a second training phase where the first and second RL agents are trained to optimize a predefined weighted sum of local objective functions.
4. The method of claim 1, wherein the first and second domains are defined based on sensor locations.
5. The method of claim 1, wherein the first and second domains are defined based on functional similarities.
6. The method of claim 1, wherein providing the configuration information to the first set of sensors and/or the second set of sensors comprises transmitting the configuration information to a sensor gateway that is configured to relay the configuration information to the first set of sensors and/or the second set of sensors.
7. The method of claim 1, wherein the first set of sensors are configured to monitor a first workstation, and the second set of sensors are configured to monitor a second workstation.
8. The method of claim 7, wherein receiving sensor data from the first set of one or more sensors comprises the first RL agent receiving the sensor data from the first set of sensors.
9. The method of claim 8, wherein using the received sensor data and the process graph to decide whether or not to reconfigure the first set of sensors and/or the second set of sensors comprises: the first RL agent detecting an anomaly with respect to the first workstation based on the received sensor data; and the first RL agent, as a result of detecting the anomaly with respect to the first workstation, obtaining information about a third workstation that is monitored by a third set of sensors.
10. The method of claim 9, further comprising the first RL agent using the obtained information about the third workstation to decide whether or not to reconfigure the second set of sensors.
11. A sensor control system for managing at least a first set of one or more sensors for monitoring a first domain of an industrial process and a second set of one or more sensors for monitoring a second domain of the industrial process, the sensor control system comprising: a first reinforcement learning, (RL) agent; and a second RL agent, wherein the first and second RL agents were trained using reinforcement learning and a process graph representing the industrial process, and the sensor control system is operable to: i) receive sensor data from the first set of one or more sensors; ii) use the received sensor data and the process graph to decide whether or not to reconfigure the first set of sensors and/or the second set of sensors; and iii) provide configuration information to the first set of sensors and/or the second set of sensors as a result of deciding to reconfigure the first set of sensors and/or the second set of sensors.
12. (canceled)
13. A non-transitory computer readable storage medium storing a computer program comprising instructions which when executed by processing circuitry of a sensor control system causes the sensor control system to perform the method of claim 1.
14. (canceled)
15. (canceled)
16. A sensor control system for managing at least a first set of one or more sensors for monitoring a first domain of an industrial process and a second set of one or more sensors for monitoring a second domain of the industrial process, the sensor control system comprising: a receiver for receiving sensor data from a first set of one or more sensors; processing circuitry; and a memory, the memory containing instructions executable by the processing circuitry, wherein the sensor control system is configured to: use the received sensor data and a process graph representing the industrial process to decide whether or not to reconfigure the first set of sensors and/or a second set of sensors; and provide configuration information to the first set of sensors and/or the second set of sensors as a result of the sensor control system deciding to reconfigure the first set of sensors and/or the second set of sensors.
17. The sensor control system claim 16, further comprising: training a first reinforcement learning (RL) agent using: i) the process graph, ii) sensor data, and iii) communication capacity information; and training a second RL agent using: i) the process graph, ii) the sensor data, and iii) the communication capacity information.
18. The sensor control system claim 17, wherein training the first and second RL agents comprises: performing a first training phase where the first RL agent is trained to optimize a first local optimization function and the second RL agent is trained to optimize a second local optimization function; and performing a second training phase where the first and second RL agents are trained to optimize a predefined weighted sum of local objective functions.
19. The sensor control system claim 17, wherein the first set of sensors are configured to monitor a first workstation, the second set of sensors are configured to monitor a second workstation, receiving sensor data from the first set of one or more sensors comprises the first RL agent receiving the sensor data from the first set of sensors, using the received sensor data and the process graph to decide whether or not to reconfigure the first set of sensors and/or the second set of sensors comprises: the first RL agent detecting an anomaly with respect to the first workstation based on the received sensor data; and the first RL agent, as a result of detecting the anomaly with respect to the first workstation, obtaining information about a third workstation that is monitored by a third set of sensors, and the first RL agent is configured to use the obtained information about the third workstation to decide whether or not to reconfigure the second set of sensors.
20. The sensor control system claim 16, wherein the first and second domains are defined based on sensor locations.
21. The sensor control system claim 16, wherein the first and second domains are defined based on functional similarities.
22. The sensor control system claim 16, wherein providing the configuration information to the first set of sensors and/or the second set of sensors comprises transmitting the configuration information to a sensor gateway that is configured to relay the configuration information to the first set of sensors and/or the second set of sensors.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
DETAILED DESCRIPTION
[0022]
[0023] A process mining module 194 is configured to mine the event logs 192 to produce a process graph 196 that represents the industrial process. For example, the process graph 196 identifies nodes and links describing the production flow from the stage of incoming assembly elements to the stage of ready to be shipped assembled products. For instance, from the sensor measurements of the physical system elements (equipment, materials, human workers, and environmental conditions) together with the event logs (e.g., business related events and measures) of the site management platform 190, process mining module 194 can discover a process flow. A network or graph representation (i.e., the process graph) can be created with stations and tasks as nodes, and logical and temporal ordering connections as directed links. These links can be weighted by, e.g., various performance-related measures of time, cost, quality, flux, etc.
[0024]
[0025] Sensor control system 202 is an Artificial Intelligence (AI) system that comprises one or more agents (e.g., RL agents, discussed below) that through reinforcement learning can optimize the configuration of the sensor network, or at least a portion thereof. That is, to optimize the efficiency and handle high-level goal definition with easy prioritization, sensor control system 202 includes a set of one or more agents 240 that are trained to make automatic decisions of sensor control tasks based on sensor output (e.g., state reports). In one embodiment, sensor control system 202 is configured for optimal monitoring within constraints, e.g., to minimize the communicated data load by filtering irrelevant information and saving radio capacity by reconfiguring the sensors to only provide reports when necessary, while at the same time keeping the key performance metrics at sufficiently high level. This is increasingly significant in the scenario of hyperscale Industrial-IoT sensor networks.
[0026]
[0027] Agent Training—Reinforcement Learning
[0028] Reinforcement Learning (RL) is a rapidly evolving AI technology that enables an RL agent to initiate real-time adjustments to a system, while continuously training the RL agent using a feedback loop. The skilled person will be familiar with RL and RL agents, nevertheless the following provides a brief introduction to RL agents.
[0029] Reinforcement learning is a type of machine learning process whereby an RL agent (e.g., a programmed computer) is used to select an action to be performed based on information indicating a current state of a system (or part of the system). For example, based on current state information obtained from the system and an objective, the RL agent can initiate an action (e.g., trigger a sensor to make measurements and send a report) to be performed, which may, for example, comprise adjusting the system towards an optimal or preferred state of the system. The RL agent receives a “reward” based on whether the action changes the system in compliance with the objective (e.g., towards the preferred state), or against the objective (e.g., further away from the preferred state). The RL agent therefore adjusts parameters in the system with the goal of maximizing the rewards received.
[0030] Use of an RL agent allows decisions to be updated (e.g., through learning and updating a model associated with the RL agent) dynamically as the environment changes, based on previous decisions (or actions) performed by the RL agent. Put more formally, an RL agent receives an observation from the environment (denoted St) and selects an action (denoted At) to maximize the expected future reward. Based on the expected future rewards, a value function for each state can be calculated and an optimal policy that maximizes the long term value function can be derived. Reference [1] describes hierarchical RL for strategic goals.
[0031] Heterogeneous sensor network control using reinforcement learning can be trained by implementing a microscopic representation of the states of the physical system structure to be monitored by the sensors. This task can be implemented by an automatic process mining technique. Using process graphs, local and global representations with corresponding metrics can be created.
[0032] In embodiments of the sensor control system 202, each i.sup.th low-level unit (i.e., an agent, such as a domain agent 302 or a zone agent 303, that is responsible for control decisions of low level sensor units) represented by f.sub.i will contribute to its respective mid-level state and objective function F.sub.k of the k.sup.th agent A.sub.k. The global objective function is then simply calculated by setting importance weights to each of the agent's interest and using their weighted sum as the global or final goal of the two-phase learning process: G=Σ.sub.kw.sub.kF.sub.k.
[0033] In embodiments a higher level and intuitive structure of an industrial site are used to define the RL agents of sensor control system 202. Looking at such an intuitive unit, an agent can be trained to control its respective sensors to serve the local optimization function. In the last phase of the training, a predefined global weighted sum of these local objective functions is used.
[0034] For the continuous training loop of the RL agents of sensor control system 202 a digital twin of a process graph is employed with measured state information for given scenarios and for updating the model when needed. This is illustrated in
[0035]
[0036] As the above demonstrates, an agent-based RL system is applied in a sensor control system for controlling a sensor network (e.g., a hyperscale sensor system). An advantage of the embodiments is the use of definition of agents by domains and the ability to leverage process level feedback in the training of the agents. High-level goal definition for simplified prioritization by operators is ensured through hierarchical learning. Reinforcement learning is made possible by use of automatically discovered process graph representation where efficiency metrics provide feedback during continuous training loops.
Example
[0037]
[0038] In the normal state of the manufacturing processes, sensors 501 and 502 send reports, and sensor 511 is idle (i.e., not sending any reports). When a KPI (e.g., input rate) drops with respect to W2, agent A2 will learn of this event from a report transmitted by sensor 502. For example, the report may indicate that the rate of units arriving at W2 has fallen below some threshold. Using the process graph 196, agent A2 has discovered that W1 is responsible for outputting the production output units, and, hence, agent A2 has learned to co-operate with agent A3, which is receiving reports from sensor 501. Accordingly, agent A2 may seek to determine whether a KPI (e.g., unit output rate) with respect to W1 has fallen below a threshold (e.g., agent A2 may send to agent A3 a request for output rate data for W1). For example, agent A2 may seek to determine whether the rate at which W1 is outputting the units has also fallen below the threshold. If agent A2 determines that the KPI for W1 has fallen below the threshold, then agent A2 may take no action as agent A2 knows from the process graph that the problem of the reduced input rate to W2 is likely not caused by a problem with the link that connects W1 with W2. By taking no action, communication network capacity can be used by other sensors as there is no reason to activate sensor 511.
[0039] On the other hand, if agent A2 determines that the rate at which W1 is outputting the units is normal, then A2 can, based on the process graph which informs agent A2 that AGV 510 is the link connecting W1 with W2, deduce that that there may be a problem on this link (i.e., a problem with AGV 510). As a result of deducing a problem on the link between W1 and W2, agent A2 can take the action of causing agent A3 to activate sensor 511 by causing agent A3 to send to sensor 511 a configuration message. In this way, sensor 511 is activated only when needed, thereby reducing the load on the sensor gateway(s) 180. By activating sensor 511, agent A3 will receive a report from sensor 511. If this report indicates a problem with AGV 510, agent A3 can send a report to the site management platform 190, which can then take a corrective action (e.g., re-routing AGV 510).
[0040] The above scenario provides an example of cooperative strategic decision, where the problem seen at A2 is not solved by A2 sensor reconfiguration action. Even the normal state of minimal manufacturing monitoring without any outages is the result of a cooperative decision of only reporting from A2 and A1 domains, but leaving AGV's idle when possible. This was learnt by a training process during a test of series of combinations of state reports and action consequences.
[0041]
[0042] In some embodiments process 600 further includes the steps of: training the first RL agent using: i) the process graph 196, ii) sensor data, and iii) communication capacity information; and training the second RL agent using: i) the process graph 196, ii) the sensor data, and iii) the communication capacity information. In some embodiments, training the first and second RL agents comprises: i) performing a first training phase where the first RL agent is trained to optimize a first local optimization function and the second RL agent is trained to optimize a second local optimization function; and ii) performing a second training phase where the first and second RL agents are trained to optimize a predefined weighted sum of local objective functions.
[0043] In some embodiments, the first and second domains are defined based on sensor locations, and in other embodiments, first and second domains are defined based on functional similarities.
[0044] In some embodiments, providing the configuration information to the first set of sensors and/or the second set of sensors comprises the sensor control system 202 transmitting the configuration information to a sensor gateway that is configured to relay the configuration information to the first set of sensors and/or the second set of sensors.
[0045] In some embodiments, the first set of sensors are configured to monitor a first workstation (e.g., workstation W2 shown in
[0046]
[0047] While various embodiments are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of this disclosure should not be limited by any of the above-described exemplary embodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
[0048] Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.
REFERENCES
[0049] [1] “OpenAI Five”, 2018, openai.com/five/, blog.openai.com/openai-five/.