METHOD FOR SELF-LEARNING MANUFACTURING SCHEDULING FOR A FLEXIBLE MANUFACTURING SYSTEM BY USING A STATE MATRIX AND DEVICE
20220342398 · 2022-10-27
Inventors
Cpc classification
G06N3/006
PHYSICS
G05B2219/33056
PHYSICS
G05B2219/32301
PHYSICS
G05B2219/31264
PHYSICS
International classification
Abstract
The method for self-learning manufacturing scheduling for a flexible manufacturing system (FMS) with processing entities that are interconnected through handling entities is disclosed. The manufacturing scheduling is learned by a reinforcement learning system on a model of the flexible manufacturing system. The model represents at least the behavior and the decision making of the flexible manufacturing system, and the model is transformed in a state matrix to simulate the state of the flexible manufacturing system. A self-learning system for online scheduling and resource allocation is also provided. The system is trained in a simulation and learns the best decision from a defined set of actions for many every situation within an FMS. A decision may be made in near real-time during a production process and the system finds the optimal way through the FMS for every product using different optimization goals.
Claims
1. A method for self-learning manufacturing scheduling for a flexible manufacturing system used to produce at least one product, wherein the flexible manufacturing system includes processing entities interconnected through handling entities, the method comprising: learning a manufacturing scheduling by a reinforcement learning system on a model of the flexible manufacturing system, wherein the model represents at least a behavior and a decision making of the flexible manufacturing system; and transforming the model in a state matrix to simulate a state of the flexible manufacturing system.
2. The method of claim 1, wherein one state of the state matrix represents one situation in the flexible manufacturing system including the at least one product.
3. The method of claim 1, wherein the flexible manufacturing system comprises a known topology, and the state matrix is generated that corresponds to information from the model, and wherein a position of information in the state matrix is ordered accordingly to a topology of the flexible manufacturing system.
4. The method of claim 3, wherein the information in the state matrix is generated automatically, wherein information of the handling entities is placed in the matrix according to an actual position in the flexible manufacturing system, and wherein information of the processing entities is also placed.
5. The method of claim 3, wherein the information in the state matrix regarding the processing entities contains a representation of processing abilities of the respective entities.
6. The method of claim 3, wherein a body of the state matrix contains an input for every product of the at least one product that is located in the flexible manufacturing system at one point of time waiting in a processing queue for a processing entity.
7. The method of claim 3, wherein a body of the state matrix contains an input for a Job list.
8. The method of claim 3, wherein, for training of the reinforcement learning system, the information contained in the state matrix is used by calculating a next transition state of the state matrix containing all status information about the flexible manufacturing system at one time, that is used as input information for the reinforcement learning system as a basis for choosing a next transition to a next step at a time of the reinforcement learning system based on additionally entered and prioritized optimization criteria regarding the manufacturing process of the at least one product or an efficiency of the flexible manufacturing system.
9. The method of claim 1, wherein, for training of the reinforcement learning system, an initial state of the matrix shows a full Job list and a defined product location, and wherein a termination state is characterized by an empty Job list.
10. A reinforcement learning system for self-learning manufacturing scheduling for a flexible manufacturing system configured to produce at least a product, wherein the flexible manufacturing system comprises processing entities interconnected through handling entities, the reinforcement learning system comprising: a model of the flexible manufacturing system, wherein the model represents at least a behavior and a decision making of the flexible manufacturing system, wherein the model is realized as a state matrix, wherein a manufacturing scheduling is configured to be learned by the reinforcement learning system on the model of the flexible manufacturing system, and wherein the model is configured to be transformed in the state matrix to simulate a state of the flexible manufacturing system.
11. The method of claim 1, wherein information in the state matrix is generated automatically, wherein information of the handling entities is placed in the matrix according to an actual position in the flexible manufacturing system, and wherein information of the processing entities is also placed.
12. The method of claim 1, wherein information in the state matrix regarding the processing entities contains a representation of processing abilities of the respective entities.
13. The method of claim 1, wherein a body of the state matrix contains an input for every product of the at least one product that is located in the flexible manufacturing system at one point of time waiting in a processing queue for a processing entity.
14. The method of claim 13, wherein a respective input for every product is for a respective Job list.
15. The method of claim 1, wherein a body of the state matrix contains an input for a Job list.
16. The method of claim 1, wherein, for training of the reinforcement learning system, wherein information contained in the state matrix is used by calculating a next transition state of the state matrix containing all status information about the flexible manufacturing system at one time, that is used as input information for the reinforcement learning system as a basis for choosing a next transition to a next step at a time of the reinforcement learning system based on additionally entered and prioritized optimization criteria regarding the manufacturing process of the at least one product or an efficiency of the flexible manufacturing system.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] In the following, the disclosure is illustrated in the following embodiments.
[0024]
[0025]
[0026]
DETAILED DESCRIPTION
[0027] In
[0028] On the top right a schematic representation 100 of the real FMS 500 is shown, with all the processing entities M1, . . . , M6 and handling entities C0, . . . , C6. The processing entities have functionalities/actions F1, . . . , F3 realized, (e.g., machining, drilling, etc.).
[0029] After choosing an action from a finite set of actions 302, beginning by making randomized choices, the environment is updated, and the RL agent observes 303 the new state, State, and reward as an evaluation of its action. The goal of the RL agent is to maximize the long-term discounted rewards 301 by finding the best control policy.
[0030] As RL technology, we may use SARSA, DQN, etc., which in
[0031] As modules may be replaced by various manufacturing processes, this concept is transferable to any intra-plant logistics application.
[0032] If, in some cases, there is a situation which is not known to the system (e.g., when there is a new manufacturing module), the system is able to explore the actions in this situation and learn online how the actions perform. So, the system learns the best actions for unknown situations online, though it will likely choose suboptimal decisions in the beginning. Alternatively, there is the possibility to train the system in the training setup again with the adapted plant topology by using the GUI, which is more deeply described later in
[0033] An important step is the representation of the FMS 500 as a state matrix 200 as a simulation of the FMS. The generation of the state matrix from a representation 100 of the FMS may happen automatically.
[0034] The state matrix is generated automatically after designing the schematic of the FMS, e. g. with the help of the GUI 10 in
[0035] In
[0036] Each processing unit M1, . . . , M6 has a corresponding field in the state matrix, the arrangement of the concerned fields of the state matrix according to the topology of the FMS. The content of the particular field shows information about the functions (F1, F2, F3) of the particular processing entity.
[0037] Further the handling units (C0, . . . , C6) are depicted in own fields, and the decision points D, with the respective waiting products 1, . . . , 4 may be found in the matrix in the last line 202. The line before the last line JL shows the progress of the processing job, e. g. which machines M1, . . . , M6 are still needed.
[0038] The handling units, for example conveyor belts (C0, . . . , C6) are ordered in a similar way to the real plant topology and the production modules/processing units (M1, . . . , M6) around them. The production modules contain further information on the jobs they are able to execute, or attributes that the plant operator wants to depict like production time, quality, or energy efficiency, just to mention a few of them. The controlled product 204 is marked by a specific number, in this example by number 5 and is updated to the decision-making points 4.1, 4.2, . . . , it is currently positioned.
[0039] The second to last row represents the job-list JL and the last row 202 contents the number of products currently waiting in the queue of the specific modules to consider other products in the manufacturing process. Alternatively, a list with product IDs may be stored in said matrix field.
[0040] The state matrix is in parallel used as simulation as the product moves to the next position in the conveyor belt, depending on which decision was chosen. If the product steps into a module, it is not depicted in the simulation as the simulation is only updated at the next decision-making point with the updated job-list. The initial state may be characterized by a full job-list and a defined product location, and the termination state may be defined as a fulfilled job-list, that means all fields have the value “0” (empty)—no products waiting.
[0041] For every module or machine of the plant, there is one place generated in the matrix. This is done module by module and the matrix is built up in the same way, as the modules are ordered in the plant topology. For every decision-making point of the transport (e.g., conveyor section between the modules), there is also a place in the matrix generated on a place, which is adjacent to the two connecting modules. The matrix is built up automatically and rule-based in the same order as the plant topology. For example, for the decision to generate a new row in the matrix, the grid in the GUI may help. The grid may help to locate the modules and conveyor sections and to find the according place in the matrix then.
[0042] After the state matrix and the simulation are created automatically, the system may be trained on these requirements. A Reinforcement Learning (RL) agent is used to train the system. It is not a Multi Agent System (MAS), so there is no need for the products to communicate with each other as the state of the plant includes the information of the queue length of the modules. The fact that with RL no labelled data is needed makes this approach very attractive for plant operators, who may sometimes struggle with the task of generating labelled data.
[0043] In one embodiment, a GUI may be used, where the plant operator depicts the plant schematically and with very little engineering effort. An example GUI is shown in
[0044] The processing units may be defined via box 11 of the GUI. The maximum number of products at one time in the plant, the maximum number of jobs in one job-list, and all possible jobs of the job-list, as well as the properties of the modules (including available executable jobs or operations or maximum queue length) may be set in the GUI easily, see box 12 and 13.
[0045] Actions may be set as well, but at a decision point with various directions an action on default is choosing direction. When there is a decision point in front of a module and there is no conveyor belt leading into the module, the action “step into” may be set. With this schematic drawing of the plant 100 and with the fixed knowledge of the meaning of the input, it is possible to automatically generate a simple simulation of the plant that is sufficient for training with the products moving from one decision point to the next one.
[0046] Furthermore, the representation of the state of the FMS may directly and automatically be depicted as a state matrix 15 as the system generating the state matrix has the knowledge about the meaning of the input of the GUI. If there is additional information the plant operator wants to depict in the simulation or state matrix, there is the possibility to code this information directly.
[0047] An alternative is a descriptive (OPC UA) information model, which describes the plant topology, etc., which then may be read by a specific (OPC UA) Client. The Client may then build a simulation and a state matrix.
[0048] The reward function 16 values the action the system chooses, in this case the route that the product takes as well as how the product complied with given constraints on its route and check at each time step whether the action was useful. Therefore, the reward function contains these process specific constraints, local optimization goals, and global optimization goals, which all may be defined via box 14. Also, the job order constraints (e.g., which job is done first, second, etc.) may be set 17.
[0049] The reward function is automatically generated, as it is a mathematical formulation of optimization goals to be considered.
[0050] The user defines the importance of the optimization goals (for example, in the GUI 14) for instance:
5×Production time, 2×quality, 1×energy efficiency
[0051] This information will directly be translated in the mathematical description of the reward function:
0.625 Production time+0.25×quality+0.125×time energy
[0052] Additionally, the reward function includes optimization goals the system may consider during the manufacturing process. These goals may include makespan, processing time, material costs, production costs, energy demand, and quality. It is the plant operator's task to set process specific constraints and optimization goals in the GUI. It is also possible to consider combined and weighted optimization goals, depending on the plant operator's desire.
[0053] In the runtime, the received reward may be compared with the expected reward for further analysis or decisions to train the model again or fine tune it.
[0054] In summary, the disclosure provides a RL agent that is trained in a virtual environment (e.g., generated simulation) and learns how to react in every possible situation that it has seen. After choosing an action from a finite set of actions, beginning by making randomized choices, the environment is updated, and the RL agent observes the new state and reward as an evaluation of its action. The goal of the RL agent is to maximize the long-term discounted rewards by finding the best control policy.
[0055] During training, the RL agents sees many possible situations (e.g., very high state space) multiple times until it knows the optimal action. For every optimization goal, a different RL agent is trained.
[0056] In the first training act, the RL agent is trained to control the product in a way that it is manufactured according to its optimization goal. Other products in the manufacturing process are controlled by a fixed policy.
[0057] In the second training act, different RL agents are trained during the same manufacturing process and simulation. This is to adjust the RL agents to each other and respect other agent's decisions and to react on them. When the RL agents give satisfactory results, the models trained in the virtual environment are then transferred to the physical level of the plant, where they are applied as control policy. Depending on the defined optimization goals for each product, the appropriate control policy is used to control the product routing and therefore the manufacturing. This enables the manufacturing of products with lot size one and a specific optimization goal, such as high energy efficiency or low material costs, at the same time in one FRMS. With the control policy every product in the manufacturing plant is able to make its own decision at every time step during the manufacturing process, depending on the defined optimization goal.
[0058] As already stated, in
[0059] As modules may be replaced by various manufacturing processes, this concept is transferable to any intra-plant logistics application.
[0060] If, in some cases, there is a situation which is not known to the system (e.g., when there is a new manufacturing module), the system is able to explore the actions in this situation and learn online how the actions perform. So, the system learns the best actions for unknown situations online, though it will likely choose suboptimal decisions in the beginning. Alternatively, there is the possibility to train the system in the training setup again with the adapted plant topology by using the GUI.
[0061] An important act in this disclosure is the representation of the FMS as a state matrix automatically. Therefore, a GUI is used, where the plant operator depicts the plant schematically and with very little engineering effort. An example GUI is shown in
[0062] The maximum number of products at one time in the plant, the maximum number of jobs in one job-list, and all possible jobs of the job-list, as well as the properties of the modules (including available executable jobs or maximum queue length) may be set in the GUI easily. Actions may be set as well, but at a decision point with various directions an action on default is choosing direction. When there is a decision point in front of a module and there is no conveyor belt leading into the module, the action “step into” may be set. With this schematic drawing of the plant and with the fixed knowledge of the meaning of the input, it is possible to automatically generate a simple simulation of the plant that is sufficient for training with the products moving from one decision point to the next one.
[0063] Various Products may be manufactured optimally in one FMS using different optimization goals at the same time.
[0064] Find the optimal way for a product through the FMS automatically by interacting with the simulated environment without the need for programming (self-training system).
[0065] The simulation is generated automatically from the GUI, there is no high engineering effort to generate a GUI for the training.
[0066] The representation of the current state of the FMS is generated automatically from the GUI, so there is no high effort to engineer the state description with only the important information from the FMS.
[0067] The decision making is not rule based or engineered. It is a self-learning system with less engineering effort.
[0068] The decision making takes place online and in near real-time as the solution is known for every situation from the training.
[0069] If, in some cases, there is a situation which is not known to the system (e.g., when there is a new manufacturing module), the system is able to explore the actions in this situation and learn online how the actions perform. So, the system learns the best actions for unknown situations online, though it will likely choose suboptimal decisions in the beginning. Alternatively, there is the possibility to train the system in the training setup again with adapted plant topology by using the GUI.
[0070] There is no need for communication between the products, as the information about the current state includes the modules' queues and therefore the important product positions.
[0071] No labelled data is needed the system to find the best decisions as it is trained by interacting with the simulation.
[0072] The Concept is transferable to any intra-plant logistics application.
[0073] It is to be understood that the elements and features recited in the appended claims may be combined in different ways to produce new claims that likewise fall within the scope of the present disclosure. Thus, whereas the dependent claims appended below depend from only a single independent or dependent claim, it is to be understood that these dependent claims may, alternatively, be made to depend in the alternative from any preceding or following claim, whether independent or dependent, and that such new combinations are to be understood as forming a part of the present specification.
[0074] While the present disclosure has been described above by reference to various embodiments, it may be understood that many changes and modifications may be made to the described embodiments. It is therefore intended that the foregoing description be regarded as illustrative rather than limiting, and that it be understood that all equivalents and/or combinations of embodiments are intended to be included in this description.