SELF-LEARNING MANUFACTURING SCHEDULING FOR A FLEXIBLE MANUFACTURING SYSTEM AND DEVICE

Abstract

A method that is used for self-learning manufacturing scheduling for a flexible manufacturing system that is used to produce at least a product is provided. The manufacturing system consists of processing entities that are interconnected through handling entities. The manufacturing scheduling will be learned by a reinforcement learning system on a model of the flexible manufacturing system. The model represents at least a behavior and a decision making of the flexible manufacturing system. The model is realized as a petri net.

An order of the processing entities and the handling entities is interchangeable, and therefore, the whole arrangement is very flexible.

Claims

1. A method for self-learning manufacturing scheduling for a flexible manufacturing system that is used to produce at least a product, wherein the flexible manufacturing system includes processing entities that are interconnected through handling entities, the method comprising: learning, by a reinforcement learning system, the manufacturing scheduling based on a model of the flexible manufacturing system, wherein the model represents at least a behavior and a decision making of the flexible manufacturing system, and wherein the model is realized as a petri net.

2. The method of claim 1, wherein one state of the petri net represents one situation in the flexible manufacturing system.

3. The method of claim 1, wherein a place of the petri net represents a state of one of the processing entities and a transition of the petri net represents one of the handling entities.

4. The method of claim 1, wherein a transition of the petri net corresponds to an action of the flexible manufacturing system.

5. The method of claim 1, wherein the flexible manufacturing system has a known topology, and wherein the method further comprises generating a matrix that corresponds to information from the petri net, the information from the petri net including information about transitions and places, and a position of the information in the matrix is ordered according to the known topology of the flexible manufacturing system.

6. The method of claim 5, wherein a body of the matrix includes an input for every product that is located in the flexible manufacturing system at one point of time, and wherein the matrix shows a position or a move from one position to another position of the respective product in the flexible manufacturing system.

7. The method of claim 6, wherein a colored petri net is used to represent characteristics of the respective product.

8. The method of claim 5, further comprising training the reinforcement learning system using the information included in the matrix, the training comprising calculating a vector that is used as input information for the reinforcement learning system as a basis for choosing a transition to a next step of the reinforcement learning system based on additionally entered and prioritized optimization criteria regarding the manufacturing process of the product or an efficiency of the flexible manufacturing system.

9. A reinforcement learning system for self-learning manufacturing scheduling for a flexible manufacturing system that is used to produce at least a product, wherein the flexible manufacturing system includes processing entities that are interconnected through handling entities, the reinforcement learning system comprising: a processor configured to: learn the manufacturing scheduling based on an input to a learning process, the input including a model of the flexible manufacturing system, wherein the model represents at least a behavior and a decision making of the flexible manufacturing system, and wherein the model is realized as a petri net.

10. The reinforcement learning system of claim 9, wherein one state of the petri net represents one situation in the flexible manufacturing system.

11. The reinforcement learning system of claim 9, wherein a place of the petri net represents a state of one of the processing entities, and a transition of the petri net represents one of the handling entities.

12. The reinforcement learning system of claim 9, wherein a transition of the petri net corresponds to an action of the flexible manufacturing system.

13. The reinforcement learning system of claim 9, wherein the flexible manufacturing system has a known topology, and wherein the processor is further configured to generate a matrix that corresponds to information from the petri net, the information from the petri net including information about transitions and places, and a position of the information in the matrix is ordered according to the known topology of the flexible manufacturing system.

14. The reinforcement learning system of claim 13, wherein a body of the matrix includes an input for every product that is located in the flexible manufacturing system at one point of time, and wherein the matrix shows a position or a move from one position to another position of the respective product in the flexible manufacturing system.

15. The reinforcement learning system of claim 14, wherein a colored petri net is used to represent characteristics of the respective product.

16. The method of claim 13, wherein the processor is further configured to train the reinforcement learning system using the information included in the matrix, the training comprising calculation of a vector that is used as input information for the reinforcement learning system as a basis for choosing a transition to a next step of the reinforcement learning system based on additionally entered and prioritized optimization criteria regarding the manufacturing process of the product or an efficiency of the flexible manufacturing system.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0031] FIG. 1 illustrates a training concept of an RL agent in a virtual level (petri net) and application of the trained model at the physical level (real FMS);

[0032] FIG. 2 shows a representation of state and behavior of an FMS as a petri net to represent multiple products in the FMS (top) and a matrix that contains system behavior of the petri net (bottom); and

[0033] FIG. 3 shows a possible draft of a GUI to schematically design the FMS.

DETAILED DESCRIPTION

[0034] FIG. 1 shows an overview of one embodiment of a whole system from a Training system 300 with a representation of a real plant 500 as a petri net 102.

[0035] As RL technology, SARSA, DQN, etc. may be used. One RL agent model is trained against the petri net 102 to later control exactly one product. Thus, there are various agents trained for various products. In some instances, the same agent may be trained for various products (e.g., one for every product). There is no need for the products to communicate with each other, as the state of the plant includes information of a queue length of modules and a location of other products.

[0036] FIG. 1 shows the concept of training. An RL agent is trained in a virtual environment (e.g., petri net) and learns how to react in different situations. After choosing an action from a finite set of actions, beginning by making randomized choices, the environment is updated, and the RL agent observes the new state and reward as an evaluation of its action. The goal of the RL agent is to maximize the long-term discounted rewards by finding the best control policy.

[0037] During training, the RL agents sees many situations (e.g., very high state space) multiple times and may generalize for the unseen ones if neural networks are used with the RL agent. After the agent is trained against the petri net, the petri net is finetuned in the real FMS before the petri net is applied at runtime for the online scheduling.

[0038] After taking an action 302, the result in the simulation is observed 303, and feedback is given (e.g., Reward 301).

[0039] There is no need for the products to communicate with each other, as the state of the plant includes the information of the queue length of the modules and the location of the other products.

[0040] After choosing an action from a finite set of actions, beginning by making randomized choices, the environment is updated, and the RL agent observes the new state and reward as an evaluation of its action. The goal of the RL agent is to maximize the long-term discounted rewards by finding the best control policy. During training, the RL agents sees many situations (e.g., very high state space) multiple times and may generalize for the unseen ones if neural networks are used with the RL agent. After the agent is trained against the petri net, the petri net is finetuned in the real FMS before the petri net is applied at runtime for the online scheduling.

[0041] With the schematic drawing 101 of the plant and with the fixed knowledge of the meaning of the content, it is possible to automatically generate the petri 102 as schematically depicted in the figures.

[0042] In the following, the structure of the petri net 101 is explained.

[0043] The circles are referred to as places M1, . . . M6, and the arrows 1, 2, . . . 24 are referred to as transitions in the petri net environment. The inner hexagon of the petri net in FIG. 2 represents conveyor belt sections (e.g., places 7-12), and the outer places represent places where manufacturing modules may be connected (e.g., number 1-6). Transitions 3, 11, 15, 19, 23 let the product stay at the same place. The remaining numbers 1, . . . 24 are the transitions, which may be fired to move a product (e.g., token) from one place to another place. These transitions are useful when a second operation may be executed in the same module after the first operation. The state of the petri net is defined by a product a, b, c, d, e (e.g., token) on a place. For considering many different products in an FMS, a colored petri net with the colored token as different products may be used. Instead of a color, a product ID may also be used.

[0044] The petri net, which describes the plant architecture (e.g., places) and its system behavior (e.g., transitions) may be represented in one single matrix shown also in FIG. 2 below.

[0045] This matrix describes the move of tokens from one place to another by activating transitions. The rows are the places and the columns the transitions. The +1 in the second column and first row describes, for example, that one token moves to place 1 by activating transition 2. By using a matrix as in FIG. 2, the following state of the petri net may be easily calculated by adding the dot product of the transition vector and matrix C to the previous state. The transition vector is a one-hot encoded vector, which describes the transition to be fired of the controlled agent.

[0046] The petri net representation of the FMS is a well suitable training environment for the RL agent. An RL agent is trained against the petri net, for example, by an algorithm known as Q-Learning, until the policy/Q-values (e.g., long-term discounted rewards over episode) converge. The state of the petri net is one component to represent the situation in the FMS, including the product location of the controlled and the other products, with their characteristics. This state may be expressed in a single vector and is used as one of the input vectors for the RL agent. This vector defines the state for every place in the petri net, including the type of products located on that place.

[0047] If, for example, product type a is located on place one, which has the capacity of three, the first vector entry looks as follows: [a, 0, 0].

[0048] If there is product type b and c on place two with capacity of three, the first and second vector entry look as follows: [[a, 0, 0] [b, c, 0]].

[0049] The action space of the RL agent is defined by all transitions of the petri net. So, the RL agent's task is to fire transitions depending on the state.

Transition to be fired t=(001000000000000000)

Current marking in state S1 S1=(000000010000)

Calculation of following state S2=S1+C.t

Current marking in state S2 S2=(010000000000)

[0050] The next state is then calculated very fast in a single line code and is propagated back to the reward function and the agent. The agent will first learn the plant behavior by getting rewarded negative when firing invalid transitions and will later be able to fire suitable transitions, that all the products, controlled by different agents, are produced in an efficient way. The action of the agent at runtime is translated in the direction the controlled product should go at every point a decision needs to be made. With several agents controlling different products by respective optimization goals while considering an addition global optimization goal, this system may be used as an online/reactive scheduling system.

[0051] The reward function (e.g., reward function is not part of the present embodiments; this paragraph is for understanding how the reward function is involved in training of an RL agent) values the action the agent chooses (e.g., the dispatching of a module) as well as how the agent complied with given constraints. Therefore, the reward function is to contain these process-specific constraints, local optimization goals, and global optimization goals. These goals may include makespan, processing time, material costs, production costs, energy demand, and quality.

[0052] The reward function is automatically generated, as the reward function is a mathematical formulation of optimization goals to be considered.

[0053] It is the plant operator's task to set process specific constraints and optimization goals in, for example, the GUI. It is also possible to consider combined and weighted optimization goals, depending on the plant operator's desire. In the runtime, the received reward may be compared with the expected reward for further analysis or decisions to train the model again or fine tune the model.

[0054] As modules may be replaced by various manufacturing processes, this concept is transferable to any intra-plant logistics application. The present embodiments are beneficial for online scheduling but may also be used for offline scheduling or in combination.

[0055] If in some cases there is a situation that is not known to the system (e.g., when there is a new manufacturing module), the system is able to explore the actions in this situation and learn online how the actions perform. The system thus learns the best actions for unknown situations online, though the system will likely choose suboptimal decisions in the beginning. Alternatively, there is the possibility to train the system in the training setup again with the adapted plant topology (e.g., by using the GUI).

[0056] In the exemplary GUI 110 in FIG. 3, a representation of the FMS is on the right side. There are boxes M1, . . . M6 for modular and static production modules and thin boxes C, C1, . . . C6 that represent conveyor belt sections. The numbers in the modular boxes M1, . . . M6 represent the processing functionality F1, F5 of the particular manufacturing modules (e.g., drilling, shaping, printing). One task in the manufacturing process may be performed by different manufacturing stations M1, . . . M6, even if the different manufacturing stations M1, . . . M6 realize different processing functionalities that may be interchangeable.

[0057] Decision making points D1, . . . D6 are be placed at desired positions. Behind the GUI, there are fixed and generic rules implemented, such as the fact that at the decision making points, a decision is to be made (e.g., a later agent call) and the products may move on the conveyor belt from one decision making point to the next decision point or stay in the module after a decision is made. The maximum number of products in the plant, the maximum number of operations in the job-list, and job-order constraints 117 such as all possible operations, as well as the properties of the modules (e.g., including maximum capacity or queue length) may be set in the third+box 113 of the exemplary GUI. Actions may be set as well, but as default, every transition of the petri net 102 is an action.

[0058] The importance of the optimization goals may be defined 114 (e.g., by setting the values in the GUI). For example:

5×Production time, 2×quality, 1×energy efficiency

[0059] This information will then directly be translated in the mathematical description of the reward function 116, such as, for example,:

0.625 Production time+0.25×quality+0.125×time energy

[0060] The present embodiments include a scheduling system with possibility to react online to unforeseen situations very fast. Self-learning online scheduling results in less engineering effort, as this is not rule based or engineered. With the present embodiments, the optimal online schedule is found by interacting with the petri net without the need of engineering effort (e.g., defining heuristics).

[0061] The “simulation” time is really fast in comparison to known plant simulation tools because only one single equation is used for calculating the next state. No communication is needed between simulation tool and agent (e.g., the “simulation” is integrated in the agent's environment, so there is also no responding time).

[0062] No simulation tool is needed for the training.

[0063] No labelled data is needed to find the best decisions, as the scheduling system is trained against the petri net. The petri net for FMSs may be generated automatically.

[0064] Various products may be manufactured optimally in one FMS using different optimization goals at the same time and an additional global optimization goal.

[0065] Due to the RL, there is no need for an engineer to overthink every exotic situation to model rules for the system.

[0066] The decision making of the applied system takes place online and in near real-time Online training is possible, and retraining of the agents offline (e.g., for a new topology) is also possible.

[0067] The elements and features recited in the appended claims may be combined in different ways to produce new claims that likewise fall within the scope of the present invention. Thus, whereas the dependent claims appended below depend from only a single independent or dependent claim, it is to be understood that these dependent claims may, alternatively, be made to depend in the alternative from any preceding or following claim, whether independent or dependent. Such new combinations are to be understood as forming a part of the present specification.

[0068] While the present invention has been described above by reference to various embodiments, it should be understood that many changes and modifications can be made to the described embodiments. It is therefore intended that the foregoing description be regarded as illustrative rather than limiting, and that it be understood that all equivalents and/or combinations of embodiments are intended to be included in this description.

SELF-LEARNING MANUFACTURING SCHEDULING FOR A FLEXIBLE MANUFACTURING SYSTEM AND DEVICE

Inventors

Cpc classification

Classification Explorer

G06N3/006

PHYSICS

Classification Explorer

G05B2219/33056

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G05B2219/32301

PHYSICS

Classification Explorer

G05B2219/32165

PHYSICS

Classification Explorer

G05B2219/31264

PHYSICS

Classification Explorer

G05B2219/33034

PHYSICS

Classification Explorer

G05B19/41865

PHYSICS

International classification

Classification Explorer

G05B19/418

PHYSICS

Abstract

Claims

Description