INJECTION MOLDING MACHINE SYSTEM

20210001526 ยท 2021-01-07

Assignee

Inventors

Cpc classification

International classification

Abstract

Provided is an injection molding machine system (1) that performs control of molding conditions in an injection molding machine (2) by an agent (6) including a machine learning device which performs reinforcement learning. In the present learning, physical data obtained from the injection molding machine (2) and a defect type indicating the type of a molding defect in a molded article are used as states, molding conditions are used as actions, and a defect state indicating the defect level of the molding defect is used as a reward.

Claims

1. An injection molding machine system comprising: an agent having a machine learner, the machine learner performing reinforcement learning of determining an action according to a value function while receiving rewards for actions done in various states and learning the value function, and an injection molding machine configured to manufacture a mold product under prescribed molding conditions; and the injection molding machine system being configured to adjust the molding conditions using the agent, wherein the machine learner is configured to: use, as the state, physical data obtained from the injection molding machine and a defect type representing a kind of a molding defect of the mold product; use the molding conditions as the action; and use, as the reward, a detect state indicating a defect degree of a molding defect.

2. The injection molding machine system according to claim 1, further comprising: a defect judging device configured to measure the mold product; and a classifier configured to perform learning through supervised learning, wherein the machine learner is configured to use, as the defect type and the defect state, output data obtained from the classifier when input data including measurement data of the mold product measured by using the defect judging device is input to the classifier that has performed the learning.

3. The injection molding machine system according to claim 2, wherein the classifier is configured to perform the learning by using plural actual product data sets each including the measurement data, the defect type, and the defect state of the actual mold product and plural quasi-data sets, and wherein the quasi-data sets include the measurement data, the defect type, and the defect state obtained by modifying the actual product data sets.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0021] FIG. 1 is a block diagram schematically showing an injection molding machine system according to an embodiment of the present invention.

[0022] FIG. 2 is a flowchart for description of a process executed by a generator and work performed by an operator in the injection molding machine system according to the embodiment of the invention.

[0023] FIG. 3 is a diagram showing a classifier of the injection molding machine system according to the embodiment of the invention.

[0024] FIG. 4 is a block diagram schematically showing an injection molding machine system according to an embodiment of the present invention that is provided with an agent that employs an actor-critic algorithm.

DESCRIPTION OF EMBODIMENTS

[0025] An injection molding machine system 1 according to an embodiment is a system in which the molding conditions of an injection molding machine 2 are adjusted utilizing machine learning (i.e., what is called AI). As shown in FIG. 1 in a simplified manner, as in conventional injection molding machines, the injection molding machine 2 is composed of a mold clamping device, an injection device, etc. A takeout device 3 for taking out a mold product that has been produced by the injection molding machine 2 and a camera 4 for shooting the mold product that has been taken out are installed adjacent to the injection molding machine 2. Every time a mold product is produced by the injection molding machine 2, image data of the mold product is acquired by the camera 4.

[0026] An AI system for adjusting the molding conditions in the injection molding machine system 1 is constructed on a prescribed computer and has plural function blocks. First, the AI system has an agent 6 which adjusts the molding conditions for the injection molding machine 2. The agent 6 has a machine learner which learns through reinforcement learning. The agent 6 will be described later in detail.

[0027] Among other function blocks comprising the AI system are a classifier 7 and a generator 8. As described later, the classifier 7 has a machine learner which learns through supervised learning. The classifier 7 is configured so as to judge whether a mold product is defective and to output a kind of defect (i.e., a defect type and a state of the defect, that is, a degree of the defect). To cause the classifier 7 to perform supervised learning, it is necessary to prepare, for the classifier 7, a large number of sets of data, that is, input data and output data (i.e., data sets). To this end, the generator 8 generates, as data sets, a large number of quasi-data sets including quasi-data. There will be described below work to be done by an operator and a process to be executed by the generator 8 in order to prepare a large number of data sets.

[0028] A combination of input data and output data as a data set may include any kinds of data; any combination of data is possible as long as it allows the classifier 7 to judge a mold product and output a defect type and a defect state. The embodiment employs a combination of data sets in which the input data is image data of a mold product and the output data is a defect type and a defect state. The image data is taken by the camera 4. The image data may be of any kind; the image data may be a set of plural image data taken from two or three directions or image data taken from a single direction. Furthermore, the image data may consist of plural image data taken by projecting light beams from different directions. Irrespective of what conditions are employed, the image data may be obtained by the camera 4 under unified conditions with respect to all mold products.

[0029] The defect type of output data includes plural data indicating occurrence/non-occurrence of a defect for respective types of defect and includes I/O data indicating occurrence/non-occurrence of a sink mark, I/O data indicating occurrence/non-occurrence of a burr, etc. The defect state is data indicating the degree of a defect irrespective of the defect type. That is, the defect state is data that relates to only the degree of a defect and is irrelevant to whether the defect type is a sink mark or a void. This data may be expressed by any numerical value. For example, the defective states of a good product, a product having a low-degree defect, and a product having a high-degree defect can be defined as numerical values 1.0, 0.3, and 0.7, respectively.

[0030] At step S1 shown in FIG. 2, an operator prepares a sample of a good mold product and samples of defective mold products of different defect types. For example, the operator prepares one or plural defective products having a sink mark and one or plural defective products having a burr; the operator prepares one or plural samples for each of other defect types. The operator determines a numerical value representing a defect state of each prepared defective product sample (step S2). Then the operator acquires image data of each good product sample and each defective product sample (step S3). The thus-obtained data sets each consisting of image data and a combination of a defect type and a defect state are obtained from the samples of an actual good product and actual defective products and correspond to the above-mentioned term actual product data sets.

[0031] The generator 8 generates a large number of quasi-data sets by modifying the actual product data sets through calculation (step S4). Quasi-data sets are generated for each defect type. For example, for defective products whose defect type is sink mark, the generator 8 modifies actual product data sets of sink mark. More specifically, the generator 8 modifies an image file by moving a position of an actual sink mark by parallel translation of image processing or changing the size of a sink mark by enlarging/reducing of image processing. In enlarging or reducing the size of a sink mark, the generator 8 also changes the defect state value according to the size of the sink mark. The generator 8 acquires quasi-data sets in the above manner. The generator 8 generates quasi-data sets in the same manner for other defect types. Any known technique as described above may be used as a technique for modifying image data automatically through image processing. The processing performed in the generator 8 may employ a method using machine learning such as a GAN.

[0032] The classifier 7 is a machine learner that performs supervised learning, and there are no limitations on the type of an algorithm employed in the classifier 7. For example, the classifier 7 can employ an SVM, a least squares method, a stepwise method, or the like. However, it is preferable that an algorithm capable of expressing a nonlinear input-output relationship is employed because it is expected that input-output relationship of a data set in which the input data is image data and the output data includes a defect type and a defect state becomes nonlinear. In the embodiment, the classifier 7 is formed by a neural network. As shown in FIG. 3, the classifier 7 has a neural network that is in plural layers and is configured in such a manner that image data is applied to the neurons in an input layer and defect types and a defect state are output from the neuron in an output layer.

[0033] Image data is applied to the classifier 7 as input, and corresponding defect type and defect state are applied to the classifier 7 as teaching signals to make the classifier 7 learn by using a large number of data sets including actual product data sets and quasi-data set. After that, the classifier 7 having learned properly can output a defect type and a defect state accurately when image data of a mold product is input to it. In the injection molding machine system 1, a mold product is taken out by the takeout device 3 and shot by the camera 4 every time injection molding is performed by the injection molding machine 2. Image data taken by the camera 4 is sent to the classifier 7 and the classifier 7 outputs a defect type and a defect state.

[0034] The agent 6 employed in the embodiment will be described. In general, a machine learner that performs reinforcement learning controls a control target or an environment and is called an agent. The agent determines an action a.sub.t (Action) on the basis of a state s.sub.t (State) of a control target and the control target makes a transition from the state s.sub.1 to another state s.sub.t+1. At this time, the agent receives a reward r.sub.t (Reward) from the control target. The agent learns so as to determine actions a.sub.t that maximize an accumulation of future rewards r.sub.t to receive. To realize the above, many agents are provided with a prescribed value function and update it through learning. When a prescribed state s.sub.t is given, the agent determines an action a.sub.t that maximizes the value of the value function in a state that the learning has advanced. The value function may be of any kind. The learning algorithm may be a known algorithm such as Q learning, a SARSA technique, TD learning, a Monte Carlo method, or an Actor-critic method. That is, the invention is characterized not in the kind of a value function or algorithm but in what data comprises the state s.sub.t, the action a.sub.t, and the reward r.sub.t that are handled by the agent 6 employed in the embodiment.

[0035] The action a.sub.t to be handled by the agent 6 according to the embodiment includes molding conditions such as an injection speed, an injection stroke, and a cylinder temperature. This is because the agent 6 can determine optimum molding conditions as an action a.sub.t when a prescribed state s.sub.t is given. The state s.sub.t to be handled by the agent 6 according to the embodiment includes various physical data obtained from the injection molding machine 2. The physical data include various data obtained in connection with the injection molding machine 2, such as an injection pressure, a resin temperature, and an external temperature. The state s.sub.t may include data other than these physical data, and other data may be added to the state s.sub.t when necessary. Incidentally, such a state s.sub.t is not sufficient for the agent 6 to determine optimum molding conditions. To enable selection of a molding condition to be adjusted, the state s.sub.t to be handled by the agent 6 according to the embodiment includes a defect type that is output from the classifier 7 as a state s.sub.t.

[0036] Since the state s.sub.t includes a defect type, the agent 6 can judge, according to a defect type, what molding condition should be made an adjustment target and hence optimum molding conditions can be determined properly as an action a.sub.t under a given state s.sub.t. The reward r.sub.t that is given to the agent 6 according to the embodiment is a defect state that is output from the classifier 7. The agent 6 can perform reinforcement learning by using the above state s.sub.t, action a.sub.t, and reward r.sub.t. Molding conditions to be made an adjustment target may be narrowed down in a rule-based manner for each defect type by utilizing knowledge of a skilled person. For example, for the burr defect, the agent 6 may be caused to learn by giving it a rule that increases, arbitrarily, the action selection probabilities of the injection speed and the keeping pressure. For another example, the agent 6 may be caused to learn by leaving how action branching should occur depending on the defect type to the algorithm as shown in the drawing etc. of the embodiment.

[0037] There will be described an example in which the agent 6 according to the embodiment is caused to perform reinforcement learning by an Actor-critic method. In this case, as shown in FIG. 4, the agent 6 is composed of an actor 10 and an evaluator 11. To determine an action a.sub.t by the Actor-critic method, a state value function V(s.sub.t) is provided in the evaluator 11 as a value function. The state value function V(s.sub.t) is a function indicating how good the state st is. The state value function V(s.sub.t) may be configured in any manner. For example, the state value function V(s.sub.t) may be formed by a matrix that stores V values corresponding to respective values of the state s.sub.t or an SVM or a neural network that represents an input-output relationship. The state value function V(s.sub.t), which is updated by reinforcement learning, may be updated according to any algorithm. For example, in a case where the state value function V(s.sub.t) is updated by a TD learning method, it can be calculated according to the following Expression 1.


[Expression 1]


V(s.sub.t)V(s.sub.t)+[r.sub.t+V(s.sub.t+1)V(s.sub.t)](Expression 1)

where

[0038] learning coefficient : 01; and

[0039] discount rate : 01.

[0040] When a prescribed state s.sub.t having a prescribed defect type and physical data is given in the injection molding machine 2, a mold product is obtained by determining molding conditions as an action a.sub.t and performing injection molding. The classifier 7 judges a defect state of the mold product and the evaluator 11 receives it as a reward r.sub.t. Then next molding conditions are determined for a state including the defect type that is a judgment result of the classifier 7 and the injection molding machine 2 performs injection molding. The state value function V(s.sub.t) can be updated according to Expression 1 as such a molding operation is performed repeatedly.

[0041] On the other hand, the actor 10 is provided with a policy (s.sub.t, a.sub.t; w.sub.t) which indicates what action a.sub.t should be decided on when a state s.sub.t is given. The policy (s.sub.t, a.sub.t; w.sub.t) is a probability distribution function representing the probability at which the action a.sub.t is decided on under the state s.sub.t, and w.sub.t is the adjustment parameter that determines the policy (s.sub.t, a.sub.t; w.sub.t). For example, in a case where the policy (s.sub.t, a.sub.t; w.sub.t) is expressed as a normal distribution N(, ), where is the average and is the standard deviation, it can be said that adjusting the adjustment parameter w.sub.t substantially means adjustment of the average and the standard deviation that are functions of w.sub.t. When the policy (s.sub.t, a.sub.t; w.sub.t) is made a proper probability distribution function by adjusting the adjustment parameter w.sub.t by learning, the probability that a proper action a.sub.t is decided on when a prescribed state s.sub.t is given becomes high and the probability that an improper action a.sub.t is decided on becomes low. An example adjusting method of the adjustment parameter w.sub.t is the following method. First, the degree of appropriateness of the policy (s.sub.t, a.sub.t; w.sub.t) is defined as the appropriateness e.sub.t by Equation 2-1. Then the appropriateness with a history, Dt, is defined by Equation 2-2 using a discount rate . As a result, the adjustment parameter w.sub.t can be updated according to Expression 2-3 using the reward r.sub.t that is received as a defect state and the state value function V(s.sub.t).

[00001] [ Formula .Math. .Math. ( Expression ) .Math. .Math. 2 ] e t = w t .Math. log ( ( a t , s t ; w t ) ) ( Formula .Math. .Math. 2 .Math. - .Math. 1 ) D t = e t + .Math. .Math. D t - 1 ( Formula .Math. .Math. 2 .Math. - .Math. 2 )

where discount rate : 01.


w.sub.tw.sub.t+.sub.tD.sub.t(Expression 2-3)

where

[0042] learning coefficient : 01; and

[0043] TD error .sub.t: .sub.t=r.sub.t+V(s.sub.t+1)V(s.sub.t).

[0044] As the learning process is executed repeatedly, both of the state value function V(s.sub.t) and the policy (s.sub.t, a.sub.t; w.sub.t) converge and the TD error .sub.t comes close to 0. That is, a state is established that the agent 6 has learned through reinforcement learning. When a state s.sub.t is given by the policy (s.sub.t, a.sub.t; w.sub.t) that has been rendered in this state, an optimum action a.sub.t (that is, an optimum molding condition) can be calculated.

[0045] The agent 6 according to the embodiment can have different configuration from the above description. For example, an action value function Q(s.sub.t, a.sub.t) is used as a value function and an optimum action a.sub.t, that is, optimum molding conditions, can be determined by the action value function Q(s.sub.t, a.sub.t). The action value function Q(s.sub.t, a.sub.t) is an evaluation function indicating how good a prescribed action a.sub.t is. The action value function Q(s.sub.t, a.sub.t) can also be configured in various manners as with the state value function V(s.sub.t). For example, the action value function Q(s.sub.t, a.sub.t) may be formed by a matrix in which Q values corresponding to sets of a state s.sub.t value and an action a.sub.t value are set, that is, a Q table. The action value function Q(s.sub.t, a.sub.t) can be updated according to the following expression by Q learning:

[00002] .Math. [ Expression .Math. .Math. 3 ] Q ( s t , a t ) Q ( s t , a t ) + [ r t + 1 + .Math. max .Math. Q ( s t + 1 , a t ) - Q ( s t , a t ) ] ( Expression .Math. .Math. 3 )

where

[0046] learning coefficient : 01; and

[0047] discount rate : 01.

[0048] As the injection molding and the learning process are performed repeatedly, the action value function Q(s.sub.t, a.sub.t) converges and optimum molding conditions can be determined using the learned action value function Q(s.sub.t, a.sub.t). That is, when a prescribed state s.sub.t is given, an action a.sub.t that maximizes the action value function Q(s.sub.t, a.sub.t) is searched for. Such an action a.sub.t is an optimum molding condition.

[0049] In a case where the action value function Q(s.sub.t, a.sub.t) is formed by a Q table, each of the state s.sub.t and the action a.sub.t are handled as discrete values. Although each of them can be handled substantially as continuous values if the Q table matrix is made huge, this causes a heavy calculation load. In contrast, if the action value function Q(s.sub.t, a.sub.t) is formed by, for example, what is called a function approximator such as a neural network, each of the state s.sub.t and the action a.sub.t can be handled as continuous values and the calculation load can be relatively light.

[0050] The present invention is not limited to the above embodiment and various modifications can be made within the confines of the invention. For example, the invention is not limited to the above-described embodiment and modifications, improvements, etc. can be made as appropriate. Furthermore, the material, shape, dimensions, number, location, etc. of each constituent element or each set of constituent elements according to the above-described embodiment may be determined in desired manners, that is, are not subjected to any restrictions, as long as the invention can be realized.

[0051] For example, although the above embodiment has been described with an assumption that the learning converges, convergence to a final state needs not be assured in certain algorithms. Furthermore, as described in the actor-critic example, algorithms may update the policy in an explicit manner. As described in the Q learning example, the algorithm may be limited within update of the value function.

[0052] For another example, it was explained that the value function was learned by performing actual molding repeatedly in the above-described embodiment. That is, it was explained that learning is performed while actual molding is performed repeatedly (online learning). However, the value function may be learned offline in advance. Even if actual molding is not performed repeatedly, if a certain amount of data about a relationship between a state s.sub.t, action a.sub.t, and reward r.sub.t can be acquired in advance, the value function can be learned using those data. Early convergence can be attained if as described above learning of the value function is performed while actual molding is performed in a state that the learning of the value function has proceeded to a certain extent.

[0053] As another modification, input data to the classifier 7 may be modified. Although the embodiment was described in such a manner that input data to the classifier 7 is only image data of a mold product, physical data relating to a mold product such as a weight of the mold product, chromaticity of the mold product, and a refractive index of the mold product may be given to the classifier 7 as input data. This makes it possible to make a judgment as to more kinds of defect types.

[0054] Furthermore, the injection molding machine system according to the embodiment can be modified into a system having plural injection molding machines. That is, in a case where the same mold product is to be manufactured by plural molding machines, information may be exchanged between agents. The learning efficiency can be increased by exchanging the information and performing swarm reinforcement learning.

[0055] Furthermore, the state s.sub.t and the action a.sub.t that are handled by the agent 6 may be any kind of data. The molding conditions that are handled as the action a.sub.t may be actual values of an injection speed, an injection stroke, a cylinder temperature, etc. Changes in the molding conditions may be handled. That is, an amount of change in an injection speed, an amount of change in an injection stroke, an amount of change in a cylinder temperature, etc., may be handled. Furthermore, when data of each of the state s.sub.t, the action a.sub.t, and the reward r.sub.t is handled, the data may be converted by normalizing it in advance so that it has a numerical value range 0 to 1 or converted so that it has a numerical value range 1 to 1.

[0056] Still further, in the embodiment, image data taken by the camera 4 is used for defect judgment of a mold product, for example. However, in the injection molding machine system according to the invention, as long as an appearance etc. of a mold product can be measured, a defect judging device other than the camera 4 and measurement data measured by that defect judging device may be used for, for example, defect judgment of a mold product.

[0057] The present application is based on Japanese Patent Application No. 2018-055633 filed on Mar. 23, 2018, the disclosure of which is incorporated herein by reference.

INDUSTRIAL APPLICABILITY

[0058] The injection molding machine system according to the invention makes it possible to avoid excessive consumption of computer resources, to reduce the time and cost required for learning, and to adjust molding conditions quickly. The invention providing these advantages can be applied to, for example, systems that perform injection molding of a resin material.

REFERENCE SIGN LIST

[0059] 1: Injection molding machine system [0060] 2: Injection molding machine [0061] 3: Takeout device [0062] 4: Camera (defect judging device) [0063] 6: Agent [0064] 7: Classifier [0065] 8: Generator [0066] 10: Actor [0067] 11: Evaluator