CONTROL DEVICE, METHOD, PROGRAM, AND SYSTEM

Abstract

A control device includes a first controller configured to generate a first operation amount for the device on the basis of an output fed back from the device and a target value, a predicted output generator including a learned model which is machine learned so as to generate a predicted output from the device on the basis of the output fed back from the device and the first operation amount, a second controller configured to generate a second operation amount for the device on the basis of the predicted output and the target value, an integrated operation amount generator configured to generate an integrated operation amount which is an operation amount for the device on the basis of the first operation amount and the second operation amount.

Claims

1. A control device configured to perform feedback control on a predetermined device, the control device comprising: a first controller configured to generate a first operation amount for the device on a basis of an output fed back from the device and a target value; a predicted output generator including a learned model which is machine learned so as to generate a predicted output from the device on a basis of the output fed back from the device and the first operation amount; a second controller configured to generate a second operation amount for the device on a basis of the predicted output and the target value; an integrated operation amount generator configured to generate an integrated operation amount which is an operation amount for the device on a basis of the first operation amount and the second operation amount; and a storage configured to, in a case where the second operation amount is subjected to invalidation processing, store the first operation amount, the output fed back from the device and an output from the device corresponding to the integrated operation amount as machine learning data.

2. The control device according to claim 1, further comprising: learning processor circuitry configured to perform learning processing on a basis of the machine learning data and update the learned model.

3. The control device according to claim 1, further comprising: determination processor circuitry configured to determine whether or not the second operation amount satisfies an invalidation condition; and invalidation processor circuitry configured to perform processing of invalidating the second operation amount in a case where it is determined at the determination processor circuitry that the second operation amount satisfies the invalidation condition.

4. The control device according to claim 3, wherein the invalidation condition is a condition that the second operation amount is greater than a first threshold or smaller than a second threshold which is smaller than the first threshold.

5. The control device according to claim 1, wherein the storage is further configured to store the first operation amount, the output fed back from the device, and the output from the device corresponding to the integrated operation amount as machine learning data in a case where the second operation amount is 0 or a value close to 0.

6. The control device according to claim 1, wherein the storage is further configured to store as machine learning data, the first operation amount relating to one or a plurality of time steps temporally before a reference time step, the output fed back from the device, and the output from the device corresponding to the integrated operation amount in addition to the first operation amount relating to the reference time step in a case where the second operation amount is subjected to invalidation processing, the output fed back from the device and the output from the device corresponding to the integrated operation amount.

7. The control device according to claim 1, wherein each of the first controller and/or the second controller is configured to perform one of P control, PI control, PD control or PID control.

8. The control device according to claim 1, wherein the learned model is a model which is obtained by performing machine learning using a learning model comprising a tree structure constituted by hierarchically disposing a plurality of nodes respectively associated with state spaces which are hierarchically divided.

9. A control method at a control device configured to perform feedback control on a predetermined device, the control device comprising: a first controller configured to generate a first operation amount for the device on a basis of an output fed back from the device and a target value; a predicted output generator including a learned model which is machine learned so as to generate a predicted output from the device on a basis of the output fed back from the device and the first operation amount; and a second controller configured to generate a second operation amount for the device on a basis of the predicted output and the target value, and the control method comprising: generating an integrated operation amount which is an operation amount for the device on a basis of the first operation amount and the second operation amount; and in a case where the second operation amount is subjected to invalidation processing, storing the first operation amount, the output fed back from the device and an output from the device corresponding to the integrated operation amount as machine learning data.

10. A non-transitory computer readable storage medium encoded with computer readable instructions, which, when executed by processor circuitry related to a control device, causes the processor circuitry to perform a method for feedback control on a predetermined device, the method comprising: generating a first operation amount for the device on a basis of an output fed back from the device and a target value; generating, by including a learned model which is machine learned, a predicted output from the device on a basis of the output fed back from the device and the first operation amount; and generating a second operation amount for the device on a basis of the predicted output and the target value, generating an integrated operation amount which is an operation amount for the device on a basis of the first operation amount and the second operation amount; and in a case where the second operation amount is subjected to invalidation processing, storing the first operation amount, the output fed back from the device and an output from the device corresponding to the integrated operation amount as machine learning data.

11. A control system configured to perform feedback control on a predetermined device, the control system comprising: a first controller configured to generate a first operation amount for the device on a basis of an output fed back from the device and a target value; a predicted output generator including a learned model which is machine learned so as to generate a predicted output from the device on a basis of the output fed back from the device and the first operation amount; a second controller configured to generate a second operation amount for the device on a basis of the predicted output and the target value; an integrated operation amount generator configured to generate an integrated operation amount which is an operation amount for the device on a basis of the first operation amount and the second operation amount; and a storage configured to, in a case where the second operation amount is subjected to invalidation processing, store the first operation amount, the output fed back from the device and an output from the device corresponding to the integrated operation amount as machine learning data.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0032] FIG. 1 is a hardware configuration diagram of a control system.

[0033] FIG. 2 is a general flowchart regarding operation of a system.

[0034] FIG. 3 is a block diagram regarding a basic system.

[0035] FIG. 4 is a detailed flowchart regarding operation of the basis system.

[0036] FIG. 5 is a detailed flowchart regarding initial learning.

[0037] FIG. 6 is a detailed flowchart regarding operation of an extended system.

[0038] FIG. 7 is a block diagram regarding the extended system.

[0039] FIG. 8 is a detailed flowchart (No. 1) regarding control processing in the extended system.

[0040] FIG. 9 is a detailed flowchart (No. 2) regarding the control processing in the extended system.

[0041] FIG. 10 is an explanatory diagram regarding a condition of a second operation amount.

[0042] FIG. 11 is a block diagram regarding a basic configuration of a feedback system.

[0043] FIG. 12 is an explanatory diagram regarding a learning tree.

DESCRIPTION OF EMBODIMENTS

[0044] One embodiment of the present invention will be described in detail below with reference to the accompanying drawings.

1. FIRST EMBODIMENT

[0045] <1.1 Configuration>

[0046] FIG. 1 is a hardware configuration diagram of a control system including a control device 100 and a control mechanism 12.

[0047] As can be clear from the drawing, the control device 100 includes a control unit 1, a storage unit 2, an I/O unit 3, an input unit 4, a display unit 5 and a communication unit 6, which are connected to one another via a bus. Further, the control device 100 is connected to an operation unit 121 and a detection unit 122 which constitute the control mechanism 12 and can control a control target which is not illustrated.

[0048] The control unit 1, which is an information processing unit such as a CPU, reads out and executes various kinds of programs stored in the storage unit 2. The storage unit 2, which is a volatile or non-volatile storage device such as a ROM, a RAM, a hard disk and a flash memory, stores various kinds of data which will be described later including data to be machine learned. The I/O unit 3 is an interface that performs input and output to and from an external device. The input unit 4 processes a signal input via a keyboard, a touch panel, a button, or the like. The display unit 5 is connected to a display, or the like, performs display control and provides a GUI to a user via the display, or the like. The communication unit 6 is a communication unit that performs communication with external equipment in a wired or wireless manner.

[0049] The operation unit 121, which affects a control target on the basis of a predetermined operation amount, is, for example, constituted with an actuator, or the like. The detection unit 122, which detects a state, or the like, of the control target, is, for example, constituted with a sensor, or the like.

[0050] Note that a hardware configuration is not limited to a configuration according to the present embodiment, and components and functions may be distributed or integrated. For example, processing may be performed in a distributed manner using a plurality of control devices 100, or a mass storage device may be further externally provided and connected to the control device 100. Alternatively, processing may be performed by forming a computer network via the Internet, or the like.

[0051] Further, processing according to the present embodiment may be implemented as so-called a hardware using a semiconductor circuit (such as an IC) such as an FPGA.

[0052] <1.2 Operation>

[0053] Operation of the control device 100 will be described next with reference to FIG. 2 to FIG. 10.

[0054] FIG. 2 is a general flowchart regarding the operation of the control device 100.

[0055] As can be clear from the drawing, if processing is started, processing of setting respective gains (that is, a P (proportional) gain, an I (integral) gain), a D (derivative) gain) to be set at a first PID controller 11 of a basic system 10 which will be described later is performed (S1).

[0056] FIG. 3 is a block diagram regarding the basic system 10. As can be clear from the drawing, the basic system 10 includes the first PID controller 11, the control mechanism 12 which is provided in a subsequent stage of the first PID controller 11 and which includes the operation unit 121 and the detection unit 122, and a data logger 13 that records an operation amount u.sub.0 output from the first PID controller and an output value y output from the detection unit 122 of the control mechanism 12. Note that while the operation is substantially the same as operation of the feedback system 200 illustrated in FIG. 11, the operation is different in that the data logger 13 records the operation amount u.sub.0 output from the first PID controller and the output value y output from the detection unit 122 of the control mechanism 12.

[0057] The user adjusts respective gains of the first PID controller 11 using a publicly known method by causing the basic system 10 to operate or performing simulation, or the like, and inputs final gains via the input unit 4, or the like, to set the gains. The input respective gains are stored in the storage unit 2.

[0058] Returning to FIG. 2, if the processing of setting gains (S1) is completed, processing of causing the basic system 10 to actually operate by utilizing the gains, that is, processing of acquiring and storing machine learning data is performed (S3).

[0059] FIG. 4 is a detailed flowchart regarding operation of the basic system 10. As can be clear from the drawing, if the processing is started, processing of initializing a predetermined integer value t corresponding to a time step (for example, at 1) is performed (S31). If initialization is completed, processing of reading out a predetermined target value r(t) and an output value y(t−1) of a previous time step (t−1), calculating deviation (r(t)−y(t−1)) and inputting the deviation to the first controller 11 is performed (S32).

[0060] If the deviation is input, the first controller 11 calculates an operation amount u(t) on the basis of the set gains (S33). This operation amount u(t) is provided to the operation unit 121 of the control mechanism 12, and thereby, predetermined control is performed on the control target. Then, a current (t) output value y(t) is detected via the detection unit 122 of the control mechanism 12 (S34).

[0061] If a series of processing described above is finished, processing of storing the output value y(t−1) of the previous time step, the operation amount u(t) and the output value y(t) of the current time (t) in the storage unit 2 via the data logger 13 is performed (S36). Then, a value of t is incremented by 1 (S38), and a series of processing (S32 to S38) is performed again.

[0062] In other words, processing of storing the output value y(t−1) of the previous time step, the operation amount u(t) and the output value y(t) of the current time in the storage unit 2 via the data logger 13 is continuously performed while the control target is controlled. By this means, a desired amount of machine learning data for generating a learned model which is to be used at a prediction processing unit 35 which will be described later is accumulated.

[0063] Returning to FIG. 2, if processing of acquiring and storing data is completed on the basis of the operation of the basic system 10 (S3), processing of performing initial learning on the basis of the obtained data is performed (S5).

[0064] FIG. 5 is a detailed flowchart regarding initial learning. In the present embodiment, the machine learning technique utilizing the tree structure described above with FIG. 12 is used as the machine learning technique.

[0065] As can be clear from the drawing, if the processing is started, processing of reading out parameter files regarding learning including a structure of a learning tree (such as the number of layers, the number of dimensions and the number of divisions) and various initial parameters from the storage unit 2 is performed. Then, processing of initializing a predetermined integer value t (for example, at 1) is performed (S52).

[0066] After this initialization, processing of reading out t-th input data, that is, the output value y(t−1) of the previous time step and the operation amount u(t) and inputting the t-th input data to the learning tree is performed (S53). Then, the input is classified in accordance with a predetermined branch condition, a plurality of nodes from a root node to a leaf node are specified, and the input is stored in association with the respective nodes (S54).

[0067] Then, processing of calculating an arithmetic average value from values including a new output value y(t) so as to update the arithmetic average value so far based on the output value y and storing the arithmetic average value in association with the nodes is performed at the respective nodes (S56).

[0068] Then, it is determined whether the value of t matches a predetermined maximum value (t_max), and in a case where the value of t is not yet the maximum value (S57: No), the value of t is incremented by 1, and the above-described learning processing (S53 to S56) is repeated again. On the other hand, in a case where the value of t becomes the predetermined maximum value (S57: Yes), the processing is finished.

[0069] In other words, as a result of this, a learned model which predicts the output value y(t) is generated on the basis of the output value y(t−1) of the previous time step and the operation amount u(t) of the current time.

[0070] Returning to FIG. 2, if the initial learning processing is completed, the extended system 30 which will be described later and which is obtained by extending the basic system 10 is then caused to operate (S7).

[0071] FIG. 6 is a detailed flowchart regarding operation of the extended system 30. As can be clear from the drawing, if the processing is started, control processing based on the extended system 30 is performed (S71).

[0072] FIG. 7 is a block diagram of the extended system 30. As can be clear from the drawing, the extended system 30 further includes a second feedback loop and a learning processing unit 34 in addition to the components of the basic system 10 including a first feedback loop. The second feedback loop includes the prediction processing unit 35 including a learned model, a second controller 37 which is provided in the subsequent stage of the prediction processing unit 35, and an invalidation processing unit 38 and a determination unit 39 which are provided in the further subsequent stage of the second controller 37.

[0073] The prediction processing unit 35 includes a learned model that generates a predicted output value y.sub.hat(t) on the basis of the output value y(t−1) of the previous time step and the first operation amount u.sub.1(t) of the current time. Further, the second controller 37 generates a second operation amount u.sub.2(t) on the basis of deviation (r(t)−y.sub.hat(t)) between the target value r(t) and the predicted output value y.sub.hat(t). The determination unit 39 performs predetermined conditional determination regarding the second operation amount u.sub.2(t) and provides a determination result to the invalidation processing unit 38. The invalidation processing unit 38 invalidates the second operation amount u.sub.2(t) (for example, sets the second operation amount u.sub.2(t) at 0) or provides the second operation amount u.sub.2(t) as is in accordance with the determination result provided from the determination unit 39.

[0074] Further, the learning processing unit 34 reads out the data stored in the storage unit 2 through the data logger 53, performs learning processing under a predetermined condition and provides the updated learned model to the prediction processing unit 35.

[0075] FIG. 8 and FIG. 9 are detailed flowcharts regarding control processing in the extended system 30.

[0076] In FIG. 8, if the processing is started, processing of initializing a flag to be used at processing which will be described later is performed (S711). Then, processing of inputting deviation (r(t)−y(t−1)) between the output value y(t−1) of the previous time step and the target value r(t) to the first controller 31 is performed (S712). The first controller 31 performs processing of calculating the first operation amount u.sub.1(t) on the basis of the input and the set gains (S713).

[0077] Then, processing of inputting the first operation amount u.sub.1(t) and the output value y(t−1) of the previous time step to the prediction processing unit 35 is performed (S714). The prediction processing unit 35 calculates a predicted output y.sub.hat(t) by inputting the first operation amount u.sub.1(t) and the output value y(t−1) of the previous time step to the learned model (S715). After this calculation, processing of inputting deviation (r(t)−y.sub.hat(t)) between the predicted output y.sub.hat(t) and the target value r(t) to the second controller 37 is performed (S716). The second controller 37 calculates the second operation amount u.sub.2(t) on the basis of the deviation between the predicted output y.sub.hat(t) and the target value r(t) (S717).

[0078] Continuous with FIG. 9, if the second operation amount u.sub.2(t) is calculated, processing of determining whether or not the second operation amount u.sub.2(t) satisfies a predetermined condition is performed by the determination unit 39 (S719).

[0079] FIG. 10 is an explanatory diagram regarding outline of the predetermined condition of the second operation amount u.sub.2(t). As can be clear from the drawing, the predetermined condition is a condition that whether or not the second operation amount u.sub.2(t) falls within a range (range indicated with R in the drawing) equal to or greater than a predetermined threshold U.sub.L and equal to or less than a predetermined threshold U.sub.H.

[0080] In a case where the second operation amount u.sub.2(t) does not fall within this range (R) (S719: No), that is, the second operation amount u.sub.2(t) is smaller than the predetermined threshold U.sub.L or greater than the predetermined threshold U.sub.H, the determination unit 39 provides a determination signal indicating that the second operation amount u.sub.2(t) does not fall within the predetermined range to the invalidation processing unit 38, and the invalidation processing unit 38 performs processing of invalidating the second operation amount u.sub.2(t) (S720). After this invalidation processing, processing of putting a flag into an ON state which means that invalidation has been performed, is performed (S721).

[0081] On the other hand, in a case where the second operation amount u.sub.2(t) falls within the above-described range (R) (S719: Yes), the determination unit 39 provides a determination signal indicating that the second operation amount u.sub.2(t) falls within the predetermined range to the invalidation processing unit 38, and the invalidation processing unit 38 provides the second operation amount u.sub.2(t) as is to a subsequent stage of output of the first controller 13 of the first feedback loop (S722).

[0082] Then, processing of adding the first operation amount u.sub.1(t) and the second operation amount u.sub.2(t) to calculate the operation amount u(t) is performed at the subsequent stage of output of the first controller 13 of the first feedback loop (S723). This operation amount u(t) is input to the operation unit 121 of a control mechanism 32, and an output value y(t) as a result is detected through the detection unit 122 (S724).

[0083] After this detection processing, processing of storing the output value y(t−1) of the previous time step, the operation amount u(t), the output value y(t) and a flag signal is performed (S725), and processing corresponding to one cycle of the control processing in the extended system 30 is finished.

[0084] Returning to FIG. 6, if the processing corresponding to one cycle of the control processing in the extended system 30 is finished, processing of determining a state of the stored flag is performed (S73). In a case where it is determined that the flag is in an OFF state (S73: No), processing in the next time step in the extended system 30 is performed again (S71). On the other hand, in a case where it is determined that the flag is in an ON state (S73: Yes), that is, in a case where processing of invalidating the second operation amount u.sub.2(t) is performed, learning processing is performed (S75).

[0085] Content of the learning processing (S75) is substantially the same as the processing illustrated in FIG. 5, and thus, description will be omitted here. After this learning processing, processing in the next time step in the extended system 30 is performed again (S71).

[0086] According to such a configuration, it is possible to perform adaptive control on the basis of data obtained during control by a machine learning technique while utilizing feedback control which is a reliable control technique which has been utilized for many years.

[0087] Further, according to such a configuration, in a case where the second operation amount u.sub.2(t) satisfies the condition determined in advance, the second operation amount u.sub.2(t) is invalidated, and only control based on the first operation amount u.sub.1(t) is performed, so that it is possible to perform reliable control. Further, data in the period is provided as machine learning data, so that improvement in control accuracy can be expected in the future.

2. MODIFIED EXAMPLE

[0088] The above-described embodiment is an illustrative embodiment, and various modifications can be made to the present invention.

[0089] While in the above-described embodiment, a PID controller is described as an example of a controller, the present invention is not limited to such a configuration. Thus, other controllers having the same type of functions may be used, or for example, control utilizing only part of gains, such as P control, PI and PD control may be utilized.

[0090] While the above-described embodiment has a configuration (online learning) in which learning processing is performed in real time in each case while the state of the flag is confirmed for each time step, the present invention is not limited to such a configuration. Thus, for example, learning (batch learning, mini-batch learning) may be performed in a batch manner after waiting until a certain amount of data to be learned is accumulated.

[0091] While the above-described embodiment has a configuration in which data relating to the previous one step is learned in a case where the flag is in an ON state (S721), the present invention is not limited to such a configuration. Thus, for example, learning (S75) may be performed by also utilizing data of one or a plurality of steps leading to the one step. Such learning can be effective particularly in a case where a learning target is continuous.

[0092] While the above-described embodiment has a configuration in which learning (S75) is performed while the flag is put into an ON state in a case where the second operation amount u.sub.2(t) deviates from the predetermined range (the region indicated with “R” in FIG. 10) because the invalidation processing (S720) is performed in a case where the second operation amount u.sub.2(t) deviates from the region (R) (S719: No). However, the present invention is not limited to such a configuration. Thus, for example, learning (S75) may be performed in a case where the second operation amount u.sub.2(t) is 0 or a value close to 0 (a range of 0±ε) (ε is a minute value) regardless of whether or not the second operation amount u.sub.2(t) falls within the predetermined range (R). Note that in this event, this minute value ε may be able to be arbitrarily set by the user.

[0093] While in the above-described embodiment, a machine learning model based on a tree structure model is utilized, the present invention is not limited to such a configuration. Thus, for example, other machine learning models such as a neural network and support vector machine may be utilized.

INDUSTRIAL APPLICABILITY

[0094] The present invention can be utilized in various industries, or the like, which utilize a control device.

REFERENCE SIGNS LIST

[0095] 1 control unit [0096] 2 storage unit [0097] 3 I/O unit [0098] 4 input unit [0099] 5 display unit [0100] 6 communication unit [0101] 10 basic system [0102] 11 first PID controller [0103] 12 control mechanism [0104] 100 control device [0105] 121 operation unit [0106] 122 detection unit [0107] 13 data logger [0108] 30 extended system [0109] 31 first controller [0110] 32 control mechanism [0111] 33 data logger [0112] 34 learning processing unit [0113] 35 prediction processing unit [0114] 37 second controller [0115] 38 invalidation processing unit [0116] 39 determination unit [0117] 200 feedback system [0118] 201 controller [0119] 202 control mechanism

CONTROL DEVICE, METHOD, PROGRAM, AND SYSTEM

Assignee

Inventors

Cpc classification

Classification Explorer

G05B13/0265

PHYSICS

Classification Explorer

G06N20/00

PHYSICS

Classification Explorer

G05B6/02

PHYSICS

Classification Explorer

G05B13/048

PHYSICS

Classification Explorer

G06N5/01

PHYSICS

International classification

Classification Explorer

G05B6/02

PHYSICS

Classification Explorer

G05B13/02

PHYSICS

Classification Explorer

G05B13/04

PHYSICS

Abstract

Claims

Description