INTEGRATION-ORIENTED INTELLIGENT SPEED TRAJECTORY OPTIMIZATION METHOD AND SYSTEM FOR AUTONOMOUS TRAIN
20260001582 ยท 2026-01-01
Assignee
Inventors
- Hairong Dong (Beijing, CN)
- Min Zhou (Beijing, CN)
- Haifeng Song (Beijing, CN)
- Ling Liu (Beijing, CN)
- Xiaoyong Wang (Beijing, CN)
- Xuan Liu (Beijing, CN)
Cpc classification
International classification
Abstract
The present invention relates to an integration-oriented intelligent speed trajectory optimization method and system for an autonomous train. The method includes: constructing an autonomous train speed trajectory optimization model under virtual coupling based on a discrete distance; converting the autonomous train speed trajectory optimization model into a Markov decision process; using a deep reinforcement learning algorithm TD3 to train a neural network and an agent in the Markov decision process, to obtain a trained neural network and agent; and deploying the trained neural network and agent to an autonomous train, to perform an autonomous train speed trajectory optimization decision, so that safe, efficient, and comfortable train autonomous operations can be implemented.
Claims
1. An integration-oriented intelligent speed trajectory optimization method for an autonomous train, comprising: constructing an autonomous train speed trajectory optimization model under virtual coupling based on a discrete distance; converting the autonomous train speed trajectory optimization model into a Markov decision process; using a deep reinforcement learning algorithm TD3 to train a neural network and an agent in the Markov decision process, to obtain a trained neural network and agent; and deploying the trained neural network and agent to an autonomous train, to perform an autonomous train speed trajectory optimization decision, wherein the autonomous train speed trajectory optimization model comprises a plurality of constraint conditions; a first constraint condition in the plurality of constraint conditions is as follows:
2. The method according to the claim 1, wherein a second constraint condition in the plurality of constraint conditions is as follows:
3. The method according to claim 2, wherein the autonomous train speed trajectory optimization model further comprises an objective function; the objective function is as follows:
4. The method according to claim 3, wherein the using a deep reinforcement learning algorithm TD3 to train a neural network and an agent in the Markov decision process comprises: generating a plurality of training scenarios performing reinforcement learning on the agent in the Markov decision process, wherein each training scenario in the plurality of training scenarios comprises but is not limited to a line length, a line topological structure, and line speed restriction information; a training completion judgment step: judging whether training of the neural network and the agent is completed; if the training of the neural network and the agent is not completed, randomly selecting a target training scenario from the plurality of training scenarios, resetting a reinforcement learning environment based on the target training scenario, and determining a maximum speed trajectory of the autonomous train and a planned speed trajectory of the front train; a storage execution step: selecting an action according to state information of a current environment by the agent and outputting the action to the environment, updating the state information and calculating a reward value by the environment after executing the action, storing the action, state information of a previous time step of the current environment, state information of a current time step of the current environment, and the reward value in a memory buffer as a group of storage data, selecting random target group storage data from the memory buffer, and updating a parameter of the neural network by using the target group storage data; judging whether the autonomous train reaches an end point; and if the autonomous train does not reach the end point, returning to execute the storage execution step; and if the autonomous train reaches the end point, returning to the training completion judgment step, and until the training is completed, stopping a cycle.
5. The method according to claim 4, wherein a state of the Markov decision process is as follows:
6. The method according to claim 4, wherein a reward function of the Markov decision process is as follows:
7. The method according to claim 4, wherein an environment of the Markov decision process comprises a train dynamics simulation environment and a train operation environment under the virtual coupling.
8. An integration-oriented intelligent speed trajectory optimization system for an autonomous train, comprising: a construction module, configured to construct an autonomous train speed trajectory optimization model under virtual coupling based on a discrete distance; a conversion module, configured to convert the autonomous train speed trajectory optimization model into a Markov decision process; a training module, configured to use a deep reinforcement learning algorithm TD3 to train a neural network and an agent in the Markov decision process, to obtain a trained neural network and agent; and a deployment module, configured to deploy the trained neural network and agent to an autonomous train, to perform an autonomous train speed trajectory optimization decision, wherein the autonomous train speed trajectory optimization model comprises a plurality of constraint conditions; a first constraint condition in the plurality of constraint conditions is as follows:
9. An electronic device, comprising a processor, a memory, and a computer program stored on the memory, wherein the processor executes the computer program to implement the integration-oriented intelligent speed trajectory optimization method for an autonomous train according to claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0032] To describe the technical solutions in the embodiments of the present invention more clearly, the following briefly describes the accompanying drawings required for describing the embodiments of the present invention. It should be understood that, the following accompanying drawings show merely some embodiments of the present invention, and therefore should not be regarded as a limitation on the scope. Those of ordinary skill in the art may still derive other related drawings from these accompanying drawings without creative efforts.
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0041] To better explain the present invention and facilitate understanding, the present invention is described in detail below with reference to the specific implementations and the accompanying drawings.
[0042] For problems of changeable virtual coupling scenarios, difficulty of dynamic decoupling and coupling decisions, low utilization of line resources, and the like caused by high-speed railways, intercity railways, and other railway lines with complex line structures, an embodiment of the present invention provides an integration-oriented intelligent speed trajectory optimization method and system for an autonomous train. An autonomous train speed trajectory optimization model under virtual coupling based on a discrete distance is constructed, the autonomous train speed trajectory optimization model is converted into a Markov decision process, a deep reinforcement learning algorithm TD3 is used to train a neural network and an agent in the Markov decision process, and a trained neural network and agent are deployed to an autonomous train, to perform an autonomous train speed trajectory optimization decision, so that safe, efficient, and comfortable train autonomous operations can be implemented.
[0043] To better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although the exemplary embodiments of the present invention are shown in the accompanying drawings, it should be understood that the present invention may be implemented in various forms and should not be limited by the embodiments described herein. On the contrary, these embodiments are provided for aims that the present invention can be understood more clearly and thoroughly and the scope of the present invention can be fully conveyed to a person skilled in the art.
[0044] Referring to
[0049] For step S110, the autonomous train speed trajectory optimization model includes a plurality of constraint conditions and an objective function.
[0050] It should be understood that a specific condition of the constraint condition and a specific function of the objective function both may be set according to actual requirements. The embodiment of the present invention is not limited thereto.
[0051] Optionally, the autonomous train may be regarded as a mass point, and a subjected force during operation includes an actual traction or braking force output by the autonomous train, a Davis force, and an additional resistance force. A fundamental dynamics model of the autonomous train is as shown in a formula (1) that is specifically as follows:
[0053] The force actually output by the autonomous train a at the segment s may be calculated according to a formula (2) that is specifically as follows:
resents a ratio of an actual traction force output by the autonomous train a at the segment s to a maximum traction force that can be output;
(s) represents a ratio of an actual braking force output by the autonomous train a at the segment s to a maximum braking force that can be output; the end speed of the autonomous train at the segment s is used to calculate the maximum traction force and braking force that can be output by the autonomous train at the segment s, and tf.sup.a(V.sup.a(s)) represents that the end speed of the autonomous train a at the segment s is used to calculate the maximum traction force that can be output by the autonomous train at the segment s; and bf.sup.a(v.sup.a(s)) represents that the end speed of the autonomous train a at the segment s is used to calculate the maximum braking force that can be output by the autonomous train at the segment s.
[0055] A relationship between end speeds of the autonomous train at two continuous segments is described through a formula (3) that is specifically as follows:
[0057] A change amount of the acceleration of the autonomous train at the segment s is calculated through a formula (4) that is specifically as follows:
represents the change amount of the acceleration of the autonomous train a at the segment s; and acc.sup.a(s) represents acceleration of the autonomous train a at the segment s.
[0059] Actual operation time of the autonomous train at the segment s may be calculated according to a formula (5) that is specifically as follows:
[0061] The autonomous train may generate a maximum speed trajectory according to static line information (for example, a slope gradient and a static speed restriction) and a received temporary speed restriction order, and a formula (6) limits a speed of the autonomous train at a tail end of the segment s to no more than a maximum speed, which is specifically as follows:
and
[0062] in the formula, represents the maximum speed of the autonomous train a at the tail end of the segment s.
[0063] Operation energy consumption of the autonomous train at the segment s is calculated through a formula (7) that is specifically as follows:
and [0064] in the formula, ec.sup.a(s) represents the operation energy consumption of the autonomous train a at the segment s;
(s) represents the ratio of the actual traction force output by the autonomous train a at the segment s to the maximum traction force that can be output; tf.sup.a(v.sup.a(s)) represents the maximum traction force that can be output by the autonomous train a at the segment s; and l(s) represents the length of the segment s.
[0065] A requirement of tracking a front train thereof safely by the autonomous train is ensured through a formula (8) that is specifically as follows:
and
[0066] In the formula, s represents a segment index; d(s)=.sub.s*[0, . . . , s]l(s*), where s* represents any one of segments from a segment 0 to the segment s, and l(s*) represents a length of the segment s*; ebd(v.sup.a(s)) represents an emergency braking distance of the autonomous train a at the segment s when an end speed is v(s); .sub.l represents a length of the front train; .sub.sm represents a minimum safety margin; since a speed trajectory V.sup.l of the front train is known, P.sup.l(t(s)) may be used to represent a position of the front train at a moment t(s); t(s) represents time for the autonomous train to operate to the tail end of the segment s, t(s)=.sub.s*[0, . . . , s]rt.sup.a(s*), s* represents any one of segments from the segment 0 to the segment s, and rt.sup.a(s*) represents operation time of the autonomous train at the segment s*; since the speed trajectory V.sup.l of the front train is known, V.sup.l(t(s)) may be used to represent a speed of the front train at the moment t(s), and ebd (V.sup.l(t(s))) represents an emergency braking distance when the speed is V.sup.l(t(s)); s represents the segment that is connected to and in front of the segment s; S represents the segment index set;
represents a position of an outbound switch; and
represents a position of an inbound switch.
[0067] It should be noted herein that in the virtual coupling, two front and rear trains operate according to a relative braking mode, that is, after the two front and rear trains perform emergency braking simultaneously, a position of a rear train head cannot exceed a position at a front train tail minus the safety margin .sub.sm. Since the autonomous train may be on a different route from a front train thereof when approaching an outbound switch, that is, the autonomous train and the front train operate parallelly, only when an end point of emergency braking of the autonomous train exceeds a switch junction, the autonomous train and the front train need to meet a safety rule of relative braking. Similarly, when the end point of the emergency of the autonomous train exceeds a switch junction of an inbound route to be reached, even the autonomous train and the front train still operate on the same track, the autonomous train and the front train do not need to comply with the safety rule of relative braking.
[0068] A safe operation of the autonomous train at a switch segment may be ensured through a formula (9). Influences of switches on the autonomous train in a virtual coupling operation mode are as shown in
[0069] As shown in
so that the switch can complete state rotation only at
is duration of a switch rotation action, so that earliest time for the autonomous train to reach the switch is
that is,
Similarly, earliest time
for the autonomous train to reach an inbound switch of a next station is at least
represents a segment index at which the outbound switch is located; and
represents a segment index at which the inbound switch is located.
[0070] The formula (9) is specifically as follows:
and [0071] in the formula,
represents the segment index at which the outbound switch is located;
represents time for the autonomous train a to reach the outbound switch;
represents the time for the front train to leave the outbound switch; Ads represents the duration of the switch rotation action;
represents the segment index at which the inbound switch is located;
represents time for the autonomous train a to reach the inbound switch; and
represents time for the front train to leave the inbound switch.
[0072] Optionally, the objective function of the autonomous train speed trajectory optimization model is as shown in a formula (10), which is a minimized weighted sum of total operation time, total energy consumption, and an accumulated value of an acceleration change that are of the autonomous train and is specifically as follows:
represents an accumulated value of an acceleration change of the autonomous train a at the segment s.
[0074] For step S130, the using a deep reinforcement learning algorithm TD3 to train a neural network and an agent in the Markov decision process, to obtain a trained neural network and agent may be set according to actual requirements. The embodiment of the present invention is not limited thereto.
[0075] It should be noted herein that the neural network includes an LSTM network, an Actor network, and a Critic network.
[0076] Optionally, a deep reinforcement learning framework constructed in the present invention is as shown in
[0077] For the agent and the action:
[0078] As a controller of the autonomous train, the agent selects an action a.sub.s at each segment s to control a force actually output by the train. To implement more stable train control, a value range of a.sub.s is [1,1], and a.sub.s represents a ratio of the force actually output by the train to a maximum force that can be output. If a.sub.s<0, the train outputs a braking force of a.sub.sbf.sup.a(v.sup.a(s)). If a.sub.s>0, the train outputs a traction force of a.sub.stf.sup.a(v.sup.a(s)). It should be noted herein that s represents a previous segment, and a maximum braking force and traction force that can be output by a current segment s is obtained through calculation according to an end speed of the previous segment s.
[0079] For the environment and the state:
[0080] As shown in
[0081] The state of the autonomous train is composed of nine factors, which is as shown in a formula (11):
and [0082] in the formula, .sub.s represents the state of the Markov decision process; v.sup.a(s) represents the end speed of the autonomous train a at the segment s; d(s) represents a position of a tail end of the segment s; l(s) represents the length of the segment s; V.sup.a(s) represents a maximum speed of the autonomous train a at the segment s; rrt.sup.a(s) represents remaining operation time of the autonomous train a at the tail end of the segment s to a next switch; rd.sup.a(s) represents a distance of the autonomous train a at the tail end of the segment s to the next switch; acc.sup.a(s) represents acceleration of the autonomous train a at the tail end of the segment s;
represents a maximum safety action of the autonomous train a at the segment s; and
represents a minimum safety action of the autonomous train a at the segment s.
[0083] For the reward:
[0084] A definition of the reward function is as shown in a formula (12):
and [0085] in the formula, r.sub.s represents a reward value; case1 represents that the action selected by the agent does not violate a constraint represented by the formula (8) or the formula (9); and represents a preset negative number, and a specific value of may be set according to actual requirements. For example, is a negative number with a relatively large absolute value; and case2 represents that the action selected by the agent violates the constraint represented by the formula (8) or the formula (9).
[0086] For ease of understanding the present invention, the following describes a specific process of the using a deep reinforcement learning algorithm TD3 to train a neural network and an agent in the Markov decision process, to obtain a trained neural network and agent.
[0087] Optionally, referring to
[0098] For step S140, a specific process of the deploying the trained neural network and agent to an autonomous train, to perform an autonomous train speed trajectory optimization decision may be set according to actual requirements. The embodiment of the present invention is not limited thereto.
[0099] Optionally, the trained neural network and agent are deployed to the autonomous train, and an operation scenario that requires autonomous train speed trajectory optimization is generated. Then the reinforcement learning environment is reset according to scenario data, and the maximum speed trajectory of the autonomous train and a planned speed trajectory of the front train are calculated. Then whether the autonomous train reaches the end point at the current time step is judged. If the autonomous train does not reach the end point at the current time step, the agent selects the action according to the state information of the current environment and outputs the action to the environment, and the environment updates the state information and calculates the reward value after executing the action and returns to execute the step of judging whether the autonomous train reaches the end point at the current time step until the autonomous train reaches the end point. If the autonomous train reaches the end point at the current time step, an optimized autonomous train speed trajectory and a train control sequence are output, and an output result is displayed visually.
[0100] It should be noted herein that the agent and the agent may be the same controller or may not be the same controller.
[0101] Therefore, through the above technical solutions, a speed trajectory optimization model of the autonomous train under a virtual coupling operation is constructed, and an influence of an internal structure of a station on coupling and decoupling is considered, so that safe and efficient operations and dynamic coupling and decoupling processes of the autonomous train are implemented.
[0102] The constructed model is converted into the Markov decision process, and in a large amount of operation scenarios, an LSTM-TD3 algorithm is used to train the agent used as the controller of the autonomous train to learn a control sequence generation policy, to implement a real-time decision of the autonomous train.
[0103] To enable a person skilled in the art to learn about the technical solutions of the present invention more clearly, the technology will be described below with reference to specific scenarios.
[0104] Specifically, training of the agent and applied scenarios and data are selected from true data of Beijing-Shanghai high-speed railway. After the agent is trained, four scenarios are selected to test the agent. Test scenario information is as shown in the following Table 1. A first column is a scenario serial number. A second column represents a station dwell mode of the autonomous train and the front train thereof at two front and rear stations, for example, [(0, 1), (1, 1)] represents that the front train does not stop at a first station and stops at a second station, and the autonomous train stops at both the first station and the second station. Data in a third column is the total length of a scenario. Data in a final column is temporary speed restriction information, for example, [10000, 20000]: 200 represents that a start point and an end point of a temporary speed restriction segment are 10000 m and 20000 m respectively, and a speed restriction value is 200 km/h. In a third scenario, both temporary speed restriction start point and end point are 20000 m, and a speed restriction value is 0, which represents that the scenario is an interrupt scenario. Results obtained by the trained agent in four test scenarios are as shown in
TABLE-US-00001 TABLE 1 Scenario Station serial dwell Total Temporary speed number mode length restriction 1 [(1, 1), (1, 1)] 100800 2 [(1, 1), (1, 1)] 40000 [10000, 20000]: 200 3 [(1, 1), (1, 1)] 100800 [20000, 20000]: 0 4 [(0, 1), (1, 1)] 80000
[0105] It should be understood that the above integration-oriented intelligent speed trajectory optimization method for an autonomous train is exemplary only. A person skilled in the art may make various variations according to the above method, and modified or variated content falls within the protection scope of the present invention.
[0106] Referring to
[0111] The autonomous train speed trajectory optimization model includes a plurality of constraint conditions; a first constraint condition in the plurality of constraint conditions is as follows:
represents a position of an outbound switch junction; and
represents a position of an inbound switch junction.
[0113] Since the system described in the above embodiment of the present invention is a system used to implement the method of the above embodiment of the present invention, based on the method described in the above embodiment of the present invention, a person skilled in the art can learn about a specific structure and variation of the system/apparatus. Therefore, details are not described herein again. All the systems used in the method of the embodiment of the present invention belong to the protection scope of the present invention.
[0114] In addition, as shown in
[0115] The memory 703 is configured to store a computer program.
[0116] The processor 701 implements the following steps when configured to execute the program stored on the memory 703: [0117] constructing an autonomous train speed trajectory optimization model under virtual coupling based on a discrete distance; converting the autonomous train speed trajectory optimization model into a Markov decision process; using a deep reinforcement learning algorithm TD3 to train a neural network and an agent in the Markov decision process, to obtain a trained neural network and agent; and deploying the trained neural network and agent to an autonomous train, to perform an autonomous train speed trajectory optimization decision.
[0118] The autonomous train speed trajectory optimization model includes a plurality of constraint conditions; a first constraint condition in the plurality of constraint conditions is as follows:
represents a position of an outbound switch; and
represents a position of an inbound switch.
[0120] The communication bus in the above terminal may be a peripheral component interconnect (PCI) bus, an extended industry standard architecture (EISA) bus, or the like. The communication bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used for representation in the figure, but it does not mean that there is only one bus or one type of bus.
[0121] The communication interface is configured for communication between the above terminal and another device.
[0122] The memory may include a random access memory (RAM) or a non-volatile memory, for example, at least one disk memory. Optionally, the memory may also be at least one storage device located far away from the above-mentioned processor.
[0123] The above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), and the like, or may also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or any other programmable logic device, discrete gate or transistor logic device, and discrete hardware component.
[0124] As shown in
[0125] In in another embodiment of the present invention, a computer program product including an instruction is further provided. When the computer program product is executed on a computer, the computer executes the integration-oriented intelligent speed trajectory optimization method for an autonomous train in the above embodiments.
[0126] A person skilled in the art should understand that the embodiments of the present invention may be provided as a method, a system, or a computer program product. Therefore, the present invention may be in the form of a hardware only embodiment, a software only embodiment, or an embodiment with a combination of software and hardware. Moreover, the present invention may be in the form of a computer program product that is implemented on one or more computer-usable storage media (including but not limited to a disk memory, a CD-ROM, an optical memory, and the like) that include computer-usable program code.
[0127] The present invention is described with reference to the flowcharts and/or block diagrams of the method, the device (system) and the computer program product according to the embodiments of the present invention. It should be understood that a computer program instruction may be used to implement each process and/or each block in the flowcharts and/or the block diagrams and a combination of a process and/or a block in the flowcharts and/or the block diagrams.
[0128] It should be noted that in the claims, any reference numerals located between parentheses shall not be construed as limiting the claims. The word comprise does not exclude the presence of components or steps not listed in the claims. The word a/an or one preceding a component does not exclude the presence of a plurality of such components. The present invention may be implemented by means of hardware including a plurality of different components and by means of a suitably programmed computer. In the claims enumerating a plurality of apparatuses, a plurality of apparatuses in these apparatuses may be embodied by the same hardware. The use of the words: first, second, third, and the like is for description convenience only and does not represent any order. These terms may be understood as a part of a component name.
[0129] In addition, it should be noted that in the description of the specification, descriptions of terms such as an embodiment, some embodiments, embodiment, example, specific example, some examples, or the like mean that a specific feature, structure, material, or characteristic described in conjunction with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic expressions of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific feature, structure, material, or characteristic described may be combined in any suitable manner in any one or more embodiments or examples. In addition, different embodiments or examples described in the specification and features of different embodiments or examples may be combined and integrated by a person skilled in the art without contradicting each other.
[0130] Although the preferred embodiments of the present invention have been described, a person skilled in the art can make additional changes and modifications to these embodiments after knowing the basic inventive concept. Therefore, the claims should be construed as encompassing the preferred embodiments and all the changes and modifications falling in the scope of the present invention.
[0131] Apparently, various modifications and variations to the present invention can be made by a person skilled in the art without departing from the spirit and scope of the present invention. Thereby, the present invention should also encompass all such modifications and variations within the scope of the claims of the present invention and its equivalents.