TEMPORAL DETECTOR SCAN IMAGE METHOD, SYSTEM, AND MEDIUM FOR TRAFFIC SIGNAL CONTROL
20220198925 · 2022-06-23
Inventors
- Soheil MOHAMAD ALIZADEH SHABESTARY (Toronto, CA)
- Baher ABDULHAI (Mississauga, CA)
- Hao Hai MA (St. Kleinburg, CA)
- Scott Patrick SANNER (Scarborough, CA)
Cpc classification
G06F18/214
PHYSICS
G06N3/006
PHYSICS
G06V20/588
PHYSICS
International classification
Abstract
Methods, systems, and processor-readable media for generating a temporal detector scan image for traffic signal control are described. An intelligent adaptive cycle-level traffic signal controller uses a deep learning module for traffic signal control, applying image processing techniques to traffic environment data formatted as image data, called “temporal detector scan image” data. A temporal detector scan image is generated by formatting point detector data collected by point detectors (e.g. inductive-loop traffic detectors) over time into two-dimensional matrices representing the traffic environment state in a plurality of lanes over a plurality of points in time, combined with traffic signal data indicating the state of a traffic signal of each lane. The deep learning module may be trained using temporal detector scan image data collected from a traffic environment, and then may be deployed to control the traffic signal for the traffic environment once trained.
Claims
1. A method for generating a temporal detector scan image for traffic signal control, the method comprising: obtaining temporal traffic state data comprising: first location traffic data indicating a traffic state at a first location in each of one or more lanes of the traffic environment at each of a plurality of points in time; second location traffic data indicating a traffic state at a second location in each of the one or more lanes at each of the plurality of points in time; and traffic signal data indicating a traffic signal state of each of the one or more lanes at each of the plurality of points in time; and generating a temporal detector scan image by: processing the first location traffic data to generate a two-dimensional first location traffic matrix; processing the second location traffic data to generate a two-dimensional second location traffic matrix; and processing the traffic signal data to generate a two-dimensional traffic signal matrix.
2. The method of claim 1, further comprising: providing the temporal detector scan image as input to a deep learning module; and processing the temporal detector scan image using the deep learning module to generate traffic signal control data.
3. The method of claim 2, wherein: the deep learning module comprises a deep reinforcement learning module; and processing the temporal detector scan image comprises using the deep reinforcement learning module to generate traffic signal control data by applying a policy to the temporal detector scan image, the method further comprises: determining an updated state of the traffic environment following application of the traffic signal control data to the traffic signal; generating an updated temporal detector scan image based on the updated state of the traffic environment; generating a reward by applying a reward function to the temporal detector scan image and the updated temporal detector scan image; and adjusting the policy based on the reward.
4. The method of claim 3, wherein: the deep reinforcement learning module comprises a deep Q network; and the traffic signal control data comprises a decision between: extending a current phase of a cycle of the traffic signal; and advancing to a next phase of the cycle of the traffic signal.
5. The method of claim 3, wherein: the deep reinforcement learning module comprises a proximal policy optimization (PPO) module; and the traffic signal control data comprises a phase duration for at least one phase of a cycle of the traffic signal.
6. The method of claim 1, further comprising, for each location of the first locations and second locations: sensing vehicle traffic at the location using a point detector; generating point detector data for the location based on the sensed vehicle traffic; and generating the traffic state data based on the point detector data for each location.
7. The method of claim 6, wherein each point detector comprises an inductive-loop traffic detector.
8. The method of claim 6, wherein each point detector comprises a point camera.
9. The method of claim 1, wherein: the traffic environment comprises an intersection; and for each lane of the one or more lanes: the first location and second location in the lane are on the approach to the intersection; and the second location in the lane is closer to the intersection than the first location.
10. The method of claim 3, further comprising, for each location of the first locations and second locations: sensing vehicle traffic at the location using a point detector; generating point detector data for the location based on the sensed vehicle traffic; and generating the traffic state data based on the point detector data for each location, wherein: the traffic environment comprises an intersection; and for each lane of the one or more lanes: the first location and second location in the lane are on the approach to the intersection; and the second location in the lane is closer to the intersection than the first location.
11. A system for generating a temporal detector scan image for traffic signal control, comprising: a processor device; and a memory storing: machine-executable instructions thereon which, when executed by the processing device, cause the system to: obtain temporal traffic state data comprising: first location traffic data indicating a traffic state at a first location in each of one or more lanes of the traffic environment at each of a plurality of points in time; second location traffic data indicating a traffic state at a second location in each of the one or more lanes at each of the plurality of points in time; and traffic signal data indicating a traffic signal state of each of the one or more lanes at each of the plurality of points in time; and generate a temporal detector scan image by: processing the first location traffic data to generate a two-dimensional first location traffic matrix; processing the second location traffic data to generate a two-dimensional second location traffic matrix; and processing the traffic signal data to generate a two-dimensional traffic signal matrix.
12. The system of claim 11, wherein: the memory further stores a deep learning module; and the instructions, when executed by the processing device, further cause the system to: provide the temporal detector scan image as input to the deep learning module; and process the temporal detector scan image using the deep learning module to generate traffic signal control data.
13. The system of claim 12, wherein: the deep learning module comprises a deep reinforcement learning module; processing the temporal detector scan image comprises using the deep reinforcement learning module to generate traffic signal control data by applying a policy to the temporal detector scan image; and the instructions, when executed by the processing device, further cause the system to: determine an updated state of the traffic environment following application of the traffic signal control data to the traffic signal; generate an updated temporal detector scan image based on the updated state of the traffic environment; generate a reward by applying a reward function to the temporal detector scan image and the updated temporal detector scan image; and adjust the policy based on the reward.
14. The system of claim 13, wherein: the deep reinforcement learning module comprises a deep Q network; and the traffic signal control data comprises a decision between: extending a current phase of a cycle of the traffic signal; and advancing to a next phase of the cycle of the traffic signal.
15. The system of claim 13, wherein: the deep reinforcement learning module comprises a proximal policy optimization (PPO) module; and the traffic signal control data comprises a phase duration for at least one phase of a cycle of the traffic signal.
16. The system of claim 11, wherein the instructions, when executed by the processing device, further cause the system to, for each location of the first locations and second locations: obtain point detector data for the location; and generate the traffic state data based on the point detector data for each location.
17. The system of claim 16, further comprising, for each location of the first locations and second locations, a point detector configured to generate the point detector data based on sensed vehicle traffic at the location.
18. The system of claim 17, wherein each point detector comprises an inductive-loop traffic detector.
19. The system of claim 17, wherein each point detector comprises a point camera.
20. A processor-readable medium having machine-executable instructions stored thereon which, when executed by a processor device, cause the processor device to perform the method of claim 1.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0051] Reference will now be made, by way of example, to the accompanying drawings which show example embodiments of the present application, and in which:
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060] Similar reference numerals may have been used in different figures to denote similar components.
DESCRIPTION OF EXAMPLE EMBODIMENTS
[0061] In various examples, the present disclosure describes methods, systems, and processor-readable media for generating a temporal detector scan image for traffic signal control. An intelligent adaptive cycle-level traffic signal controller is described that uses a deep learning module for traffic signal control. The deep learning module applies image processing techniques to temporal detector scan image data.
[0062] Various embodiment are described below with reference to the drawings. The description of the example embodiments is broken into multiple sections. The Example Controller Devices section describes example devices or systems suitable for implementing example traffic signal controllers and methods. The Example Deep Learning Modules section describes how the controller learns and updates the parameters of an inference model, such as a deep reinforcement learning model, of the deep learning module. The Example Methods for Generating a Temporal Detector Scan Image for Traffic Signal Control section describes how temporal traffic state data received from point detectors in the traffic environment can be used to generate a temporal detector scan image, which the deep learning module can process using image processing techniques. The Example Training Methods section describes how temporal detector scan images (also called temporal detector scan image data) can be used to train the deep learning module of the controller. The Examples of Traffic Signal Control Data section describes the actions space and outputs of the controller. The Examples of Traffic Environment State Data section describes the state space and inputs of the controller. The Example Reward Functions section describes the reward function of the controller. The Example Systems for Controlling Traffic Signals section describes the operation of the trained controller when it is used to control traffic signals in a real traffic environment.
[0063] Example Controller Devices
[0064]
[0065] It will be appreciated that, whereas embodiments are described herein with reference to a traffic environment consisting of a single intersection managed by a single signal (e.g., a single set of traffic lights), in some embodiments the traffic environment may encompass multiple nodes or intersections within a transportation grid and may control multiple traffic signals.
[0066]
[0067] The controller device 220 may include one or more processor devices 225, such as a processor, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a dedicated logic circuitry, a dedicated artificial intelligence processor unit, or combinations thereof. The controller device 220 may also include one or more optional input/output (I/O) interfaces 232, which may enable interfacing with one or more optional input devices 234 and/or optional output devices 236.
[0068] In the example shown, the input device(s) 234 (e.g., a maintenance console, a keyboard, a mouse, a microphone, a touchscreen, and/or a keypad) and output device(s) 236 (e.g., a maintenance console, a display, a speaker and/or a printer) are shown as optional and external to the controller device 220. In other examples, there may not be any input device(s) 234 and output device(s) 236, in which case the I/O interface(s) 232 may not be needed.
[0069] The controller device 220 may include one or more network interfaces 222 for wired or wireless communication with one or more devices or systems of a network, such as network 210. The network interface(s) 222 may include wired links (e.g., Ethernet cable) and/or wireless links (e.g., one or more antennas) for intra-network and/or inter-network communications. One or more of the network interfaces 222 may be used for sending control signals to the traffic signals 202, 204, 206, 208 and/or for receiving data from the point detectors (e.g., point detector data generated by inductive loop traffic detectors, or point cameras or traffic state data based on the point detector data, as described below with reference to
[0070] The controller device 220 may also include one or more storage units 224, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. The storage units 224 may be used for long-term storage of some or all of the data stored in the memory 228 described below.
[0071] The controller device 220 may include one or more memories 228, which may include a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM), and/or a read-only memory (ROM)). The non-transitory memory(ies) 228 may store instructions for execution by the processor device(s) 225, such as to carry out examples described in the present disclosure. The memory(ies) 228 may include software instructions 238, such as for implementing an operating system and other applications/functions. In some examples, the memory(ies) 228 may include software instructions 238 for execution by the processor device 225 to implement a deep learning module 240, as described further below. The deep learning module 240 may be loaded into the memory(ies) 228 by executing the instructions 238 using the processor device 225.
[0072] In some embodiments, the deep learning module 240 is a deep reinforcement learning module, such as a deep Q network or a PPO module, as described below in the Example Deep Learning Modules section. The deep learning module 240 may be coded in the Python programming language using the tensorflow machine learning library and other widely used libraries, including NumPy. It will be appreciated that other embodiments may use different software libraries and/or different programming languages.
[0073] The memor(ies) 228 may also include one or more samples of temporal traffic state data 250, which may be used as training data samples to train the deep learning module 240 and/or as input to the deep learning module 240 for generating traffic signal control data after the deep learning module 240 has been trained and the controller device 220 is deployed to control the traffic signals in a real traffic environment, as described in detail below. The temporal traffic state data 250 may include first location traffic data 252, second location traffic data 254, and traffic signal data 256, as described in detail below with reference to
[0074] In some examples, the controller device 220 may additionally or alternatively execute instructions from an external memory (e.g., an external drive in wired or wireless communication with the controller device 220) or may be provided executable instructions by a transitory or non-transitory computer-readable medium. Examples of non-transitory computer readable media include a RAM, a ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a CD-ROM, or other portable memory storage.
[0075] The controller device 220 may also include a bus 242 providing communication among components of the controller device 220, including those components discussed above. The bus 242 may be any suitable bus architecture including, for example, a memory bus, a peripheral bus or a video bus.
[0076] It will be appreciated that various components and operations described herein can be implemented on multiple separate devices or systems in some embodiments.
[0077] Example Deep Learning Modules
[0078] In some embodiments, a self-learning traffic signal controller interacts with a traffic environment and gradually finds an optimal strategy to apply to traffic signal control. The deep learning module uses deep learning algorithms to train a set of parameters or a policy of a deep learning model to perform traffic signal control. The deep learning module may use any type of deep learning algorithm, including supervised or unsupervised learning algorithms, to train any type of deep learning model, such as a convolutional neural network or other type of artificial neural network.
[0079] In some embodiments, the deep learning module (such as deep learning module 240) is a deep reinforcement learning module. The controller (such as controller device 220) generates traffic signal control data by executing the instruction 238 of the deep learning module 240 to apply a function to traffic environment state data (such as temporal traffic state data 250), and using a learned policy of the deep learning module 240 to determine a course of action (i.e. traffic signal control actions in the form of traffic signal control data) based on the output of the function. The function is approximated using a model trained using reinforcement learning, sometimes referred to herein as a “reinforcement learning model” or “RL model”. Thus, in some embodiments, the deep learning module 240 is a deep reinforcement learning module, which uses a reinforcement learning algorithm to train a RL model. The reinforcement learning model may be an artificial neural network, such as a convolutional neural network, in some embodiments. In some embodiments, the traffic environment state data (such as temporal traffic state data 250) may be formatted as one or more two-dimensional matrices, thereby allowing the convolutional neural network or other RL model to apply known image-processing techniques to generate the traffic signal control data.
[0080] Formally, the objective of the reinforcement learning model may be stated as follows: given the traffic demand trajectories over time d(t), t∈[0, t.sub.e]; find a control policy or control function R such that the control variables (e.g. signal phasing) u(t)=R[x(t), t], t∈[0, t.sub.e], where x(t) is the system state measurements, that minimizes the objective J subject to the system equations and the constraints.
[0081] Reinforcement learning (RL) is a technique suitable for optimal control problems that have highly complicated dynamics. These problems may be difficult to model, difficult to control, or both. In RL, the controller can be functionally represented as an agent having no knowledge of the environment that it is working on. In early stages of training, the agent starts taking random actions, called exploration. For each action, the agent observes the changes in the environment (e.g., through sensors monitoring a real traffic environment, or through receiving simulated traffic environment from a simulator), and it also receives a numerical value called a reward, which indicates a degree of desirability of its actions. The objective of the agent is to optimize the cumulative reward over time, not the immediate reward it receives after any given action. This optimization of cumulative reward is necessary in domains such as traffic signal control, in which the actions of the agent affect the future state of the system, requiring the agent to consider the future consequences of its actions beyond their immediate impact. As training progresses, the agent starts learning about the environment and takes fewer random actions; instead, it takes actions that, based on its experience, lead to better performance of the system.
[0082] In some embodiments, an actor-critic reinforcement learning model is used by the controller. In particular, a Proximal Policy Optimization (PPO) module, including a PPO model trained using PPO, may be used as the deep learning module 240 in some embodiments. A PPO model is a variation of a deep actor-critic RL model. Actor-critic RL models can generate continuous action values (e.g., traffic signal cycle phase durations) as output. An actor-critic RL model has two parts: an actor, which defines the policy of the agent, and a critic, which helps the actor to optimize its policy during training.
[0083] A PPO model of a PPO module may be particularly suited for use as the RL model of the deep learning module 240 in embodiments using cycle-based traffic signal control. Some embodiments may generate traffic signal control data for controlling the duration and timing of one or more phases of a cycle of the traffic signal; other embodiments may generate traffic signal control data for controlling the duration and timing of each phase of one or more complete cycles of the traffic signal. A PPO module may thus be used in some embodiments to generate traffic signal control data comprising a phase duration for at least one phase of a cycle of the traffic signal.
[0084] In other embodiments, a deep Q network may be used by the deep learning module 240. Deep Q networks may be particularly suited for use as the RL model of the deep learning module 240 in embodiments using second-based traffic signal control. Thus, in some embodiments a deep Q network may be used to generate traffic signal control data comprising a decision between extending a current phase of a cycle of the traffic signal, and advancing to a next phase of the cycle of the traffic signal.
[0085] Example Methods for Generating a Temporal Detector Scan Image for Traffic Signal Control
[0086] As described above, traffic signal control may be facilitated by the generation of a temporal detector scan image, which may be used as input to a deep learning module to generate traffic signal control data. Example methods will now be described for generating a temporal detector scan image, including optional steps for obtaining the point detector data used to generate the temporal detector scan image and optional steps for using the temporal detector scan image to train a deep reinforcement learning model of the deep learning module.
[0087]
[0088] Steps 402 through 406 are optional. In these steps, point detectors located in a traffic environment are used to collect vehicle traffic data and transform that data into traffic data usable by the controller to generate the temporal detector scan image. Steps 402 through 406 may be performed by the controller (such as controller device 220), by hardware controllers of one or more point detectors, by a point detector network controller device, or by some combination thereof.
[0089]
[0090] Eight point detectors are shown in
[0091] A second set of point detectors are positioned and configured to sense vehicle traffic at a second location in each of one or more lanes of the traffic environment 500: second northbound point detector 504a senses traffic at a second location in the northbound lanes approaching the intersection, second southbound point detector 502b senses traffic at a second location in the southbound lanes approaching the intersection, second eastbound point detector 502c senses traffic at a second location in the eastbound lanes approaching the intersection, and second westbound point detector 502d senses traffic at a second location in the westbound lanes approaching the intersection. In each direction, the second location is located on the approach to the intersection and closer to the intersection than to the first location. In some embodiments, the second location is at or near the stop bar of the intersection.
[0092] Each of the four traffic directions (north, south, east, west) shown in
[0093] Returning to
[0094] At 404, for each location of the first locations and second locations, the point detectors (e.g., point detectors 502a-d and 504a-d) generate point detector data for the location based on the sensed vehicle traffic. In some embodiments, the point detector data may be simply a binary indication of the presence or absence of a vehicle at the location at a point in time. In other embodiments, the point detector data may encode information regarding the sensed vehicle traffic over a period of time. For example, in some embodiments, the point detector data may encode the number of vehicles passing through the location over a time period, such as one second or ten seconds. The number of vehicles passing through the location may be determined in some embodiments by identifying a pattern of vehicle presence and vehicle absence corresponding to a number of vehicles passing through the location. In some embodiments, each point detector includes a point detector controller (e.g., a microcontroller or other data processing device) configured to generate the point detector data. In some embodiments, the point detector data is generated by a single point detector controller in communication with multiple point detectors. In some embodiments, the point detectors may provide raw sensor data to the traffic signal controller (e.g., to controller device 220 via the network interface 222), which generates the point detector data (e.g., using the processor device 225).
[0095] At 406, traffic state data is generated based on the point detector data for each location. As at step 404, the traffic state data may be generated, e.g., by a point detector controller at each point detector, by a single point detector controller in communication with multiple point detectors, or by the traffic signal controller. In some embodiments, the traffic state data indicates vehicle traffic data for each location for each of a plurality of time periods. In some embodiments, the vehicle traffic data for each location for each time period is a binary value indicating the presence or absence of a vehicle at the location during the rime period. In some embodiments, the vehicle traffic data for each location for each time period is a numerical value indicating the number of vehicles passing through the location during the time period. In some embodiments, the traffic state data indicates vehicle traffic data for each location for a single time period or for a single point in time. It will be appreciated that other configurations for the vehicle traffic data are possible.
[0096] Step 408 through 416 may be referred to as the “temporal detector scan image generation” steps, and may be performed by the traffic signal controller (e.g., controller device 220) in some embodiments.
[0097] At 408, temporal traffic state data is obtained. The temporal traffic state data includes first location traffic data, second location traffic data, and traffic signal data. The first location traffic data indicates a traffic state at a first location in each of one or more lanes of the traffic environment at each of a plurality of points in time. The second location traffic data indicates a traffic state at a second location in each of the one or more lanes at each of the plurality of points in time. The traffic signal data indicates a traffic signal state of each of the one or more lanes at each of the plurality of points in time.
[0098] In some embodiments, the controller device 220 performs step 408 by receiving the first location traffic data and second location traffic data from the one or more point detector controllers as described at steps 404-406 above. As described at step 406, in some embodiments the first location traffic data and second location traffic data may be received over time as traffic state data indicating traffic state at each location for a single point in time or period of time. The traffic state data for each location may be compiled by the controller device 220 into first location traffic data and second location traffic data for a plurality of points in time or periods of time. In other embodiments, the point detector controllers may compile traffic state data for multiple points in time or periods of time and transmit the compiled data to the controller device 220.
[0099] In an example embodiment, the point detector controllers generate point detector data by sampling each point detector once per second. The point detector data for each point detector for a given sample period (i.e. one second) consists of a binary indication of whether a vehicle is present at the time the sample is obtained (e.g., 1 for the presence of a vehicle, 0 for the absence of a vehicle). The traffic state data may consist of the samples from each point detector in the traffic environment 500 for a single sample period. The point detector controller(s) transmit the traffic state data to the traffic signal controller (e.g. controller device 220) at each sample period.
[0100] The traffic signal data may be obtained from the traffic controller itself. In some embodiments, as shown in
[0101] At step 410, a temporal detector scan image is generated based on the temporal traffic state data. Step 410 may include sub-steps 412, 414, and 416. At 412, the first location traffic data is processed to generate a two-dimensional first location traffic matrix. At 414, the second location traffic data is processed to generate a two-dimensional second location traffic matrix. At 416, the traffic signal data is processed to generate a two-dimensional traffic signal matrix. Step 410 and sub-steps 412 through 416 will be described with reference to
[0102]
[0103] The first location traffic data 252 is shown as a first location traffic matrix 603 consisting of data elements arranged along a Y axis 610 representing a plurality of traffic lanes monitored by the point detectors, and an X axis 612 representing time, e.g., a plurality of points in time or periods of time (e.g., a one-second period each). Each element of the first location traffic matrix 603 represents the traffic state (e.g., number of vehicles passing through during the time period) of the first location in each of the plurality of lanes at each time (e.g., point in time or period of time). Thus, the first location traffic matrix 603 may be generated based on data obtained from the point detectors at the first locations 502a-d.
[0104] Similarly, the second location traffic data 254 is shown as a second location traffic matrix 605 consisting of data elements arranged along a Y axis 610 representing a plurality of traffic lanes monitored by the point detectors, and an X axis 612 representing time. Each element of the second location traffic matrix 605 represents the traffic state of the second location in each of the plurality of lanes at each time. Thus, the second location traffic matrix 605 may be generated based on data obtained from the point detectors at the second locations 504a-d.
[0105] The traffic signal data 256 is shown as a traffic signal matrix 607 consisting of data elements arranged along a Y axis 610 representing a plurality of traffic lanes monitored by the point detectors, and an X axis 612 representing time. Each element of the traffic signal matrix 607 represents the traffic signal state of each of the plurality of lanes at each time. In some embodiments, the value of each element may be a first value indicating a green light traffic signal state for that lane or a second value indicating an amber or red light traffic signal state for that lane. Other embodiments may use further values to distinguish amber from red, and/or further values to distinguish advance green turn arrows from regular green lights.
[0106] The traffic temporal detector scan image 601 is generated at step 410 by arranging, concatenating, or otherwise combining the three matrices 603, 605, 607 into a single three-channel image, wherein each element of each matrix is analogous to a pixel value of the image. The traffic temporal detector scan image 601 may be used as input to a deep learning module (e.g., deep learning module 240), which may process the traffic temporal detector scan image 601 using image processing techniques used in deep learning to generate traffic signal control data, as described in detail below in the Example Traffic Signal Control Data section.
[0107] Whereas
[0108] Returning to
[0109] At 418, the temporal detector scan image 601 is provided as input to the deep learning module 240. This step 418 may include known deep learning techniques for preprocessing image data used as input to a deep learning model. In some examples, the temporal detector scan image 601 may be used as training data to train the deep learning model of the deep learning module 240, as described in greater detail in reference to
[0110] At 420, the temporal detector scan image 601 is processed using the deep learning module 240 to generate traffic signal control data, as described in greater detail below in the Example Traffic Signal Control Data section.
[0111] Example Training Methods
[0112] The deep learning module 240 used by the controller device 220 must be trained before it can be deployed for effecting control of a traffic signal in a traffic environment. In embodiments using a deep reinforcement learning module, training is carried out by supplying traffic environment data (such as temporal traffic state data 250, described in the previous section) to the deep reinforcement learning module, using the traffic signal control data generated by the deep reinforcement learning module to control the traffic signals in the traffic environment, then supplying traffic environment data representing the updated state of the traffic environment data (such as an updated version of the temporal traffic state data 250) to the deep RL model for use in adjusting the deep RL model policy and for generating future traffic signal control data.
[0113]
[0114] At 702, a temporal detector scan image 601 is generated based on an initial state of the traffic environment 500. This step 702 may be performed by steps 408 and 410 (and optionally steps 402 through 406) of method 400 described in the previous section.
[0115] At 704, upon receiving the temporal detector scan image 601, the RL model applies its policy to the temporal detector scan image 601 and optionally one or more past temporal detector scan images to generate traffic signal control data, as described in greater detail in the Example Traffic Signal Control Data section below.
[0116] At 706, the traffic signal control data is applied to a real or simulated traffic signal. In the case of a real traffic environment using real traffic signals, the controller device 220 may send control signals to the traffic signal (e.g., lights 202, 204, 206, 208) to effect the decisions dictated by the traffic signal control data. In the case of a simulated traffic environment, the RL model provides the traffic signal control data to a simulator module, which simulates a response of the traffic environment to the traffic signal control decisions dictated by the traffic signal control data.
[0117] At 708, an updated state of the real or simulated traffic environment is determined. The updated traffic state may be represented in some embodiments by updated temporal traffic state data 250 as described above with reference to
[0118] At 710, a new temporal detector scan image 601 is generated based on the updated state of the traffic environment determined at step 708. In some embodiments, step 710 may be performed by the controller device 220 by performing steps 408 and 410 (and optionally steps 402 through 406) of method 400 described above.
[0119] At 712, a reward function of the deep RL module is applied to the initial state of the traffic environment and the updated state of the traffic environment to generate a reward value.
[0120] At 714, the deep RL module adjusts its policy based on the reward generated at step 712. The weights or parameters of the deep RL model may be adjusted using RL techniques, such as PPO actor-critic or DQN deep reinforcement learning techniques.
[0121] The method 700 then returns to step 704 to repeat the step 704 of processing a temporal detector scan image 601, the temporal detector scan image 601 (generated at step 710) now indicating the updated state of the traffic environment (determined at step 708). This loop may be repeated one or more times (typically at least hundreds or thousands of times) to continue training the RL model.
[0122] Thus, method 700 may be used to train the RL model and update the parameters of its policy, in accordance with known reinforcement learning techniques using image data as input.
[0123] Examples of Traffic Signal Control Data
[0124] The deep learning module 240 processes the temporal detector scan image 601 used as input to generate traffic signal control data. The traffic signal control data may be used to make decisions regarding the control (i.e. actuation) of the traffic signal. The action space used by the deep learning module 240 in generating the traffic signal control data may be a continuous action space, such as a natural number space, or a discrete action space, such as a decision between extending a traffic signal phase for one second or advancing to the next traffic signal phase.
[0125] Some embodiments generate traffic signal control data comprising a phase duration for at least one phase of a cycle of the traffic signal. The traffic signal control data may thus be one or more phase durations of one or more respective phases of a traffic signal cycle. In some embodiments, each phase duration is a value selected from a continuous range of values. This selection of a phase duration from a continuous range of values may be enabled in some examples by the use of an actor-critic RL model, as described in detail above.
[0126] In some embodiments, the traffic signal control data includes phase durations for each phase of at least one cycle of the traffic signal. In other embodiments, the traffic signal control data includes a phase duration for only one phase of a cycle of the traffic signal. Cycle-level control and phase-level control may present trade-offs between granularity and predictability.
[0127] Embodiments operating at cycle-level or phase-level control of the traffic signal may have relatively low frequency interaction with the traffic signal relative to second-level controllers: a cycle-level controller may send control signals to the traffic signal once per cycle, for example at the beginning of the cycle, whereas a phase-level controller may send control signals to the traffic signal once per phase, for example at the beginning of the phase.
[0128] In some embodiments, phase-level or cycle-level control may be constrained to a fixed sequence of phases (e.g., the eight sequential phases 102 through 116 shown in
[0129] Thus, for a traffic signal having P phases per cycle (e.g., P=8 in the example of
[0130] In some embodiments, the phase durations generated by the deep learning module 240 are selected from a different continuous range, such as positive real numbers. The use of an actor-critic RL model (such as a PPO model) may enable the generation of phase durations selected from a continuous range of values, rather than a limited number of discrete values (such as 5-second or 10-second intervals as in existing approaches).
[0131] Other embodiments generate traffic signal control data comprising a decision between extending a current phase of a cycle of the traffic signal, and advancing to a next phase of the cycle of the traffic signal. This decision may be implemented on a per-time-period (e.g. per-second) basis. In a second-based control approach with a fixed order of phases in each cycle, the controller has to decide either to extend the current green phase or to switch to next phase, which leads to a discrete action space of size two (e.g., 0=extend, 1=switch). In some embodiments, second-based control may also include flexible ordering of phases within each cycle, as described above with reference to cycle-based or phase-based control.
[0132] As described above, a PPO deep reinforcement learning module may be particularly suitable for cycle-based on phase-based control, whereas a DQN deep reinforcement learning module may be particularly suitable for second-based control.
[0133]
[0134] Example Reward Functions
[0135] Different embodiments may use different reward functions. A reward function may be based on a traffic flow metric or performance metric intended to achieve certain optimal outcomes. As described above, various embodiments may use different performance metrics, such as total throughput (the number of vehicles passing through the intersection per cycle), the longest single delay for a single vehicle over one or more cycles, or any other suitable metric, to determine reward.
[0136] Example Systems for Controlling Traffic Signals
[0137] Once the deep learning model has been trained as described above, the controller device 220 may be deployed for use in controlling a real traffic signal in a real traffic environment. When deployed for the purpose of controlling a real traffic signal, the deep learning module 240 and other components described above operate much as described with reference to the training method 700. When deployed to control a real traffic signal, the controller device 220 may make up all or part of a system for controlling a traffic signal, and in particular a system for generating a temporal detector scan image for traffic signal control. The controller device 220 includes the components described with reference to
[0138] It will be appreciated that, in some embodiments, a system for traffic signal control may also include one or more of the other components described above, such as one or more of the point detectors 502a-d and 504a-d, one or more point detector controllers (included in, or separate from, each point detector), and/or one or more of the traffic lights 202, 204, 206, 208.
[0139] General
[0140] Although the present disclosure describes methods and processes with steps in a certain order, one or more steps of the methods and processes may be omitted or altered as appropriate. One or more steps may take place in an order other than that in which they are described, as appropriate.
[0141] Although the present disclosure is described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that the present disclosure is also directed to the various components for performing at least some of the aspects and features of the described methods, be it by way of hardware components, software or any combination of the two. Accordingly, the technical solution of the present disclosure may be embodied in the form of a software product. A suitable software product may be stored in a pre-recorded storage device or other similar non-volatile or non-transitory computer readable medium, including DVDs, CD-ROMs, USB flash disk, a removable hard disk, or other storage media, for example. The software product includes instructions tangibly stored thereon that enable a processing device (e.g., a personal computer, a server, or a network device) to execute examples of the methods disclosed herein.
[0142] The present disclosure may be embodied in other specific forms without departing from the subject matter of the claims. The described example embodiments are to be considered in all respects as being only illustrative and not restrictive. Selected features from one or more of the above-described embodiments may be combined to create alternative embodiments not explicitly described, features suitable for such combinations being understood within the scope of this disclosure.
[0143] All values and sub-ranges within disclosed ranges are also disclosed. Also, although the systems, devices and processes disclosed and shown herein may comprise a specific number of elements/components, the systems, devices and assemblies could be modified to include additional or fewer of such elements/components. For example, although any of the elements/components disclosed may be referenced as being singular, the embodiments disclosed herein could be modified to include a plurality of such elements/components. The subject matter described herein intends to cover and embrace all suitable changes in technology.