TEMPORAL DETECTOR SCAN IMAGE METHOD, SYSTEM, AND MEDIUM FOR TRAFFIC SIGNAL CONTROL

20220198925 · 2022-06-23

    Inventors

    Cpc classification

    International classification

    Abstract

    Methods, systems, and processor-readable media for generating a temporal detector scan image for traffic signal control are described. An intelligent adaptive cycle-level traffic signal controller uses a deep learning module for traffic signal control, applying image processing techniques to traffic environment data formatted as image data, called “temporal detector scan image” data. A temporal detector scan image is generated by formatting point detector data collected by point detectors (e.g. inductive-loop traffic detectors) over time into two-dimensional matrices representing the traffic environment state in a plurality of lanes over a plurality of points in time, combined with traffic signal data indicating the state of a traffic signal of each lane. The deep learning module may be trained using temporal detector scan image data collected from a traffic environment, and then may be deployed to control the traffic signal for the traffic environment once trained.

    Claims

    1. A method for generating a temporal detector scan image for traffic signal control, the method comprising: obtaining temporal traffic state data comprising: first location traffic data indicating a traffic state at a first location in each of one or more lanes of the traffic environment at each of a plurality of points in time; second location traffic data indicating a traffic state at a second location in each of the one or more lanes at each of the plurality of points in time; and traffic signal data indicating a traffic signal state of each of the one or more lanes at each of the plurality of points in time; and generating a temporal detector scan image by: processing the first location traffic data to generate a two-dimensional first location traffic matrix; processing the second location traffic data to generate a two-dimensional second location traffic matrix; and processing the traffic signal data to generate a two-dimensional traffic signal matrix.

    2. The method of claim 1, further comprising: providing the temporal detector scan image as input to a deep learning module; and processing the temporal detector scan image using the deep learning module to generate traffic signal control data.

    3. The method of claim 2, wherein: the deep learning module comprises a deep reinforcement learning module; and processing the temporal detector scan image comprises using the deep reinforcement learning module to generate traffic signal control data by applying a policy to the temporal detector scan image, the method further comprises: determining an updated state of the traffic environment following application of the traffic signal control data to the traffic signal; generating an updated temporal detector scan image based on the updated state of the traffic environment; generating a reward by applying a reward function to the temporal detector scan image and the updated temporal detector scan image; and adjusting the policy based on the reward.

    4. The method of claim 3, wherein: the deep reinforcement learning module comprises a deep Q network; and the traffic signal control data comprises a decision between: extending a current phase of a cycle of the traffic signal; and advancing to a next phase of the cycle of the traffic signal.

    5. The method of claim 3, wherein: the deep reinforcement learning module comprises a proximal policy optimization (PPO) module; and the traffic signal control data comprises a phase duration for at least one phase of a cycle of the traffic signal.

    6. The method of claim 1, further comprising, for each location of the first locations and second locations: sensing vehicle traffic at the location using a point detector; generating point detector data for the location based on the sensed vehicle traffic; and generating the traffic state data based on the point detector data for each location.

    7. The method of claim 6, wherein each point detector comprises an inductive-loop traffic detector.

    8. The method of claim 6, wherein each point detector comprises a point camera.

    9. The method of claim 1, wherein: the traffic environment comprises an intersection; and for each lane of the one or more lanes: the first location and second location in the lane are on the approach to the intersection; and the second location in the lane is closer to the intersection than the first location.

    10. The method of claim 3, further comprising, for each location of the first locations and second locations: sensing vehicle traffic at the location using a point detector; generating point detector data for the location based on the sensed vehicle traffic; and generating the traffic state data based on the point detector data for each location, wherein: the traffic environment comprises an intersection; and for each lane of the one or more lanes: the first location and second location in the lane are on the approach to the intersection; and the second location in the lane is closer to the intersection than the first location.

    11. A system for generating a temporal detector scan image for traffic signal control, comprising: a processor device; and a memory storing: machine-executable instructions thereon which, when executed by the processing device, cause the system to: obtain temporal traffic state data comprising: first location traffic data indicating a traffic state at a first location in each of one or more lanes of the traffic environment at each of a plurality of points in time; second location traffic data indicating a traffic state at a second location in each of the one or more lanes at each of the plurality of points in time; and traffic signal data indicating a traffic signal state of each of the one or more lanes at each of the plurality of points in time; and generate a temporal detector scan image by: processing the first location traffic data to generate a two-dimensional first location traffic matrix; processing the second location traffic data to generate a two-dimensional second location traffic matrix; and processing the traffic signal data to generate a two-dimensional traffic signal matrix.

    12. The system of claim 11, wherein: the memory further stores a deep learning module; and the instructions, when executed by the processing device, further cause the system to: provide the temporal detector scan image as input to the deep learning module; and process the temporal detector scan image using the deep learning module to generate traffic signal control data.

    13. The system of claim 12, wherein: the deep learning module comprises a deep reinforcement learning module; processing the temporal detector scan image comprises using the deep reinforcement learning module to generate traffic signal control data by applying a policy to the temporal detector scan image; and the instructions, when executed by the processing device, further cause the system to: determine an updated state of the traffic environment following application of the traffic signal control data to the traffic signal; generate an updated temporal detector scan image based on the updated state of the traffic environment; generate a reward by applying a reward function to the temporal detector scan image and the updated temporal detector scan image; and adjust the policy based on the reward.

    14. The system of claim 13, wherein: the deep reinforcement learning module comprises a deep Q network; and the traffic signal control data comprises a decision between: extending a current phase of a cycle of the traffic signal; and advancing to a next phase of the cycle of the traffic signal.

    15. The system of claim 13, wherein: the deep reinforcement learning module comprises a proximal policy optimization (PPO) module; and the traffic signal control data comprises a phase duration for at least one phase of a cycle of the traffic signal.

    16. The system of claim 11, wherein the instructions, when executed by the processing device, further cause the system to, for each location of the first locations and second locations: obtain point detector data for the location; and generate the traffic state data based on the point detector data for each location.

    17. The system of claim 16, further comprising, for each location of the first locations and second locations, a point detector configured to generate the point detector data based on sensed vehicle traffic at the location.

    18. The system of claim 17, wherein each point detector comprises an inductive-loop traffic detector.

    19. The system of claim 17, wherein each point detector comprises a point camera.

    20. A processor-readable medium having machine-executable instructions stored thereon which, when executed by a processor device, cause the processor device to perform the method of claim 1.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0051] Reference will now be made, by way of example, to the accompanying drawings which show example embodiments of the present application, and in which:

    [0052] FIG. 1 is a table showing eight phases of an example traffic signal cycle, showing an example operating environment for example embodiments described herein.

    [0053] FIG. 2 is a block diagram showing an example traffic environment at an intersection, including a traffic signal, in communication with a traffic signal controller in accordance with embodiments described herein.

    [0054] FIG. 3 is a block diagram of an example traffic signal controller in accordance with embodiments described herein.

    [0055] FIG. 4 is a flowchart showing steps of an example method for generating a temporal detector scan image for traffic signal control, in accordance with embodiments described herein.

    [0056] FIG. 5 is a top view of a traffic environment at an intersection, showing the locations of point detectors used to sense vehicle traffic in accordance with embodiments described herein.

    [0057] FIG. 6 is a schematic diagram of traffic location data and traffic signal data converted into a traffic temporal detector scan image, in accordance with embodiments described herein.

    [0058] FIG. 7 is a flowchart showing steps of an example method of training a deep reinforcement learning model to generate traffic signal control data in accordance with embodiments described herein.

    [0059] FIG. 8 is a block diagram of an example deep learning module of a traffic signal controller showing a traffic temporal detector scan image as input and generated traffic signal control data as output, in accordance with embodiments described herein.

    [0060] Similar reference numerals may have been used in different figures to denote similar components.

    DESCRIPTION OF EXAMPLE EMBODIMENTS

    [0061] In various examples, the present disclosure describes methods, systems, and processor-readable media for generating a temporal detector scan image for traffic signal control. An intelligent adaptive cycle-level traffic signal controller is described that uses a deep learning module for traffic signal control. The deep learning module applies image processing techniques to temporal detector scan image data.

    [0062] Various embodiment are described below with reference to the drawings. The description of the example embodiments is broken into multiple sections. The Example Controller Devices section describes example devices or systems suitable for implementing example traffic signal controllers and methods. The Example Deep Learning Modules section describes how the controller learns and updates the parameters of an inference model, such as a deep reinforcement learning model, of the deep learning module. The Example Methods for Generating a Temporal Detector Scan Image for Traffic Signal Control section describes how temporal traffic state data received from point detectors in the traffic environment can be used to generate a temporal detector scan image, which the deep learning module can process using image processing techniques. The Example Training Methods section describes how temporal detector scan images (also called temporal detector scan image data) can be used to train the deep learning module of the controller. The Examples of Traffic Signal Control Data section describes the actions space and outputs of the controller. The Examples of Traffic Environment State Data section describes the state space and inputs of the controller. The Example Reward Functions section describes the reward function of the controller. The Example Systems for Controlling Traffic Signals section describes the operation of the trained controller when it is used to control traffic signals in a real traffic environment.

    [0063] Example Controller Devices

    [0064] FIG. 2 is a block diagram showing an example traffic environment 200 at an intersection 201, including a traffic signal, in communication with an example traffic signal controller 220. The traffic signal is shown as four traffic lights: a south-facing light 202, a north-facing light 204, an east-facing light 206, and a west-facing light 208. (In all drawings showing top-down views of traffic environments, North corresponds to the top of the page.) The controller device 220 sends control signals to the four traffic lights 202, 204, 206, 208. The controller device 220 is also in communication with a network 210, through which it may communicate with one or more servers or other devices, as described in greater detail below.

    [0065] It will be appreciated that, whereas embodiments are described herein with reference to a traffic environment consisting of a single intersection managed by a single signal (e.g., a single set of traffic lights), in some embodiments the traffic environment may encompass multiple nodes or intersections within a transportation grid and may control multiple traffic signals.

    [0066] FIG. 3 is a block diagram illustrating a simplified example of a controller device 220, such as a computer or a cloud computing platform, suitable for carrying out examples described herein. Other examples suitable for implementing embodiments described in the present disclosure may be used, which may include components different from those discussed below. Although FIG. 3 shows a single instance of each component, there may be multiple instances of each component in the controller device 220.

    [0067] The controller device 220 may include one or more processor devices 225, such as a processor, a microprocessor, a digital signal processor, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a dedicated logic circuitry, a dedicated artificial intelligence processor unit, or combinations thereof. The controller device 220 may also include one or more optional input/output (I/O) interfaces 232, which may enable interfacing with one or more optional input devices 234 and/or optional output devices 236.

    [0068] In the example shown, the input device(s) 234 (e.g., a maintenance console, a keyboard, a mouse, a microphone, a touchscreen, and/or a keypad) and output device(s) 236 (e.g., a maintenance console, a display, a speaker and/or a printer) are shown as optional and external to the controller device 220. In other examples, there may not be any input device(s) 234 and output device(s) 236, in which case the I/O interface(s) 232 may not be needed.

    [0069] The controller device 220 may include one or more network interfaces 222 for wired or wireless communication with one or more devices or systems of a network, such as network 210. The network interface(s) 222 may include wired links (e.g., Ethernet cable) and/or wireless links (e.g., one or more antennas) for intra-network and/or inter-network communications. One or more of the network interfaces 222 may be used for sending control signals to the traffic signals 202, 204, 206, 208 and/or for receiving data from the point detectors (e.g., point detector data generated by inductive loop traffic detectors, or point cameras or traffic state data based on the point detector data, as described below with reference to FIGS. 5-6). In some embodiments, the traffic signals and/or sensors may communicate with the controller device, directly or indirectly, via other means (such as an I/O interface 232).

    [0070] The controller device 220 may also include one or more storage units 224, which may include a mass storage unit such as a solid state drive, a hard disk drive, a magnetic disk drive and/or an optical disk drive. The storage units 224 may be used for long-term storage of some or all of the data stored in the memory 228 described below.

    [0071] The controller device 220 may include one or more memories 228, which may include a volatile or non-volatile memory (e.g., a flash memory, a random access memory (RAM), and/or a read-only memory (ROM)). The non-transitory memory(ies) 228 may store instructions for execution by the processor device(s) 225, such as to carry out examples described in the present disclosure. The memory(ies) 228 may include software instructions 238, such as for implementing an operating system and other applications/functions. In some examples, the memory(ies) 228 may include software instructions 238 for execution by the processor device 225 to implement a deep learning module 240, as described further below. The deep learning module 240 may be loaded into the memory(ies) 228 by executing the instructions 238 using the processor device 225.

    [0072] In some embodiments, the deep learning module 240 is a deep reinforcement learning module, such as a deep Q network or a PPO module, as described below in the Example Deep Learning Modules section. The deep learning module 240 may be coded in the Python programming language using the tensorflow machine learning library and other widely used libraries, including NumPy. It will be appreciated that other embodiments may use different software libraries and/or different programming languages.

    [0073] The memor(ies) 228 may also include one or more samples of temporal traffic state data 250, which may be used as training data samples to train the deep learning module 240 and/or as input to the deep learning module 240 for generating traffic signal control data after the deep learning module 240 has been trained and the controller device 220 is deployed to control the traffic signals in a real traffic environment, as described in detail below. The temporal traffic state data 250 may include first location traffic data 252, second location traffic data 254, and traffic signal data 256, as described in detail below with reference to FIGS. 5-6. In some examples, the memory may store temporal traffic state data 250 formatted as one or more temporal detector scan images 601, as described below with reference to FIG. 6.

    [0074] In some examples, the controller device 220 may additionally or alternatively execute instructions from an external memory (e.g., an external drive in wired or wireless communication with the controller device 220) or may be provided executable instructions by a transitory or non-transitory computer-readable medium. Examples of non-transitory computer readable media include a RAM, a ROM, an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a CD-ROM, or other portable memory storage.

    [0075] The controller device 220 may also include a bus 242 providing communication among components of the controller device 220, including those components discussed above. The bus 242 may be any suitable bus architecture including, for example, a memory bus, a peripheral bus or a video bus.

    [0076] It will be appreciated that various components and operations described herein can be implemented on multiple separate devices or systems in some embodiments.

    [0077] Example Deep Learning Modules

    [0078] In some embodiments, a self-learning traffic signal controller interacts with a traffic environment and gradually finds an optimal strategy to apply to traffic signal control. The deep learning module uses deep learning algorithms to train a set of parameters or a policy of a deep learning model to perform traffic signal control. The deep learning module may use any type of deep learning algorithm, including supervised or unsupervised learning algorithms, to train any type of deep learning model, such as a convolutional neural network or other type of artificial neural network.

    [0079] In some embodiments, the deep learning module (such as deep learning module 240) is a deep reinforcement learning module. The controller (such as controller device 220) generates traffic signal control data by executing the instruction 238 of the deep learning module 240 to apply a function to traffic environment state data (such as temporal traffic state data 250), and using a learned policy of the deep learning module 240 to determine a course of action (i.e. traffic signal control actions in the form of traffic signal control data) based on the output of the function. The function is approximated using a model trained using reinforcement learning, sometimes referred to herein as a “reinforcement learning model” or “RL model”. Thus, in some embodiments, the deep learning module 240 is a deep reinforcement learning module, which uses a reinforcement learning algorithm to train a RL model. The reinforcement learning model may be an artificial neural network, such as a convolutional neural network, in some embodiments. In some embodiments, the traffic environment state data (such as temporal traffic state data 250) may be formatted as one or more two-dimensional matrices, thereby allowing the convolutional neural network or other RL model to apply known image-processing techniques to generate the traffic signal control data.

    [0080] Formally, the objective of the reinforcement learning model may be stated as follows: given the traffic demand trajectories over time d(t), t∈[0, t.sub.e]; find a control policy or control function R such that the control variables (e.g. signal phasing) u(t)=R[x(t), t], t∈[0, t.sub.e], where x(t) is the system state measurements, that minimizes the objective J subject to the system equations and the constraints.

    [0081] Reinforcement learning (RL) is a technique suitable for optimal control problems that have highly complicated dynamics. These problems may be difficult to model, difficult to control, or both. In RL, the controller can be functionally represented as an agent having no knowledge of the environment that it is working on. In early stages of training, the agent starts taking random actions, called exploration. For each action, the agent observes the changes in the environment (e.g., through sensors monitoring a real traffic environment, or through receiving simulated traffic environment from a simulator), and it also receives a numerical value called a reward, which indicates a degree of desirability of its actions. The objective of the agent is to optimize the cumulative reward over time, not the immediate reward it receives after any given action. This optimization of cumulative reward is necessary in domains such as traffic signal control, in which the actions of the agent affect the future state of the system, requiring the agent to consider the future consequences of its actions beyond their immediate impact. As training progresses, the agent starts learning about the environment and takes fewer random actions; instead, it takes actions that, based on its experience, lead to better performance of the system.

    [0082] In some embodiments, an actor-critic reinforcement learning model is used by the controller. In particular, a Proximal Policy Optimization (PPO) module, including a PPO model trained using PPO, may be used as the deep learning module 240 in some embodiments. A PPO model is a variation of a deep actor-critic RL model. Actor-critic RL models can generate continuous action values (e.g., traffic signal cycle phase durations) as output. An actor-critic RL model has two parts: an actor, which defines the policy of the agent, and a critic, which helps the actor to optimize its policy during training.

    [0083] A PPO model of a PPO module may be particularly suited for use as the RL model of the deep learning module 240 in embodiments using cycle-based traffic signal control. Some embodiments may generate traffic signal control data for controlling the duration and timing of one or more phases of a cycle of the traffic signal; other embodiments may generate traffic signal control data for controlling the duration and timing of each phase of one or more complete cycles of the traffic signal. A PPO module may thus be used in some embodiments to generate traffic signal control data comprising a phase duration for at least one phase of a cycle of the traffic signal.

    [0084] In other embodiments, a deep Q network may be used by the deep learning module 240. Deep Q networks may be particularly suited for use as the RL model of the deep learning module 240 in embodiments using second-based traffic signal control. Thus, in some embodiments a deep Q network may be used to generate traffic signal control data comprising a decision between extending a current phase of a cycle of the traffic signal, and advancing to a next phase of the cycle of the traffic signal.

    [0085] Example Methods for Generating a Temporal Detector Scan Image for Traffic Signal Control

    [0086] As described above, traffic signal control may be facilitated by the generation of a temporal detector scan image, which may be used as input to a deep learning module to generate traffic signal control data. Example methods will now be described for generating a temporal detector scan image, including optional steps for obtaining the point detector data used to generate the temporal detector scan image and optional steps for using the temporal detector scan image to train a deep reinforcement learning model of the deep learning module.

    [0087] FIG. 4 shows an example method 400 of generating a temporal detector scan image for traffic signal control. In some embodiments, the temporal detector scan image generation steps of the method 400 are performed by a controller device or system, such as the controller device 220. In other embodiments, the temporal detector scan image may be generated by another device and provided to the controller. Other steps of the method 400 may be performed by the controller or by another device or other devices, as described below.

    [0088] Steps 402 through 406 are optional. In these steps, point detectors located in a traffic environment are used to collect vehicle traffic data and transform that data into traffic data usable by the controller to generate the temporal detector scan image. Steps 402 through 406 may be performed by the controller (such as controller device 220), by hardware controllers of one or more point detectors, by a point detector network controller device, or by some combination thereof.

    [0089] FIG. 5 shows a top view of a traffic environment 500 at an intersection, showing the locations of point detectors used to sense vehicle traffic. The intersection has four approaches. Each approach can be as long as the full length of the road link all the way to an upstream intersection. Each point detector is positioned and configured to detect the presence of vehicles at a particular location along the length of one or more lanes of traffic. In some embodiments, the point detectors may be inductive loop traffic detectors, also called vehicle detection loops, configured to sense the presence of large metal vehicles using an electric current induced in a conductive loop of material laid across or embedded in a road surface. An inductive loop traffic detector may be used to detect a vehicle in a single lane, or it may be laid across several lanes to detect a vehicle in any of the lanes it traverses. In some embodiments, the point detectors may be point cameras. Each point camera operates to capture images of vehicles occupying a longitudinal location along the length of one or more traffic lanes. Machine vision techniques may be used to process the image data captured by the point cameras to recognize the presence or absence of vehicles. Some point cameras may be positioned and configured to detect the presence of vehicles in a single lane; others may be positioned and configured to detect the presence of vehicles in each of two or more lanes along a single line or stripe crossing the two or more lanes. Thus, each point detector can detect the presence or absence of vehicle traffic in one or more lanes of traffic, but this detection is limited to a single point or small area along the length of the traffic lane(s). It will be appreciated that other technologies, such as electric eyes, weight sensors, or photoreceptors may be used to achieve similar detection of vehicles at a highly localized area in a lane, or a plurality of adjacent lanes, of traffic. Some embodiments may use multiple different types of point detectors to sense vehicle traffic in different lanes or at different locations.

    [0090] Eight point detectors are shown in FIG. 5. A first set of point detectors are positioned and configured to sense vehicle traffic at a first location in each of one or more lanes of the traffic environment 500: first northbound point detector 502a senses traffic at a first location in the northbound lanes approaching the intersection, first southbound point detector 502b senses traffic at a first location in the southbound lanes approaching the intersection, first eastbound point detector 502c senses traffic at a first location in the eastbound lanes approaching the intersection, and first westbound point detector 502d senses traffic at a first location in the westbound lanes approaching the intersection. In each direction, the first location is located on the approach to the intersection and distal from the intersection. For example, in some embodiments the first location may be 50 meters from the stop bar of the intersection. In other embodiments, the first location may be a different distance from the intersection in different lanes and/or in different traffic directions.

    [0091] A second set of point detectors are positioned and configured to sense vehicle traffic at a second location in each of one or more lanes of the traffic environment 500: second northbound point detector 504a senses traffic at a second location in the northbound lanes approaching the intersection, second southbound point detector 502b senses traffic at a second location in the southbound lanes approaching the intersection, second eastbound point detector 502c senses traffic at a second location in the eastbound lanes approaching the intersection, and second westbound point detector 502d senses traffic at a second location in the westbound lanes approaching the intersection. In each direction, the second location is located on the approach to the intersection and closer to the intersection than to the first location. In some embodiments, the second location is at or near the stop bar of the intersection.

    [0092] Each of the four traffic directions (north, south, east, west) shown in FIG. 5 may include one or more road lanes configured to carry traffic in that direction. Each point detector shown in FIG. 5 may monitor one or more lanes, and in some embodiments there may be multiple individual point detectors positioned at each point detector location (i.e. each first location and each second location), e.g., one point detector to monitor each lane at each location. Thus, in one example embodiment the traffic environment 500 may include three southbound lanes to the north of the intersection, and there may be one individual point detector (e.g., an inductive loop traffic detector) located at the first location (i.e. the location of first southbound point detector 502b) in each of the three southbound lanes, for a total of three inductive-loop traffic detectors at the location of first southbound point detector 502b.

    [0093] Returning to FIG. 4, at 402, each point detector (e.g., point detectors 502a-d at each first location and point detectors 504a-d at each second location) senses vehicle traffic at its respective location. Sensing vehicle traffic may include sensing the presence of a vehicle in a single lane being monitored by a point detector, or sensing the presence of at least one vehicle in one of multiple lanes being monitored by a point detector.

    [0094] At 404, for each location of the first locations and second locations, the point detectors (e.g., point detectors 502a-d and 504a-d) generate point detector data for the location based on the sensed vehicle traffic. In some embodiments, the point detector data may be simply a binary indication of the presence or absence of a vehicle at the location at a point in time. In other embodiments, the point detector data may encode information regarding the sensed vehicle traffic over a period of time. For example, in some embodiments, the point detector data may encode the number of vehicles passing through the location over a time period, such as one second or ten seconds. The number of vehicles passing through the location may be determined in some embodiments by identifying a pattern of vehicle presence and vehicle absence corresponding to a number of vehicles passing through the location. In some embodiments, each point detector includes a point detector controller (e.g., a microcontroller or other data processing device) configured to generate the point detector data. In some embodiments, the point detector data is generated by a single point detector controller in communication with multiple point detectors. In some embodiments, the point detectors may provide raw sensor data to the traffic signal controller (e.g., to controller device 220 via the network interface 222), which generates the point detector data (e.g., using the processor device 225).

    [0095] At 406, traffic state data is generated based on the point detector data for each location. As at step 404, the traffic state data may be generated, e.g., by a point detector controller at each point detector, by a single point detector controller in communication with multiple point detectors, or by the traffic signal controller. In some embodiments, the traffic state data indicates vehicle traffic data for each location for each of a plurality of time periods. In some embodiments, the vehicle traffic data for each location for each time period is a binary value indicating the presence or absence of a vehicle at the location during the rime period. In some embodiments, the vehicle traffic data for each location for each time period is a numerical value indicating the number of vehicles passing through the location during the time period. In some embodiments, the traffic state data indicates vehicle traffic data for each location for a single time period or for a single point in time. It will be appreciated that other configurations for the vehicle traffic data are possible.

    [0096] Step 408 through 416 may be referred to as the “temporal detector scan image generation” steps, and may be performed by the traffic signal controller (e.g., controller device 220) in some embodiments.

    [0097] At 408, temporal traffic state data is obtained. The temporal traffic state data includes first location traffic data, second location traffic data, and traffic signal data. The first location traffic data indicates a traffic state at a first location in each of one or more lanes of the traffic environment at each of a plurality of points in time. The second location traffic data indicates a traffic state at a second location in each of the one or more lanes at each of the plurality of points in time. The traffic signal data indicates a traffic signal state of each of the one or more lanes at each of the plurality of points in time.

    [0098] In some embodiments, the controller device 220 performs step 408 by receiving the first location traffic data and second location traffic data from the one or more point detector controllers as described at steps 404-406 above. As described at step 406, in some embodiments the first location traffic data and second location traffic data may be received over time as traffic state data indicating traffic state at each location for a single point in time or period of time. The traffic state data for each location may be compiled by the controller device 220 into first location traffic data and second location traffic data for a plurality of points in time or periods of time. In other embodiments, the point detector controllers may compile traffic state data for multiple points in time or periods of time and transmit the compiled data to the controller device 220.

    [0099] In an example embodiment, the point detector controllers generate point detector data by sampling each point detector once per second. The point detector data for each point detector for a given sample period (i.e. one second) consists of a binary indication of whether a vehicle is present at the time the sample is obtained (e.g., 1 for the presence of a vehicle, 0 for the absence of a vehicle). The traffic state data may consist of the samples from each point detector in the traffic environment 500 for a single sample period. The point detector controller(s) transmit the traffic state data to the traffic signal controller (e.g. controller device 220) at each sample period.

    [0100] The traffic signal data may be obtained from the traffic controller itself. In some embodiments, as shown in FIG. 2, the controller device 220 is used to control the state of the traffic signal and thus has direct access to the state of the traffic signal for each lane (e.g., the state of each directional traffic light 202, 204, 206, 208).

    [0101] At step 410, a temporal detector scan image is generated based on the temporal traffic state data. Step 410 may include sub-steps 412, 414, and 416. At 412, the first location traffic data is processed to generate a two-dimensional first location traffic matrix. At 414, the second location traffic data is processed to generate a two-dimensional second location traffic matrix. At 416, the traffic signal data is processed to generate a two-dimensional traffic signal matrix. Step 410 and sub-steps 412 through 416 will be described with reference to FIG. 6.

    [0102] FIG. 6 shows an example schematic diagram of temporal traffic state data 250 converted into a temporal detector scan image 601. The temporal traffic state data 250 includes first location traffic data 252, second location traffic data 254, and traffic signal data 256. In the illustrated example, the first location traffic data 252, second location traffic data 254, and traffic signal data 256 are shown as two-dimensional matrices.

    [0103] The first location traffic data 252 is shown as a first location traffic matrix 603 consisting of data elements arranged along a Y axis 610 representing a plurality of traffic lanes monitored by the point detectors, and an X axis 612 representing time, e.g., a plurality of points in time or periods of time (e.g., a one-second period each). Each element of the first location traffic matrix 603 represents the traffic state (e.g., number of vehicles passing through during the time period) of the first location in each of the plurality of lanes at each time (e.g., point in time or period of time). Thus, the first location traffic matrix 603 may be generated based on data obtained from the point detectors at the first locations 502a-d.

    [0104] Similarly, the second location traffic data 254 is shown as a second location traffic matrix 605 consisting of data elements arranged along a Y axis 610 representing a plurality of traffic lanes monitored by the point detectors, and an X axis 612 representing time. Each element of the second location traffic matrix 605 represents the traffic state of the second location in each of the plurality of lanes at each time. Thus, the second location traffic matrix 605 may be generated based on data obtained from the point detectors at the second locations 504a-d.

    [0105] The traffic signal data 256 is shown as a traffic signal matrix 607 consisting of data elements arranged along a Y axis 610 representing a plurality of traffic lanes monitored by the point detectors, and an X axis 612 representing time. Each element of the traffic signal matrix 607 represents the traffic signal state of each of the plurality of lanes at each time. In some embodiments, the value of each element may be a first value indicating a green light traffic signal state for that lane or a second value indicating an amber or red light traffic signal state for that lane. Other embodiments may use further values to distinguish amber from red, and/or further values to distinguish advance green turn arrows from regular green lights.

    [0106] The traffic temporal detector scan image 601 is generated at step 410 by arranging, concatenating, or otherwise combining the three matrices 603, 605, 607 into a single three-channel image, wherein each element of each matrix is analogous to a pixel value of the image. The traffic temporal detector scan image 601 may be used as input to a deep learning module (e.g., deep learning module 240), which may process the traffic temporal detector scan image 601 using image processing techniques used in deep learning to generate traffic signal control data, as described in detail below in the Example Traffic Signal Control Data section.

    [0107] Whereas FIG. 6 shows the temporal traffic state data 250 already formatted as matrices 603, 605, 607, it will be appreciated that in some embodiments the temporal traffic state data 250 will have another format, and may be formatted as matrices 603, 605, 607 by sub-steps 412, 414, and 416 respectively. More generally, it will be appreciated that in some embodiments one or more of the described data entities (e.g., point detector data, traffic state data, and/or temporal traffic state data 250) may have a format equivalent to the format of a predecessor data entity (e.g., the traffic state data may be equivalent to the point detector data in some embodiments), and thus the step of generating the downstream data entity (e.g., the traffic state data) may be performed trivially.

    [0108] Returning to FIG. 4, optional steps 418 and 420 may be performed by the traffic signal controller (e.g., controller device 220) to operate a deep learning module (e.g., deep learning module 240) to generate traffic signal control data by using the temporal detector scan image 601 as input.

    [0109] At 418, the temporal detector scan image 601 is provided as input to the deep learning module 240. This step 418 may include known deep learning techniques for preprocessing image data used as input to a deep learning model. In some examples, the temporal detector scan image 601 may be used as training data to train the deep learning model of the deep learning module 240, as described in greater detail in reference to FIG. 7 below. In other examples, the temporal detector scan image 601 may be used as input to a trained deep learning module (e.g., trained using the method 700 described below with reference to FIG. 7) deployed to operate in an inference mode to control a traffic signal used by a real traffic environment.

    [0110] At 420, the temporal detector scan image 601 is processed using the deep learning module 240 to generate traffic signal control data, as described in greater detail below in the Example Traffic Signal Control Data section.

    [0111] Example Training Methods

    [0112] The deep learning module 240 used by the controller device 220 must be trained before it can be deployed for effecting control of a traffic signal in a traffic environment. In embodiments using a deep reinforcement learning module, training is carried out by supplying traffic environment data (such as temporal traffic state data 250, described in the previous section) to the deep reinforcement learning module, using the traffic signal control data generated by the deep reinforcement learning module to control the traffic signals in the traffic environment, then supplying traffic environment data representing the updated state of the traffic environment data (such as an updated version of the temporal traffic state data 250) to the deep RL model for use in adjusting the deep RL model policy and for generating future traffic signal control data.

    [0113] FIG. 7 shows an example method 700 of training a deep reinforcement learning model to generate traffic signal control data.

    [0114] At 702, a temporal detector scan image 601 is generated based on an initial state of the traffic environment 500. This step 702 may be performed by steps 408 and 410 (and optionally steps 402 through 406) of method 400 described in the previous section.

    [0115] At 704, upon receiving the temporal detector scan image 601, the RL model applies its policy to the temporal detector scan image 601 and optionally one or more past temporal detector scan images to generate traffic signal control data, as described in greater detail in the Example Traffic Signal Control Data section below.

    [0116] At 706, the traffic signal control data is applied to a real or simulated traffic signal. In the case of a real traffic environment using real traffic signals, the controller device 220 may send control signals to the traffic signal (e.g., lights 202, 204, 206, 208) to effect the decisions dictated by the traffic signal control data. In the case of a simulated traffic environment, the RL model provides the traffic signal control data to a simulator module, which simulates a response of the traffic environment to the traffic signal control decisions dictated by the traffic signal control data.

    [0117] At 708, an updated state of the real or simulated traffic environment is determined. The updated traffic state may be represented in some embodiments by updated temporal traffic state data 250 as described above with reference to FIG. 6. The updated temporal traffic state data 250 may include data elements corresponding to times (e.g., along X axis 612) that are subsequent to the point in time at which the traffic signal decision of step 706 was applied to the traffic signal of the traffic environment.

    [0118] At 710, a new temporal detector scan image 601 is generated based on the updated state of the traffic environment determined at step 708. In some embodiments, step 710 may be performed by the controller device 220 by performing steps 408 and 410 (and optionally steps 402 through 406) of method 400 described above.

    [0119] At 712, a reward function of the deep RL module is applied to the initial state of the traffic environment and the updated state of the traffic environment to generate a reward value.

    [0120] At 714, the deep RL module adjusts its policy based on the reward generated at step 712. The weights or parameters of the deep RL model may be adjusted using RL techniques, such as PPO actor-critic or DQN deep reinforcement learning techniques.

    [0121] The method 700 then returns to step 704 to repeat the step 704 of processing a temporal detector scan image 601, the temporal detector scan image 601 (generated at step 710) now indicating the updated state of the traffic environment (determined at step 708). This loop may be repeated one or more times (typically at least hundreds or thousands of times) to continue training the RL model.

    [0122] Thus, method 700 may be used to train the RL model and update the parameters of its policy, in accordance with known reinforcement learning techniques using image data as input.

    [0123] Examples of Traffic Signal Control Data

    [0124] The deep learning module 240 processes the temporal detector scan image 601 used as input to generate traffic signal control data. The traffic signal control data may be used to make decisions regarding the control (i.e. actuation) of the traffic signal. The action space used by the deep learning module 240 in generating the traffic signal control data may be a continuous action space, such as a natural number space, or a discrete action space, such as a decision between extending a traffic signal phase for one second or advancing to the next traffic signal phase.

    [0125] Some embodiments generate traffic signal control data comprising a phase duration for at least one phase of a cycle of the traffic signal. The traffic signal control data may thus be one or more phase durations of one or more respective phases of a traffic signal cycle. In some embodiments, each phase duration is a value selected from a continuous range of values. This selection of a phase duration from a continuous range of values may be enabled in some examples by the use of an actor-critic RL model, as described in detail above.

    [0126] In some embodiments, the traffic signal control data includes phase durations for each phase of at least one cycle of the traffic signal. In other embodiments, the traffic signal control data includes a phase duration for only one phase of a cycle of the traffic signal. Cycle-level control and phase-level control may present trade-offs between granularity and predictability.

    [0127] Embodiments operating at cycle-level or phase-level control of the traffic signal may have relatively low frequency interaction with the traffic signal relative to second-level controllers: a cycle-level controller may send control signals to the traffic signal once per cycle, for example at the beginning of the cycle, whereas a phase-level controller may send control signals to the traffic signal once per phase, for example at the beginning of the phase.

    [0128] In some embodiments, phase-level or cycle-level control may be constrained to a fixed sequence of phases (e.g., the eight sequential phases 102 through 116 shown in FIG. 1), but may dictate durations for the phases. In other embodiments, one or more of the phases in the sequence may be omitted, or the sequence of phases may be otherwise reordered or modified. Constraining the sequence of phases may have advantages in terms of conforming to driver expectations, at the cost of potentially sacrificing some flexibility and therefore potentially some efficiency.

    [0129] Thus, for a traffic signal having P phases per cycle (e.g., P=8 in the example of FIG. 1), the output of a deep learning module 240 using cycle-level control may be P natural numbers, each indicating the length of a traffic signal phase. A deep learning module 240 using phase-level control may generate only one natural number indicating the length of a traffic signal phase. Other embodiments may generate different numbers of phase durations.

    [0130] In some embodiments, the phase durations generated by the deep learning module 240 are selected from a different continuous range, such as positive real numbers. The use of an actor-critic RL model (such as a PPO model) may enable the generation of phase durations selected from a continuous range of values, rather than a limited number of discrete values (such as 5-second or 10-second intervals as in existing approaches).

    [0131] Other embodiments generate traffic signal control data comprising a decision between extending a current phase of a cycle of the traffic signal, and advancing to a next phase of the cycle of the traffic signal. This decision may be implemented on a per-time-period (e.g. per-second) basis. In a second-based control approach with a fixed order of phases in each cycle, the controller has to decide either to extend the current green phase or to switch to next phase, which leads to a discrete action space of size two (e.g., 0=extend, 1=switch). In some embodiments, second-based control may also include flexible ordering of phases within each cycle, as described above with reference to cycle-based or phase-based control.

    [0132] As described above, a PPO deep reinforcement learning module may be particularly suitable for cycle-based on phase-based control, whereas a DQN deep reinforcement learning module may be particularly suitable for second-based control.

    [0133] FIG. 8 shows a block diagram of an example deep learning module 240 of a traffic signal controller (e.g., controller device 220) showing a traffic temporal detector scan image 601 as input and generated traffic signal control data 804 as output. The traffic signal control data 804 may be, e.g., cycle-based, phase-based, or second-based traffic signal control data, as described above. The deep learning module 240 is shown using a policy 802 to generate the traffic signal control data 804, as described above with reference to step 704 of method 700.

    [0134] Example Reward Functions

    [0135] Different embodiments may use different reward functions. A reward function may be based on a traffic flow metric or performance metric intended to achieve certain optimal outcomes. As described above, various embodiments may use different performance metrics, such as total throughput (the number of vehicles passing through the intersection per cycle), the longest single delay for a single vehicle over one or more cycles, or any other suitable metric, to determine reward.

    [0136] Example Systems for Controlling Traffic Signals

    [0137] Once the deep learning model has been trained as described above, the controller device 220 may be deployed for use in controlling a real traffic signal in a real traffic environment. When deployed for the purpose of controlling a real traffic signal, the deep learning module 240 and other components described above operate much as described with reference to the training method 700. When deployed to control a real traffic signal, the controller device 220 may make up all or part of a system for controlling a traffic signal, and in particular a system for generating a temporal detector scan image for traffic signal control. The controller device 220 includes the components described with reference to FIG. 3, including the processor device 225 and memory 228. The deep learning module 240 stored in the memory 228 now includes a trained deep learning model, which has been trained in accordance with one or more of the techniques described above. The traffic environment used to train the reinforcement learning model is the same real traffic environment now being controlled, or a simulated version thereof. The instructions 238, when executed by the processor device 225, cause the system to carry out steps of method 700, and in particular steps 702 through 710. In some embodiments, the system continues to train the RL model during deployment by also performing steps 712 and 714.

    [0138] It will be appreciated that, in some embodiments, a system for traffic signal control may also include one or more of the other components described above, such as one or more of the point detectors 502a-d and 504a-d, one or more point detector controllers (included in, or separate from, each point detector), and/or one or more of the traffic lights 202, 204, 206, 208.

    [0139] General

    [0140] Although the present disclosure describes methods and processes with steps in a certain order, one or more steps of the methods and processes may be omitted or altered as appropriate. One or more steps may take place in an order other than that in which they are described, as appropriate.

    [0141] Although the present disclosure is described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that the present disclosure is also directed to the various components for performing at least some of the aspects and features of the described methods, be it by way of hardware components, software or any combination of the two. Accordingly, the technical solution of the present disclosure may be embodied in the form of a software product. A suitable software product may be stored in a pre-recorded storage device or other similar non-volatile or non-transitory computer readable medium, including DVDs, CD-ROMs, USB flash disk, a removable hard disk, or other storage media, for example. The software product includes instructions tangibly stored thereon that enable a processing device (e.g., a personal computer, a server, or a network device) to execute examples of the methods disclosed herein.

    [0142] The present disclosure may be embodied in other specific forms without departing from the subject matter of the claims. The described example embodiments are to be considered in all respects as being only illustrative and not restrictive. Selected features from one or more of the above-described embodiments may be combined to create alternative embodiments not explicitly described, features suitable for such combinations being understood within the scope of this disclosure.

    [0143] All values and sub-ranges within disclosed ranges are also disclosed. Also, although the systems, devices and processes disclosed and shown herein may comprise a specific number of elements/components, the systems, devices and assemblies could be modified to include additional or fewer of such elements/components. For example, although any of the elements/components disclosed may be referenced as being singular, the embodiments disclosed herein could be modified to include a plurality of such elements/components. The subject matter described herein intends to cover and embrace all suitable changes in technology.