LEARNING DEVICE, TRAFFIC EVENT PREDICTION SYSTEM, AND LEARNING METHOD
20220415054 · 2022-12-29
Assignee
Inventors
Cpc classification
G06V10/778
PHYSICS
International classification
Abstract
To provide a learning device that improves, using appropriate learning data, the accuracy of a prediction model that predicts a traffic event from a video. The learning device: detects, from a video obtained by imaging a road, an object to be detected including at least a vehicle, by a method different from that of a prediction model that predicts a traffic event on the road; generates learning data for the prediction model on the basis of the detected object and the captured video; and learns the prediction model using the generated learning data.
Claims
1. A learning device comprising: a memory; and at least one processor coupled to the memory, the at least one processor performing operations to: detect a detection target including at least a vehicle, from a video obtained by imaging a road, by a method different from a prediction model that predicts a traffic event on the road; generate learning data for the prediction model based on the detected detection target and the imaged video; and learn the prediction model using the generated learning data.
2. The learning device according to claim 1, wherein the at least one processor is further configured to select a video for detecting the detection target from the imaged video based on at least one of a prediction result using the prediction model, and weather information and a traffic situation on the road and detect the detection target from the selected video.
3. The learning device according to claim 1, wherein the at least one processor is further configured to detect the detection target from the video obtained by imaging the road by a monocular camera, based on a temporal change of the video.
4. The learning device according to claim 1, wherein the at least one processor is further configured to detect the detection target from the video obtained by imaging the road by a compound-eye camera, based on a distance between lenses in the compound-eye camera.
5. The learning device according to claim 1, wherein the at least one processor is further configured to detect the detection target from position information of the detection target calculated using light detection and ranging (LIDAR) and the video obtained by imaging the road.
6. The learning device according to claim 1, wherein the at least one processor is further configured to learn the prediction model based on the generated learning data in a case where the number of the generated learning data is equal to or more than a predetermined threshold value.
7. The learning device according to claim 1, wherein the at least one processor is further configured to update the learned prediction model in a case where an instruction to update is received.
8. A traffic event prediction system comprising: a memory; and at least one processor coupled to the memory, the at least one processor performing operations to: predict a traffic event on a road from a video obtained by imaging the road, using a prediction model; detect a detection target including at least a vehicle, from the imaged video, by a method different from the prediction model; generate learning data for the prediction model based on the detected detection target and the imaged video; and learn the prediction model using the generated learning data.
9. A learning method executed by a computer, comprising: detecting a detection target including at least a vehicle, from a video obtained by imaging a road, by a method different from a prediction model that predicts a traffic event on the road; generating learning data for the prediction model based on the detected detection target and the imaged video; and learning the prediction model using the generated learning data.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
EXAMPLE EMBODIMENT
First Example Embodiment
[0033] Hereinafter, a first example embodiment according to the present invention will be described.
Prediction Model
[0034] A prediction model used in the present example embodiment will be described.
[0035] A prediction target of the prediction model in the present example embodiment is not limited to the vehicle statistics, and may be a traffic event on a road. For example, the prediction target may be presence or absence of traffic congestion, presence or absence of illegal parking, or presence or absence of a vehicle traveling in a wrong direction on a road.
[0036] The imaging device in the present example embodiment is not limited to a visible light camera. For example, an infrared camera may be used as the imaging device.
[0037] The number of imaging devices in the present example embodiment is not limited to two of the imaging device 50 and the imaging device 60. For example, any one of the imaging device 50 and the imaging device 60 may be used, or three or more imaging devices may be used.
Object Assumed by Present Example Embodiment
[0038] In order to facilitate understanding, an object assumed by the present example embodiment will be described.
[0039] A value of the vehicle statistics for the imaging device 60 is the vehicle statistics “2” illustrated in the vehicle statistics 80 of
[0040] When a case where an annotation is performed using the prediction model is extracted, when the prediction model 70 with low accuracy is used, an appropriate case is not accurately extracted. As a result, appropriate learning data is not generated.
[0041] Therefore, an object of the first example embodiment is to improve the accuracy of the prediction model 70 by generating appropriate learning data.
Example of Functional Configuration of Learning Device 2000
[0042]
Hardware Configuration of Learning Device 2000
[0043]
[0044] The computer 1000 includes a bus 1020, a processor 1040, a memory 1060, a storage device 1080, an input/output interface 1100, and a network interface 1120. The bus 1020 is a data transmission path for the processor 1040, the memory 1060, the storage device 1080, the input/output interface 1100, and the network interface 1120 to transmit and receive data to and from each other. However, a method of connecting the processor 1040 and the like to each other is not limited to the bus connection.
[0045] The processor 1040 is various processors such as a central processing unit (CPU), a graphics processing unit (GPU), and a field-programmable gate array (FPGA). The memory 1060 is a main storage device achieved by using a random access memory (RAM) or the like. The storage device 1080 is an auxiliary storage device achieved by using a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), or the like.
[0046] The input/output interface 1100 is an interface for connecting the computer 1000 and an input/output device to each other. For example, an input device such as a keyboard and an output device such as a display device are connected to the input/output interface 1100. In addition, for example, the imaging device 50 and the imaging device 60 are connected to the input/output interface 1100. However, the imaging device 50 and the imaging device 60 are not necessarily directly connected to the computer 1000. For example, the imaging device 50 and the imaging device 60 may store the acquired data in a storage device shared with the computer 1000.
[0047] The network interface 1120 is an interface for connecting the computer 1000 to a communication network. The communication network is, for example, a local area network (LAN) or a wide area network (WAN). A method of connecting the network interface 1120 to the communication network may be wireless connection or wired connection.
[0048] The storage device 1080 stores a program module that achieves each functional configuration unit of the learning device 2000. The processor 1040 reads and executes the program modules in the memory 1060, thereby achieving functions corresponding to the program modules.
Flow of Processing
[0049]
Video Imaged by Imaging Device 2010
[0050] The video imaged by the imaging device 2010 will be described.
[0051] The imaging data and time indicate a date and time when each image is imaged.
Processing of Detection Unit 2020 Using Monocular Camera
[0052] An example of a method in which the detection unit 2020 detects the detection target in a case where the imaging device 2010 is a monocular camera will be described.
[0053]
[0054]
[0055] As illustrated in
[0056] Next, the detection unit 2020 calculates the change amount (u, v) from the acquired image (S210). For example, the detection unit 2020 compares the image with the image ID “0030” and the image with the image ID “0031” illustrated in
[0057] As a method of calculating the change amount, for example, there is template matching for each partial region in the image. As another calculation method, for example, there is a method of calculating local feature amounts such as scale-invariant feature transform (SIFT) features and comparing the feature amounts.
[0058] Next, the detection unit 2020 detects the vehicle 20 based on the calculated change amount (u, v) (S220).
[0059] A method for detecting the vehicle 20 using the change amount (u, v) will be described in detail. The detection unit 2020 calculates a depth distance D of the vehicle 20 based on the calculated change amount (u, v).
[0060] When the detection unit 2020 substitutes the Euclidean distance of the change amount (u, v) into the vehicle movement amount l.sub.t,t+1 of Equation (1), and calculates θ.sup.i.sub.t, θ.sup.j.sub.t+1 by a predetermined method (for example, a pinhole camera model), d.sup.i.sub.t and d.sup.j.sub.t+1 can be calculated. The depth distance D illustrated in
[0061] The detection unit 2020 can calculate the depth distance D as shown in Equation (2). The detection unit 2020 detects the vehicle 20 based on the depth distance D.
[Formula 2]
D=d.sub.t+1.sup.jsin θ.sub.t+1.sup.j=d.sub.t.sup.i sin θ.sub.t.sup.i (2)
Processing of Detection Unit 2020 Using Compound-Eye Camera
[0062] An example of a method in which the detection unit 2020 detects the detection target in a case where the imaging device 2010 is a compound-eye camera will be described.
[0063] In
[0064]
[0065] As illustrated in
[0066] Next, the detection unit 2020 detects the vehicle 20 based on the distance b between the lenses of the imaging devices (S310). For example, the detection unit 2020 calculates the depth distance D of the vehicle 20 from the imaging device 50 and the imaging device 60 using the principle of triangulation from the two images having the relative parallax and the distance b between the lenses, and detects the vehicle 20 based on the calculated distance.
[0067] Here, a case where the imaging device 2010 includes two or more lenses is described. However, the number of imaging devices used by the detection unit 2020 is not limited to one. For example, the detection unit 2020 may detect the vehicle based on two different imaging devices and the distance between the imaging devices.
Processing of Detection Unit 2020 Using Light Detection and Ranging (LIDAR)
[0068] An example of a method in which the detection unit 2020 detects the detection target using light detection and ranging (LIDAR) instead of the imaging device 2010 will be described.
[0069]
[0070]
[0071] In
[0072]
[0073] As illustrated in
[0074] Next, the reception unit of the LIDAR 150 receives the laser light reflected from the vehicle 20 (S410). For example, the reception unit of the LIDAR 150 receives the laser light reflected from the vehicle 20 traveling on the road 10 as a LIDAR point sequence, converts the laser light into an electrical signal, and inputs the electrical signal to the detection unit 2020.
[0075] Next, the detection unit 2020 detects the vehicle 20 based on the electrical signal input from the LIDAR 150 (S420). For example, the detection unit 2020 detects position information of a surface (front surface, side surface, rear surface) of the vehicle 20 based on the electrical signal input from the LIDAR 150.
Processing of Generation Unit 2030
[0076] Processing of the generation unit 2030 will be described.
[0077] The label assigned by the generation unit 2030 is not limited to binary (“0” and “1”). The generation unit 2030 may determine the acquired detection target and assign a multi-value label. For example, the generation unit 2030 may give labels such as “1” in a case where the acquired detection target is a pedestrian, “2” in a case where the acquired detection target is a bicycle, and “3” in a case where the acquired detection target is a truck.
[0078] As an example of a method of determining the acquired detection target, for example, there is a method of determining whether the acquired detection target satisfies a predetermined condition (for example, conditions for the height, color histogram, and area of the detection target) for each label.
Processing of Learning Unit 2040
[0079] Processing of the learning unit 2040 will be described. The learning unit 2040 learns the prediction model 70 based on the generated learning data in a case where the number of generated learning data is equal to or more than a predetermined threshold value. Examples of the learning method of the learning unit 2040 include a neural network, a linear discriminant analysis (LDA), a support vector machine (SVM), a random forest (RFs), and the like.
Action and Effect
[0080] As described above, the learning device 2000 according to the present example embodiment can generate appropriate learning data without depending on the accuracy of the prediction model by detecting the detection target by the method different from the prediction model. As a result, the learning device 2000 can improve the accuracy of the prediction model that predicts the traffic event from the video by learning the prediction model using appropriate learning data.
Second Example Embodiment
[0081] Hereinafter, a second example embodiment according to the present invention will be described. The second example embodiment is different from the first example embodiment in that a selection unit 2050 is provided. Details will be described below.
Example of Functional Configuration of Learning Device 2000
[0082]
Flow of Processing
[0083]
Selection Condition
[0084] In the second example embodiment, information stored in the condition storage unit 2012 will be described.
[0085] As illustrated in
[0086] When the indexes are the “weather information” and “traffic situation”, the selection unit 2050 selects a video based on the imaging date and time of the imaged video and the weather information and road traffic situation acquired from the outside.
[0087] When the indexes are the “weather information” and “traffic situation”, the selection unit 2050 may acquire the weather information and the road traffic situation from the acquired video and select the video.
Selection Method of Selection Unit 2050
[0088] An example of a method in which the selection unit 2050 selects the video for detecting the detection target will be described.
[0089] As illustrated in
[0090] Next, the selection unit 2050 determines whether the acquired prediction result satisfies the condition (“10 or less per hour” illustrated in
[0091] When the selection unit 2050 determines that the prediction result satisfies the condition (S620; YES), the acquired video is selected as the video for detecting the detection target (S630).
[0092] In the present example embodiment, the case where the index is the “prediction result of the prediction model” is described. However, the selection unit 2050 may combine the indices illustrated in
Action and Effect
[0093] As described above, since the learning device 2000 according to the present example embodiment selects, for example, the video with a small traffic volume and detects the detection target, a possibility of erroneously detecting a vehicle is reduced, and thus, the detection target can be detected with high accuracy. As a result, the learning device 2000 can generate appropriate learning data, and can improve the accuracy of the prediction model that predicts the traffic event from the video.
Third Example Embodiment
[0094] Hereinafter, a third example embodiment according to the present invention will be described. The third example embodiment is different from the first and second example embodiments in that an update unit 2060 is provided. Details will be described below.
Example of Functional Configuration of Learning Device 2000
[0095]
Flow of Processing
[0096]
Determination Method of Update Unit 2060
[0097] An example of a method in which the update unit 2060 performs update determination of the prediction model will be described. The update unit 2060 receives an instruction as to whether to update the learned prediction model from the user 2013. When receiving an instruction for update, the update unit 2060 updates the prediction model stored in the prediction model storage unit 2011.
[0098] For example, the update unit 2060 applies the video acquired from the imaging device 2010 to the prediction model before learning and the learned prediction model, and displays the obtained prediction result on a terminal to be used from the user 2013. The user 2013 confirms the displayed prediction result, and for example, in a case where the prediction results of the two prediction models are different, inputs an instruction as to whether to update the prediction model to the update unit 2060 via the terminal.
[0099] In the present example embodiment, the case where the update unit 2060 receives an instruction for update from the user 2013 is described. However, the update unit 2060 may determine whether to update the prediction model without receiving an instruction from the user 2013. For example, in a case where the prediction results of the two prediction models described above are different, the update unit 2060 may determine to update the prediction model.
Action and Effect
[0100] As described above, the learning device 2000 according to the present example embodiment visualizes the prediction result using the prediction model before learning and the prediction result using the prediction model after learning to the user, and receives the update instruction. The user compares the prediction results using the prediction models before and after the learning, and then, gives an instruction whether to update the prediction model before learning to the prediction model after learning. Accordingly, the learning device 2000 can improve the accuracy of the prediction model.
[0101] The learning device 2000 of the present example embodiment may further include the selection unit 2050 described in the second example embodiment.
Fourth Example Embodiment
[0102] Hereinafter, a fourth example embodiment according to the present invention will be described.
Example of Functional Configuration of Traffic Event Prediction System 3000
[0103]
[0104] In parallel with the prediction unit 3010, the detection unit 3020, the generation unit 3030, and the learning unit 3040 learn a prediction model and update a prediction model stored in a prediction model storage unit 2011. That is, the prediction unit 3010 appropriately performs prediction using the prediction model updated by the learning unit 3040.
Action and Effect
[0105] As described above, the traffic event prediction system 3000 according to the present example embodiment can accurately predict a traffic event by using a prediction model learned using appropriate learning data.
[0106] The traffic event prediction system 3000 of the present example embodiment may further include the selection unit 2050 described in the second example embodiment and the update unit 2060 described in the third example embodiment.
[0107] In the present example embodiment, the case where both the prediction unit 3010 and the detection unit 3020 use the imaging device 2010 is described. However, the prediction unit 3010 and the detection unit 3020 may use different imaging devices.
[0108] The invention of the present application is not limited to the above example embodiments, and can be embodied by modifying the components without departing from the gist thereof at the implementation stage. Various inventions can be formed by appropriately combining a plurality of components disclosed in the above example embodiments. For example, some components may be deleted from all the components shown in the example embodiments. The components of different example embodiments may be appropriately combined.
REFERENCE SIGNS LIST
[0109] 10 road
[0110] 20 vehicle
[0111] 30 vehicle
[0112] 40 vehicle
[0113] 50 imaging device
[0114] 60 imaging device
[0115] 70 prediction model
[0116] 80 vehicle statistics
[0117] 90 house
[0118] 100 vehicle statistics
[0119] 150 LIDAR
[0120] 1000 computer
[0121] 1020 bus
[0122] 1040 processor
[0123] 1060 memory
[0124] 1080 storage device
[0125] 1100 input/output interface
[0126] 1120 network interface
[0127] 2000 learning device
[0128] 2010 imaging device
[0129] 2011 prediction model storage unit
[0130] 2012 condition storage unit
[0131] 2013 user
[0132] 2020 detection unit
[0133] 2030 generation unit
[0134] 2040 learning unit
[0135] 2050 selection unit
[0136] 2060 update unit
[0137] 3000 traffic event prediction system
[0138] 3010 prediction unit
[0139] 3020 detection unit
[0140] 3030 generation unit
[0141] 3040 learning unit