TRAFFIC HAND SIGNAL DETECTION SYSTEM AND METHOD THEREOF
20230169797 · 2023-06-01
Inventors
Cpc classification
G06V20/56
PHYSICS
International classification
Abstract
A traffic hand signal detection system includes: an imaging unit configured to acquire a photographed image from a camera photographing a predetermined range; an image classifier configured to classify an arm motion from the photographed image provided from the imaging unit by imparting a class; a detection module configured to detect the arm motion from the photographed image classified by the image classifier and generate a traffic hand signal sequence converted into a number; and an analysis module configured to receive the traffic hand signal sequence converted into the number from the detection module and determine a type of a traffic hand signal.
Claims
1. A traffic hand signal detection system comprising: an imaging unit configured to acquire a photographed image from a camera photographing a predetermined range; an image classifier configured to classify an arm motion from the photographed image provided from the imaging unit by imparting a class; a detection module configured to detect the arm motion from the photographed image classified by the image classifier and generate a traffic hand signal sequence converted into a number; and an analysis module configured to receive the traffic hand signal sequence converted into the number from the detection module and determine a type of a traffic hand signal.
2. The traffic hand signal detection system of claim 1, wherein the detection module comprises: an arm motion detector configured to detect an arm motion performed by a hand signal performer in a photographed image sequence classified by the image classifier; and an arm motion number converter configured to convert the class of the detected arm motion image into the number.
3. The traffic hand signal detection system of claim 2, wherein the detection module further comprises a traffic hand signal sequence adjustor configured to generate the traffic hand signal sequence as a combination of numbers representing a class of an arm direction image.
4. The traffic hand signal detection system of claim 3, wherein the traffic hand signal sequence adjustor randomly generates the class of the arm direction image within a predetermined range and adds it to or deletes it from the traffic hand signal sequence.
5. The traffic hand signal detection system of claim 1, wherein the analysis module comprises: a traffic hand signal learning unit configured to learn the traffic hand signal sequence converted into the number; and a traffic hand signal determinator configured to receive the traffic hand signal sequence and determine the type of the traffic hand signal.
6. The traffic hand signal detection system of claim 5, wherein the traffic signal learning unit is one of Vanilla RNN, LSTM, GRU and LSTM GRU.
7. The traffic hand signal detection system of claim 5, wherein when a probability value of the traffic hand signal is 0.4 or more, the traffic hand signal determinator determines it as a corresponding traffic hand signal.
8. The traffic hand signal detection system of claim 5, wherein the traffic hand signal determinator comprises a fully connected layer and a softmax.
9. The traffic hand signal detection system of claim 1, wherein the analysis module further comprises a database configured to store information about the photographed traffic hand signal sequence image, the arm direction image extracted from the photographed image, and the traffic hand signal sequence expressed in the number.
10. A traffic hand signal detection method comprising: an image acquiring step of acquiring a photographed image from a camera photographing a predetermined range; an image classifying step of classifying types of arm motions from the photographed image acquired in the image acquiring step; a detecting step of detecting an arm direction motion from the classified image and generating a traffic hand signal sequence converted into a number; and an analysis step of determining a type of a traffic hand signal corresponding to the traffic hand signal sequence converted into the number based on information detected in the detecting step.
11. The traffic hand signal detection metho of claim 10, wherein the detecting step further comprises adding or deleting an arbitrary traffic hand signal sequence to the converted traffic hand signal sequence.
12. The traffic hand signal detection metho of claim 10, wherein the detecting step comprises learning the traffic hand signal sequence converted into the number.
13. The traffic hand signal detection metho of claim 10, wherein in the analysis step, when using the traffic hand signal sequence as an input, if a signal is substantially equal to or more than a threshold value, the signal is determined as the traffic hand signal.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] The above and other features and aspects of the present disclosure of disclosure will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
DETAILED DESCRIPTION
[0033] Hereinafter, a vehicle hand signal detection system and a vehicle hand signal detection method according to an embodiment of the present disclosure will be described in detail with reference to the accompanying drawings. Since the present disclosure may have various changes and may have various forms, specific embodiments are illustrated in the drawings and described in detail in the description. However, this is not intended to limit the present disclosure to the specific disclosed form, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present disclosure. In describing each figure, like reference numerals have been used for like elements. In the accompanying drawings, the dimensions of the structures are enlarged than the actual size for clarity of the present disclosure.
[0034] Terms such as first, second, and the like may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present disclosure, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component.
[0035] The following embodiments are detailed descriptions to help the understanding of the present disclosure, and do not limit the scope of the present disclosure. Accordingly, an invention of the same scope performing the same function as the present disclosure will also fall within the scope of the present disclosure.
[0036] The terms used in the present disclosure are only used to describe specific embodiments, and are not intended to limit the present disclosure. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present disclosure, terms such as “include (comprise)” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification is present, but it is to be understood that it does not preclude the possibility of the presence or addition of one or more other feature, number, step, operation, component, part, or combination thereof.
[0037] Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present disclosure.
[0038] In general, a traffic hand signal means a signal performed by a hand signal performer to control an amount of traffic on the road and is specified by law. According to the curriculum for hand signals for out-of-office police officers in the Central Police Academy, there are 10 hand signals used by the police for traffic control. However, referring to
[0039] A traffic hand signal motion consists of a combination of 2 to 4 motions. All signals, except for a stop signal simply raising the arm diagonally in the corresponding direction, are combined into motions that points to a target of the signal with the arm and then move and point the arm in a next moving direction.
[0040]
[0041]
[0042] That is, regardless of the direction the police are looking, the traffic hand signal is completed by changing the direction the arm is pointing. In addition, based on the direction of the police's arm pointing to the driver as a signal, it may be determined whether the hand signal is for the driver or not. If a preceding motion points to the driver, it may be appreciated that the signal is valid for the driver. Otherwise, it is a signal for another driver in another direction, so the driver does not need to take any action. Accordingly, the traffic signal may be inferred from the direction of the police's arm and its sequence.
[0043]
[0044] Referring to
[0045] The imaging unit 200 is connected to the camera 15 to acquire a photographed image, that is, a photographed image of the front of the vehicle, from the camera 15. In such a case, the imaging unit 200 is preferably connected to the camera 15 through a wired or wireless communication network.
[0046] The imaging unit 200 photographs an image from before execution of a hand signal to after the execution of the hand signal, and photographs as one hand signal sequence.
[0047] Referring to
[0048] The image classifier 300 selects only an image taken from the front from among all the photographed images and annotates it as a front image. In addition, the image classifier 300 converts the traffic hand signal sequence into, for example, 15 images per second, and then performs a bounding box annotation on the arm motion based on the class of
[0049] The image classifier 300 may classify an entire image into a training dataset and a test dataset at a ratio of 8:2 in order to train the detection module 400. In addition, the image classifier 300 distributes the training dataset into training data and validation data at a ratio of 8:2. For example, the number of classified images is shown in Table 1.
TABLE-US-00001 TABLE 1 Signal (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) Total Training 5,470 3,700 3,764 3,681 5,768 3,952 4,322 4,148 4,369 3,819 4,340 3,960 4,193 3,984 3,969 63,439 Valid 1,367 924 940 920 1,442 988 1,080 1,037 1,092 954 1,084 990 1,048 995 992 15,853 Test 1,689 1,189 1,191 1,174 1,768 1,194 1,359 1,289 1,389 1,193 1,386 1,230 1,286 1,265 1,221 19,823
[0050] Table 2 shows an example of the number of classes of arm motions that the image classifier 300 classifies into training datasets and test datasets. Pedestrians who are not traffic signal performers have the largest number. Next was the number of down, which is a basic movement of lowering an arm. The number of left and right arm motions that stop the drivers left and right sides, not the driver, was the smallest, and the up arm motion, which appeared only in the signal indicating going straight, was the next smallest.
TABLE-US-00002 TABLE 1 Signal down front back left right up oblique left and right Pedestrian Training 40,123 7,100 8,227 7,299 6,090 3,448 4,324 2,764 104,683 Test 10,056 1,755 2,012 1,840 1,544 876 1,055 706 26,351
[0051] The detection module 400 detects the arm motion of the traffic hand signal from the classified photographed image, generates a traffic hand signal sequence obtained by converting the arm motion into a number, learns the traffic hand signal, and determines a type of the traffic hand signal. Referring to
[0052] One traffic hand signal sequence may consist of about 100 images or so for each traffic hand signal motion frame from before a start of the hand signal to after an end of the hand signal.
[0053] The arm motion detector 410 detects the arm motion performed by the hand signal performer in the photographed image sequence classified by the image classifier 300 in order to classify the hand signal motion.
[0054] The arm motion detector 410 may be one of yolo v3 and yolo v4.
[0055] Here, yolo v4 improves the performance of yolo v3 by using the latest deep learning techniques such as BoF (Bag of Freebies) and BoS (Bag of Specials). Furthermore, yolo v4 uses Mosaic Data Augmentation which augments data by putting multiple classes in one image, and DropBlock Regularization which does not randomly out when executing DropOut and outs a predetermined range. In addition, yolo v4 prevents an overfitting problem by changing labels, which were previously expressed as 1 and 0, to a probability such as 0.1 and 0.9 using class label smoothing.
[0056] The arm motion number converter 420 converts each class in the detected arm motion image into numbers, for example, from 0 to 8 in order. That is, the arm motion number converter 420 assigns a number to the extracted arm direction image for each class. Referring to Table 3, the traffic hand signal may indicate go straight, turn right, turn left, stop, and an invalid signal with a combination of numbers of the arm direction.
TABLE-US-00003 TABLE 3 Signal Traffic hand signal Arm motion Go straight (a) front to back front(1) - up(5) - back(2) (g) right to left (k) left to right Turn right (b) front to left front(1) - right(4) (f) right to front (l) left to back (m) back to right Turn left (c) Front to right front(1) - left(3) (h) right to back (j) left to front (n) back to left Stop (d) front stop Oblique(6) (o) back stop (f) front and back simultaneous stop Invalid signal (i) right and left left and right(7) simultaneous stop
[0057] The traffic hand signal sequence adjustor 430 generates six types of traffic hand signal sequences that are meaningful to the driver by combining numbers representing the classes of images indicating the arm direction. In addition, the traffic hand signal sequence adjustor 430 may randomly generate a class of an arm direction image within a predetermined range and add it to the traffic hand signal sequence. An arm direction label, which is detected by the arm motion detector 410 and assigned with a class, is cut to a predetermined length and is not transmitted when it is transmitted to the RNN, which is the traffic hand signal learning unit 510. Labels are delivered for each frame in a continuous stream. Accordingly, if a traffic signal action is taken after no action, there is a possibility that it may be classified as a completely different hand signal before the hand signal is even started. In order to prevent this, the traffic hand signal sequence adjustor 430 may generate a sequence consisting of only zeros with a random length. Additionally, the traffic hand signal sequence adjustor may generate a sequence in which a number other than a stop signal and a signal indicating the driver is appended quite shortly after a sequence of zeros.
[0058] A length of the generated sequence is from a minimum of 6 to a maximum of 91. An average length is 33.8 and a standard deviation is 9.4. Considering that the dataset is obtained at 15 frames/sec, it may be inferred that most of the hand signals occur between about 2 and 4 seconds.
[0059] The traffic hand signal sequence adjustor 430 randomly mixes the number-converted traffic hand signal sequence into training, validation, and test sets at a ratio of 6:2:2. In addition, the traffic hand signal sequence adjustor 430 adjusts a length of the traffic hand signal sequence. That is, when the traffic hand signal sequence is shorter than an input length, the traffic hand signal sequence adjustor 430 truncates the traffic hand signal sequence from the rear. On the other hand, when the sequence length is longer than the input length, zero padding is performed to fill the front part of the sequence with zeros.
[0060] Referring to
[0061] The traffic hand signal learning unit 510 according to an embodiment of the present disclosure learns a pedestrian class and eight traffic hand signal classes according to the arm direction expressed in numbers. A hyper parameter has 100,000 iterations, 0.5 IoU, and 64 batch size. The traffic hand signal learning unit 510 performs learning with the training set and validity set of Tables 2 and 3. 100,000 iterations is about 80 epochs. As a result of learning, a final mAP was 91.3%.
[0062] The traffic hand signal learning unit 510 includes RNN for a skeleton-based action recognition to analyze motion. In an embodiment, the traffic hand signal learning unit 510 may be one of Vanilla RNN, LSTM, GRU, and LSTM GRU.
[0063] RNN is a kind of deep learning model in which hidden nodes are connected with directivity to form a directed cycle. Vanilla RNN is a basic RNN model.
[0064] LSTM is a type of RNN and has more complex cells than Vanilla RNN. LSTM works well for longer input data compared to V-RNN. The most significant characteristic of LSTM is a cell state. LSTM is different from RNN which has one-layer repeating module, and LSTM has the function of determining memorizing and forgetting of information by adding three sigmoid layers. Bi LSTM refers to a network in which an LSTM layer that processes in a reverse direction is added to the existing LSTM in order to solve a data bottleneck that occurs in the LSTM. Since the reverse layer is added, end-to-end learning which may view the entire data input in chronological order is possible.
[0065] LSTM GRU is a neural network that serves the same role as LSTM with a sigmoid layer added to RNN, but improves computational efficiency by simplifying the structure of LSTM. Dissimilar to input and forget gates of LSTM, LSTM GRU includes a reset gate and an update gate, requiring only two sigmoid operations and one tanh operation.
[0066]
[0067] Referring to
[0068] Referring to
[0069] [Table 4] shows an evaluation accuracy and a test accuracy of each RNN according to the number of hidden layers. A maximum test accuracy of each RNN is as follows: vanilla RNN: 89%, LSTM: 95.47%, Bi-LSTM: 95.79%, and GRU: 95.79%. Two algorithms with the highest test accuracy are Bi-LSTM with 4 hidden layers and GRU with 7 hidden layers. The least accurate RNN is vanilla RNN.
TABLE-US-00004 TABLE 4 GT V-RNN LSTM Bi-LSTM GRU Hidden layers 2 4 7 10 2 4 7 10 2 4 7 10 2 4 7 10 Evaluation 96.43 97.24 95.54 96.19 95.94 96.19 95.62 95.86 96.27 94.97 95.54 95.05 96.43 96.35 95.86 96.19 accuracy (%) Test 88.35 87.38 89.00 87.70 95.47 94.82 95.47 94.82 95.15 95.79 93.53 92.88 94.82 93.85 95.79 94.82 accuracy (%)
[0070] The traffic hand signal learning unit 510 preferably uses GRU of 7 hidden layers having the same accuracy but relatively fast operation speed, based on the trained result.
[0071] Referring to
[0072]
[0073] Accordingly, in a case where the traffic hand signal sequence is used as an input by streaming the label, a threshold value is set, and when a signal is substantially equal to or greater than the threshold value, the traffic hand signal determinator 520 determines it as a traffic hand signal. The threshold value may be 0.4.
[0074] Both the inactive sequence and the invalid sequence do not signal to the driver. However, the probability graph shows a different pattern. Dissimilar to the probability graph of the flat inactive sequence, in the case of the invalid sequence, a change in value appears as the hand signal motion proceeds. Accordingly, the traffic hand signal determinator may determine a signal having a probability lower than 0.4, the threshold value, and higher than 0.2 as an invalid signal.
[0075] In order to evaluate the traffic hand signal learning unit 510, the traffic hand signal sequence adjustor 430 may be used to expand the data in advance. The traffic hand signal sequence adjustor 430 may generate 300 sequences and adjust them evenly according to the type of the traffic hand signal.
[0076] Table 5 shows an example of the sequence of the traffic hand signals extracted from the dataset. One number refers to one image of an arm direction, and a sequence consisting of only zeros indicates that no hand signal motion is performed, so it is regarded as no signal. There are total 15 types of traffic hand signals, but it is preferable to interpret the traffic hand signals from the driver's point of view.
TABLE-US-00005 TABLE 5 Example of hand Original Artificial Total Signal signal sequence sequence sequence sequence Go 111111115552222222222222222222222222 100 1,100 1,200 1111111111111155555555552222222222222222222 1111111111155552222222222 Turn 1111111111111333333333333333 100 1,100 1,200 right 111111111133333333333 1111111111133333333333333 Turn 111111111144444444444444444 100 1,100 1,200 left 11111111111111111444444444444444 11111111111111111111111111444444444444444 Stop 111111116666666666666 100 1,100 1,200 111111111666666666666666666666 6666666666666666666 Invalid 22222222222222222222222 1,100 100 1,200 signal 44444444444444444111111111111111111 55555555555555555522222222222222222 Inactive 000000000000000000000000000 0 1,200 1,200 signal 00000000000000000000222222 000000000000000000000000000555 Total 1,500 5,700 7,200
[0077] The traffic hand signal determinator 520 determines the type of the traffic hand signals for 15 signals as go straight, turn right, turn left, stop, an inactive signal in which the hand signal performer does not perform any action and an invalid signal in which the hand signal is not for the driver. Referring back to
[0078] The traffic hand signal determinator 520 receives the number-converted traffic hand signal sequence from the detection module 400, passes it through the fully connected layer, and processes the result by softmax to determine the type of the traffic hand signal.
[0079] The traffic hand signal determinator 520 outputs a result determined as a go straight, turn right, turn left, or stop signal as a valid signal, and outputs an invalid signal when instructing another driver in a different direction and when a signal does not correspond to a traffic hand signal.
[0080] The database 530 stores information about the photographed traffic hand signal sequence image, the arm direction image extracted from the photographed image, and the traffic hand signal sequence expressed in numbers.
[0081] The traffic hand signal determinator 520 compares information on the traffic hand signal sequence provided from the detection module 400 with information stored in the database 530 and determines the type of the traffic hand signal included in the photographed image.
[0082] The traffic hand signal detection system 100 according to the present disclosure configured as described above may interpret the signal indicated by the traffic hand signal based on information on the traffic hand signal in the image photographed by the camera 15, such that the driver may be provided with information on the signal indicated by the traffic hand signal, and an autonomous vehicle may also recognize the signal indicated by the traffic hand signal.
[0083]
[0084] Referring to
[0085] The image acquisition step S100 is a step of acquiring a photographed image from the camera 15 photographing a predetermined range. The imaging unit 200 connected to the camera 15 receives a photographed image from the camera 15.
[0086] The imaging unit 200 photographs an image from before the execution of the hand signal to after the execution, and collects one hand signal sequence.
[0087] In the image classification step S200, images taken from the front among the collected, photographed traffic hand signal images are classified according to the type of arm motions. The image classifier 300 converts the traffic hand signal sequence into 15 images per second, and then performs the bounding box annotation based on the class. In addition, the image classifier 300 classifies arm motions into eight types: down, front, back, left, right, up, oblique, and left and right.
[0088] The arm motion detecting step S300 is a step of detecting a motion of an arm direction from the photographed arm motion image obtained in the image classification step S200, assigning a class with a number in a range from 0 to 8 to the detected arm motion image, and generating a traffic hand signal sequence.
[0089] In the arm motion detector 410, an arm motion taken by the hand signal performer in the photographed image sequence classified by the image classifier is detected. The arm motion number converter 420 converts the class of the detected arm motion image into numbers.
[0090] In the detecting step 300, an arbitrary traffic hand signal sequence may be added to or deleted from the number-converted traffic hand signal sequence by the hand signal sequence adjustor 430.
[0091] The traffic hand signal analysis step S400 is a step of determining the type of the traffic hand signal corresponding to the number-converted traffic hand signal sequence based on the information detected in the detecting step S300.
[0092] The analysis step S400 includes a step of the traffic hand signal learning unit 510 learning the number-converted traffic hand signal sequence. The traffic hand signal determinator 520 receives the number-converted traffic hand signal sequence from the detection module 400, passes it through the fully connected layer, and processes the result by the softmax to determine the type of the traffic hand signal.
[0093] The traffic hand signal determinator 520 outputs a result determined as a go straight, turn right, turn left, or stop signal as a valid signal, and outputs an invalid signal when instructing another driver in a different direction and when a signal does not correspond to a traffic hand signal.
[0094] In a case where the label is streamed and the traffic hand signal sequence is used as an input, a threshold value may be set, and when a signal is substantially equal to or greater than the threshold value, the traffic hand signal determinator 520 determines it as a traffic hand signal. The threshold value may be 0.4.
[0095] The description of the presented embodiments is provided to allow any person skilled in the art to implement or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the scope of the present disclosure. Therefore, the present disclosure is not intended to be limited to the embodiments presented herein, but is to be construed in the widest scope consistent with the principles and novel features presented herein.