METHOD FOR UPDATING ROAD SIGNS AND MARKINGS ON BASIS OF MONOCULAR IMAGES

20230135512 · 2023-05-04

Assignee

Inventors

Cpc classification

International classification

Abstract

The present invention discloses a method for updating road signs and markings on the basis of monocular images, comprising the following steps: acquiring street images of urban roads and GPS phase center coordinates and spatial attitude data corresponding to the street images; extracting coordinates of the road sign marking images; constructing a sparse three-dimensional model, and then generating a streetscape image depth map; calculating the space position of the road sign and marking according to the semantic and depth values of the image, the collinear equation and the space distance relation; if the same road sign and marking is visible in multiple views, solving the position information of the road sign; and vectorizing the obtained road sign position information, and fusing the information into the original data to realize the updating of the road sign data.

Claims

1. A method for updating road signs and markings on the basis of monocular images, specifically comprising the following steps: S1, acquiring streetscape images of urban roads and GPS phase center coordinates and spatial attitude data corresponding to the streetscape images through mobile data acquisition equipment; S2, sequentially preprocessing and distortion correcting the streetscape images obtained in step S1, semantically segmenting elements on the corrected images using the deep learning semantic segmentation technology by category, converting the segmented streetscape images of the road signs and markings into binary image maps, and extracting image coordinates of skeleton points of road signs and image positions of road markings in the binary image maps; S3, constructing a sparse three-dimensional model, calculating the attitude of the mobile data acquisition equipment and the spatial position of the sparse three-dimensional point cloud, and then generating a streetscape image depth map on the basis of the multi-view dense reconstruction technology according to the reconstructed sparse three-dimensional point cloud and the internal and external parameters of the mobile data acquisition equipment; S4, calculating the spatial positions of the road signs and markings according to the semantic and depth values of the image, the collinear equation and the spatial distance relationship; S5, if the same road sign and marking is visible in multiple views, solving the position information of the road sign and marking by adopting a multi-view forward intersection method; and S6, vectorizing the position information of the road signs and markings obtained in steps S4 and S5, and fusing into the original data to update the data of the road signs and markings.

2. The method for updating road signs and markings on the basis of monocular images according to claim 1, wherein the mobile data acquisition equipment in step S1 integrates a monocular camera and a GPS/IMU equipment, the mobile data acquisition equipment is installed in a forward-looking window of a vehicle, and then the relative pose relationship between the monocular camera and the GPS/IMU equipment and the internal reference information of the monocular camera through calibration.

3. The method for updating road signs and markings on the basis of monocular images according to claim 2, wherein step S2 specifically comprises the following steps: S21, preprocessing the streetscape images using image enhancement and de-noising technologies to reduce the influence of image noise on the streetscape images; S22, carrying out distortion correction on the preprocessed streetscape images in combination with the internal reference of the monocular camera; S23, making the streetscape images after distortion correction into DeeplabV3+network training dataset on the basis of the data label tool, and adopting the GeoAI deep learning framework to realize the training, validation and testing of the model, and segmenting label elements on the streetscape images by category, the label elements comprising sky, trees, road signs, road markings and lane surfaces; S24, eliminating the sky and trees in the images after distortion correction by using image mask technology on the basis of the semantic segmented label data; S25, carrying out semantic segmentation on elements of road signs and markings in streetscape images, converting the images into binary images, and extracting image coordinate information of the road signs and markings from the binary images by using different vision algorithms; S251, extracting a connected region of a binary image for road sign elements, and calculating a shape descriptor of the connected region to extract image coordinates of skeleton points of road signs; and S252, extracting image positions of road markings for road marking elements by using Hough transform algorithm.

4. The method for updating road signs and markings on the basis of monocular images according to claim 3, wherein in step S4 of calculating the spatial positions of the road signs and markings according to the collinear equation and the spatial distance relationship with the object space imaging in front of the monocular camera as the constraint condition, the calculation formula is as follows (1): { [ x - x 0 f y - y 0 ] = λ * [ a 11 a 12 a 13 b 11 b 12 b 13 c 11 c 12 c 13 ] * [ a 21 a 22 a 23 b 21 b 22 b 23 c 21 c 22 c 23 ] [ ? ] d 2 = ? α = cos - 1 ? ( 1 ) ? indicates text missing or illegible when filed where, [x, f, y] is the coordinate of the skeleton point of the road pole or the sign and marking in the image plane coordinate system; [x.sub.0, y.sub.0] is the coordinate of the main point of the image; λ is the projection coefficient; [α.sub.11, α.sub.12, α.sub.13; b.sub.11, b.sub.12, b.sub.13;c.sub.1 1, c.sub.12, c.sub.13] is the transition matrix from the auxiliary coordinate system of image space to the coordinate system of image plane; [α.sub.21, α.sub.22, α.sub.23; b.sub.21, b.sub.22, b.sub.23; c.sub.21, c.sub.22, c.sub.23] is the transition matrix from the geodetic coordinate system to the auxiliary coordinate system of the image space; [X.sub.A, Y.sub.A, Z.sub.A] is the coordinate of the skeleton point of the road pole or the sign and marking in the geodetic coordinate system, and is a value to be calculated in this calculation method; [X.sub.S1, Y.sub.S1, Z.sub.S1] and [X.sub.S2, Y.sub.S2, Z.sub.S2] are the photographing center coordinates of the cameras in the front and rear camera stations, i.e. GPS value; d is the distance from the object space point to the photographing center of the camera; and α is the plane projection value of the included angle between the photographing center line α of connecting the front and rear monocular cameras and the photographing center line b of connecting the object space point and the front monocular camera.

5. The method for updating road signs and markings on the basis of monocular images according to claim 3, wherein step S3 specifically comprises the following steps: S31, extracting feature points of the streetscape images through SIFT algorithm, and adopting image mask, feature extraction and matching to the original streetscape images to generate a multi-view geometrical relationship map; S32, setting parameters, selecting an initial matching image pair on the basis of an incremental SFM algorithm, and constructing a regional network model; adding new sequence images iteratively in the construction of the regional network model to generate a new sparse three-dimensional model; when the number of streetscape images in the sparse three-dimensional model is less than 3, continuing to add new sequence images, until the number of streetscape images in the sparse three-dimensional model is greater than 3; S33, when the number of streetscape images in the sparse three-dimensional model is greater than 3, fusing GPS/IMU prior constraint data to reduce the accumulation of model errors, incorporating the regional network model coordinates into the real geodetic coordinate system by an absolute orientation method, solving the error with the absolute orientation model by formula (2), if the error is greater than 10 cm, discarding the reconstructed regional network model, repeating steps S32˜S33, and continuing to initialize and construct the next regional network model, until the error is less than 10 cm; S34, when the error is less than 10 cm, adopting the local bundle adjustment to further optimize the attitude information of the monocular camera; after all the streetscape images are constructed, use the global bundle adjustment to further optimize and solve the internal and external parameter information of the sparse three-dimensional point cloud and the monocular camera; and S35, reconstructing a streetscape image depth map by using the multi-view dense reconstruction method; the formula (2) in step S33 being: { [ σ xi 2 σ yi 2 σ zi 2 ] = λ [ a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 ] [ X i Y i Z i ] + [ X 0 Y 0 Z 0 ] - [ X t i Y t i Z t i ] σ = .Math. 1 n ( σ xi 2 + σ yi 2 + σ zi 2 ) n ( 2 ) where, [X.sub.i, Y.sub.i, Z.sub.i] is the model coordinate of the i-th checkpoint; λ, [α.sub.1, α.sub.2, α.sub.3; b.sub.1, b.sub.2, b.sub.3; c.sub.1, c.sub.2, c.sub.3] and [X.sub.0, Y.sub.0, Z.sub.0] are absolute orientation 7-parameter models; [X1, Z.sub.r] is the real geodetic coordinate of the i-th checkpoint; [σ.sub.xi, σ.sub.yi, σ.sub.zi] is the mean square error component corresponding to the i-th checkpoint, σ is the mean square error of the point, and n is the number of checkpoints.

6. The method for updating road signs and markings on the basis of monocular images according to claim 3, wherein in step S5, when the same road signs and markings are visible in multiple views, the accurate position information is obtained by the least square fitting, as shown in formula (3), { [ x 1 - x 0 f y 1 - y 0 ] = λ * [ a 11 a 12 a 13 b 11 b 12 b 13 c 11 c 12 c 13 ] * [ a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 ] [ X A - X S 1 Y A - Y S 1 Z A - Z S 1 ] .Math. [ x n - x 0 f y n - y 0 ] = λ * [ a n 1 a n 2 a n 3 b n 1 b n 2 b n 3 c n 1 c n 2 c n 3 ] * [ a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 ] [ X A - X Sn Y A - Y Sn Z A - Z Sn ] d i 2 = ( X A - X Si ) 2 + ( Y A - Y Si ) 2 + ( Z A - Z Si ) 2 1 i n ( 3 ) where, [x.sub.1, y.sub.1, x.sub.n, y.sub.n] is the image coordinate projected on multiple views for the same road sign or marking of the object space; [x0, y0] is the coordinate of the main point of the image; f is the focal distance of the camera; λ is the projection coefficient; [X.sub.A, Y.sub.A, Z.sub.A] is the object space coordinate to be calculated for the road sign and marking; [X.sub.Si, Y.sub.si, Z.sub.si] is the corresponding camera position for the i-th image, 1≤i≤n, when i is n, [X.sub.Sn, Y.sub.Sn, Z.sub.Sn]; [α.sub.11, α.sub.12, α.sub.13; b.sub.11, b.sub.12, b.sub.13;c.sub.11, C.sub.12, c.sub.13] and [α.sub.n1, α.sub.n2, α.sub.n3; b.sub.n1, b.sub.n2, b.sub.n3; C.sub.n1, C.sub.n2, C.sub.n3] are expressed as the transition matrix from the auxiliary coordinate system of the image space to the coordinate system of the image plane of the corresponding image; [α.sub.1, α.sub.2, α.sub.3; b.sub.1, b.sub.2, b.sub.3;c.sub.l, c.sub.2, c.sub.3] is the transition matrix from the geodetic coordinate system to the auxiliary coordinate system of the image space; and d.sub.i is the spatial distance between the road sign and marking to be calculate and the photographing center of the i-th streetscape image.

7. The method for updating road signs and markings on the basis of monocular images according to claim 5, wherein in step S4, further comprising: for the problem on longitudinal imaging of the road pole sign on the streetscape image, when the included angle between the line connecting the photographing center of the monocular camera with the undetermined position of the road pole sign α and the vehicle traveling direction b is α<90°, this coordinate is selected as the spatial position of the road pole sign.

8. The method for updating road signs and markings on the basis of monocular images according to claim 5, wherein in step S23 of carrying out element segmentation on streetscape images after distortion correction specifically comprises the following steps: S231, data preparation: with reference to format specifications of open source datasets, structuring the data of streetscape images, and importing data labels; S232, carrying out model training, validation and testing by using GeoAI depth learning framework in combination with DeeplabV3+neural network model; S2321, setting parameters, and importing the training model; S2322, carrying out model training; S2323, carrying out model validation if the obtained model is the global optimal value; in case of not returning to step S2322, carrying out model training again, until the obtained model is the global optimal value; S2324, if the model validation result meets the accuracy, outputting the model; in case of not meeting the accuracy, returning to step S2321, repeating steps S2321˜S2324 until the model validation results meet the accuracy, and outputting the model; S2325, visualizing the test results of the output model, determining whether the generalization is satisfied, and if so, deploying and applying the model; if not, returning to step S231, repeating steps S231˜S2325, until the generalization is satisfied; and S233, model deployment and application: saving the model locally and deploying the model to the server to achieve semantic segmentation of the scene.

Description

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

[0052] FIG. 1 is a flowchart of a method for updating road signs and markings on the basis of monocular images according to the present invention;

[0053] FIG. 2 is a flowchart of GeoAI-based image semantic segmentation in a method for updating road signs and markings on the basis of monocular images according to the present invention;

[0054] FIG. 3 is a flowchart of constructing a monocular depth map in a method for updating road signs and markings on the basis of monocular images according to the present invention;

[0055] FIG. 4 is a flowchart of multi-view dense reconstruction algorithm MVS in a method for updating road signs and markings on the basis of monocular images according to the present invention; and

[0056] FIG. 5 shows the effect of a method for updating road signs and markings on the basis of monocular images according to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

[0057] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention.

[0058] Embodiment: as shown in FIG. 1, a method for updating road signs and markings on the basis of monocular images, specifically comprising the following steps:

[0059] S1, equipment installation, calibration and field acquisition: acquiring street images of urban roads and GPS phase center coordinates and spatial attitude data corresponding to the street images by mobile data acquisition equipment; the mobile data acquisition equipment in step S1 integrates a monocular camera and a GPS/IMU equipment, the mobile data acquisition equipment is installed in a forward-looking window of a vehicle, and then the relative pose relationship between the monocular camera and the GPS/IMU equipment and the internal reference information of the monocular camera through calibration by two-dimensional/three-dimensional calibration field; the pose inconsistency between the monocular camera and the GPS/IMU equipment is mainly caused by the non-parallel axes during the installation process. The internal reference calibration of the camera is mainly to solve the radial and tangential distortion of the camera lens. Such calibration data will directly determine the accuracy of subsequent depth map calculation;

[0060] S2, image distortion correction and semantic segmentation: sequentially preprocessing and distortion correcting the streetscape images obtained in step S1, semantically segmenting elements on the corrected streetscape images using the deep learning semantic segmentation technology by category; converting the streetscape images into binary images for semantic segmentation on the streetscape images of the road sign and marking element, and then extracting the image coordinates of the skeleton point of the road sign and the image position of the road marking in the binary image map.

[0061] Step S2 specifically comprises the following steps:

[0062] S21, preprocessing the streetscape images using image enhancement and de-noising technologies to reduce the influence of image noise on the streetscape images;

[0063] S22, carrying out distortion correction on the preprocessed streetscape images in combination with the internal reference of the monocular camera; and

[0064] S23, making the streetscape images after distortion correction into DeeplabV3+network training dataset on the basis of the data label tool, and adopting the GeoAI deep learning framework to realize the training, validation and testing of the model, and segmenting the elements on the images by category, the elements comprising sky, trees, road signs, road markings and lane surfaces.

[0065] As shown in FIG. 2, in step S23 of carrying out element segmentation on streetscape images after distortion correction specifically comprises the following steps:

[0066] S231, data preparation: structuring the streetscape image data according to the open source datasets (such as Mapillary, KITTI), and importing the data labels by sample batches;

[0067] S232, carrying out model training, validation and testing by using GeoAI depth learning framework in combination with DeeplabV3+neural network model;

[0068] S2321, setting parameters, and importing the training model;

[0069] S2322, carrying out model training;

[0070] S2323, carrying out model validation if the obtained model is the global optimal value; in case of not returning to step S2322, carrying out model training again, until the obtained model is the global optimal value;

[0071] S2324, if the model validation result meets the accuracy, outputting the model; in case of not meeting the accuracy, returning to step S2321, repeating steps S2321˜S2324 until the model validation results meet the accuracy, and outputting the model;

[0072] S2325, visualizing the test results of the output model, determining whether the generalization is satisfied, and if so, deploying and applying the model; if not, returning to step

[0073] S231, repeating steps S231˜S2325, until the generalization is satisfied; and

[0074] S233, model deployment and application: saving the model locally and deploying the model to the server, and batch forecasting the segmentation results of streetscape images to achieve semantic segmentation of the scene.

[0075] S24, eliminating the sky and trees in the images after distortion correction by using image mask technology; and S25, converting the streetscape image map of the road sign and marking element in streetscape images after segmentation into a binary image map, and extracting the information at the skeleton point of the road sign from the binary images by using different vision algorithms.

[0076] S251, extracting a connected region of a binary image for road sign elements, and calculating a shape descriptor of the connected region to extract image coordinates of skeleton points of road signs; and

[0077] S252, for the road marking elements, extracting image positions of road markings by Hough transform algorithm mentioned in Generalizing the Hough transform to detect arbitrary shapes.

[0078] The process of image enhancement and de-noising in step S2 mainly solves the problem that the exposure intensities of the photos are inconsistent due to the influence of solar rays at different incident angles when the vehicle acquires data on the road.

[0079] S3, image mask, reconstruction of sparse point cloud and construction of depth map: on the basis of the incremental SFM algorithm, fusing GPS/IMU data, reconstructing sparse three-dimensional point cloud, calculating the internal and external parameters of the camera and the sparse three-dimensional point cloud information, and then generating a streetscape image depth map on the basis of the multi-view dense reconstruction algorithm according to the reconstructed sparse three-dimensional point cloud and the internal and external parameters of the camera.

[0080] As shown in FIG. 3, step S3 specifically comprises the following steps:

[0081] S31, extracting feature points of the streetscape images through SIFT algorithm, and adopting image mask, feature extraction and matching to the original streetscape images to generate a multi-view geometrical relationship map;

[0082] S32, setting parameters, selecting an initial matching image pair on the basis of an incremental SFM algorithm, and constructing a relative orientation model, and then gradually adding the unconstructed streetscape image to construct a regional network model; adding new sequence images iteratively in the regional network model to generate a new sparse three-dimensional model; when the number of streetscape images in the sparse three-dimensional model is less than 3, continuing to add new sequence images, until the number of streetscape images in the sparse three-dimensional model is greater than 3; carrying out the next GPS/IMU data fusion to construct an absolute orientation model.

[0083] S33, when the number of streetscape images in the sparse three-dimensional model is greater than 3, fusing GPS/IMU prior constraint data to reduce the accumulation of model errors, incorporating the regional network model coordinates into the real geodetic coordinate system by an absolute orientation method, solving the error with the absolute orientation model by formula (2), if the error is greater than 10 cm, discarding the reconstructed regional network model, repeating steps S32˜S33, and continuing to initialize and construct the next regional network model, until the error is less than 10 cm; the formula (2) in step S33 being:

[00004] { [ σ xi 2 σ yi 2 σ zi 2 ] = λ [ a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 ] [ X i Y i Z i ] + [ X 0 Y 0 Z 0 ] - [ X t i Y t i Z t i ] σ = .Math. 1 n ( σ xi 2 + σ yi 2 + σ zi 2 ) n ( 2 )

[0084] where, [X.sub.i, Y.sub.i, Z.sub.i] is the model coordinate of the i-th checkpoint; λ, [α.sub.1, α.sub.2, α.sub.3; b.sub.1, b.sub.2, b.sub.3; c.sub.1, c.sub.2, c.sub.3] and [X.sub.0, Y.sub.0, Z.sub.0] are absolute orientation 7-parameter models; [X.sub.l.sup.i, Y.sub.l.sup.i, Z.sub.l.sup.i] is the real geodetic coordinate of the i-th checkpoint; [σ.sub.xi, σ.sub.yi, σ.sub.zi] is the mean square error component corresponding to the i-th checkpoint, σ is the mean square error of the point, and n is the number of checkpoints;

[0085] S34, when the error is less than 10 cm, adopting the local bundle adjustment to further optimize the attitude information of the monocular camera; after all the streetscape images are constructed, use the global bundle adjustment to further optimize and solve the internal and external parameter information of the sparse three-dimensional point cloud and the camera; and

[0086] S35, reconstructing a streetscape image depth map by using the multi-view dense reconstruction method: as shown in FIG. 4,

[0087] S351, inputting images, internal and external parameters of monocular camera and sparse point cloud;

[0088] S352, clustering views for multi-view stereo: merging sparse point clouds, filtering redundant images, and determining whether the classification conditions are met; if the classification conditions are met, carrying out block stereo matching; if the classification conditions are met, adding the streetscape images when the clustered images meet the size of the classification container, and re-determining whether the classification conditions are met; if the classification conditions are still not met, adding the streetscape images again until the classification conditions are met when the clustered images meet the size of the classification container; and

[0089] S353, block stereo matching: first matching, recycling for multiple diffusion and filtration until dense point cloud and streetscape image depth map are generated.

[0090] In step S3, the present invention proposes to gradually fuse GPS/IMU data, and introduce the local bundle adjustment model, so as to solve the problem on model distortion or camera attitude error in SFM reconstruction, and realize the conversion from regional network model coordinates to real geodetic coordinates.

[0091] S4, solving an initial position of the road sign and marking: calculating the spatial positions of the road signs and markings according to the semantic and depth values of the image, the collinear equation and the spatial distance relationship;

[0092] In step S4 of calculating the spatial positions of the road signs and markings according to the collinear equation and the spatial distance relationship with the object space imaging in front of the monocular camera as the constraint condition, i.e., solving the spatial position of the sign of the road pole, the calculation formula is as follows (1):

[00005] { [ x - x 0 f y - y 0 ] = λ * [ a 11 a 12 a 13 b 11 b 12 b 13 c 11 c 12 c 13 ] * [ a 21 a 22 a 23 b 21 b 22 b 23 c 21 c 22 c 23 ] [ X A - X S 1 Y A - Y S 1 Z A - Z S 1 ] d 2 = ? α = cos - 1 ? ( 1 ) ? indicates text missing or illegible when filed

[0093] where, [x, f, y] is the coordinate of the skeleton point of the road pole or the sign and marking in the image plane coordinate system; [x.sub.0, y.sub.0] is the coordinate of the main point of the image; λ is the projection coefficient; [α.sub.11, α.sub.12, α.sub.13;b.sub.11, b.sub.12, b.sub.13, c.sub.11, c.sub.12, c.sub.13] is the transition matrix from the auxiliary coordinate system of image space to the coordinate system of image plane; [α.sub.21, α.sub.22, α.sub.23; b.sub.21, b.sub.22, b.sub.23; c.sub.21, c.sub.22, c.sub.23] is the transition matrix from the geodetic coordinate system to the auxiliary coordinate system of the image space; [X.sub.A, Y.sub.A, Z.sub.A] is the coordinate of the skeleton point of the road pole or the sign and marking in the geodetic coordinate system, and is a value to be calculated in this calculation method; [X.sub.S1, Y.sub.S1, Z.sub.S1] and [X.sub.S2, Y.sub.S2, Z.sub.S2] are the photographing center coordinates of the cameras in the front and rear camera stations, i.e. GPS value; d is the distance from the object space point to the photographing center of the camera; and α is the plane projection value of the included angle between the photographing center line α of connecting the front and rear monocular cameras and the photographing center line b of connecting the object space point and the front monocular camera; Step S4 further comprises: for the problem on longitudinal imaging of the road pole sign on the streetscape image, when the included angle of plane projection between the line α of connecting the photographing center of the monocular camera with the undetermined position of the road pole sign and the vehicle traveling direction b is α<90°, this coordinate is selected as the spatial position of the road pole sign. When the sign of the road pole is imaged longitudinally on the image, the extracted skeleton point of the road pole is different with the height of the road pole, and there are two cases: higher than the installation position of the monocular camera and lower than the installation position of the monocular camera, resulting in the problem on multiple solutions to the spatial position information of the road sign solved in step S4. Therefore, by adding the constraint of the vehicle driving direction, that is, the included angle α of plane projection between the line of connecting the camera photographing center with the undetermined position of the rod and the vehicle traveling direction b, if α<90°, this coordinate is selected as the spatial position of the rod.

[0094] S5, position fitting of road signs and marking under multi-view overlap: if the same road sign and marking is visible in multiple views, solving the position information of the road sign and marking by adopting a multi-view forward intersection method;

[0095] In step S5, when the same road signs and markings are visible in multiple views, the accurate position information is obtained by the least square fitting, as shown in formula (3),

[00006] { [ x 1 - x 0 f y 1 - y 0 ] = λ * [ a 11 a 12 a 13 b 11 b 12 b 13 c 11 c 12 c 13 ] * [ a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 ] [ X A - X S 1 Y A - Y S 1 Z A - Z S 1 ] .Math. [ x n - x 0 f y n - y 0 ] = λ * [ a n 1 a n 2 a n 3 b n 1 b n 2 b n 3 c n 1 c n 2 c n 3 ] * [ a 1 a 2 a 3 b 1 b 2 b 3 c 1 c 2 c 3 ] [ X A - X Sn Y A - Y Sn Z A - Z Sn ] d i 2 = ( X A - X Si ) 2 + ( Y A - Y Si ) 2 + ( Z A - Z Si ) 2 1 i n ( 3 )

[0096] where, [x1, y1, xn, yn] is the image coordinate projected on multiple views for the same road sign or marking of the object space; [x.sub.0, y.sub.0] is the coordinate of the main point of the image; f is the focal distance of the camera; λ is the projection coefficient; [X.sub.A, Y.sub.A, Z.sub.A] is the object space coordinate to be calculated for the road sign and marking; [X.sub.Si, Z.sub.Si] is the corresponding camera position for the i-th image, 1≤i≤n, when i is n, [X.sub.Sn, Y.sub.Sn, Z.sub.Sn]; [α.sub.11, α.sub.12, α.sub.13; b.sub.11, b.sub.12, b.sub.13;c.sub.1 1, c.sub.12, c.sub.13] and [an1, an2, an3; b.sub.n1, b.sub.n2, b.sub.n3; cn1, cn2, cn3] are expressed as the transition matrix from the auxiliary coordinate system of the image space to the coordinate system of the image plane of the corresponding image; [α.sub.1, α.sub.2, α.sub.3; b.sub.1, b.sub.2, b.sub.3;c.sub.1, c.sub.2, c.sub.3] is the transition matrix from the geodetic coordinate system to the auxiliary coordinate system of the image space; and d.sub.i is the spatial distance between the road sign and marking to be calculate and the photographing center of the i-th streetscape image; and

[0097] S6, data fusion and updating: in GIS software, vectorizing the position information of the road signs and markings obtained in steps S4 and S5, and fusing into the original data to update the data of the road signs and markings.

[0098] FIG. 5 is a software interface diagram of the automatic acquisition of road signs and markings by this method in this embodiment, where the elements rendered with white traffic light symbols represent the road rod-shaped signs; since example segmentation is not performed in the image semantic segmentation, it is necessary to further input its type attribute against the streetscape image; the white line indicates the road marking and stop line; there is no further distinction on the marking category here, and it needs to be further finished according to the actual situation; the execution process of the whole software function is roughly as follows: with the video playing, the spatial positions of the road signs and markings are solved in real time and rendered on the map; and when a user finds that there is no element change in the current road section, he/she can drag the video progress bar to update the changed road section.

[0099] The foregoing are only better embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present invention shall be covered within the scope of protection for the present invention.