AUTOMATED COMPUTER SYSTEM AND METHOD OF ROAD NETWORK EXTRACTION FROM REMOTE SENSING IMAGES USING VEHICLE MOTION DETECTION TO SEED SPECTRAL CLASSIFICATION
20220414376 · 2022-12-29
Inventors
- Grant B. Boroughs (Southlake, TX, US)
- John J. Coogan (Bedford, TX, US)
- Lisa A. McCoy (Springfield, TX, US)
Cpc classification
G06V20/194
PHYSICS
G06V10/751
PHYSICS
G06V10/763
PHYSICS
International classification
Abstract
A fully-automated computer-implemented system and method for generating a road network map from a remote sensing (RS) image in which the classification accuracy is satisfactory combines moving vehicle detection with spectral classification to overcome the limitations of each. Moving vehicle detections from an RS image are used as seeds to extract and characterize image-specific spectral roadway signatures from the same RS image. The RS image is then searched and the signatures matched against the scene to grow a road network map. The entire process can be performed using the radiance measurements of the scene without having to perform the complicated geometric and atmospheric conversions, thus improving computational efficiency, the accuracy of moving vehicle detection (location, speed, heading) and ultimately classification accuracy.
Claims
1. A computer-implemented method of automated road network identification, said computer programmed to implement the steps of: receiving a single multi-band image including a first set of band images and a second set of band images collected with a time lag relative to the first set; exploiting the time lag between the first and second sets to detect moving vehicles in the single multi-band image; using the detected moving vehicles to extract pixel values from the single multi-band image at pixels adjacent to the locations of detected moving vehicles to characterize at least one image-specific spectral roadway signature; and searching the single multi-band image to match the at least one image-specific spectral roadway signature to pixel values for each of the bands for groupings including at least one pixel to grow a road network map.
2. The computer-implemented method of claim 1, further comprising: forming first and second pseudo-pan images as a weighted average of one or more band images in the first set and a weighted average of one or more band images in the second set, respectively, wherein the time lag between the first and second pseudo pan images is exploited to detect the moving vehicles.
3. The computer-implemented method of claim 2, further comprising pre-processing the first and second pseudo-pan images to improve registration by for each column in the first and second pseudo-pan images, selecting a subset of rows; for each row in the subset, performing a localized sub-pixel correlation to produce X and Y sub-pixel correlation offsets between the first and second pseudo-pan images; averaging the sub-pixel correlation offsets X and Y to produce Xavg and Yavg average sub-pixel correlation offsets; using the average sub-pixel correlation offsets Xavg and Yavg to interpolate pixel values from the first or second pseudo-pan images to produce resampled pixel values for all rows in the column; and concatenating the resampled pixel values for all columns to form resampled first or second pseudo-pan images.
4. The computer-implemented method of claim 3, further comprising: removing X and Y correlation offsets that exceed a first threshold; computing a Figure of Merit (FOM) for each of the remaining X and Y correlation offsets and removing the X and Y sub-pixel correlation offsets whose FOM is less than a second threshold; and then averaging the remaining X and Y sub-pixel correlation offsets to produce the Xavg and Yavg average correlation offsets.
5. The computer-implemented method of claim 3, further comprising: using the average sub-pixel correlation offsets Xavg and Yavg to interpolate pixel values from the one or more band images in the first and second sets to produce resampled values for a resampled multi-band image that is used to extract the at least one image-specific spectral roadway signature and grow the road network map.
6. The computer-implemented method of claim 2, wherein the step of exploiting the time lag to detect moving vehicles comprises, segmenting the first pseudo-pan image into a plurality of local template windows; correlating each local template window to a larger search window in the second pseudo-pan image to produce a correlation surface; extracting X and Y correlation offsets from the correlation surface for each template window; rejecting correlations where X and Y correlation offsets are less than a first threshold; computing a Figure of Merit (FOM) for each of the correlations; and removing correlations whose FOM is less than a second threshold.
7. The computer-implemented method of claim 6, wherein the step of correlating each local template window to the larger search window further comprises: for each possible template offset position relative to the larger search window, extracting pixels from the search window that correspond to the template offset position; for each pixel in the template window, computing a relative contrast metric M from pixel values in the template and search windows for the first and second pseudo-pan images, respectively; raising the relative contrast metric M to a power X where X>1; weighting the M to emphasize center pixels in the template window and de-emphasize edge pixels in the template window; and computing a running sum of the metric M as a cost; assigning the cost to the correlation surface for each template offset position.
8. The computer-implemented method of claim 6, wherein the first threshold is greater than zero and less than one to detect sub-pixel vehicle motion.
9. The computer-implemented method of claim 6, wherein the step of computing the FOM comprises for each said template window, resampling the pixel values within template window of the first pseudo pan image or the search window of the second pseudo-pan image using the X and Y correlation offsets; correlating a template window in the first pseudo-pan image to a search window in the second pseudo-pan image to produce a resampled correlation surface; and computing the FOM from the resampled correlation surface.
10. The computer-implemented method of claim 9, wherein the FOM is computed as 1−A/B where B is a maximum value in the resampled correlation surface and A is a next highest value at least a radius R away from the location of the maximum value in the resampled correlation surface.
11. The computer-implemented method of claim 6, further comprising: for remaining correlations, grouping pixels whose X and Y offsets exceed a threshold into objects; measuring a local contrast around candidate vehicles to reject objects below a contrast threshold; measuring a local contrast within candidate vehicles to split objects with interior low contrast lines; measuring a size and dimensions of objects to reject objects that exceed a specified size; eroding objects to remove pixels around objects; dilating objects to obtain more accurate centroid locations and more complete object definition; outputting remaining objects and their locations as moving vehicles.
12. The computer-implemented method of claim 6, further comprising: for each detected moving vehicle, outputting its location, speed and heading with sub-pixel accuracy.
13. The computer-implemented method of claim 1, further comprising: compute a candidate spectral roadway signature from the pixels adjacent each detected moving vehicle; and clustering the candidate spectral roadway signatures to identify a reduced number of image-specific spectral roadway signatures.
14. The computer-implemented method of claim 13, further comprising including the speed of the moving vehicle as a dimension in the clustering of candidate spectral roadway signatures, wherein the image-specific spectral roadway signatures do not include speed as a dimension.
15. The computer-implemented method of claim 13, further comprising: comparing the image-specific spectral roadway signatures to a library of signatures for different classes of roadways to label each image-specific spectral roadway signature.
16. The computer-implemented method of claim 13, wherein each said image-specific spectral roadway signature includes pixel values representing radiance measurements of a scene in all of the bands in both the first and second sets of the multi-band image.
17. The computer-implemented method of claim 1, wherein the step of searching the multi-band image to grow the road network map comprises for each image-specific spectral roadway signature, selecting a subset of pixels from the multi-band image; computing a spectral similarity metric to each pixel in the subset; applying a threshold to the similarity metric for each pixel to create a binary roadway map; combining the binary roadway map for all of the image-specific spectral roadway signatures to generate the road network map.
18. The computer-implemented method of claim 17, further comprising: using only the spectral similarity metrics at pixels adjacent to the locations of detected moving vehicles to compute an image-specific threshold.
19. The computer-implemented method of claim 17, further comprising: for each image-specific spectral roadway signature, analyzing the binary roadway map based at least in part on the headings of the detected moving vehicles to adjust and reapply the threshold to the similarity metric for each pixel to create update the binary roadway map.
20. The computer-implemented method of claim 17, wherein the step of exploiting the time lag detects moving vehicles including their location, heading and speed, further comprising: using the speed and heading of detected moving vehicles to label the road network map.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
DETAILED DESCRIPTION
[0023] The present disclosure provides a fully-automated computer-implemented system and method for generating a road network map from a remote sensing (RS) image in which the classification accuracy is satisfactory. This approach combines moving vehicle detection with spectral classification to overcome the limitations of each. Moving vehicle detections from an RS image are used as seeds to extract and characterize image-specific spectral roadway signatures from the same RS image. The RS image is then searched and the signatures matched against the scene to grow a road network map. The entire process can be performed using the radiance measurements of the scene without having to perform the complicated geometric and atmospheric conversions, thus improving computational efficiency, the accuracy of moving vehicle detection (location, speed, heading) and ultimately classification accuracy.
[0024] The RS image is a multi-band image that includes first and second sets of band images that are collected with a time lag. For example, WorldView images, WV-2 or WV-3, are 8-band images. The first set includes Near-IR2 (860-1040 nm), Coastal Blue (400-450 nm), Yellow (585-625 nm) and Red-Edge (705-745 nm) and the second set includes Blue (450-510 nm), Green (520-580 nm, Red (630-690 nm) and Near-IR1 (770-895 nm). There is a time lag of 0.2-0.3 seconds between the collection of the band images in the first and second sets. Other multi-band images the exhibit a suitable time lag between at least two bands may be used. Without loss of generality, aspects of the computer-implemented system and method for generating a road network map will be described for a WV-2 8-band image.
[0025] Referring now to
[0026] The computer-implemented system and method is configured exploit the time lag between the sets of band images to detect moving vehicles (step 32) and generate an output 33 including at least the location of each detected moving vehicles. The system may also generate and output the speed and heading of each moving vehicle. The system may also generate resampled band images that sub-pixel column misregistration to improve image quality for subsequent processing. The computer-implemented system and method is configured to extract pixel values from the multi-band image at pixels adjacent to the locations of detected moving vehicles to characterize at least one image-specific spectral roadway signature 35 (step 34). For example, the signature may be an average pixel value and an uncertainty measure (e.g. a standard deviation) for each of the 8 bands. The pixel values may represent the radiance of the scene measured by the band images. Alternately, the multi-band images could be processed to perform geometric and atmospheric conversions such that the pixel values represent reflectance of the scene.
[0027] The computer-implemented system and method is configured to search the multi-band image to match each of the image-specific spectral roadway signatures to pixel values in the scene (radiance) or representative of the scene (reflectance) to grow a road network map (step 36). The union of the initial searches for each of the signatures provides the initial road network map 28. The computer-implemented system and method are configured to apply morphological analysis and clean-up (step 38) to output the final road network map 30.
Moving Vehicle Detection Referring now to
[0028] To condition the band images, the computer-implemented system forms first and second “pseudo-pan” images (e.g., average grayscale images) from the first and second sets 12 and 16 of band images 14 and 18 (step 46). Because each band has its own spectral response, the same feature will have different intensities in different bands, so in order to make the correlation processing more accurate, it is helpful to balance the intensities of the pseudo-pan images. The computer-implemented system is configured to perform a least-squares intensity matching for the pseudo-pan images (step 48) in which one image is the simple average of its bands and the other image uses a least squares approach to solve for a gain g.sub.i for each band and a total offset c that best matches the first pseudo-pan image value.
g.sub.1b.sub.1+g.sub.2b.sub.2+ . . . +g.sub.nb.sub.n+c=p (eq. 1)
To balance the two grayscale images, an observation matrix is constructed using the band values for each pixel of Image B. If there are k pixels, the observation matrix for n bands is:
where the column of 1's is for the constant offset c. And the pseudo-pan values matrix for Image A is:
The least-squares solution for the gains and offset is then obtained by:
(A.sup.TA).sup.−1(A.sup.TB) (eq. 4)
The resulting array is the gain values and the offset value. The gains and offset are applied to the pixel values of band set B to yield Image B as described in (eq. 1). Alternately, a single gain and offset may be calculated but will provide less accurate balancing.
[0029] The computer-implement system is suitably configured to apply masks to the pseudpan images (step 50) to exclude areas of the pseudo-pan images from further processing for reasons of computational efficiency. For example, the system may apply cloud, water or vegetation masks.
[0030] In order to use time lag to detect moving vehicles the sets of band images in the multi-band image must otherwise be closely registered, typically within less than one pixel. The current sources of multi-band images including WV-2 and WV-3 satisfy this criteria. For reasons unknown, certain multi-band images (e.g. WV-2 and WV-3) that satisfy this overall criteria exhibit a sub-pixel misregistration 52 as shown in
[0031] As shown in
[0032] Referring back to
[0033] The computer-implemented system is suitably configured to implement a Least Squares Correlation (LSC) or Normalized Cross Correlation (NCC), traditional techniques that allow for sub-pixel measurement of the X and Y offsets (step 80) from a correlation surface 79 for each template window. Both techniques compute the difference in pixel values between the template window and search window at each template window position to determine a cost function that expresses the similarity (or difference) between pixel values. The LSC uses a squared difference and the NCC uses an absolute value difference to compute the cost function. The pixel cost is added to a running sum for the current template position to assign a total cost to the correlation surface for the current template position. Low costs being representative of a positive correlation to a detected moving vehicle.
[0034] Referring to
[0035] Correlation 500 slides a template window 502 across a somewhat larger search window 504 to extract search window pixels that correspond to the current template window position (step 506) for each possible template offset position (step 505) within the search window. For each pixel in the template window, the correlation computes a relative contrast metric M between the corresponding pixels in the first and second pseudo-pan images (step 508). For example, M=(pT−pS)/(pT+pS)*100 where pT and pS are the pixel values in the template and search windows in the first and second pseudo-pan images. The metric M is raised to an exponent X e.g. M=M.sup.X where X>1 to penalize large differences more heavily (step 510). A weight factor W is applied to the metric M (M=W(M) to weight center pixels more heavily and to de-emphasize edge pixels to emphasize vehicles (step 512). For example, W=1 for center pixels and W=0.25 for edge pixels. The pixel costs M are added in a running sum for the current template (step 514) to generate a total cost (step 516) that is assigned to the correlation surface for the current template position (step 518). The total costs for each possible template offset position generate a cost surface (step 520). This surface is normalized and inverted (step 522) so that high costs have a low score and low costs have a high correlation score. A low cost score representing the alignment of a vehicle in the template and search windows will have a high correlation score e.g. a sharp peak in the correlation surface.
[0036] The ability to provide sub-pixel measurements is important because in practice, some slower-moving vehicles have been observed to have total motion magnitudes of 0.7-0.8 pixels (˜0.5 pixels in the individual line and sample directions). Also, the decimal component of the offset is important for making an accurate speed calculation; when the motion offset is just one or two pixels, an offset of 1.2 pixels versus an offset of 1.6 pixels (for example) is a 33% difference in speed.
[0037] The correlator measures the template window's best matching location within the search window, and provides the X (line) and Y (sample) pixel offsets of this matching location. For example, the correlator might calculate the best match for a moving vehicle's template window to have an offset of +1.2 line pixels and −0.7 sample pixels in the search window, for a total displacement Δp of 1.4 pixels.
Δp=√{square root over (Δl.sup.2+Δs.sup.2)} (eq. 5)
[0038] The line and sample offsets are used to calculation the speed and direction of motion. The velocity can be calculated as the correlator offset in pixels Δp times the GSD of the image divided by the time between band sets Δt.
The direction of motion can be calculated as the arctangent of the correlator line and sample offsets Δl and Δs.
θ=atan 2(Δl,Δs) (eq. 7)
[0039] With reference to
[0040] Although remaining X and Y offsets indicate candidate detected moving vehicles with sufficient detected motion to warrant further processing, the computer-implemented system is suitably configured to further asses confidence in those correlations and the X and Y offsets; do they represent actual moving vehicles or a correlation error? The system is configured to compute a Figure of Merit (FOM) (step 84) for each offset pair (correlation) and reject correlations whose FOM is less than a threshold (TH4) (step 86). The FOM threshold is typically derived numerically from test data.
[0041] A traditional approach would be to assess the quality of the peak at the X and Y offsets in the correlation surface (cost surface) generated by the LSC, NCC or the moving vehicle correlator. The peak correlation score is an indicator of how good the match is overall. Also, the noise component of the result can be assessed by comparing the peak correlation score to the next-highest score that is at least a certain distance away from the peak. A certain distance range must be enforced to prevent using a secondary value that is in actuality part of the main peak. This distance limit can either be a defined value, or a fraction of the search area size, e.g., one-eighth of the search area size. Accordingly the FOM may be computed using known techniques such as a ratio of primary to secondary peak height or using an approach tailored to moving vehicle detection, which compensates for sub-pixel offsets when computing peak heights. An example of the later computes the FOM as 1−A/B where B is a maximum value in the resampled correlation surface and A is a next highest value at least a radius R away from the location of the maximum value in the resampled correlation surface.
[0042] As shown in
[0043] Referring back to
[0044] The system is configured to measure a local contrast around each object to reject false alarms (step 102) and within objects to split abutting objects (step 104). For example, a ring contrast may be computed as a ratio of average center pixel values (presumably moving vehicle pixels) to average pixel values in a ring around the center (presumably not moving vehicle pixels). The specific contrast may be computed as 1−(min(centerAvg,ringAvg)/max(centerAvg,ringAvg))). This may be augmented with a directional contrast calculation. Instead of the entire ring, a directional contrast is calculated between the center pixel and directional offset groups of pixels (upper-left, upper, upper-right, right, lower-right, lower, lower-left, and left). To reject FAs, if the contrast is greater than a threshold, the object is retained. If not, if at least a certain number (e.g., 5) of the directional contrasts exceed a threshold, the object is retained. To split objects, the contrast calculations are done for each pixel in the image (each pixel is the center of a ring). After the detections are grouped into objects, the system looks for lines of minimal contrast within the object to split the object.
[0045] The system is suitably configured to erode the object to remove false alarms and noise pixels from the object (step 106). Standard erosion techniques for image processing to remove outer pixels from a shape may be employed. The system is suitably configured to dilate objects to provide more accurate object centroids (step 108). Standard dilation techniques for image processing to add outer pixels to complete a specified shape may be employed. Erosion removes the outer pixels from each object. This completely removes objects that are too small/narrow to be vehicles (e.g., bits of noise, road edges, etc.) but leaves the interior of vehicles, which can then be expanded out again by dilating. Dilating also closes in gaps in the vehicle (in practice, we sometimes only detect a moon-shaped or other concave subset of pixels from the car instead of the full outline, so dilation expands this). Once erosion and dilation is complete, the system is suitably configured to measure object size and dimensions to reject objects that are too large (to be a moving vehicle) (step 110).
[0046] The computer-implemented system is suitably configured to use overlapping template windows so that there are multiple detections per vehicle. This allows the system to further reject noise and false alarms from isolated single detects or shapes that do not match a vehicle size. For example, the edge of a building or roadway may produce a long, thin line of detections that may have the same correlator offset as a slow-moving vehicle. The size and shape of each object can be analyzed for consistency with a vehicle profile. This can also allow us to estimate the size of vehicles, possibly differentiating between cars, trucks, and larger vehicles.
[0047] The system is configured to at minimum output a list of detected moving vehicles and their image coordinate locations (step 112). The system is suitably configured to estimate geolocation and associated geolocation uncertainty (step 114). Geolocation position uncertainty is estimated by propagating sensor uncertainty metadata and elevation uncertainty metadata through a suitable sensor model when calculating vehicles' geolocation. The system is suitably configured to use the time lag between collection of the first and second sets of band images and the pixel GSD to calculate vehicle speed and associated uncertainty from the correlation offsets (step 116). Speed uncertainty is calculated by propagating uncertainties in the sensor time lag, image GSD, and/or correlation offset through (eq. 6). The system may also output vehicle speed and headings, which may be used in subsequent processing to extract the road network map (step 118). They system may also output vehicle geolocation and uncertainties for each of the geolocation, speed and headings, which may be useful for other applications (step 118).
Extraction of Image-Specific Spectral Roadway Signatures
[0048] Referring now to
[0049] The system is configured to compute a candidate spectral roadway signature for the remaining pixels for each detection (step 210). The system may compute the signature as an average pixel value and standard deviation in each of the bands of the multi-band image (e.g. each of the 8 bands in the WV-2 image). To improve processing efficiency and classification accuracy, the system is suitably configured to cluster the candidate signatures to identify a reduced number N of predominant spectral roadway signatures (step 212). The number N can be preset or can be determined by the clustering algorithm. For simplicity, clusters 214 of candidate signatures 216 based on average pixel values in Bands 1 and 2 is depicted in
Growing the Road Network Map
[0050] Referring now to
[0051] For each image-specific spectral roadway signature (step 306), the system may be configured to downsample or aggregate pixels in the remaining portions of the multi-band image (step 308) to reduce the computational load. The system computes a spectral similarity between the signature and each remaining pixel in the multi-band image (step 310). Standard spectral similarities include the Mahalanobis distance, Spectral Angle Mapper, Spectral Correlation mapper and Euclidean distance that measure the similarity between two vectors (e.g. the 8-band signature) and output a single score. Alternately, a similarity metric could be configured to produce a score for each band. The spectral similarity may account for both the differences in pixel values and standard deviations. The system then compares the score(s) to a threshold (TH5) to form a binary map of road pixels (step 312). First, the system is configured to generate the threshold (TH5). In one case, the system is configured to use spectral similarity values across the entire multi-band image (or a database of similar multi-band images) to set threshold (TH5) (step 314). In another case, the system is configured to use only the spectral similarity values for pixels adjacent detected moving vehicles to set threshold (TH5) (step 316). The threshold is set so that adjacent pixels that are vehicle pixels satisfy the threshold and adjacent pixels that were excluded as outliers do not satisfy the threshold.
[0052] The system may be configured to output the binary map as the road network map for each spectral signature (step 318) or to perform a spatial analysis of the binary map to iteratively refine and reapply threshold TH5 (step 320). As shown in
[0053] The system is configured to form a union of the road network maps for all of the image-specific spectral roadway signatures (step 320) to produce the initial road network map 28. As with the moving vehicles, the road network map is processed to clean-up the map to form the final road network map 30. The system may be configured to implement morphological erosion by M and dilation by N where N>=M to remove noise and fill in gaps in the roads (step 322) and morphological size/shape analysis to remove structures such as parking lots or buildings that may have similar signatures to concrete roads (step 324). The system may be configured to label different portions of the road network map to signify the type of road (concrete, asphalt, gravel, dirt etc.), the direction of traffic flow and the approximate speed of traffic flow (using the headings and speeds of the detecting moving vehicles) (step 326).
[0054] While several illustrative embodiments of the disclosure have been shown and described, numerous variations and alternate embodiments will occur to those skilled in the art. Such variations and alternate embodiments are contemplated, and can be made without departing from the spirit and scope of the disclosure as defined in the appended claims.