Method of detecting structural parts of a scene
09864927 ยท 2018-01-09
Assignee
Inventors
- Colin Alexander McManus (Oxford, GB)
- Paul Michael Newman (Oxford, GB)
- Benjamin Charles Davis (Oxford, GB)
Cpc classification
G06V10/751
PHYSICS
International classification
Abstract
A method of detecting the structural elements within a scene sensed by at least one sensor within a locale, the method comprising: a) capturing data from the sensor, which data provides a first representation of the sensed scene at the current time; b) generating a second representation of the sensed scene where the second representation is generated from a prior model of the locale; and c) comparing the first and second representations with one another to determine which parts of the first representation represent structural elements of the locale.
Claims
1. A method of detecting structural elements within a scene sensed by a first sensor within a locale, the method comprising: acquiring data from the first sensor, which data provides a first representation of the sensed scene at a current time; generating a second representation of the sensed scene where the second representation is generated from a prior model of the locale, the prior model of the locale generated using a second sensor different from the first sensor; and comparing the first and second representations with one another to determine which parts of the first representation represent structural elements of the locale, wherein comparing the first and second representations comprises generating disparity differences between the first and second representations which are weighted according to an associated measurement uncertainty.
2. A method according to claim 1 in which generating the second representation includes generating the second representation by reprojecting at least a portion of the prior model to represent the first representation.
3. A method according to claim 2 in which the prior model is processed so that the second representation includes only the structural elements and excludes ephemeral objects.
4. A method according to claim 1 in which the measurement uncertainty is provided by using a pre-computed filter.
5. A method according to claim 4 in which the pre-computed filter is an average depth-Jacobian image where z.sup.s/x is computed for each pixel in the following:
6. A method according to claim 5 in which the average depth Jacobian image is computed over a plurality of frames and the results combined.
7. A method according to claim 1 in which generating the second representation includes generating the second representation using a determination of movement between a time the first representation was provided and the sensed scene was provided by the prior model.
8. A method according to claim 1 in which the method is used to localize a vehicle within a vehicle locale, the first sensor is mounted upon the vehicle, and wherein the method is applied to data obtained from the first sensor on the vehicle.
9. A method according to claim 1 wherein the first sensor and the second sensor are of different sensor modalities.
10. A method according to claim 1 wherein the first sensor is a camera and the second sensor is a laser scanner.
11. A system comprising a memory, processing circuitry, and a first sensor, the system arranged to detect structural elements within a sensed scene, wherein the system is arranged to perform the following tasks: acquire, using the first sensor, a first representation of a sensed scene taken at a first time; generate, using the processing circuitry, a second representation of the sensed scene, where the second representation is generated from a prior model of a locale containing the sensed scene, the prior model of the locale having been generated using a second sensor different from the first sensor; compare, using the processing circuitry, the first and second representations with one another to determine which parts of the first representation are used to determine structural elements of the locale; and generate, using the processing circuitry, disparity differences between the first and second representations which are weighted according to an associated measurement uncertainty.
12. A system according to claim 11 which is arranged to reproject at least a portion of the prior model to provide a similar view to that of the first representation.
13. A system according to claim 11 which is arranged to generate the second representation using a determination of movement between the first time that the first representation was acquired and a different time.
14. A system according to claim 11 which is arranged to weight the first and second representations by a pre-computed filter.
15. A system according to claim 11 wherein the first sensor and the second sensor are of different sensor modalities.
16. A system according to claim 11 wherein the first sensor is a camera and the second sensor is a laser scanner.
17. A non-transitory machine readable medium containing instructions which when read by a computer cause the computer to: acquire, using a first sensor, a first representation of a scene taken at a first time; generate a second representation of the scene, where the second representation is generated from a prior model of a locale containing the scene, the prior model of the locale generated using a second sensor different from the first sensor; compare the first and second representations with one another to determine which parts of the first representation are used to determine structural elements of the locale; and generate disparity differences between the first and second representations which are weighted according to an associated measurement uncertainty.
18. A non-transitory machine readable medium according to claim 17 in which the measurement uncertainty is provided by using a pre-computed filter.
19. A non-transitory machine readable medium according to claim 18 in which the pre-computed filter is an average depth-Jacobian image where z.sup.s/x is computed for each pixel in the following:
20. A non-transitory machine readable medium according to claim 19 in which the average depth Jacobian image is computed over a plurality of frames and the results combined.
Description
(1) There now follows, by way of example only and with reference to the accompanying figures, a detailed description of embodiments of the present invention of which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31) Whilst it is convenient to describe embodiments in relation to a vehicle which is arranged to process its locale, embodiments of the invention may find wider applicability. The ability to determine which parts of a scene are ephemeral and/or which parts relate to structural elements (i.e. to differentiate background from foreground) may find applicability in a number of other fields. For example embodiments may find utility in surveillance systems perhaps to aid object detection, smartphone applications; surveying applications interested in change detection (e.g., maybe returning to a pre-surveyed environment to see if any infrastructure has changed).
(32) Thus, embodiments of the invention are described in relation to a sensor 100 mounted upon a vehicle 102 and in relation to the flow chart of
(33) In the embodiment being described, the sensor 100 is a passive sensor (i.e. it does not create radiation and merely detects radiation) and in particular is a stereoscopic camera (such as the PointGrey BumbleBee); it comprises two cameras 104, 106. The skilled person will appreciate that such a sensor could be provided by two separate cameras rather than as a single sensor 100.
(34) In other embodiments, the sensor 100 may comprise other forms of sensor such as a laser scanner or the like. As such, the sensor 100 may also be an active sensor arranged to send radiation out therefrom and detect reflected radiation.
(35) In some embodiments, it is conceivable that the vehicle 102 also comprises a second sensor 103, which in the embodiment being described is a laser scanner, such as a scanning-LIDAR sensor (the SICK LMS-151). Other laser sensors, such as those produced by Velodyne, may also be used. The second sensor is used to generate prior data providing a prior model from which data a second representation of the sensed scene can be generated; thus, the second representation may be thought of as a synthetic representation of the locale. However, in other embodiments the second sensor may not be provided on the same vehicle as the first sensor 100 and prior data may be collected and stored, as a prior model, for use on the vehicle 102.
(36) In the embodiment shown in
(37) Within the sensed scene, some of the objects remain static (i.e. do not move or change other than changes in lighting, etc) and an example of such a static image within
(38) Looking at
(39) Returning to
(40) The skilled person will appreciate that memory 124 may be provided by a variety of components including a volatile memory, a hard drive, a non-volatile memory, etc. Indeed, the memory 124 comprise a plurality of components under the control of, or at least accessible by, the processing unit 118.
(41) However, typically the memory 124 provides a program storage portion 126 arranged to store program code which when executed performs an action and a data storage portion 128 which can be used to store data either temporarily and/or permanently.
(42) In other embodiments at least a portion of the processing circuitry 112 may be provided remotely from the vehicle. As such, it is conceivable that processing of the data generated by the sensor 100 is performed off the vehicle 102 or a partially on and partially off the vehicle 102. In embodiments in which the processing circuitry is provided both on and off the vehicle then a network connection (such as a 3G UMTS (Universal Mobile Telecommunication System) or WiFi (IEEE 802.11) or like).
(43) In the embodiment shown, the program storage portion 126 comprises a localiser 130 arranged to localise the vehicle 102 (i.e. to provide a co-ordinate reference identifying the location of the vehicle); an image synthesizer 132 arranged to synthesize images; and a comparator 134 arranged to compare images. The data storage portion of the memory 128 comprises one or more sets of data each including a prior representation; one or more prior visual experiences 142 used by the localiser 130; and a pre-computed Jacobian 144 used to filter images.
(44) It is convenient to refer to a vehicle travelling along a road, or may be off-road, or the like, but the skilled person will appreciate that embodiments need not be limited to land vehicles and could be water borne vessels such as ships, boats or the like or indeed air borne vessels such as airplanes, or the like. Likewise, it is convenient in the following description to refer to image data generated by cameras 104, 106 but other embodiments may generate other types of the data. In particular embodiments may utilise stereo cameras; monocular cameras; laser scanners (such as LIDAR); or sensors that are capable of generating data containing depth information.
(45) The sensor 100, together with the processing circuitry 112 to which the sensor 100 is connected, together with the software running on the processing circuitry 112 form what is often termed a Visual Odometry (VO) system. In the embodiment being described, the VO system continuously produces a (possibly ephemerali.e. containing ephemeral objects) 3D model of the world using the data generated from the cameras (104, 106). Typically, the VO system locates points (which may be referred to as nodes) within each image from the camera pair which can be located in both images of the stereo pair). These points are then tracked between subsequent images (i.e. between a first time and a second time) to generate a trajectory of the sensor 100. Since the location of the sensor 100 relative to the vehicle is known then it is possible to also calculate, perhaps using the processing circuitry 112, the trajectory of the vehicle 102.
(46) In contrast to the prior art, such as trained detector systems, embodiments described herein do not detect objects per se (although object detection could conceivably be performed in addition) but rather determine what may be thought of the as the relevance of objects within the sensed scene. This may be though as what should be focused on within the sensed scene.
(47) Embodiments described herein rely on a prior model of the sensed scene. In the embodiment being described this prior model is generated using a prior scan of the environment using the laser scanner 103 as the sensor to make the survey together with the output of a VO system.
(48) Embodiments may use a LIDAR sensor together with the output of the VO system to generate a 3D point cloud as illustrated in
(49) Other embodiments may use techniques of generating the prior model other than a LIDAR/VO combination.
(50) In order to fuse the data output from each of the first 100 and second 103 sensors the trajectory of the vehicle (such as for example as output by the VO system) is used and the LIDAR data (i.e. output from the second sensor 103) is retrospectively compared with the camera data (i.e. output from the first sensor). This retrospective comparison of the LIDAR and camera data uses the fact that the vehicle 102 motion causes an overlap of the respective fields of the sensors 100, 103 if such an overlap did not occur already.
(51) Embodiments construct a swathe using a base trajectory estimate, X.sup.b(t), obtained using the trajectory generated by the VO and the putative calibration .sup.bT.sub.1 between the base trajectory and the LIDAR 1. Other embodiments may generate the trajectory from other sensor systems such as from an Inertial Navigation System (INS).
(52) The swathe is then projected into the camera using a calibration between the camera c and base trajectory .sup.bT.sub.c. An interpolated LIDAR reflectance image is then generated.
(53) Thus, the embodiment being described relies on using vision sensors (i.e. the cameras 104, 106) in conjunction with a prior 3D survey (i.e. a prior model) generated from a laser scanner 103. The skilled person will appreciate that whilst the laser scanner 103 is shown on the same vehicle 102, the prior representation of the scene may be generated by another vehicle.
(54) In particular therefore, embodiments use a prior model of the scene generated by a survey vehicle equipped with 3D laser sensors 103, cameras, and an Inertial Navigation System INS (which in one embodiment is an Oxford Technical Solutions (OxTS) RT-3042 Inertial Navigation System (INS) for groundtruth). More specifically, the embodiment being described uses a prior model provided by a 3D point-cloud in conjunction with stereo imagery generated from the sensor 100 (ie the VO system). An example a prior model of the environment, and in particular of a 3D point cloud, is shown in
(55) At least some of the embodiments process the data collected in the survey to generate a prior model which is free, or at least substantially free, of ephemeral objects. Processing of the data collected to remove ephemeral objects to produce the prior model is described in D. Wang, I. Posner, and P. Newman, What could move? finding cars, pedestrians and bicyclists in 3d laser data, in Proceedings of the IEEE International Conference on Robotics and Automation, Saint Paul, Minn., USA, May 14-18, 2012 and embodiments may use a similar method.
(56) Thus, it will be seen that
(57) At a high-level, a first embodiment may be described as follows: At runtime, the localiser 130 provided by the processing circuitry is arranged to match live stereo images, generated by the sensor 100, against prior visual experiences 142 (step 1200) using, in the embodiment being described, an Experience-Based Navigation (EBN) system in order to localise the vehicle 102 in its surroundings. EBN is described in the patent application GB1202344.6 an also in Practice makes perfect? Managing and leveraging visual experiences for lifelong navigation 2012 IEEE International Conference on Robotics and Automation (ICRA), 14-18 May 2012 each of which is hereby incorporated by reference and the skilled person is directed to read the contents. Such matching allows the current position of the vehicle 102 to be localised within the previous experiences of the (or another) vehicle such that the current location of the vehicle is determined. Typically, the localisation of the vehicle is performed such that the lateral position is determined to have an error of less than substantially 2 meters. In some embodiments, the lateral positional error is less than substantially 1 m. Further the positional error in the localisation may be less than substantially 10 degrees. Such accuracies may more readily be obtained by embodiments using VO systems than other sensor technologies such as GPS and the like.
(58) The EBN provides a transformation estimate against a prior visual experience (recall that one of the requirements for the prior representation 138 (i.e. a prior model) is that it has an associated image sequence). Table I below provides the system parameters used in our experiments, corresponding to the notation introduced below.
(59) Other systems may use other means to localise within the available prior representations available.
(60) For example, other embodiments may use prior data in the form of street-view images (such as GOOGLE) and a use a place-recognition system that queries a database of images to find the prior image, or a set of candidate prior images, that best matches the queried image. Then, the position and orientation is estimated using standard computer-vision techniques (e.g., estimating the Essential matrix).
(61) Embodiments using EBN are believed advantageous because they can more naturally handle sudden or gradual appearance changes such as due to changing weather conditions, lighting or seasons.
(62) Each prior visual experience 142 has an associated 3D point-cloud (i.e. a prior model 138) and the localiser 130 is arranged to select the prior model corresponding to the selected prior visual experience for further processing (step 1202).
(63) Once the vehicle 102 has been localised within a prior model 138 the image synthesizer is arranged to synthesise depth images (which may be thought of as a second representations) from estimated camera poses in this 3D model (step 1204). These synthetic depth images (second representations) are then used, by a comparator 134, to compare the current structure of the scene (given by the output from the sensor 100; i.e. the first representation) with the static structure of scene (given by the prior and associated synthesized image; i.e. the second representation) to identify large discrepancies (step 1206). This comparison provides a clean segmentation of the image into foreground and background elements, without the need for an object detection system.
(64) Thus, describing the first embodiment in more detail, the pose of the vehicle 102 is generated by the localiser 130. In the embodiment being described, this is performed by the EBN which is arranged to provide an estimated pose of the vehicle, denoted by a 61 column vector x, within the 3D scene model.
(65) Using this estimated pose, the image synthesizer 132 reprojects all of the points from the 3D prior model into the frame of the cameras 104, 106 of the sensor 10 to produce a synthetic depth image.
(66) For reasons of efficiency, the localiser 130 is arranged to restrict the size of the 3D prior model (i.e. the prior representation) that is considered by using a sliding window about the estimated position of the cameras 104, 106 within the sensor 100. Embodiments arranged in such a manner improve the speed of the system and reduce the amount of work that for the localiser 130. In the embodiment being described the window corresponded to a physical size within the prior representation of substantially 40 m. In other embodiments other window sizes may be used and for example windows sizes of substantially any of the following may be used: 20 m; 30 m; 50 m; 60 m; 70 m or the like.
(67) Thus, for every pixel, i, in the image, the image synthesizer computes the estimated depth, z.sub.i, in the local map according to the localisation estimate,
z.sub.i=z.sub.i(x+x), x(0,P.sub.x),(1)
where x is normally distributed noise given by covariance P.sub.x, which represents our localisation and calibration uncertainty.
(68) Due to the sparsity and sub-pixel values of the reprojections in the image, the image synthesizer performs bilinear interpolation and then applies a median filter for smoothing Other embodiments may utilise other image processing techniques. The image synthesizer may only perform interpolations on pixels that are within a specified threshold of their reprojected neighbours. In the embodiment being described the specified threshold is to only perform interpolations on pixels that have at least one neighbour within a 4-pixel radius. In other embodiments other thresholds may be used.
(69) As noted above, the prior representations 138 held in the memory have been preprocessed to remove ephemeral objects and as such the synthesized depth image (i.e. second representation) contains only the static/invariant/structural components of the scene.
(70) The removal of the ephemeral objects in the prior representation is exemplified with reference to
(71)
(72) It should be noted that the image shown in
(73) In addition to the step described above, some embodiments may cause the comparator 134 to perform additional steps as described below.
(74) Firstly, it is worth noting that calibration and/or localisation errors may lead to large disagreements in the foreground because of the inverse relationship between depth and disparity.
(75) Secondly, disparity differences for distant objects will naturally be smaller, meaning that there may be advantages in amplifying these weaker signals and some embodiments may be arranged to do this.
(76) Embodiments being described cause the localiser 130 to localise the position of the vehicle 102 against a single prior model 138. Accordingly, in such embodiments, it is not possible to learn a statistical model for the background as is done in most background subtraction methods of the prior art (i.e., we only have a single sample of the prior). However, it is conceivable that other embodiments may use more than one prior model in which case statistical models may become applicable.
(77) Thus, in the embodiment being described, the comparator 134 is arranged to take a probabilistic approach and weight the disparity differences by their associated measurement uncertainties which are obtained from the average depth Jacobian (step 1208). For every pixel, i, in the image, we define a disparity measurement from the dense-stereo algorithm, d.sup.c, and synthetic depth image, d.sup.s, as follows,
(78)
where d.sup.c.sub.i is normally distributed pixel noise with standard deviation .sub.d.sup.2c.sub.i, {f,b} are the intrinsic focal length and baseline, z.sub.i.sup.s() is the synthetic depth produced by a map-localisation estimate, x, with normally distributed noise given by the covariance matrix P.sub.x. Dropping the pixel sub-script for convenience, it is possible to define a disparity difference measurement as,
(79)
where:
(80)
(81) The Jacobian, z.sup.s/x, represents the change in depth that occurs given small perturbations of the vehicle's 102 pose. In view of the time that would be required to compute the Jacobian some embodiments use the following approximation. To begin, the term Z.sub.x is defined as follows:
(82)
which provides an estimate of the depth change at a particular pixel location, given the localisation uncertainty.
(83) Thus, embodiments may use an average depth Jacobian which is precomputed 144. In order to generate this pre-computed Jacobian a Jacobian was averaged over 500 keyframes (i.e. over a plurality of images) from a separate dataset. In other embodiments, other number of images may be used to generate the pre-computed Jacobian 144.
(84) This pre-computed average depth-Jacboian image 144, which may be thought of as a pre-computed filter, is shown in
(85) Denoting this approximation as {circumflex over (Z)}.sub.x, we have
(86)
allowing us to define our Mahalanobis disparity difference measurement as,
{tilde over (e)}.sub.d:={square root over (e.sub.d.sup.2/2.sub.e.sub.
(87)
(88) Thus, an effect, in embodiments that use the precomputed Jacobian 144, is that errors in the near field are down-weighted, which naturally brings out differences with objects that are farther away (i.e., the weaker signals for distant objects appear stronger since the foreground noise is reduced).
(89) An output from the comparator 134 may be termed a background-likelihood image which can be obtained by thresholding the uncertainty-weighted disparity as shown in
(90) In
(91) A second embodiment is now described which provides an alternative method for generating the background-likelihood image which relies on a so-called optical flow between images (the optical flow may be thought of as being the movement between portions of two images). Such an embodiment may utilise monocular images as opposed to the stereo images used in the method outlined above.
(92) To create a synthetic optical flow image at time t.sub.k, the synthetic depth image created from the prior model as described above and camera image at t.sub.k-1 are used to create a point cloud, which is typically shown as a coloured point cloud within images (step 1300). The motion estimate between times t.sub.k-1 and t.sub.k, denoted by the 44 SE(3) transformation T.sub.k, k-1 (Special Euclidean transformation defined in a 3D space in both translation and rotation; i.e. there is 6 degrees-of-freedom estimate of the pose of the vehicle (i.e., in x, y, z, roll, pitch, yaw), is applied and the point cloud is reprojected into the estimated camera pose at time t.sub.k to create a synthetic camera image as shown in
(93) Regions without any data (i.e., pixel locations where the nearest reprojected point is beyond a certain distance) are filled in with the intensity values from the true camera image. This step is performed in order to ensure that a full image is created and that there is no missing data, otherwise the optical flow algorithm produces a noisy result. After reprojecting the point cloud and filling in missing regions, an interpolation is applied (which in this embodiment is bilinear), followed by a Gaussian low-pass filter to smooth the image.
(94) Once we have generated a synthetic intensity image at time t.sub.k (step 1304), embodiments may use the method set out in C. Liu, Beyond pixels: Exploring new representations and applications for motion analysis, Ph.D. dissertation, MIT, 2009 to compute the true optical flow that is taken between the true image at t.sub.k-1 and the true image at t.sub.k (and the result is seen in
(95) The true optical flow measurement, f.sup.c, and synthetic optical flow measurement, f.sup.s, for pixel i are defined as,
f.sub.i.sup.c:=
f.sub.i.sup.s:=f.sub.i.sup.s(z.sup.s(x+x)), xN(0,P.sub.x).(10)
(96)
(97)
(98) Such a derivation introduces another Jacobian term, f.sup.s/z.sup.s, which represents changes in optical flow due to changes in depth. Computing this Jacobian term involves reprojecting coloured points, interpolating a grayscale image, and running it through an optical flow algorithm that computes local spatial and temporal derivatives. It is conceivable that embodiments could compute this Jacobian term, however, in the embodiment being described an approximation is performed as follows (step 1308).
(99) The approximation uses the intuition that scaling 2D flow fields by their associated depth approximates the 3D velocity. In the embodiment being described, the difference between the expected and observed flow is scaled by the expected depth to amplify large differences:
{tilde over (e)}.sub.f=e.sub.fz.sup.s.(13)
(100) Whilst this approximation does not explicitly account for uncertainties in the flow difference it was found to work well in practice and obviate the need to compute the Jacobian term thereby increasing the computational speed, etc.
(101) As with the output of the first embodiment described above, it will be seen that the background likelihood image comprises lighter regions for structural portions of the image shown in
(102) The skilled person will appreciate that each of the two embodiments described above generate a background likelihood image highlighting areas of the original image input to the embodiment that have a high likelihood of being a structural portion thereof. This background likelihood image may find further use in further embodiment.
(103) In one such embodiment, the background likelihood image may be used to improve the Visual Odometry (VO) front end. Initially the VO front end extracts features in the images of the stereo pair by using the FAST corner detector with a low threshold to obtain thousands of candidate features. (E. Rosten, G. Reitmayr, and T. Drummond, Real-time video annotations for augmented reality, in Advances in Visual Computing, 2005).
(104) In order to allow the method to process images in real time, the embodiment being described takes the top N features, ranked by their corner score, s.sub.i for further processing. In order to ensure that the features are well distributed spatially, the image is partitioned into a number of quadrants and the desired number of features N, is divided equally among each quadrant.
(105) The background-likelihood images may then be used to re-weight each corner score by looking up the closest likelihood weight, b.sub.i, and re-weighting according to the following
(106)
where .sub.b is a threshold for the minimum required likelihood. This threshold using the background-likelihood is useful because otherwise, embodiments would seek to find a minimum number of features in each quadrant, provided that the corner scores are above zero. This means that there could be a quadrant with low likelihood scores (close to zero, but not exactly zero), yet, the target number of features will still be taken since all scores have decreased by a proportional amount. Thus, the embodiment being described disregards features that are extracted in quadrants believed to belong to foreground objects. Thus, such embodiments remove outlier feature points (i.e., extracted feature points believed to be on ephemeral objects).
(107) In the experimental data that was collected a localisation error was computed by measuring the estimated frame-to-frame pose change output from the VO system against that measured by the INS. It is believed that this was a more appropriate measure than looking at cumulative errors since an orientation error in one frame may skew the results for the rest of the trajectory. Denoting the true frame-to-frame translation as .sub.t and the estimated as .sub.e, we define a frame-to-frame error measure as:
E.sub.xyz:=|.sub.e.sub.2.sub.t.sub.2|.(14)
(108) This error measure was computed for three implementations: (i) our standard VO system using RANSAC, (ii) our disparity-based method with RANSAC (i.e. the first embodiment described above), and (iii) our flow-based method with RANSAC (i.e. the second embodiment described above). The skilled person will appreciate that RANASC (RANdom SAmple Consensus) is a standard outlier rejection technique used to estimate parameters of a model (in the case for the embodiment being described, the pose of the vehicle) based on observations (in the case for the embodiment being described, feature matches between stereo images), where it is assumed that in the set of observations, there are outliers.
(109) To reiterate, the method used by at least some embodiments provides an extra step of outlier rejection before proceeding with RANSAC. The goal is to illustrate the improvements in VO by incorporating these likelihood images for feature reweighing.
(110) The system parameters used in relation to
(111) TABLE-US-00001 TABLE I SYSTEM PARAMETERS Parameter Description Value .sub.d.sub.
(112) A number of representative cases where the methods of the first and second embodiments out-perform the baseline are shown in
(113) Referring to
(114) In a second set of results, in a dense urban environment, there were signal-strength issues which resulted in poor GPS measurements, which are not accurate enough to groundtruth our motion estimates; i.e. the INS system did not function correctly. The skilled person will appreciate that this can be a common problem in urban environments, which strengthens the case that improving the robustness of relative motion estimation (as performed by the first and second embodiments described above) can be used to improve vehicle navigation.
(115) Referring to
(116) The results show that both the disparity-based (i.e. the first embodiment) and flow-based (i.e. the second embodiment) methods outperformed our standard VO system with comparable results (see
(117) An observable difference between the two embodiments was with regard to how they handled stationary objects. For the flow-based method, stationary objects where only identified if the camera was in motion, otherwise, the objects would reproject to the exact same location, which would not produce a flow difference. In contrast, the disparity-based method was able to detect stationary objects regardless of whether or not the camera was in motion. However, since tracking features on stationary objects does not directly impact the performance of egomotion, these two techniques ended up performing comparably.
(118) The skilled person will appreciate that elements of this embodiments described above may be provided in hardware, firmware or software.