Vehicle localisation using the ground surface with an event camera
11365966 · 2022-06-21
Assignee
Inventors
Cpc classification
G01S19/485
PHYSICS
International classification
G01B11/25
PHYSICS
G01S19/48
PHYSICS
Abstract
A method for estimating vehicle location by obtaining change events from an event camera's observations of a ground surface moving relative to the vehicle, determining a signature of the ground surface from the change events; and estimating the location using the signature. The change events may be processed to produce an 1st invariant representation of a ground surface patch for use as the signature. Alternatively, range measurements representing a patch may be used as the signature. A map is constructed having the representations of the ground surface patches including the locations of the patches. The same patch of ground surface is subsequently measured thereby obtaining a sequence of change events which are processed to produce a 2nd representation. The 2nd representation is matched to the map of 1st invariant representations. The location of the vehicle on the ground is determined based on the match.
Claims
1. A method for estimating a location of a vehicle, the method comprising: (a) obtaining change events from an event camera's observations of a ground surface moving relative to the vehicle, the change events arising from the ground surface moving relative to the vehicle; (b) determining a signature of the ground surface from the obtained change events; and (c) estimating the location using the signature; wherein the ground surface is a railway track bed surface.
2. The method of claim 1, wherein the signature of the ground surface characterizes ground surface texture having a wavelength of 50 mm or less.
3. The method of claim 1, further comprising the step of using motion data corresponding to the event camera's observations to produce a spatially-organized collection of the change events for the use as the signature.
4. The method of claim 1, further comprising the step of using timing data corresponding to the event camera's observations to produce a time-organized collection of the change events for the use as the signature.
5. The method of claim 1, further comprising illuminating the ground surface with structured electromagnetic radiation in the event camera's field of view.
6. The method of claim 1, further comprising the step of determining range measurements representing the ground surface based on the event camera's observations of the ground surface.
7. The method of claim 6, further comprising the step of aggregating the determined range measurements to produce a collection of range measurements representing a patch of the ground surface for the use as the signature.
8. The method of claim 6, further comprising the step of using motion data corresponding to the event camera's observations to combine the determined range measurements into a spatially-organized collection of range measurements representing a patch of the ground surface for the use as the signature.
9. The method of claim 6, further comprising the step of using timing data corresponding to the event camera's observations to combine the determined range measurements into a time-organized collection of range measurements representing a patch of the ground surface for the use as the signature.
10. The method of claim 1, further comprising the step of processing the change events to produce an invariant representation for the use as the signature.
11. The method of claim 10, wherein the step of processing the change events comprises using a spatio-temporal filter to produce a response when a sequence of change events with defined space and time properties are obtained, so as to produce the invariant representation.
12. The method of claim 10, wherein the invariant representation comprises an orthographic map of change events and the step of processing the change events comprises mapping the obtained change events from local event camera coordinates into orthographic map coordinates, so as to produce the orthographic map of change events.
13. The method of claim 12, wherein the mapping of the obtained change events comprises determining estimates of motion of the event camera.
14. The method of claim 13, wherein the estimates are determined using optical flow.
15. The method of claim 1 further comprising the step of positioning the event camera under the vehicle.
16. The method of claim 1 comprising the steps of repeating the observations of the ground surface and estimating the location by searching signatures from previous observations that have associated location information.
17. A system for estimating a location of a vehicle comprising: an event camera; and a processor configured to: (a) obtain change events from the event camera's observations of a ground surface moving relative to the vehicle, the change events arising from the ground surface moving relative to the vehicle wherein as the vehicle moves, a view of the ground surface in front of the event camera changes and this generates change events in the event camera as the intensity of any pixel increases or decreases; (b) determine a signature of the ground surface from the obtained change events; and (c) estimate the location using the signature; wherein the ground surface is a railway track bed surface.
18. A computer program product embodied on a nontransitory computer readable medium comprising one or more sequences of machine-readable instructions for estimating a location of a vehicle, the instructions being adapted to cause one or more processors to perform a method for estimating a location of a vehicle, the method comprising: (a) obtaining change events from an event camera's observations of a ground surface moving relative to the vehicle, the change events arising from the ground surface moving relative to the vehicle wherein as the vehicle moves, a view of the ground surface in front of the event camera changes and this generates change events in the event camera as the intensity of any pixel increases or decreases; (b) determining a signature of the ground surface from the obtained change events; and (c) estimating the location using the signature; wherein the ground surface is a railway track bed surface.
Description
BRIEF DESCRIPTION OF DRAWINGS
(1) Embodiments of the present invention will now be described, by way of example only, with reference to the drawings, in which:
(2)
(3)
(4)
(5)
(6)
DESCRIPTION OF EMBODIMENTS
(7) The inventors have devised a way to determine vehicle location that is not only lower-cost (when manufactured at scale), more compact, more discreet and more efficient than conventional approaches. It also determines position to mm accuracy and can work at practical vehicle speeds. Other advantages are robustness to environmental factors, simpler acquisition of mapping data and less mapping being required, and lower power and processing requirements.
(8) In embodiments described below, an example ground surface is a road surface. A road surface is a particularly important ground surface in the context of autonomous vehicles because road surfaces generally form a network allowing the autonomous vehicles to travel over large geographical areas.
(9) In an embodiment, a patch of road surface moving below the vehicle is observed to obtain a signature to determine uniquely the vehicle's location. A laser scanner, having a laser structured light source, may be used that can capture range measurements from the road surface in a way that can operate at vehicle speeds. This uses a special type of camera known as an ‘event camera’ or ‘dynamic vision sensor’ that can react much faster than conventional cameras. Conventional cameras are synchronous and run at a specific frame-rate whilst event cameras are generally asynchronous with each pixel reacting independently as soon as an intensity change is observed.
(10) A ground surface patch can be characterised by its structure and its appearance. Ground surface texture, such as road surface texture, comprises its structure and/or appearance.
(11) The structure of ground surface patches can be characterized by a number of properties such as their wavelength and mean depth. In the case of roads, these properties are intentional by-products of road construction and relate directly to desirable effects such as improved road traction, noise reduction and water dispersal; as well as less desirable effects such as increased rolling resistance and tyre wear. Macrotexture is defined as having a wavelength of 0.5 mm to 50 mm and is formed by the structure of graded aggregates particles protruding from an asphalt binding matrix. The individual size, shape, pose and dispersal of these particles—along with the manner in which they are combined and compacted into a road surface—creates a unique arrangement of surface features.
(12) Ground surface patches can also be characterised by their appearance. The appearance is both a property of the materials used (colour) and the interaction of the structure of the ground surface with light (shadows and shading). The appearance of the graded aggregates and asphalt binding matrix may also have variation with a wavelength of 0.5 mm to 50 mm, commensurate with the structure.
(13) In an embodiment, a sensor configuration using an event camera and a plane of laser light is used that can capture range measurements very much more rapidly than conventional LiDAR and therefore allows vehicle location estimation using fine-scale structure and in particular the fine structure of the ground surface along which the vehicle is travelling.
(14) Electromagnetic radiation will be referred to as light herein. The light may be visible or not visible.
(15) In the embodiment described with reference to
(16) Calibration methods for laser scanner systems are well known and typically involve capturing one or more images of a 2D or 3D calibration target intersecting the plane of laser light. Various calibration models exist, such as: (1) direct calibration where the range at each camera pixel is stored as a look-up-table; (2) a planar projection model where the range at each camera pixel is represented as a planar projection from the camera image plane to the laser plane; and (3) a camera-plane model comprising a camera model that defines a ray for each pixel and a plane model representing the position of the plane relative to the camera such that the range at each pixel is determined by the intersection of the ray and the plane.
(17) With reference to
(18)
(19) The vehicle is travelling across a ground surface 104 such as a road or track bed (the rails are not shown). The vehicle has an acquisition unit 106, the operation of which is illustrated by
(20) As illustrated by
(21) In this way, motion data corresponding to the event camera's observations is used to produce a spatially-organised collection of the range measurements. In a further embodiment motion information may not be available and a range model is constructed where the axis orthogonal to the lines of range measurements being collected represents time (this axis typically represents the direction—but not the magnitude—of motion). Thus, timing data corresponding to the event camera's observations is used to produce a time-organised collection of range measurements. In order to facilitate matching to a collection of later acquired range measurements, the range model may require to be adjusted to a different binning frequency or warped in order to accommodate differences in speed and direction. This operation can be eliminated by applying a spatial mapping to the temporal data.
(22) To reduce the detection of events that are not created at the observed laser stripe the event camera may be fitted with an optical filter tuned to the frequency of the laser light. This filter may also be tuned to the polarisation of the laser light. Events not created at the observed laser stripe may also be filtered out using software running on the processor. For example, the image position of the observed laser stripe may be estimated by histogramming the position of events and then rejecting events that do not contribute to the peak of the histogram. It may be preferable to use a frequency of laser light than does not naturally occur in the environment to further reduce unwanted events.
(23) With reference to
(24) Various methods for matching range measurements may be used. Some methods work directly on range measurements, for example the well-known Iterative Closest Point (ICP) algorithm. Other methods process the range measurements to extract features and matching is performed using these features. A common technique for feature based matching is the RANSAC (Random sample consensus) algorithm and its variants. An initial match based on extracted features may be further refined by applying ICP and the refined match qualified as a good or bad match based on the fit error.
(25) Matching of features and subsequent localisation may be done using Particle Filter Localisation also known as Monte Carlo Localisation. A Particle Filter maintains a set of potential matches between the stored model and newly acquired data. Each particle is evaluated for goodness of fit and a strategy is employed to discard poorly fitting particles and to generate new replacement particles. An advantage of a Particle Filter Localisation is that provides an efficient and effective way to optimise matching by using the results of earlier matches and an estimate of the motion.
(26) In an embodiment where motion information is not available in the construction of a range model, a range model may be stretched or warped relative to another range model representing the same location along the axis in the direction of motion. The matching algorithm is required to determine one (for simple linear stretching) or more (for more complex warping) additional parameters to account for this stretching or warping.
(27) As well as being used for matching, a signature can also be used as an index into an associative array of stored [signature, location data] pairs. Matching uses one-by-one comparisons whereas indexing uses the signature as a key to access a value in an associative array. This may be implemented using hashing. This is dependent on the quantisation of the signature resulting in a consistent and unique hash value, and that the same input data consequently sampled from the same exact physical point will have guaranteed invariance during capture in order to generate this same hash value.
(28) Matching and indexing are both examples of searching.
(29) With reference to
(30) 202: The structure of patches of the ground surface is measured using an event camera (dynamic vision sensor) and structured light, for example a laser plane projected on the ground surface. A sequence of change events arise from the event camera's observations of the laser stripe scattering off the ground surface moving relative to the vehicle. The change events thus are generated in response to motion of the event camera in the vehicle across the ground surface. In the description and claims, arising from should be interpreted to mean as a consequence of. The change events are stored in a memory. As discussed above, the change events are processed to determine range measurements using the known calibrated position of the laser plane relative to the event camera. An accelerometer (and/or other source of motion information) is used to combine range measurements obtained using the event camera/laser system into a collection of range measurements stored as a range model. The purpose of combining motion information is that organises the range model observations spatially, as opposed to temporally, which facilitates invariance to the speed of capture and simplifies matching.
(31) 204: Construct a map of the ground surface comprising range models of the ground surface patches including the locations of the patches. A map is a spatial organisation of information within a common coordinate frame and in this example is a collection of signatures annotated with locations.
(32) 206: Measure structure of the same patch of ground surface using an event camera (dynamic vision sensor) and structured light, for example a laser plane projected on the ground surface. Thereby a sequence of change events are obtained from the event camera's observations of the same patch of ground surface moving relative to the vehicle. Again, the change events are processed to determine range measurements using the known position of the laser plane relative to the event camera. An accelerometer (and/or other source of motion information) is used to combine the range measurements obtained using the event camera/laser system into a collection of range measurements stored as a range model.
(33) 208: Optionally, an initial position is determined using GPS and/or odometry data.
(34) 210: Match the range model generated in step 206 to the map of the ground surface constructed in steps 202 and 204. This may use the initial position if available from step 208 to limit the search to range models in the map close to the initial position.
(35) 212: Determine location of vehicle on the ground based on the match.
(36) The fine structure of the ground surface is thus used to estimate the location by using the range model, derived from event camera change events, as a signature of the ground surface at the location of the vehicle. This surface moves very quickly at typical vehicle speeds and this problem is addressed by capturing the relative movement of the ground surface structure with an event camera.
(37) An advantage of this embodiment is that because the lighting is controlled, the sequence of change events is consistent for two subsequent traversals over the same surface because the lighting is consistent resulting thus avoiding different shading and shadows. Furthermore, the system works well in the dark.
(38) In a further embodiment the range models may be processed to extract range model features. Features may, for example, be defined at extrema points or points of inflection of the curvature of the range model and the feature may describe the local geometry. Range model features may then be used as a signature of the ground surface at the location of the vehicle.
(39) In another embodiment different light sources may be used. A simple light source is a non-structured, for example uniform, source of light bright enough so that the appearance of the ground surface as observed by the event camera is largely dependent on this light rather than the external light. This can be improved further by (1) selecting the position of the acquisition unit, such as under the vehicle, to reduce external light or (2) using a light source of a certain wavelength and having a filter in front of the event camera to exclude other wavelengths (for example near-infrared).
(40) The height of operation of the event camera may be determined relative to the scale of features present in the ground surface and the optical system used to resolve it.
(41) The optical system of the event camera may be optimized to minimise the effects of radial distortion and other aberrations by calibration. The optical system of the event camera may be kept clean by choice of housing material and mechanical means.
(42) As the vehicle moves, the surface in front of the camera changes and this generates change events in the event camera (as the intensity of any pixel increases or decreases by an amount above a threshold). This sequence of change events is used construct an invariant representation that is used as a signature for the location of the vehicle. The change events are dependent upon both the 3D ground surface structure, since the intensities viewed by the pixels will be dependent upon the structure due to shading and shadows, and dependent upon the ground surface appearance due to its material properties. Because the vehicle approach angle (within lane) is very constrained, the surface is often very similar.
(43) With reference to
(44)
(45) As illustrated by
(46) With reference to
(47) 402: A patch of ground surface is observed using an event camera. This generates a sequence of change events arising from the ground surface moving relative to the vehicle.
(48) 403: The sequence of change events are processed to produce a 1st invariant representation that is stored in a memory.
(49) 404: Construct a map comprising the 1st invariant representations of the ground surface patches including the locations of the patches.
(50) 406: Measure the same patch of ground surface thereby obtaining a sequence of change events from an event camera's observations of the same patch of ground surface moving relative to the vehicle.
(51) 407: The sequence of change events are processed to produce a 2nd invariant representation.
(52) 408: Optionally an initial position is determined using GPS and/or odometry data.
(53) 410: Match the 2nd invariant representation determined from change events generated in step 406 to the map of 1st invariant representations determined from the change events in steps 402, 403 and 404. This may use the initial position from step 408 to limit the search for 1st invariant representations in the map close to the initial position.
(54) 412: Determine location of vehicle on the ground based on the match.
(55) Again, the fine details of the ground surface are thus used to estimate the location by using the change events as a signature of the ground surface at the location of the vehicle. This surface moves very quickly at typical vehicle speeds and this problem is addressed by capturing the relative movement of the ground surface structure and/or appearance with an event camera.
(56) We now describe the sequence of change event data and how this may be used to create an invariant representation. Each event has a pixel location, a time stamp and polarity (+ if it gets brighter and − if it gets darker for example). Individual “raw” events are unlikely to be useful for matching without some processing since, unless the camera follows exactly the same trajectory under exactly the same lighting and at the same speed, the positions and relative timing of the events will be different. The events are therefore processed to produce an “invariant” representation that is suitable for searching by matching or indexing. Invariant means a representation that has properties that do not change given certain changes in the input, e.g. lighting and speed.
(57) In one embodiment this invariant representation is not explicitly defined and machine learning techniques are used to determine this automatically from examples. Learning may be supervised where features in the event data being used for training are labelled or learning may be unsupervised where the event data being used for training is not labelled. There may also be a control signal—in which case the 3rd form of machine learning (i.e. reinforcement learning) may be used.
(58) In a further embodiment multiple events are explicitly integrated into a larger representation that has invariant properties that can be matched. For example, events may be aggregated over a short period of time to produce features.
(59) In a further embodiment spatio-temporal filters are used that produce a response when a sequence of events with certain space/time properties are observed. The output of these filters is the invariant representation used for searching by matching or indexing.
(60) In a further embodiment the invariant representation is an orthographic map of change events. The ground is assumed to be planar and viewed from a known height with an event camera with known calibration allowing the events to be mapped from perspective coordinates to orthographic coordinates. The motion of the event camera is estimated using optical flow. The motion can be estimated from the change events, the change events arising from the ground surface moving relative to the vehicle.
(61) A suitable example of estimating motion directly from the change events using optical flow is described in “Simultaneous Optical Flow and Intensity Estimation From an Event Camera”, Patrick Bardow, Andrew J. Davison, Stefan Leutenegger; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 884-892. Optical flow may be defined as the correspondence field between two temporally close intensity images, for example an estimation of two parameters per pixel, the horizontal and vertical displacement between one frame and the next. Another way of estimating motion from the change events using optical flow is by forming event images by accumulating events over a period of time and using conventional methods for optical flow applied to consecutive event images.
(62) An orthographic map is determined by mapping events from local event camera coordinates into orthographic map coordinates using the optical flow estimates. The values stored in the map are derived from the change events, for example by calculating the mean value of polarity of change events at each point in the orthographic map. The map may be stored as a discrete sparse array. This is illustrated in
(63) With reference to
(64) In another embodiment, the ground surface is observed with an event camera, but controlled or powered light is not introduced and the natural environmental lighting is exploited.
(65) Embodiments of the present invention provide a sensor that is significantly cheaper (when manufactured at scale), capable of operating at speed, with robustness to environment lighting, and with lower power and lower processing requirement—for capturing the detailed structure and/or appearance of the underlying ground surface patches. A map comprising only the ground surface below the vehicle (rather than other approaches that capture much more of the environment) may be used to perform accurate localisation. Such a map comprises less data, is applicable in both rural and urban environments and is able to offer an extremely precise location.
(66) There are various advantages. The cost of manufacturing the acquisition unit in volume can be less expensive than LiDAR.
(67) Localisation accuracy is much higher than conventional systems given the scale of surface resolved (improving accuracy from m's or cm's to mm's).
(68) The sensor is much more compact and can be mounted discretely on a vehicle as it only requires a view of the ground surface, not the full environment.
(69) The system will work wherever there is a visible, stable ground surface. Even if the surface is partially occluded this approach can work due to the robustness of the matching/representation in the absence of data. Existing systems rely upon fixed large-scale structures that might not be present out of urban centres. Some embodiments work in the dark because of the active illumination of the ground, for example by a laser plane or spread-out light source.
(70) Known systems require complex 3D maps of the environment that may contain lots of extraneous data (pedestrians, vehicles, vegetation) that needs to be cleaned up or ignored in matching algorithms.
(71) Embodiments use a simple top-down view of the ground surface, which simplifies the capture, storage, and processing required.
(72) Location could be (but is not limited to) a particular place or position. Location data may mean data representing a location within a global coordinate system, for example GPS coordinates. Location data may also mean data representing location within a local or relative coordinate system, for example a road-centric coordinate system where the data directly represents a location on a road or a rail-centric coordinate system where the data directly represents a location on a railway. Location data may also mean data that is stored at a location and accessed at that location and not necessarily a description of that location.
(73) The term signature can mean, but is not limited to, a distinctive pattern, product, or characteristic by which something can be identified.
(74) The term ground surface can mean, but is not limited to, any ground surface along which a vehicle may travel, including for example roads, railway track beds (outside or between the rails), sidewalks, parking lots, the surface of an airport or sea port and the ground in an industrial complex.
(75) Embodiments of the invention may be implemented using a computer program comprising one or more sequences of machine-readable instructions describing methods of estimating a location of a vehicle as described above. This computer program may be executed in a programmable data processing apparatus, for example within a vehicle. There may also be provided a data storage medium (e.g. semiconductor memory, magnetic or optical disk) having such a computer program stored therein.