VISUAL POSITIONING SYSTEM

20200401617 ยท 2020-12-24

Assignee

Inventors

Cpc classification

International classification

Abstract

Systems and methods for image-based self-localization of an image capture device, include receiving a query image of an undetermined location of the image and comparing the query image to a set of geotagged reference images to determine a location of the query image. The system and method may alternatively compare a series of query images to reference images or portions of feature vectors generated from said query images and reference images to determine the closest query image in the series to respective reference images. The set of geotagged reference images may include a sequence of previously obtained reference images of a route, each reference image corresponding to a known geolocation. The geolocation of the user may be determined based on the location of the query image within the set. Ambient conditions of the images may be used to improve comparison of a query image to reference images. Segment and/or abstractions of cell images may be used to reduce computational and/or communications resources.

Claims

1. A method for image based self-localization of a user comprising the steps of: receiving one or more query images at a processor; identifying an approximate location of query image capture; selecting a subset of geotagged reference images proximal to said approximate location of query image capture, wherein said reference images, each correspond to a known geolocation; using the processor to compare said query images to a said subset of geotagged reference images to determine which reference image most closely matches a query image; and using the processor to determine a geolocation of the capture of said query image by reference to the geotag of the reference image most similar to said query image.

2. The method of claim 1 comprising using the processor to control a device based on determination of the geolocation of the user.

3. The method of claim 1 comprising: assigning a location sensitive tag to the query image; and comparing the tag of the query image to tags linked to each reference image to determine the location capture of the query image.

4. The method of claim 1 comprising using the processor to determine the location of capture of the query image within the reference images based on a known parameter of the user's movement.

5. The method of claim 1 wherein the step of comparing comprises using a Convolutional Neural Network.

6. The method of claim 1 wherein the reference images are linked to known ambient conditions.

7. The method of claim 1 comprising using the query image to update the reference images.

8. The method of claim 1 comprising using the processor to select a subset of reference images based on a previously determined geolocation of the image capture and based on a probable route.

9. The method of claim 1 comprising: assigning a location sensitive tag to the query image; comparing the tag assigned to the query image to tags linked to each reference image in the sequence of reference images, the tags sensitive to location of each reference image within the sequence, to determine a location of the query image within the sequence; and determining the matching image based on the location of the query image within the sequence.

10. The method of claim 9 wherein determining ambient conditions is based on information automatically obtained from a time recording device associated with the user.

11. The method of claim 10 comprising: determining a location of the first matching reference image within a sequence of the first set of reference images to determine a first geolocation; determining a location of the second matching reference image within a sequence of the second set of reference images to determine a second geolocation; combining the first and second geolocations to a combined geolocation; and determining the geolocation of the user based on the combined geolocation.

12. The method of claim 1 comprising using the query image to update the sequence of images linked to known ambient conditions.

13. The method according to claim 1 wherein one or more reference images are processed to determine features that contrast similar reference images proximal said one or more references image(s); weighting said features that contrast in the reference images in the processor determination of similarity of said reference images to said query image.

14. A method for self-geolocation comprising the steps of: determining approximate location of a query image capture; identifying a geotagged reference image captured proximally to said location of said image capture, wherein said geotagged reference image includes a marked point of interest; comparing a series of query images to said identified geotagged reference image to find the query image that best matches the reference image; tracking a current location relative to a location of capture of the query image that best matches the reference image.

15. The method according to claim 14 further comprising the step of identifying aspects of a query image corresponding to a marked point of interest in said reference image to determine the relative location of the query image that best matches the reference image and the geotagged position of the reference image.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0106] FIG. 1 schematically illustrates a system for image based self-localization and mapping.

[0107] FIGS. 2A-C schematically illustrate methods for image based self-localization and mapping in a known route.

[0108] FIGS. 3A-D schematically illustrate methods for image based self-localization and mapping using ambient conditions.

[0109] FIG. 4 shows an image-based location recognition system.

[0110] FIG. 5 shows a method for self-location which may be carried out by processor 102.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0111] Before the present invention is described in further detail, it is to be understood that the invention is not limited to the particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.

[0112] In the following description, various aspects of the present invention will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the present invention. However, it will also be apparent to one skilled in the art that the present invention may be practiced without the specific details presented herein. Furthermore, well known features may be omitted or simplified in order not to obscure the present invention.

[0113] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It must be noted that as used herein and in the appended claims, the singular forms a, an, and the include plural referents unless the context clearly dictates otherwise.

[0114] Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as analyzing, processing, computing, calculating, determining, detecting, identifying, creating, producing, finding, combining or the like, refer to the action and/or processes of a hardware based or software driven computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices. Unless otherwise stated, these terms refer to action of a processor or hardware.

[0115] All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention or work of an inventor. Further, the dates of publication provided may be different from the actual publication dates, which may need to be independently confirmed.

[0116] An exemplary system, which may be used for image based self-localization and mapping, is schematically illustrated in FIG. 1.

[0117] System 100 may include a processor 102 in communication with one or more camera(s) 103 and with a device, such as a user interface device 106 and/or other devices, such as storage device 108.

[0118] Components of the system may be in wired or wireless communication and may include suitable ports and/or network hubs. Components of the system may communicate via USB or Ethernet or appropriate cabling, etc.

[0119] Processor 102 may include, for example, one or more processors and may be a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a field-programmable gate array (FPGA), a microprocessor, a controller, a chip, a microchip, an integrated circuit (IC), or any other suitable multi-purpose or specific processor or controller. Processor 102 may be locally embedded or remote.

[0120] Processor 102 may receive image data (which may include data such as pixel values that represent the intensity of reflected light as well as partial or full images or videos) from the one or more camera(s) 103 and runs processes according to embodiments of the invention.

[0121] Processor 102 is typically in communication with a memory unit 112, which may store at least part of the image data received from camera(s) 103.

[0122] Memory unit 112 may include, for example, a random access memory (RAM), a dynamic RAM (DRAM), a flash memory, a volatile memory, a non-volatile memory, a cache memory, a buffer, a short term memory unit, a long term memory unit, or other suitable memory units or storage units.

[0123] In some embodiments the memory unit 112 stores executable instructions that, when executed by processor 102, facilitate performance of operations of processor 102, as described herein.

[0124] System 100 may be a mobile device. Components of system 100 may be vehicle-mounted. System 100 may be wearable, like smart glasses. The components may be located in different physical devices.

[0125] For example, processor 102 may be integrated into an onboard vehicle computing system, a smartphone or networked to a remote location. Cameras 103 may be vehicle mounts or integrated into a vehicle or in a wearable device like smart glasses. User interface device 106 may be vehicle-mounted or part of a connected smart device, local or remote.

[0126] Storage device 108 may be a server including for example, volatile and/or non-volatile storage media, such as a hard disk drive (HDD) or solid-state drive (SSD). Storage device 108 may be connected locally or remotely, e.g., in the cloud. In some embodiments storage device 108 may include software to create and maintain databases of reference image sequences and reference images linked to known geolocations and/or known ambient conditions.

[0127] Camera(s) 103 is typically configured to obtain images of a route traveled, by a vehicle or otherwise. A vehicle may be a mobile or a moving device such as a robot, or any other traveling beings or devices such as a car, boat, or electric bicycles. Routes traveled by a vehicle may include routes in an urban environment or other outdoor or in-door environment. Thus, in one embodiment, a camera 103 may be placed and/or fixed to e.g., glasses worn by a person or to a vehicle, such that at least part of the route being traveled by the vehicle is within the field of view (FOV) of the camera 103.

[0128] Camera 103 is an image capture device and may include a CCD or CMOS or other appropriate chip and a suitable optical system. The camera 103 may be a 2D or 3D camera. In some embodiments the camera 103 may include a standard camera provided, for example, with mobile devices such as smart-phones or tablets.

[0129] In one embodiment, processor 102 receives a query image, namely, an image captured by camera 103 whose location is sought. Processor 102 compares the query image to reference images, which are previously captured images, and finds a matching reference image, typically based on similarities between the image and the reference image. The match need not be an identical match. The reference image may be matched to a query image based on an acceptable degree of similarity in the absolute sense or relative to other reference images.

[0130] In one embodiment, the reference images may be indexed by location. The set of reference images may be distributed over an area of interest. The area of interest may be a geographic region or be more limited, such as along a route or anticipated route. When used in a vehicle whose movement is limited, the set of images may be similarly limited. For example, a train-mounted system would only need images along train tracks. Processor 102 may access the location index for a matching reference image. The set of reference images may be a sequence of reference images of a route, each reference image may be linked to a known geolocation. The set of reference images indexed by location may be stored, for example, in storage device 108. Processor 102 may determine the geolocation of the user (or of the camera 103) based on the location best matching reference image, as further detailed herein.

[0131] In another embodiment, processor 102 receives a query image captured by camera 103 located in or on a vehicle, or carried by a person or other user. Processor 102 may compare the query image to a set of reference images, which are previously captured images linked to known ambient conditions. Based on the comparison, processor 102 may search for a matching reference image and determines the geolocation of the vehicle based on the matching reference image, as further detailed herein.

[0132] The term ambient conditions refers to conditions in the environment which affect the imaged scene. Such conditions may include, for example illumination levels and color in the environment being imaged. These conditions may be influenced, for example, by the season of the year, the time of day or night, the location within the city or other site, the amount of vegetation in the scene being imaged and more. Thus, ambient conditions may include time or location related descriptions. For example, ambient conditions may include conditions such as summer, winter, evening, city center at noon, city center at night, countryside at night, etc.

[0133] In some embodiments images obtained by camera 103 (e.g., still images or a video) may be displayed to a user, e.g., via user interface device 106. Augmented reality annotations and/or additional information may be added to the displayed images.

[0134] The user interface device 106 may include a display, such as a monitor or screen, for displaying images, instructions and/or notifications to a user (e.g., via text or other content displayed on the monitor). User interface device 106 may also be designed to receive input from a user. For example, user interface device 106 may include a monitor and keyboard and/or mouse and/or touch screen and/or a smartphone to enable user input.

[0135] An example of a method for system self-location, which is carried out by processor 102, may be based on a process of comparing a query image to a set of reference images having known capture locations and extrapolating the location of capture of the query image from the capture location index of one or more of the closest reference images, as schematically illustrated in FIG. 2A.

[0136] In one embodiment, the method may include accepting an indication of route (or location) 202 and receiving an image (e.g., from camera 103) whose capture location is to be ascertained (also termed query image) (step 204) and comparing the query image to a set of reference images which includes a sequence of previously obtained images of a route, to find a reference image which matches the query image (step 206). Typically, each of the reference images in the set corresponds to a known geolocation. The step 202 may be based on a prior location determination or a GPS reading. The location need not be accepted as without error. The location of capture of the query image is determined to be at or near the location of capture of a matching reference image.

[0137] The set may include previously captured reference images of the specific route, each reference image corresponding to a known geolocation. It is possible to derive an ordered subset of reference images, the sequence of which corresponds to the specific route. Thus, processor 102 may determine the self-geolocation, i.e., the geolocation of the camera 103 used to capture the query image based on the geolocation of the matching reference image (step 208).

[0138] In another embodiment processor 102 may use the query image to update the reference image set. For example, a camera may capture an image of a location (e.g., location X) along the route from a viewpoint which may be used to enhance the set of reference images based on points of view or other ambient condition different from the viewpoint of one or more of the reference images that correspond to location X. This may be due to a unique capture location or a changing environment. Thus, the query image may be sufficiently similar to a reference image to ascertain location and still be useful to supplement a reference image set. The query image may be added to the set of reference images such that more viewpoints and/or appearances are available for comparison, thereby updating and improving the reference set. Thus, in some embodiments, storage device 108 may be controlled to update the reference images and location index in storage device 108.

[0139] In some embodiments the step of comparing the query image to the reference set (step 206) may include using a machine learning process. For example, a convoluted neural network (CNN) may be used. In one embodiment a CNN having a netVLAD component is used. A CNN may be used, which aggregates mid-level convolutional features extracted from an entire image, into a compact features vector representation that can be efficiently indexed. This may be achieved, for example, by plugging into the CNN architecture a generalized Vector of Locally Aggregated Features (Descriptor) layer (netVLAD) which will output an aggregated representation that can then be compressed (e.g., using Principle Component Analysis (PCA)) to obtain a compact descriptor of the image.

[0140] The vector representation (or other compact representation) of the image may be indexed, e.g., by using hashing methods, to facilitate retrieval of the images from the database during the comparing step (step 206). For example, Locality-sensitive hashing (LSH) can be used. Although LSH reduces the dimensionality of high-dimensional data, in a limited set of images (as obtained by embodiments of the invention), the possible loss of data is not a constraint but rather may provide better results than neural networks alone.

[0141] The images or compact descriptors of the images may be input to a Siamese network, which learns to differentiate between two inputs. A Siamese network consists of two identical neural networks, each taking one of the two input images. The last layers of the two networks are then fed to a contrastive loss function, which calculates the similarity between the two images. The output from the Siamese network may thus be used to compare between images and to find a match between a query image and a reference image.

[0142] In one embodiment, an example of which is schematically illustrated in FIG. 2B, a set of reference images 20 of a specific route may be created, for example, by traveling the route in advance while capturing reference images of the environment at known geolocations (e.g., known GPS coordinates) in the order of these locations within the route. A reference image may be captured once every few kilometers, meters or centimeters (for example), depending on the speed of travel and required resolution of the map.

[0143] In another example, a set of reference images 20 may be created by sampling images from reference image database 21, which are images at known geolocations of the world, such as panoramic views from positions along streets or other areas featured, for example, by Google Maps and Google Earth.

[0144] Typically, not all of the images of the set of images 21 are used to create set of reference images 20 (as illustrated by the dashed arrows).

[0145] In this embodiment map 20 includes an ordered subset of reference images, the sequence of which corresponds to a specific route, each reference image being linked or corresponding to a known geolocation. The order of the reference images in the sequence is typically route-dependent and is not necessarily the same order of the images in the set of images 21. For example, a pre-determined route may pass through locations X.sub.1, X.sub.2, X.sub.3, X.sub.4, X.sub.5 and X.sub.6, in that order. In this case, images of map 20 will be arranged such that an image (or images) of location X.sub.1 will be first in the sequence, followed by an image (or images) of location X.sub.2, followed by an image (or images) of location X.sub.3 and so on, until location X.sub.6, even though this was not necessarily the order of these images in the set of images from which set of reference images 20 was created.

[0146] In some embodiments, a set of reference images of a route may be created by processor 102 on the fly and may be dynamically changed based on locations the user passes and based on possible and probable routes as dictated by physical constraints in the real world, such as roads, railways, rivers, etc. For example, two consecutive geolocations of an image capture device may be determined to be X.sub.1 and X.sub.2. These locations may be mapped to a known map of routes (e.g., from Google Maps). In the map of routes, locations X.sub.1 and X.sub.2 are mapped to a set of reference images that proceeds through locations X.sub.3, X.sub.4, X.sub.5 and X.sub.6. For example, locations X.sub.1 and X.sub.2 may be mapped to a one way road in which a vehicle can only proceed by passing through locations X.sub.3, X.sub.4, X.sub.5 and X.sub.6. Thus, a probable route for a vehicle passing through locations X.sub.1 and X.sub.2 would include locations X.sub.3, X.sub.4, X.sub.5 and X.sub.6. In this case, processor 102 can select a set of reference images of the route based on a previously determined geolocation of the user (e.g., locations X.sub.1 and/or X.sub.2) and based on a probable route.

[0147] In one embodiment, an example of which is schematically illustrated in FIG. 2C, each of reference images A, B, C and D in the reference image set 20 for a route may have a location sensitive tag linked to it (tags 1, 2, 3, and 4, respectively). A location sensitive tag may include a textual, spatial or other description relating to the location of the images within the sequence of images that make up a reference image set 20 for a route. The location sensitive tags linked to reference images may be established as the reference database is curated and updated. In addition to location of the image within the sequence, the tags may also include geographical location and/or other information. Assigning or linking tags to images may be assisted by hints, such as the geolocation, the specific route, known velocity of the user traveling the route, etc., as further described below.

[0148] Location of an image within a sequence of images may be determined based on similarity between images, e.g., by using image analysis and/or machine learning techniques as described above. For example, reference image A may be compared to reference images B, C and D and determined to be most similar to reference image B, whereas reference image D is determined to be most similar to reference image C and reference image B is similar to both reference images A and C, and so on.

[0149] In one embodiment a location sensitive tag 3 is assigned to a query image I and the tag 3 is compared to the tags 1, 2, 3 and 4 linked to each of the reference images in the set of reference images 20, to determine a location of the query image I within the set of reference images 20. In this case tag 3 is most similar to tag 3, thus, query image I is determined to be located at the location of reference image C (identified by tag 3) in set of reference images 20.

[0150] Comparing tags, rather than images requires less processing power and therefore provides a faster and more economic method for self-localization in real-time (or near real-time). Use of this method and other embodiments described herein, greatly improves and facilitates a self-localization and mapping device.

[0151] In some embodiments, determining the location of the query image I within set of reference images 20 can be done by using hints. For example, a known parameter of the user's (e.g., vehicle's) movement can be used to determine the location of the query image. Parameters of the vehicle's movement, such as velocity and/or acceleration of a vehicle, can be received, for example, from the telematics device of the vehicle, and can be used to provide an estimate of the vehicle's query location. In another embodiment, data from motion sensors and/or image processing to detect changes in consecutive images, can be used to estimate change in location over time and can be used to provide an estimate of the location of the image capture device.

[0152] For example, a set of reference images A, B, C and D is created for a pre-determined route which includes locations X.sub.1, X.sub.2, X.sub.3 and X.sub.4, in that order. Reference images A, B, C and D relate to locations X.sub.1, X.sub.2, X.sub.3 and X.sub.4, respectively. An image capture device traveling this route captures query images along the way and each current image is compared to the reference images of the map reference image set to determine a matching reference image, thereby to determine a geolocation of the user. Once a first current image is matched to reference image A, the second current image (a current image captured after the first query image was captured) may be compared to reference images B, C and D only, thereby limiting the search for a matching reference image. The search may be further limited by using, for example, the velocity or other parameter of the vehicle, for example, to calculate an estimated distance traveled from the last query location (e.g., location X.sub.1 which corresponds to reference image A) to a subsequent query location. The estimated distance may indicate, for example, that the subsequent query image location is close to location X.sub.4 such that the search for a matching reference image may be further limited to images C and D only.

[0153] In some embodiments the reference images A, B, C and D of the set of reference images 20 correspond to known ambient conditions in addition to corresponding to geolocations. The ambient conditions may be used to further limit the search for a matching reference image, thereby reducing the required processing power and providing an improved and facilitated self-localization and mapping device.

[0154] A query image may be compared to the sequence of reference images making up set of reference images 20, to find a matching image in the sequence, based on an ambient condition, as further detailed below.

[0155] FIGS. 3A and 3B schematically illustrate a method for image based self-localization using ambient conditions. The method, which may be carried out by processor 102, may include receiving a query image I of an undetermined geolocation of the user (e.g., a vehicle) (step 302). Thus, query image I can be captured by a camera located at a point of interest such as in a vehicle or by a camera mounted on an object or carried by a user. The query image I is compared, at processor 102, to a set of reference images in a route (step 304), which can be maintained in storage device 108. The set of reference images may be linked to known ambient conditions, and the query image I is compared to the set of reference images to find a matching reference image. The geolocation of the vehicle (self-location) is determined based on the matching reference image (step 306).

[0156] In some embodiments, the location of the matching reference image within the sequence of the set of reference images is determined by processor 102. The geolocation of the image capture device can then be determined based on the location of the matching reference image within the sequence, as described above. The location of the matching reference image within the sequence can be used together with a known parameter of the movement of the image capture device to determine the geolocation of the image capture device, as described above.

[0157] Ambient conditions are conditions in the environment that affect the imaged scene. Ambient conditions may include time or location related descriptions. For example, ambient conditions may include conditions such as summer, winter, evening, city center at noon, city center at night, countryside at night, etc. A set of reference images linked to a known ambient condition is typically a set of images captured while the known ambient condition was prevailing. For example, a set of reference images linked to the ambient condition nighttime includes images captured during night hours, a map of reference images linked to the ambient condition city center at night includes reference images of a city center captured at night, and so on.

[0158] In one embodiment, the query image I is assigned to one or more ambient conditions. For example, a query image I captured in the city center area during the summer season may be assigned to the conditions summer and city center. Query image I will then be compared to a pre-prepared set of images assigned the condition summer and to a pre-prepared set of images assigned the condition city center (or to a single set assigned to both conditions).

[0159] Typically, the ambient conditions in the query image are determined prior to comparing the query image to the set of reference images, such that processor 102 may find a set of reference images linked to ambient conditions similar to those of the query image and may use this set in the comparison.

[0160] The ambient condition(s) of the query image may be determined automatically, for example, based on image analysis of the query image. For example, color detection algorithms and/or other known image analysis algorithms may be used to determine the ambient condition in a query image. In another embodiment, ambient condition(s) of the query image may be determined based on information automatically obtained from a time recording device associated with the user. For example, processor 102 may receive information from a clock of a vehicle to determine the time and/or date of a query image, and may deduce the ambient conditions from the time and/or date.

[0161] A set of reference images may be arranged in an ordered sequence (e.g., a sequence that corresponds to a specific route). As described above, comparing a query image to a set of reference images corresponding to known ambient conditions (step 304) can include assigning a location sensitive tag to the query image and comparing the tag of the query image to tags linked to each reference image in the sequence of reference images. A matching reference image is determined based on the comparison and its location within the sequence is determined to be the location of the query image within the sequence.

[0162] Hints, such as known parameters of the user's movement, can be used to find a matching reference image in the sequence.

[0163] Machine learning techniques, such as using a CNN (e.g., a CNN having a netVLAD component) and/or using a Siamese network can be performed in the comparison step. Additionally, a hashing method can be used to index the images for facile comparison.

[0164] In one embodiment, each reference image in the sequence of reference images may be linked to a known geolocation, in addition to being linked to an ambient condition, such that the geolocation of the query image is known from the matching reference image.

[0165] In another embodiment, examples of which are schematically illustrated in FIGS. 3C and 3D, the query image is compared to a first set of images linked to known geolocations and to a second set of images linked to ambient conditions.

[0166] A method for image based self-localization, according to one embodiment, includes determining ambient conditions of a query image I (step 312). The query image I is then compared to a first set of reference images (map 1) linked to known geolocations to obtain a first matching reference image (step 314). The current image I is then compared to a second set of reference images (reference image set 2) linked to ambient conditions (which are similar to the ambient conditions of the query image), to obtain a second matching reference image (step 316).

[0167] The geolocation of the query image capture (self-location of the user) is then determined based on the first matching image and the second matching image (step 318). In some embodiments, a device may be controlled based on the determined geolocation, for example, as described above.

[0168] The first matching reference image may serve to limit the search for the second matching reference image. For example, the first matching reference image, which can be detected based on machine learning techniques as described herein, may indicate a specific geographical location and may thus limit the next search to the specific geographical location. However, due to noise and changing conditions in the environment, the specific geographical location may not be accurate and may be a few meters off from the actual location of the image capture device while the query image is captured. The search for the second matching reference image will provide a more accurate match because noise, due to ambient conditions, is reduced in the second set of reference images, such that a more accurately matching image may be found and thus, a more accurate localization may be achieved. In addition, the search for the second matching reference image (which may also be carried out using the machine learning techniques described herein), is simplified because it can be restricted to the specific geographical area (and therefore restricted to a limited number of reference images).

[0169] In one embodiment, the query image is used to update the set of reference images linked to known ambient conditions.

[0170] In some embodiments, a geolocation of the first matching reference image and a geolocation of the second matching reference image can be combined to obtain a combined geolocation and the geolocation of the vehicle (or other user) is determined based on the combined geolocation. Combining geolocations may include, for example, calculating a statistical value such as an average or mean location. In some embodiments, a weight may be assigned to each of the first and second geolocations and the weighted locations are combined to obtain a combined geolocation. The geolocation of the query image may then be determined based on the combined geolocation, as demonstrated in the following equation:


L1*W1+L2*W2=Query Location

In which W1 and W2 are different weights, L1 is the location of the first reference image and L2 is the location of the second reference image.

[0171] In some embodiments the query image is captured in a specific route and the reference images are images previously captured at the specific route (possibly, during specific ambient conditions). In this case L1 and L2 may be locations of the images along the route.

[0172] Thus, locations L1 and L2 may refer to locations in the world (geolocations) and/or locations within a sequence of images.

[0173] Typically, in environments where ambient conditions are more significant (e.g., during a snow storm, during dark hours, etc.) the weight assigned to the location of images linked to ambient conditions is greater than the weight assigned to the location of images linked to geolocations.

[0174] Embodiments of the invention enable self-localization without relying on an external positioning system and provide more accurate localization than enabled by the existing positioning systems. Additionally, embodiments of the invention provide a quick and processing-wise economical method for self-localization and mapping based on visual data.

[0175] A specialized visual search engine may be utilized as a location recognition system. The visual search query may be an image of a location, for example, a location in a city. Image-based location recognition in an urban environment presents particular challenges created, in part, by repetitive man-made structures. Man-made structures such as buildings, fences, walls, and other repetitive structures make image differentiation and therefore recognition more challenging. An image-based location recognition system may rely on correspondence between a query image and a reference in order to identify and recognize a location.

[0176] FIG. 4 shows an image-based location recognition system. Reference images and metadata 403 may be provided to a feature extractor 404. The feature extractor processes the reference images and generates a plurality of feature vectors, which collectively use feature maps 402 stored in a reference database 401. The establishment of the reference database may be an offline process, but may be supplemented, augmented, or updated over time utilizing the query images. The reference database 401 may store location references in the form of feature maps 402 describing an image of a particular location.

[0177] A query image 405 may be captured by any system which would benefit by recognition of a location by an image. A simple use would be a smartphone application which may allow a tourist to photograph a particular location with the intention that the system recognized the location and provides location-based information to the user. Location recognition according to a system as described may be used when other location services are inadequate. GPS-based location services are often not sufficiently responsive or accurate for certain applications. Traditional location services may be utilized to narrow the references in a reference database which may consulted for the location recognition. A query image 405 may be provided to an integrated neural network 406 for recognition. The integrated neural network 406 may include a feature extractor 407. The query image 405 may be provided to the feature extractor 407 which generates a query feature map provided to a correspondence matcher 408. The correspondence matcher 408 may be implemented within a neural network 406 in order to identify correspondence between a feature map generated by the feature extractor 407 from a query image 405 to a feature map generated from a reference image. The correspondence matcher 408 may output a correspondence map 409. The output correspondence map 409 reflects a confidence level in the query and reference image correspondence which indicates confidence in a location identification of the query image 405 as the location corresponding to the reference feature map 402.

[0178] The system may be maintained or updated by incorporating feature maps derived from actual images captured in the location recognition process into the reference database 401. A location may undergo changes over time. These changes may occur in small increments. The system may operate to recognize accurately a location which includes a small incremental change over the location when input as a reference. Over time, the location may be altered in an amount that is sufficient to defeat the location recognition based on an early reference image. The process of using query image feedback into the reference database 401 enhances the ability of the database to change over time to match changes in locations. Advantageously, very small changes, such as occlusions which may be caused by temporal presence by individuals or vehicles, need not be incorporated into the reference database.

[0179] According to an advantageous embodiment, scene recognition may be accomplished by finding the approximate location of a moving camera, using a database of sparse position-tagged images. A process for self-location is shown in FIG. 5.

[0180] An aspect of this process is to use anchors or traps designated in the reference images in a sparse geo-tagged reference database. The trap may be a segment of an image that is easily recognized by image processing techniques. A series of query images may be captured by a vehicle-mounted or user-carried image capture device 103, for example a camera. Once the process identifies a general vicinity of the image capture device, it may select a feature in a reference image generally corresponding to the location. Successive query images may be evaluated to determine the level of correlation between the trap segment of the reference image and the query image as the level of correlation increases, the evaluation determines that the position of the image capture device is getting closer to the known location of the reference image until the level of correlation begins to decrease. At that point, the closest query position is known. The level of correspondence increases as the query image and reference images converge. The highest level of correspondence between a reference image and a query image when the capture location of the respective images is at the closest location to each other. This is because image capture for both reference sets and query sets are discrete. This process is repeated for other reference image locations. By determining a sequence of positions corresponding to the closest position to a series of known locations, the current location of a query image may be found. Correspondence may be determined by various techniques and represents a degree of match between two images. Correspondence may be determined by a correlation function. Other measures may also be used including, but not limited to, edge or pixel matching/comparison.

[0181] FIG. 5 shows an example of a method for self-localization, which may be carried out by processor 102.

[0182] The method may include accepting an indication of route or location 502 and includes receiving nearby reference images or reference image data and a sequence of images (e.g., from camera 103) whose capture location is to be ascertained (also termed query image) (step 504). The sequence of query images is compared to the reference images to find a query image closest to one or more of the reference images (step 506). Each of the reference images corresponds to a known geolocation. The location of capture of a matching reference image is used to determine self-geolocation, step 508.

[0183] The set of reference images may include previously captured reference images of the specific route, each reference image corresponding to a known geolocation. It is possible to derive a subset of reference images, near to a known or estimated location which corresponds to the vehicle location. Thus, processor 102 may determine the self-geolocation, i.e., the geolocation of the camera 103 (which is located on or attached to a vehicle which matches the query image), based on the geolocation of the reference images (step 208).

[0184] In some embodiments the vehicle may be controlled based, in part, on the determined self-geolocation (step 510). For example, the user interface device 106 may be controlled to display the query image and to further display an annotation or other mark related to the geolocation of the vehicle at the time the query image was captured.

[0185] According to an advantageous feature, these traps are sufficiently dense in the reference dataset that will ensure that a trap is located in a reference image sufficiently near any major segment of a path to be recognized.

[0186] The process is repeated for other reference images around the known area of the vehicle. The location of each trap is known and may be used as an accurate positioning point. The approximate position relative to a trap may be calculated. The error of that calculation decreases as the query image nears. The smallest error exists at the closet point to a trap. The movement between the traps is also tracked using relevant measurements, and all togetheran accurate locating, tracking and mapping of a path is determined.

[0187] Another feature for locating a point of interest (POI) is a method for locating an area in a query image and matching an annotated area in a reference image. The location of a POI measures the match to the reference image and may operate to reduce the amount of an image processed to an area of interest.

[0188] The use of traps may be either for a whole reference image or on areas of interest of the reference image. The traps may represent POIs appearing in an image, for example, billboards, shops, buildings, windows, etc.

[0189] For a situation where there is a high frequency of repetitive images, the process will find a match only if there is a sequence of matches. In order to determine the length of a sequence large enough to extend outside of the repetitive zone, the sequence length may be extended until it is sufficiently unique to a threshold confidence. This may rely on a combination of traps and POls. POI may be used to mark a specific area as a trap, the scene recognition traps-building will make sure that the trapping sequence will ensure a unique result.

[0190] The system may include apparatus and a method for detecting changes through using the approximate location of a moving camera.

[0191] Once a path of travel covered with traps is determined, a certain trap (or few consecutive traps) may encounter a significant reduction in its matching qualities. If it is determined that there is a good match quality on a path before and after a zone of reduced match quality, it may be inferred that there has been a change to a portion of the path that is not recognized in the reference database. This may be used to generate a report of an area which should be scanned to update the reference database. As an alternative to rescanning or to temporarily upgrade the reference database pending a rescan, the query image stream may be used.

[0192] The invention is described in detail with respect to preferred embodiments, and it will now be apparent from the foregoing to those skilled in the art that changes and modifications may be made without departing from the invention in its broader aspects, and the invention, therefore, as defined in the claims, is intended to cover all such changes and modifications that fall within the true spirit of the invention.

[0193] Thus, specific apparatus for and methods of abstraction of images has been disclosed. It should be apparent, however, to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the spirit of the disclosure. Moreover, in interpreting the disclosure, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms comprises and comprising should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced.