Image classification and information retrieval over wireless digital networks and the internet
10990811 · 2021-04-27
Assignee
Inventors
Cpc classification
G06V40/169
PHYSICS
G06V20/52
PHYSICS
Y10S707/99948
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G06V40/171
PHYSICS
International classification
Abstract
A method and system for matching an unknown facial image of an individual with an image of a celebrity using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user. The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, image and associated meta-data is sent back to the user. The method and system uses human perception techniques to weight the feature vectors.
Claims
1. A method for matching an unknown image with a known image, the method comprising: receiving, at a server that includes a neural network, an unknown facial image from a video camera within a video surveillance system; processing the unknown facial image at the server to create a first set of variables that represent the unknown facial image; comparing the first set of variables to a plurality of additional variables stored in a database, the plurality of additional variables including a second set of variables representing a known facial image; determining a match value based at least in part on match prediction carried out by algorithmically employing the first set of variables and the second set of variables within the neural network; and selectively and concurrently displaying the unknown facial image and the known facial image on a wireless communication device when a match exists based at least in part on an indication provided by the match value.
2. The method of claim 1 wherein the neural network is a high-dimensional neural network.
3. The method of claim 1 wherein the wireless communication device is a wireless mobile telephone.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
DETAILED DESCRIPTION OF THE INVENTION
(17) A flow chart of a preferred specific method of the present invention is illustrated in
(18) At block 102, the facial image is transmitted over a network to an image classification server, preferably over a wireless network. The facial image is preferably sent to a male or female designation site at the image classification server. The facial image is subsequently sent over the internet using HTTP or e-mail to the image classification server. The facial image, preferably a compressed digital facial image such as a JPEG image, is sent to a wireless carrier as a MMS, a SMS, a SMTP, or WAP upload. Alternatively, the facial image is uploaded to a PC from a digital camera, or scanner and then transferred over the internet to the image classification server as an e-mail attachment, or HTTP upload.
(19) At block 103, the facial image is analyzed at the image classifications server to determine if the facial image is of adequate quality to be processed for matching. Quality issues with the facial image include but are not limited to a poor pose angle, brightness, shading, eyes closed, sunglasses worn, obscured facial features, or the like. At block 104, an image determination is made concerning the quality of the image. A negative image determination is made at block 105. At block 106, a transmission is sent to the sender informing then sender that the facial image provided is inadequate and requesting that the sender provide a new facial image. The matching procedure for such a negative image may continue, and the matched images will be sent with an additional statement informing the sender that the image was of bad quality and that a better match may be possible with a higher quality image.
(20) At block 107, if the facial image is positive, then the facial image is processed at block 108. It should be noted that the facial image is previously unknown to the image classification and is the first time that the facial image has been analyzed by the image classification server. Thus, the method of present invention involves processing an unknown image to find a match with facial images of other individuals, which is unlike typical facial recognition systems which involve matching an image of an individual with a known image of the individual in the database. At block 108, processing of image preferably comprises using an algorithm which includes a principle component analysis technique to process the face of the facial image into an average of a multitude of faces, otherwise known as the principle component and a set of images that are the variance from the average face image known as the additional components. Each is reconstructed by multiplying the principal components and the additional components against a feature vector and adding the resulting images together. The resulting image reconstructs the original face of the facial image. Processing of the facial image comprises factors such as facial hair, hair style, facial expression, the presence of accessories such as sunglasses, hair color, eye color, and the like. Essentially a primary feature vector is created for the facial image.
(21) At block 109, processed image or primary feature vector is compared to a plurality of database processed images preferably located at the image classification server. During the comparison, the primary feature vector is compared a plurality of database feature vectors which represent the plurality of database processed images. The database preferably includes at least 10,000 processed images, more preferably at least 50,000 processed images, and most preferably from 50,000 processed images to 100,000 processed images. Those skilled in the pertinent art will recognize that the database may contain any number of images without departing from the scope and spirit of the present invention. The processed images preferably include multiple images of one individual, typically from two to twenty images, more preferably from four to ten images of a single individual in different poses, different facial expressions, different hair styles and the like. The database of processed images preferably includes celebrities, including, but not limited to actors, actresses, musicians, athletes, models, government officials, and other publicly well-known individuals. Again, it should be noted that the facial image sent by the sender is an unknown image which is being best matched to a known image.
(22) At block 110, the processed image undergoes raw matching of a small plurality of database images with each having a feature vector value that is close to the value of the primary feature vector. At block 110a, the iterative processing of the raw matching is performed wherein the human perception of what is a good match is one of the primary factors in creating the matched images. At block 111, a perception value for the matched images is determined based on the feature vector values. The perception value ranges from 0% to 100%, with 100% being an ideal match. At block 111a, the matches are sorted based on predicted human perception.
(23) At block 112, the matched images and the perception value are transmitted to the sender over a network as discussed above for the initial transmission. The entire process preferably occurs within a time period of sixty seconds, and most preferably within a time of ten seconds. The process may be delayed due to the wireless carrier, and network carrier. In this manner, the sender will know which celebrity the facial image best matches. The output of the matched images and any additional text is preferably sent to the sender's wireless communication device for instantaneous feedback of their inquiry of which celebrity does the facial image look like. Further, the output is also sent to a sender's web page on a web site hosted through the image classification server wherein the sender can control access to the sender's web page and modify the matched images and the additional text. Further, the output is sent to a voting site as discussed below.
(24) At decision 113, the quality of the matched images is determined to decide if the matched images should be sent to voting site on the web site. At block 115, the matched images are sent to the sender's wireless communication device, the sender's web page on the web site for viewing by the sender and other viewers determined by the sender. At block 114, the matched images are sent to the voting site if of sufficient quality, preferably based on the perception value, to be voted upon by visitors to the voting site.
(25) In this manner, a statistical modeling element is added to the matching process to better match images based on human perception as determined by the scores for previously matched images on the voting site. In other embodiments regression analysis or Bayesian analysis is utilized. Under this alternative scenario, a Support Vector Machine, preferably a high-dimensional neural network, with two feature vectors of a match, along with average vote scores collected from viewers of the web site will be utilized to provide better matching of images. A more detailed explanation of a Support Vector Machine is set forth in Cortes & Vapnik, Support Vector Networks, Machine Learning, 20, 1995, which is hereby incorporated by reference in its entirety. The previous voting patterns are implemented in a statistical model for the algorithm to capture the human perception element to better match images as perceived by humans.
(26) A more general method of the present invention is illustrated in
(27) A system of the present invention is illustrated in
(28) The present invention preferably uses facial recognition software commercially or publicly available such as the FACEIT brand software from IDENTIX, the FACEYACS brand software from COGNETIC, and others. Those skilled in the pertinent art will recognize that there are many facial recognition software, including those in the public domain, that may be used without departing from the scope and spirit of the present invention.
(29) The operational components of the image classification server 53 are schematically shown in
(30) As shown in
(31) The present invention collects votes on match results for two primary purposes: 1) determine which matches are the best according to human perception; and 2) use the human perception measurement to improve the matching algorithm of the present invention.
(32) Votes are cast anonymously from preferably the internet and/or mobile telephones and similar communication devices. In order to develop a more consistent measurement of human perception for a particular match, a weighted average is used for the match. The weighted average is calculated as follows:
(33)
(34) W=Weighted Rating
(35) R=average for the movie as a number from 0 to 10 (mean)=(Rating)
(36) v=number of votes for the movie=(votes)
(37) m=minimum votes required (currently 10)
(38) C=the mean vote (currently 5)
(39) The “Weighted Rating” tends to equal 5 (the median) when few votes have been cast. As more votes are cast the rating will gradually shift away from the median (5) and towards the average vote. A preferred method of calculating weighted rating is disclosed at www.IMDB.com.
(40) The present invention uses voting results to improve selection of images for enrollment. Enrollment is preferably defined as creating a feature vector on a celebrity image and adding that celebrity image feature vector into the database.
(41) Match ratings are applied to the matching algorithm as set forth below. The best matches by vote are used to build a test set. For example as shown in
(42) where x.sub.0, x.sub.1, etc. is match position. In the example above, the match only appears in position 0, so the f(0)=1. If the matched actor had appeared in position 0, and 2 for example, the fitness would be:
(43)
(44) The total fitness is calculated by summing the fitness for each match in the test set:
(45)
where F is the total fitness, n is the number of matches in the test set, as shown in the graph in
(46) Enrollment parameters are adjusted until the fitness function (F) is maximized Enrollment parameters are a set of thresholds which determine if an image of the actor should be used to create a feature vector and added to the database of target feature vectors that are matched against. The choice of which feature vectors are kept help improve the algorithm results. These enrollment parameters measure the quality of image, such as resolution, focus, brightness, exposure, shades of gray, confidence of eye detector, etc. If enrollment parameters are chosen too strictly, then the variation of feature vectors is insufficient in the database for each actor, and good matches are not found. If all images for an actor are used, too many variations of feature vectors exist and the good matches cannot be found. By optimizing the choice of enrollment parameters based on human perception voting, we get the set of feature vectors for an actor that best represent the actor by as perceived by people.
(47) The present invention also uses voting results to weigh feature vectors. In addition to using vote results to select which actor images are good for enrollment, vote results can also be used to weigh the feature vector itself so that qualities of the image that are perceived by humans are more heavily weighted when searching for a good match. Biometric security software (Cognitec, Identix, etc.) selects and weighs the features of an image in order to match an image of a person to another image of the same person and optimizing the vector to achieve this result. The feature vector can be made up of local facial features, or overall components of the face as determined by principle component analysis.
(48) The use of human perception voting results in order to optimize the look-a-likeness of a person to a different person can use used, regardless of the how the feature vectors are determined. In other words, the algorithm for determining the set of feature vectors that best represent a face can be augmented with a 2nd algorithm which takes these feature vectors, typically represented as a vector of floating point numbers, and weighs the values in the vector so that the characteristics of the image that are based on human perception are used more heavily.
(49) Statistical methods such as neural networks or support vector machines (SVMs) can be used to feed the source and actor feature vectors and predict the human perception vote.
(50)
(51) Once trained, the Neural Network or SVM can predict whether a match is good or not by using the feature vectors, determined from a separate algorithm.
(52) An example of an embodiment of the present invention is provided in
(53) Preferably up to twenty votes are collected for each match. If the rating falls below a predetermined threshold, the match is removed from the voting queue of the FaceDouble Web site. However, friends of the user that has provided the match can still vote on the match on the user's Web page.
(54) Code for the voting is as follows: # updateVote(score, facedoubleId) ‘updateVote’.Math.UPDATE votes SET score=score+?, nvotes=nvotes+1, rank=CAST((CAST((score+50) AS DECIMAL) I (nvotes+10)) AS DECIMAL) WHERE facedoubleid=?
(55) Table One provides sample data for the voting:
(56) TABLE-US-00001 TABLE ONE FACEDOUBLE Funny Funny Id Score nvotes rank funny votes rank 0 62 81 1.23 −1 1 0.71 59 1183 188 6.23 2 1 0.48 60 1783 368 4.85 0 0 0 62 1323 303 4.39 0 0 0 63 484 99 4.9 0 0 0 64 85 20 4.5 0 0 0 65 62 16 4.31 0 0 0 66 821 162 5.06 0 0 0 67 212 45 4.76 0 0 0 68 83 20 4.43 0 0 0
(57) From the foregoing it is believed that those skilled in the pertinent art will recognize the meritorious advancement of this invention and will readily understand that while the present invention has been described in association with a preferred embodiment thereof, and other embodiments illustrated in the accompanying drawings, numerous changes modification and substitutions of equivalents may be made therein without departing from the spirit and scope of this invention which is intended to be unlimited by the foregoing except as may appear in the following appended claim. Therefore, the embodiments of the invention in which an exclusive property or privilege is claimed are defined in the following appended claims.