Image classification and information retrieval over wireless digital networks and the internet
09798922 · 2017-10-24
Assignee
Inventors
Cpc classification
G06V20/52
PHYSICS
G06V40/169
PHYSICS
Y10S707/99948
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
G06V40/171
PHYSICS
International classification
Abstract
A method and system for matching an unknown facial image of an individual with an image of a celebrity using facial recognition techniques and human perception is disclosed herein. The invention provides a internet hosted system to find, compare, contrast and identify similar characteristics among two or more individuals using a digital camera, cellular telephone camera, wireless device for the purpose of returning information regarding similar faces to the user The system features classification of unknown facial images from a variety of internet accessible sources, including mobile phones, wireless camera-enabled devices, images obtained from digital cameras or scanners that are uploaded from PCs, third-party applications and databases. Once classified, the matching person's name, image and associated meta-data is sent back to the user. The method and system uses human perception techniques to weight the feature vectors.
Claims
1. A method for matching an unknown facial image with a known facial image, the method comprising: receiving the unknown facial image at an image classification server; processing the unknown facial image at the image classification server to create a first set of variables that represent one or more features of the unknown facial image; comparing the first set of variables to a second set of variables that represent one or more features of facial images among a plurality of known facial images; matching the first set of variables to a third set of variables that represent one or more features of a second known facial image in the second set of variables to create a matched set of variables that represent matching features between the unknown facial image and the second known facial image; determining weights to be assigned to one or more of the variables in the matched set of variables, wherein the weights are based on a human perception rating; and transmitting the second known facial image and a perception value based at least in part on the weights.
2. The method according to claim 1, wherein the first set of variables and each of the second set of variables are based on at least one of: a facial expression, a hair style, a hair color, a facial pose, an eye color, a texture of the face, a color of the face, and facial hair.
3. The method according to claim 1, wherein the image classification server comprises at least one of: an input module, a transmission engine, facial recognition software, an input feed, a feature vector database, a perception engine, and an output module.
4. The method according to claim 1, wherein the perception value ranges from 0% to 100%.
5. A method for matching an unknown facial image with a known facial image, the method comprising: receiving one or more unknown facial images of a person from a video camera at an image classification server; processing the one or more unknown facial images at the image classification server to create one or more sets of variables that represent one or more features of the one or more unknown facial images; when there are two or more sets of variables, combining the two or more sets of variables into a first set of variables that represent one or more features of the one or more unknown facial images; comparing the first set of variables to a second set of variables that represent one or more features of facial images among a plurality of known facial images stored in a database; matching the first set of variables to a third set of variables that represent one or more features of a second known facial image in the second set of variables to create a matched set of variables that represent matching features between the one or more unknown facial images and the second known facial image; determining a perception value to be assigned to one or more of the variables in the matched set of variables, wherein the perception value is based on a human perception rating; transmitting the second known facial image and the perception value to a computing device; and in response to receiving the second known facial image and the perception value from the computing device, transmitting the second known facial image and a confidence value of the second known facial image, based at least in part on the received perception value, to a video surveillance system.
6. The method of claim 5, further comprising adding the one or more unknown facial images to the database and adding the first set of variables that represent one or more features of the one or more unknown facial images to the database.
7. The method according to claim 5, wherein the first set of variables and each of the second set of variables are based on at least one of: a facial expression, a hair style, a hair color, a facial pose, an eye color, a texture of the face, a color of the face, and facial hair.
8. The method according to claim 7, wherein the second set of variables that represent one or more features of facial images among a plurality of known facial images further comprise at least one of: a distance between eyes, a distance between a center of the eyes to a chin, a size and a shape of eyebrows.
9. The method according to claim 5, wherein the image classification server comprises at least one of: an input module, a transmission engine, facial recognition software, an input feed, a feature vector database, a perception engine, and an output module.
10. The method according to claim 5, wherein the perception value ranges from 0% to 100%.
11. A method for matching an unknown facial image with a known facial image, the method comprising: receiving the unknown facial image at an image classification server; processing the unknown facial image at the image classification server to create a first set of variables that represent one or more features of the unknown facial image; comparing the first set of variables to a second set of variables that represent one or more features of facial images among a plurality of known facial images; matching the first set of variables to a third set of variables that represent one or more features of a second known facial image in the second set of variables to create a matched set of variables that represent matching features between the unknown facial image and the second known facial image; determining weights to be assigned to one or more of the variables in the matched set of variables, wherein the weights are based on a human perception rating; determining a perception value based at least in part on the weights; and transmitting the second known facial image based on the perception value.
12. The method according to claim 11, wherein the first set of variables and each of the second set of variables are based on at least one of: a hair style, a hair color, a facial pose, an eye color, a texture of the face, a color of the face, and facial hair.
13. The method according to claim 11, wherein the image classification server comprises at least one of: an input module, a transmission engine, facial recognition software, an input feed, a feature vector database, a perception engine, and an output module.
14. The method according to claim 11, wherein the perception value ranges from 0% to 100%.
15. The method according to claim 11, further comprising transmitting the perception value with the second known facial image.
16. A non-transitory computer-readable medium containing instructions, which, when executed on a processor is configured to perform an operation for matching an unknown facial image with a known facial image, comprising: receiving the unknown facial image at an image classification server; processing the unknown facial image at the image classification server to create a first set of variables that represent one or more features of the unknown facial image; comparing the first set of variables to a second set of variables that represent one or more features of facial images among a plurality of known facial images; matching the first set of variables to a third set of variables that represent one or more features of a second known facial image in the second set of variables to create a matched set of variables that represent matching features between the unknown facial image and the second known facial image; determining weights to be assigned to one or more of the variables in the matched set of variables, wherein the weights are based on a human perception rating; determining a perception value based at least in part on the weights; and transmitting the second known facial image based on the perception value.
17. The non-transitory computer-readable medium according to claim 16, wherein the first set of variables and each of the second set of variables are based on at least one of: a facial expression, a hair style, a hair color, a facial pose, an eye color, a texture of the face, a color of the face, and facial hair.
18. The non-transitory computer-readable medium according to claim 16, wherein the image classification server comprises at least one of: an input module, a transmission engine, facial recognition software, an input feed, a feature vector database, a perception engine, and an output module.
19. The non-transitory computer-readable medium according to claim 16, wherein the perception value ranges from 0% to 100%.
20. The non-transitory computer-readable medium according to claim 16, further comprising transmitting the perception value with the second known facial image.
21. A method for matching an unknown facial image with a known facial image, the method comprising: receiving the unknown facial image at an image classification server; processing the unknown facial image at the image classification server to create a first set of variables that represent one or more features of the unknown facial image; comparing the first set of variables to a second set of variables that represent one or more features of facial images among a plurality of known facial images; matching the first set of variables to a third set of variables that represent one or more features of a second known facial image in the second set of variables to create a matched set of variables that represent matching features between the unknown facial image and the second known facial image; determining a confidence value assigned to one or more of the variables in the matched set of variables; and transmitting the second known facial image and the confidence value.
22. The method according to claim 21, wherein determining the confidence value further comprises: determining a perception value assigned to one or more of the variables in the matched set of variables; transmitting the second known facial image and the perception value to a computing device; and in response to receiving the second known facial image and the perception value from the computing device, determining a confidence value of the second known facial image, based at least in part on the received perception value.
23. The method according to claim 22, wherein determining the perception value further comprises: determining weights to be assigned to one or more of the variables in the matched set of variables, wherein the weights are based on a human perception rating.
24. The method according to claim 21, wherein receiving the unknown facial image further comprises: receiving the unknown facial image of a person from a video camera.
25. The method according to claim 21, further comprises: transmitting the second known facial image based on the confidence value to a video surveillance system.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
DETAILED DESCRIPTION OF THE INVENTION
(17) A flow chart of a preferred specific method of the present invention is illustrated in
(18) At block 102, the facial image is transmitted over a network to an image classification server, preferably over a wireless network. The facial image is preferably sent to a male or female designation site at the image classification server. The facial image is subsequently sent over the internet using HTTP or e-mail to the image classification server. The facial image, preferably a compressed digital facial image such as a JPEG image, is sent to a wireless carrier as a MMS, a SMS, a SMTP, or WAP upload. Alternatively, the facial image is uploaded to a PC from a digital camera, or scanner and then transferred over the internet to the image classification server as an e-mail attachment, or HTTP upload.
(19) At block 103, the facial image is analyzed at the image classifications server to determine if the facial image is of adequate quality to be processed for matching. Quality issues with the facial image include but are not limited to a poor pose angle, brightness, shading, eyes closed, sunglasses worn, obscured facial features, or the like. At block 104, an image determination is made concerning the quality of the image. A negative image determination is made at block 105. At block 106, a transmission is sent to the sender informing then sender that the facial image provided is inadequate and requesting that the sender provide a new facial image. The matching procedure for such a negative image may continue, and the matched images will be sent with an additional statement informing the sender that the image was of bad quality and that a better match may be possible with a higher quality image.
(20) At block 107, if the facial image is positive, then the facial image is processed at block 108. It should be noted that the facial image is previously unknown to the image classification and is the first time that the facial image has been analyzed by the image classification server. Thus, the method of present invention involves processing an unknown image to find a match with facial images of other individuals, which is unlike typical facial recognition systems which involve matching an image of an individual with a known image of the individual in the database. At block 108, processing of image preferably comprises using an algorithm which includes a principle component analysis technique to process the face of the facial image into an average of a multitude of faces, otherwise known as the principle component and a set of images that are the variance from the average face image known as the additional components. Each is reconstructed by multiplying the principal components and the additional components against a feature vector and adding the resulting images together. The resulting image reconstructs the original face of the facial image. Processing of the facial image comprises factors such as facial hair, hair style, facial expression, the presence of accessories such as sunglasses, hair color, eye color, and the like. Essentially a primary feature vector is created for the facial image.
(21) At block 109, processed image or primary feature vector is compared to a plurality of database processed images preferably located at the image classification server. During the comparison, the primary feature vector is compared a plurality of database feature vectors which represent the plurality of database processed images. The database preferably includes at least 10,000 processed images, more preferably at least 50,000 processed images, and most preferably from 50,000 processed images to 100,000 processed images. Those skilled in the pertinent art will recognize that the database may contain any number of images without departing from the scope and spirit of the present invention. The processed images preferably include multiple images of one individual, typically from two to twenty images, more preferably from four to ten images of a single individual in different poses, different facial expressions, different hair styles and the like. The database of processed images preferably includes celebrities, including, but not limited to actors, actresses, musicians, athletes, models, government officials, and other publicly well-known individuals. Again, it should be noted that the facial image sent by the sender is an unknown image which is being best matched to a known image.
(22) At block 110, the processed image undergoes raw matching of a small plurality of database images with each having a feature vector value that is close to the value of the primary feature vector. At block 110a, the iterative processing of the raw matching is performed wherein the human perception of what is a good match is one of the primary factors in creating the matched images. At block 111, a perception value for the matched images is determined based on the feature vector values. The perception value ranges from 0% to 100%, with 100% being an ideal match. At block 111a, the matches are sorted based on predicted human perception.
(23) At block 112, the matched images and the perception value are transmitted to the sender over a network as discussed above for the initial transmission. The entire process preferably occurs within a time period of sixty seconds, and most preferably within a time of ten seconds. The process may be delayed due to the wireless carrier, and network carrier. In this manner, the sender will know which celebrity the facial image best matches. The output of the matched images and any additional text is preferably sent to the sender's wireless communication device for instantaneous feedback of their inquiry of which celebrity does the facial image look like. Further, the output is also sent to a sender's web page on a web site hosted through the image classification server wherein the sender can control access to the sender's web page and modify the matched images and the additional text. Further, the output is sent to a voting site as discussed below.
(24) At decision 113, the quality of the matched images is determined to decide if the matched images should be sent to voting site on the web site. At block 115, the matched images are sent to the sender's wireless communication device, the sender's web page on the web site for viewing by the sender and other viewers determined by the sender. At block 114, the matched images are sent to the voting site if of sufficient quality, preferably based on the perception value, to be voted upon by visitors to the voting site.
(25) In this manner, a statistical modeling element is added to the matching process to better match images based on human perception as determined by the scores for previously matched images on the voting site. In other embodiments regression analysis or Bayesian analysis is utilized. Under this alternative scenario, a Support Vector Machine, preferably a high-dimensional neural network, with two feature vectors of a match, along with average vote scores collected from viewers of the web site will be utilized to provide better matching of images. A more detailed explanation of a Support Vector Machine is set forth in Cortes & Vapnik, Support Vector Networks, Machine Learning, 20, 1995, which is hereby incorporated by reference in its entirety. The previous voting patterns are implemented in a statistical model for the algorithm to capture the human perception element to better match images as perceived by humans.
(26) A more general method of the present invention is illustrated in
(27) A system of the present invention is illustrated in
(28) The present invention preferably uses facial recognition software commercially or publicly available such as the FACEIT brand software from IDENTIX, the FACEYACS brand software from COGNETIC, and others. Those skilled in the pertinent art will recognize that there are many facial recognition software, including those in the public domain, that may be used without departing from the scope and spirit of the present invention.
(29) The operational components of the image classification server 53 are schematically shown in
(30) As shown in
(31) The present invention collects votes on match results for two primary purposes: 1) determine which matches are the best according to human perception; and 2) use the human perception measurement to improve the matching algorithm of the present invention.
(32) Votes are cast anonymously from preferably the internet and/or mobile telephones and similar communication devices. In order to develop a more consistent measurement of human perception for a particular match, a weighted average is used for the match. The weighted average is calculated as follows:
(33)
(34) W=Weighted Rating
(35) R=average for the movie as a number from 0 to 10 (mean)=(Rating)
(36) v=number of votes for the movie=(votes)
(37) m=minimum votes required (currently 10)
(38) C=the mean vote (currently 5)
(39) The “Weighted Rating” tends to equal 5 (the median) when few votes have been cast. As more votes are cast the rating will gradually shift away from the median (5) and towards the average vote. A preferred method of calculating weighted rating is disclosed at www.IMDB.com.
(40) The present invention uses voting results to improve selection of images for enrollment. Enrollment is preferably defined as creating a feature vector on a celebrity image and adding that celebrity image feature vector into the database.
(41) Match ratings are applied to the matching algorithm as set forth below. The best matches by vote are used to build a test set. For example as shown in
(42) where x.sub.0, x.sub.1, etc. is match position. In the example above, the match only appears in position 0, so the f(0)=1. If the matched actor had appeared in position 0, and 2 for example, the fitness would be:
(43)
(44) The total fitness is calculated by summing the fitness for each match in the test set:
(45)
(46) Enrollment parameters are adjusted until the fitness function (F) is maximized. Enrollment parameters are a set of thresholds which determine if an image of the actor should be used to create a feature vector and added to the database of target feature vectors that are matched against. The choice of which feature vectors are kept help improve the algorithm results. These enrollment parameters measure the quality of image, such as resolution, focus, brightness, exposure, shades of gray, confidence of eye detector, etc. If enrollment parameters are chosen too strictly, then the variation of feature vectors is insufficient in the database for each actor, and good matches are not found. If all images for an actor are used, too many variations of feature vectors exist and the good matches cannot be found. By optimizing the choice of enrollment parameters based on human perception voting, we get the set of feature vectors for an actor that best represent the actor by as perceived by people.
(47) The present invention also uses voting results to weigh feature vectors. In addition to using vote results to select which actor images are good for enrollment, vote results can also be used to weigh the feature vector itself so that qualities of the image that are perceived by humans are more heavily weighted when searching for a good match. Biometric security software (Cognitec, Identix, etc.) selects and weighs the features of an image in order to match an image of a person to another image of the same person and optimizing the vector to achieve this result. The feature vector can be made up of local facial features, or overall components of the face as determined by principle component analysis.
(48) The use of human perception voting results in order to optimize the look-a-likeness of a person to a different person can use used, regardless of the how the feature vectors are determined. In other words, the algorithm for determining the set of feature vectors that best represent a face can be augmented with a 2nd algorithm which takes these feature vectors, typically represented as a vector of floating point numbers, and weighs the values in the vector so that the characteristics of the image that are based on human perception are used more heavily.
(49) Statistical methods such as neural networks or support vector machines (SVMs) can be used to feed the source and actor feature vectors and predict the human perception vote.
(50)
(51) Once trained, the Neural Network or SVM can predict whether a match is good or not by using the feature vectors, determined from a separate algorithm.
(52) An example of an embodiment of the present invention is provided in
(53) Preferably up to twenty votes are collected for each match. If the rating falls below a predetermined threshold, the match is removed from the voting queue of the FaceDouble Web site. However, friends of the user that has provided the match can still vote on the match on the user's Web page.
(54) Code for the voting is as follows: # updateVote(score, facedoubleId) ‘updateVote’=>UPDATE votes SET score=score+?, nvotes=nvotes+1, rank=CAST((CAST((score+50) AS DECIMAL) I (nvotes+10)) AS DECIMAL) WHERE facedoubleid=?
(55) Table One provides samples data for the voting:
(56) TABLE-US-00001 TABLE ONE FACEDOUBLE Funny Funny Id Score nvotes rank funny votes rank 0 62 81 1.23 −1 1 0.71 59 1183 188 6.23 2 1 0.48 60 1783 368 4.85 0 0 0 62 1323 303 4.39 0 0 0 63 484 99 4.9 0 0 0 64 85 20 4.5 0 0 0 65 62 16 4.31 0 0 0 66 821 162 5.06 0 0 0 67 212 45 4.76 0 0 0 68 83 20 4.43 0 0 0
(57) From the foregoing it is believed that those skilled in the pertinent art will recognize the meritorious advancement of this invention and will readily understand that while the present invention has been described in association with a preferred embodiment thereof, and other embodiments illustrated in the accompanying drawings, numerous changes modification and substitutions of equivalents may be made therein without departing from the spirit and scope of this invention which is intended to be unlimited by the foregoing except as may appear in the following appended claim. Therefore, the embodiments of the invention in which an exclusive property or privilege is claimed are defined in the following appended claims.