Patent classifications
G06F16/5846
WITHHOLDING NOTIFICATIONS DUE TO TEMPORARY MISPLACED PRODUCTS
A system for processing images captured in a retail store and automatically identifying misplaced products is provided. The system may comprise at least one processor configured to receive one or more images captured by one or more image sensors from an environment of a retail store, detect in the one or more images a first product, determine that the first product is not located in the first correct display location, cause an issuance of a user-notification associated with the first product, detect in the one or more images a second product, determine that the second product is not located in the second correct display location, and after determining that the second product is not located in the second correct display location and when the second urgency level is lower than the first urgency level, withhold issuance of a user-notification associated with the second product.
SYNCHRONIZATION AND TAGGING OF IMAGE AND TEXT DATA
A computing system accesses an image-based document and a text document having text extracted from the image-based document and provides a user interface displaying at least a portion of the image-based document. In response to selection of a text portion of the image-based document, the system determines an occurrence of the text portion within at least a portion of the image-based document and then applies a search model on the text document to identify the same occurrence of the text portion. Once matched, alignment data indicating a relationship between a selected tag and both the text portion of the image-based document and the text portion of the text document is stored.
AUTOMATIC CREATION OF A TILED IMAGE BASED ON USER INTERESTS
An embodiment for creating a tiled image using different zoom levels based on user interests is provided. The embodiment may include receiving one or more photographs captured by a user. The embodiment may also include analyzing features associated with the one or more photographs. The embodiment may further include identifying one or more known objects of interest. The embodiment may also include in response to determining the user wants to take a tiled zoom photograph, capturing a photographic image. The embodiment may further include scanning the photographic image for identifying features of the one or more known objects of interest. The embodiment may also include in response to determining at least one object of interest is found in the scanned photographic image, assembling the tiled zoom photograph with a plurality of tiles.
TECHNIQUES FOR IDENTIFYING QUOTATIONS IN IMAGES POSTED TO A FEED
Described herein are techniques for using supervised machine learning to determine whether an image that has been posted to a feed of an online service includes a quotation. In some instance, supervised machine learning techniques are used to infer or predict an intent of a content poster in posting a content item to a feed of an online service. By better understanding the nature of the content being posted, various recommendations can be made during the time when an end-user is posting content, and thereafter.
SYSTEMS AND METHODS FOR VISION-LANGUAGE DISTRIBUTION ALIGNMENT
Embodiments described herein a CROss-Modal Distribution Alignment (CROMDA) model for vision-language pretraining, which can be used for retrieval downstream tasks. In the CROMDA mode, global cross-modal representations are aligned on each unimodality. Specifically, a uni-modal global similarity between an image/text and the image/text feature queue are computed. A softmax-normalized distribution is then generated based on the computed similarity. The distribution thus takes advantage of property of the global structure of the queue. CROMDA then aligns the two distributions and learns a modal invariant global representation. In this way, CROMDA is able to obtain invariant property in each modality, where images with similar text representations should be similar and vice versa.
Correlating image annotations with foreground features
A machine may be configured to execute a machine-learning process for identifying and understanding fine properties of various items of various types by using images and associated corresponding annotations, such as titles, captions, tags, keywords, or other textual information applied to these images. By use of a machine-learning process, the machine may perform property identification accurately and without human intervention. These item properties may be used as annotations for other images that have similar features. Accordingly, the machine may answer user-submitted questions, such as “What do rustic items look like?,” and items or images depicting items that are deemed to be rustic can be readily identified, classified, ranked, or any suitable combination thereof.
Multi-modal differential search with real-time focus adaptation
Multi-modal differential search with real-time focus adaptation techniques are described that overcome the challenges of conventional techniques in a variety of ways. In one example, a model is trained to support a visually guided machine-learning embedding space that supports visual intuition as to “what” is represented by text. The visually guided language embedding space supported by the model, once trained, may then be used to support visual intuition as part of a variety of functionality. In one such example, the visually guided language embedding space as implemented by the model may be leveraged as part of a multi-modal differential search to support search of digital images and other digital content with real-time focus adaptation which overcomes the challenges of conventional techniques.
System and Method for Selecting Sponsored Images to Accompany Text
A system for selecting an image to accompany text from a user in connection with a social media post. The system includes receiving text from the user; identifying one or more search terms based on the text; identifying candidate images from images in one or more image databases using the search terms, where the candidate images comprise a sponsored image; presenting one or more candidate images to the user, where the sponsored image is presented preferentially compared to other candidate images; receiving from the user a selected image from the one or more candidate images; generating the social media post comprising the selected image and the user-submitted text; and transmitting the social media post for display.
Implicit Coordinates and Local Neighborhood
A system and method are disclosed for using a local neighborhood for determining similar targets in different documents or using implicit coordinates for obtaining a coordinate location of a target. The local neighborhood method may include identifying a first target in a first document; identifying one or more first elements within a first distance range from the first target; creating a first local neighborhood based on the identifying; determining that that first local neighborhood is similar to a third local neighborhood in a second document; and determining a second target in the second document that corresponds to the first target in the first document, based on the determining the similarity. The implicit coordinates method may include performing OCR on the first document to find the first target; and obtaining a first location of the first target by using at least one of OCR or element recognition.
Image processing apparatus, image processing system, control method thereof, and storage medium
An image processing apparatus according to the present disclosure is an image processing apparatus for automatically transmitting a document file by using a result of a character recognition process on a scan image of a document as a property, and includes: at least one processor that executes the program to perform: extracting a confidence factor indicating a degree of certainty of the result of the character recognition process; in a case where the extracted confidence factor is above a predetermined threshold value, determining that the document file using the result of the character recognition process as the property is allowed to be automatically transmitted; and setting the predetermined threshold value such that an incorrect transmission ratio of document files to be automatically transmitted reaches a target incorrect transmission ratio.