Patent classifications
G06F16/56
Indexing key frames for localization
A mobile client device is localized based on a captured image by identifying where the client device is located from a set of known locations. The set of known locations is associated with a set of regions, where each region is associated with a set of key frames representing the important features of the region. Latent vectors and keypoints are calculated for each of the key frames and an image captured by the client device. The system compares the latent vectors of the captured image to the latent vectors associated with the regions to determine a subset of similar regions. The system compares the keypoints of the captured image to the keypoints associated with the regions in the subset to determine a best match. This determined location is considered the region of the client device and may be used with other localization information to maintain localization of the client device.
MULTI-TASK DEEP HASH LEARNING-BASED RETRIEVAL METHOD FOR MASSIVE LOGISTICS PRODUCT IMAGES
The present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images. According to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representation. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes.
MULTI-TASK DEEP HASH LEARNING-BASED RETRIEVAL METHOD FOR MASSIVE LOGISTICS PRODUCT IMAGES
The present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images. According to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representation. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes.
SCENE GRAPH EMBEDDINGS USING RELATIVE SIMILARITY SUPERVISION
Systems and methods for image processing are described. One or more embodiments of the present disclosure identify an image including a plurality of objects, generate a scene graph of the image including a node representing an object and an edge representing a relationship between two of the objects, generate a node vector for the node, wherein the node vector represents semantic information of the object, generate an edge vector for the edge, wherein the edge vector represents semantic information of the relationship, generate a scene graph embedding based on the node vector and the edge vector using a graph convolutional network (GCN), and assign metadata to the image based on the scene graph embedding.
SCENE GRAPH EMBEDDINGS USING RELATIVE SIMILARITY SUPERVISION
Systems and methods for image processing are described. One or more embodiments of the present disclosure identify an image including a plurality of objects, generate a scene graph of the image including a node representing an object and an edge representing a relationship between two of the objects, generate a node vector for the node, wherein the node vector represents semantic information of the object, generate an edge vector for the edge, wherein the edge vector represents semantic information of the relationship, generate a scene graph embedding based on the node vector and the edge vector using a graph convolutional network (GCN), and assign metadata to the image based on the scene graph embedding.
DECOMPOSITIONAL LEARNING FOR COLOR ATTRIBUTE PREDICTION
The present disclosure describes a model for large scale color prediction of objects identified in images. Embodiments of the present disclosure include an object detection network, an attention network, and a color classification network. The object detection network generates object features for an object in an image and may include a convolutional neural network (CNN), region proposal network, or a ResNet. The attention network generates an attention vector for the object based on the object features, wherein the attention network takes a query vector based on the object features, and a plurality of key vector and a plurality of value vectors corresponding to a plurality of colors as input. The color classification network generates a color attribute vector based on the attention vector, wherein the color attribute vector indicates a probability of the object including each of the plurality of colors.
DECOMPOSITIONAL LEARNING FOR COLOR ATTRIBUTE PREDICTION
The present disclosure describes a model for large scale color prediction of objects identified in images. Embodiments of the present disclosure include an object detection network, an attention network, and a color classification network. The object detection network generates object features for an object in an image and may include a convolutional neural network (CNN), region proposal network, or a ResNet. The attention network generates an attention vector for the object based on the object features, wherein the attention network takes a query vector based on the object features, and a plurality of key vector and a plurality of value vectors corresponding to a plurality of colors as input. The color classification network generates a color attribute vector based on the attention vector, wherein the color attribute vector indicates a probability of the object including each of the plurality of colors.
Information recommendation method, computer device, and storage medium
Information recommendation methods are provided. Image information corresponding to an image is obtained by processing circuitry. The image is associated with a user identifier. A user tag set corresponding to the user identifier and the image information is generated. A feature vector corresponding to user tags in the user tag set and the image information is formed. The feature vector is processed according to a trained information recommendation model, to obtain a recommendation parameter of to-be-recommended information. A recommendation of the to-be-recommended information is provided to a terminal corresponding to the user identifier according to the recommendation parameter.
Methods and systems for depth-aware image searching
Embodiments provide systems, methods, and non-transitory computer storage media for providing search result images based on associations of keywords and depth-levels of an image. In embodiments, depth-levels of an image are identified using depth-map information of the image to identify depth-segments of the image. The depth-segments are analyzed to determine keywords associated with each depth-segment based on objects, features, or content in each depth-segment. An image depth-level data structure is generated by matching keywords generated for the entire image with the keywords at each depth-level and assigning the depth-level to the keyword in the image depth-level data structure for the entire image. The image depth-level data structure may be queried for images that contain keywords and depth-level information that match the keywords and depth-level information specified in a search query.
Methods and systems for depth-aware image searching
Embodiments provide systems, methods, and non-transitory computer storage media for providing search result images based on associations of keywords and depth-levels of an image. In embodiments, depth-levels of an image are identified using depth-map information of the image to identify depth-segments of the image. The depth-segments are analyzed to determine keywords associated with each depth-segment based on objects, features, or content in each depth-segment. An image depth-level data structure is generated by matching keywords generated for the entire image with the keywords at each depth-level and assigning the depth-level to the keyword in the image depth-level data structure for the entire image. The image depth-level data structure may be queried for images that contain keywords and depth-level information that match the keywords and depth-level information specified in a search query.