G06V10/464

Texture Based Fusion For Images With Cameras Having Differing Modalities

Techniques for generating an enhanced image. A first image is generated using a first camera of a first modality, and a second image is generated using a second camera of a second modality. Pixels that are common between the two images are identified. Textures for the common pixels are determined. Saliencies of the two images are determined, where the saliencies reflect amounts of texture variation present in those images. An alpha map is generated and reflects edge detection weights that have been computed for each one of the common pixels based on the two saliencies. A determination is made as to how much texture from the first and/or second images to use to generate an enhanced image. This determining process is based on the edge detection weights included within the alpha map. Based on the edge detection weights, textures are merged from the common pixels to generate the enhanced image.

VIRTUAL USER INPUT CONTROLS IN A MIXED REALITY ENVIRONMENT
20220083198 · 2022-03-17 ·

A wearable display system can automatically recognize a physical remote or a device that the remote serves using computer vision techniques. The wearable system can generate a virtual remote with a virtual control panel viewable and interactable by user of the wearable system. The virtual remote can emulate the functionality of the physical remote. The user can select a virtual remote for interaction, for example, by looking or pointing at the parent device or its remote control, or by selecting from a menu of known devices. The virtual remote may include a virtual button, which is associated with a volume in the physical space. The wearable system can detect that a virtual button is actuated by determining whether a portion of the user's body (e.g., the user's finger) has penetrated the volume associated with the virtual button.

MAP CONSTRUCTING METHOD, POSITIONING METHOD AND WIRELESS COMMUNICATION TERMINAL
20220114750 · 2022-04-14 ·

According to embodiments of the present disclosure, a map constructing method, a positioning method, and a wireless communication terminal are provided. The map constructing method includes: a series of environment images of a current; first image feature information of the environment image is obtained, where the first image feature information includes feature point information and descriptor information and based on the first image feature information, a feature point matching is performed on the environment images to select keyframe images; depth information of matched feature points in the keyframe image are acquired, based on the feature point information; and map data of the current environment are generated based on the keyframe images, where the map data includes the image feature information and the depth information of the keyframe image.

Generative adversarial network based modeling of text for natural language processing

Mechanisms are provided to implement a generative adversarial network (GAN) for natural language processing. With these mechanisms, a generator neural network of the GAN is configured to generate a bag-of-ngrams (BoN) output based on a noise vector input and a discriminator neural network of the GAN is configured to receive a BoN input, where the BoN input is either the BoN output from the generator neural network or a BoN input associated with an actual portion of natural language text. The mechanisms further configure the discriminator neural network of the GAN to output an indication of a probability as to whether the input BoN is from the actual portion of natural language text or is the BoN output of the generator neural network. Moreover, the mechanisms train the generator neural network and discriminator neural network based on a feedback mechanism that compares the output indication from the discriminator neural network to an indicator of whether the input BoN is from the actual portion of natural language text of the BoN output of the generator neural network.

Framework for training machine-learned models on extremely large datasets

A MapReduce-based training framework exploits both data parallelism and model parallelism to scale training of complex models. Particular model architectures facilitate and benefit from use of such training framework. As one example, a machine-learned model can include a shared feature extraction portion configured to receive and process a data input to produce an intermediate feature representation and a plurality of prediction heads that are configured to receive and process the intermediate feature representation to respectively produce a plurality of predictions. For example, the data input can be a video and the plurality of predictions can be a plurality of classifications for content of the video (e.g., relative to a plurality of classes).

Retail store with sensor-fusion enhancements

In one aspect, a retail store includes a multitude of cameras, including a plurality of 3D cameras, and a plurality of other cameras. Certain of the cameras provide imagery from which a shopper's track through the store is monitored, and certain of the cameras are positioned to detect removal of items from store shelves. The store also includes a computer system that provides a database of information about store layout, indicating stock locations of different items. The computer system receives imagery from the cameras (or information derived from such imagery) and uses this data, together with information from the database and information derived from other sensors in the store, to produce a probabilistic tally of items selected by a store shopper. This tally includes an item bearing a barcode, but is produced without reading the barcode. Each item on the tally is associated with a confidence score that meets a computer system-determined threshold. A great number of other features and arrangements are also detailed.

DIAGNOSTIC TOOL FOR DEEP LEARNING SIMILARITY MODELS
20220101035 · 2022-03-31 ·

A diagnostic tool for deep learning similarity models and image classifiers provides valuable insight into neural network decision-making. A disclosed solution generates a saliency map by: receiving a baseline image and a test image; determining, with a convolutional neural network (CNN), a first similarity between the baseline image and the test image; based on at least determining the first similarity, determining, for the test image, a first activation map for at least one CNN layer; based on at least determining the first similarity, determining, for the test image, a first gradient map for the at least one CNN layer; and generating a first saliency map as an element-wise function of the first activation map and the first gradient map. Some examples further determine a region of interest (ROI) in the first saliency map, cropping the test image to an area corresponding to the ROI, and determine a refined similarity score.

Cart-based shopping arrangements employing probabilistic item identification

In one aspect, a retail store has multiple sensors, including item sensors in a shopping cart for gathering data from a shopper-selected first item. At least certain of the sensor data is provided to a classifier, which was previously-trained (using data including optical data from known items) to identify possible item matches corresponding to data sensed from the first item. An item identification hypothesis that the shopper-selected first item has a particular identity is evaluated based on (a) information from the classifier, and (b) store layout data indicating items associated with a store location visited by the cart or shopper. The item identification hypothesis has a confidence score. If the score meets a criterion, an item of the hypothesized identity is added to a shopping tally. A great number of other features and arrangements are also detailed.

Virtual user input controls in a mixed reality environment
11150777 · 2021-10-19 · ·

A wearable display system can automatically recognize a physical remote or a device that the remote serves using computer vision techniques. The wearable system can generate a virtual remote with a virtual control panel viewable and interactable by user of the wearable system. The virtual remote can emulate the functionality of the physical remote. The user can select a virtual remote for interaction, for example, by looking or pointing at the parent device or its remote control, or by selecting from a menu of known devices. The virtual remote may include a virtual button, which is associated with a volume in the physical space. The wearable system can detect that a virtual button is actuated by determining whether a portion of the user's body (e.g., the user's finger) has penetrated the volume associated with the virtual button.

Memory Identification and Recovery Method and System Based on Recognition
20210319877 · 2021-10-14 ·

A memory identification and recovery device, based on recognition, includes a storing means stores memory data, an interaction means interacts with users, a feature marks generation means generates feature marks according the input from the interaction means, a search means searches memory data in the storing means, and a scene enhance means enhances the memory information of the user through utilizing the memory information of other users.