G06V10/464

Image processing apparatus and non-transitory computer readable medium

An image processing apparatus includes a unifying unit, a memory, a storing unit, a setting unit, a selecting unit, an extracting, and a determining unit. The unifying unit unifies images of identification target regions cut out from a learning image. The memory stores a learning model. The storing unit stores identification target images converted into images of different image sizes. The setting unit sets a position and a size of a candidate region which is likely to include an identification target object of an identification target image. The selecting unit selects an identification target image of an image size with which the size of the cut-out candidate region is closest to the fixed size. The extracting unit extracts the information. The determining unit determines a target object included in the image of the candidate region.

Low-latency gesture detection

Low-latency gesture detection is described, for example, to compute a gesture class from a live stream of image frames of a user making a gesture, for example, as part of a natural user interface controlling a game system or other system. In examples, machine learning components are trained to learn gesture primitives and at test time, are able to detect gestures using the learned primitives, in a fast, accurate manner. For example, a gesture primitive is a latent (unobserved) variable features of a subset of frames from a sequence of frames depicting a gesture. For example, the subset of frames has many fewer frames than a sequence of frames depicting a complete gesture. In various examples gesture primitives are learnt from instance level features computed by aggregating frame level features to capture temporal structure. In examples frame level features comprise body position and body part articulation state features.

Steering Seismic Texture Analysis Algorithms Using Expert Input

A method is provided, the method including: displaying an image on a display; detect a user input corresponding to one or more portions of the image; analyzing the user input to determine at least one feature vector corresponding to the user input; and determining a classification for the one or more portions of the image based at least on the at least one feature vector.

Fast efficient vocabulary computation with hashed vocabularies
09870383 · 2018-01-16 · ·

The disclosed embodiments describe a method, an apparatus, an application specific integrated circuit, and a server that provides a fast and efficient look up for data analysis. The apparatus and server may be configured to obtain data segments from a plurality of input devices. The data segments may be individual unique subsets of the entire data set obtained by a plurality input devices. A hash function may be applied to an aggregated set of the data segments. A result of the hash function may be stored in a data structure. A codebook may be generated from the hash function results.

Virtual user input controls in a mixed reality environment
12175054 · 2024-12-24 · ·

A wearable display system includes a mixed reality display for presenting a virtual image to a user, an outward-facing imaging system configured to image an environment of the user, and a hardware processor operably coupled to the mixed reality display and to the imaging system. The hardware processor is programmed to generate a virtual remote associated with a parent device, render the virtual remote and the virtual control element on the mixed reality display, determine when the user of the wearable system interacts with the virtual control element of the virtual remote, and perform certain functions in response to user interaction with a virtual control element of the virtual remote. These functions may include generation the virtual control element to move on the mixed reality display; and when movement of the virtual control element surpasses a threshold condition, generate a focus indicator for the virtual control element.

Methods and systems for processing documents with task-specific highlighting
12197863 · 2025-01-14 · ·

Methods and systems for automatically processing a document may include classifying a document, such as a medical document, as one or more document types based at least in part on one or more machine learning models and one or more tokens extracted from the medical document, determining a token contribution weight of each token towards the classification, modifying the medical document based on the token contribution weights of the one or more tokens, and displaying the modified medical document on a display to a user.

Systems and Methods for Collaborative Edge Computing
20250039987 · 2025-01-30 ·

A method and edge computing system configured to generate or process extended reality (XR) data. Processors in the edge computing system receive, process, and analyze a sensory feed from an optical device to generate analysis results that include a relative position of the optical device from surrounding objects. The processors generate mapper output results (that include virtual coordinates) based on the analysis results, request and receive information (e.g., salient points of interest, etc.), and compare the generated mapper output results to the received information to identify a correlation between a feature included in the processed sensory feed and a feature included in the received information. The processors generate and send augmented information to a renderer and/or send the processed sensory feed to a cloud object recognizer for further processing.

Three-dimensional facial recognition method and system

The present disclosure provides a three-dimensional facial recognition method and system. The method includes: performing pose estimation on an input binocular vision image pair by using a three-dimensional facial reference model, to obtain a pose parameter and a virtual image pair of the three-dimensional facial reference model with respect to the binocular vision image pair; reconstructing a facial depth image of the binocular vision image pair by using the virtual image pair as prior information; detecting, according to the pose parameter, a local grid scale-invariant feature descriptor corresponding to an interest point in the facial depth image; and generating a recognition result of the binocular vision image pair according to the detected local grid scale-invariant feature descriptor and training data having attached category annotations. The present disclosure can reduce computational costs and required storage space.

Global-scale damage detection using satellite imagery
09858479 · 2018-01-02 · ·

A system for performing global-scale damage detection using satellite imagery, comprising a damage detection server that receives and analyzes image data to identify objects within an image via a curated computational method, and a curation interface that enables a user to curate image information for use in object identification, and a method for a curated computational method for performing global scale damage detection.

IMAGE PROCESSING AND MATCHING

A configured machine performs image matching and retrieval of natural images that may depict logos. The machine generates and uses color-localized spatial masks, which may be computationally less expensive than spatial verification techniques. Key points are detected within images that form a reference database of images. Local masks are defined by the machine around each key point based on the scale and orientation of the key point. To utilize color information presented in logo images, ordered color histograms may be extracted by the machine from locally masked regions of each image. A cascaded index may then be constructed for both visual descriptors and color histograms. For faster matching, the cascaded index maps the visual descriptors and color histograms to a list of relevant or similar images. This list may then be ranked to generate relevant matches for an input query image.