G06V10/80

LABEL IDENTIFICATION METHOD AND APPARATUS, DEVICE, AND MEDIUM
20230230400 · 2023-07-20 ·

Provided are a label identification method and apparatus, a device, and a medium. The method includes: obtaining a target feature of a first image, in which the target feature characterizes a visual feature of the first image and a word feature of at least one label; and identifying a label of the first image from the at least one label based on the target feature. By characterizing the visual feature of the first image and the target feature of the word feature of the at least one label, the label of the first image is identified from the at least one label. Thus, identification accuracy of the label can be improved.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
20230230368 · 2023-07-20 ·

The present technology relates to an information processing apparatus, an information processing method, and a program capable of obtaining a distance to an object more accurately.

An extraction unit extracts, on the basis of an object recognised in an imaged image obtained by a camera, sensor data corresponding to an object region including an object in the imaged image among sensor data obtained by a rangefinding sensor. The present technology can be applied to an evaluation apparatus for distance information, for example.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM
20230230368 · 2023-07-20 ·

The present technology relates to an information processing apparatus, an information processing method, and a program capable of obtaining a distance to an object more accurately.

An extraction unit extracts, on the basis of an object recognised in an imaged image obtained by a camera, sensor data corresponding to an object region including an object in the imaged image among sensor data obtained by a rangefinding sensor. The present technology can be applied to an evaluation apparatus for distance information, for example.

METHOD FOR GENERATING AT LEAST ONE BIRD'S EYE VIEW REPRESENTATION OF AT LEAST A PART OF THE ENVIRONMENT OF A SYSTEM
20230230385 · 2023-07-20 ·

A method for generating at least one representation of a bird's eye view of at least a part of the environment of a system, based on at least one or more digital image representations obtained from at least one or more cameras of the system. The method comprises: a) obtaining a digital image representation (2) advantageously representing a single digital image, together with at least one camera parameter of the camera that captured the image, b) extracting at least one feature from the digital image representation, wherein features are generated in different scales, c) transforming the at least one feature from the image space into a bird's eye view space, to obtain at least one bird's eye view feature.

Interactive assistant

An interactive troubleshooting assistant and method for troubleshooting a system in real time to repair (fix) one or more problems in a system is disclosed. The interactive troubleshooting assistant and method may include receiving multimodal inputs from sensors, wearable devices, a person, etc. that may be input into a feature extractor including attention layers and pre-processing units of a cloud computing system hosted by one or more servers, such as a private cloud system. A pre-processing unit converts the raw multimodal input into a structed form so that an attention layer can give weights to features provided by the pre-processing unit according to their importance. The weighted extracted features may be provided to an actions predictor. The actions predictor generates the most suitable action based on the weighted extracted features generated by the feature extractor based on the multimodal inputs. After the most suitable action is performed, the interactive troubleshooting assistant considers new information from multimodal inputs so that the interactive troubleshooting assistant can provide the next recommended action. The interactive troubleshooting assistant may repeat these operations until the repair is completed.

Visual, depth and micro-vibration data extraction using a unified imaging device
11706377 · 2023-07-18 · ·

A unified imaging device used for detecting and classifying objects in a scene including motion and micro-vibrations by receiving a plurality of images of the scene captured by an imaging sensor of the unified imaging device comprising a light source adapted to project on the scene a predefined structured light pattern constructed of a plurality of diffused light elements, classifying object(s) present in the scene by visually analyzing the image(s), extracting depth data of the object(s) by analyzing position of diffused light element(s) reflected from the object(s), identifying micro-vibration(s) of the object(s) by analyzing a change in a speckle pattern of the reflected diffused light element(s) in at least some consecutive images and outputting the classification, the depth data and data of the one or more micro-vibrations which are derived from the analyses of images captured by the imaging sensor and are hence inherently registered in a common coordinate system.

Robotic interactions for observable signs of intent

Described herein are assistant robots that anticipate needs of one or more people (or animals). The assistant robots may recognize a current activity, knowledge of the person's routines, and contextual information. As such, the assistant robots can provide or offer to provide appropriate robotic assistance. The assistant robots can learn users' habits or be provided with knowledge regarding humans in its environment. The assistant robots develop a schedule and contextual understanding of the persons' behavior and needs. The assistant robots may interact, understand, and communicate with people before, during, or after providing assistance. The robot can combine gesture, clothing, emotional aspect, time, pose recognition, action recognition, and other observational data to understand people's medical condition, current activity, and future intended activities and intents.

Classifying time series image data

The present invention extends to methods, systems, and computer program products for classifying time series image data. Aspects of the invention include encoding motion information from video frames in an eccentricity map. An eccentricity map is essentially a static image that aggregates apparent motion of objects, surfaces, and edges, from a plurality of video frames. In general, eccentricity reflects how different a data point is from the past readings of the same set of variables. Neural networks can be trained to detect and classify actions in videos from eccentricity maps. Eccentricity maps can be provided to a neural network as input. Output from the neural network can indicate if detected motion in a video is or is not classified as an action, such as, for example, a hand gesture.

Method for processing cross matching image based on deep learning and apparatus thereof
11704920 · 2023-07-18 · ·

The present disclosure relates to method and apparatus for processing cross matching image based on deep learning.

Method for processing cross matching image based on deep learning and apparatus thereof
11704920 · 2023-07-18 · ·

The present disclosure relates to method and apparatus for processing cross matching image based on deep learning.