G06V10/76

DISENTANGLED RECURRENT REPRESENTATION LEARNING FOR VIDEO GENERATION
20250209781 · 2025-06-26 ·

A method for video generation in machine learning is provided. The method includes encoding an input audio into a plurality of audio features, encoding a first pose state into a first pose feature, constructing a first latent encoding having the audio features and the first pose feature, encoding a second pose state into a second pose feature, constructing a second latent encoding having the audio features and the second pose feature, decoding features in the first latent encoding in to first sequences, decoding features in the second latent encoding in to second sequences, and rendering a video based on the first sequences. The first pose feature, the second pose feature, and each of the audio features respectively corresponds to one frame. The first pose state is different from the second pose state.

System and method for using artificial intelligence to enable elevated temperature detection of persons using commodity-based thermal cameras

A multi-sensor threat detection system and method for elevated temperature detection using commodity-based thermal cameras and mask wearing compliance using optical cameras. The proposed method does not rely on the accuracy of thermal cameras, but the combination of mathematics, statistics, machine learning, artificial intelligence, computer vision and Manifold learning to construct a classifier, or set of classifiers, that are able to, either alone or working as an ensemble, evaluate a person as being normal temperature or elevated temperature by virtue of how they present to the camera vs. any absolute temperature measurements from the camera itself.

Image based assessment for dental treatment monitoring

Systems and methods for monitoring a dental patient's progress during treatment. A first teeth mask for a captured 2D image of teeth at a particular time during treatment and a second teeth mask for an expected 2D image of the teeth may be generated. The expected 2D image may be generated from an expected 3D model representing an expected configuration of the teeth at the particular time. The captured 2D image and the expected 2D image may be compared, with the first and second teeth masks aligned, to determine whether the teeth are within a threshold level of correspondence to the expected configuration. An indication as to whether the dental treatment is proceeding as expected based on whether the configuration of the teeth is within the threshold level of correspondence may be provided.

SHELL OR OFFSET FEATURE DETECTION IN A 3D MODEL REPRESENTING A MECHANICAL PART
20250336176 · 2025-10-30 · ·

A computer-implemented method for shell or offset feature detection in a 3D model representing a mechanical part. The method comprises obtaining a segmentation of the 3D model into segments. The method further comprises browsing all possible pairs of segments, and for each pair of segments, detecting whether the segments of the pair are offset to each other with an offset value. This constitutes an improved solution for shell or offset feature detection in a 3D model representing a mechanical part.

TEXT BASED IMAGE SEARCH
20250329184 · 2025-10-23 ·

Method and system for building a machine learning model for finding visual targets from text queries, the method comprising the steps of receiving a set of training data comprising text attribute labelled images, wherein each image has more than one text attribute label. Receiving a first vector space comprising a mapping of words, the mapping defining relationships between words. Generating a visual feature vector space by grouping images of the set of training data having similar attribute labels. Mapping each attribute label within the training data set on to the first vector space to form a second vector space. Fusing the visual feature vector space and the second vector space to form a third vector space. Generating a similarity matching model from the third vector space

Method and system for identifying objects

The present disclosure provides methods and/or systems for identifying an object. An example method includes: generating a plurality of synthesized images according to a three-dimensional digital model, the plurality of synthesized images having different view angles; respectively extracting eigenvectors of the plurality of synthesized images; generating a first fused vector by fusing the eigenvectors of the plurality of synthesized images; inputting the first fused vector into a classifier to train the classifier; acquiring a plurality of pictures of the object, the plurality of pictures respectively having same view angles as at least a portion of the plurality of synthesized images; respectively extracting eigenvectors of the plurality of pictures; generating a second fused vector by fusing the eigenvectors of the plurality of pictures; and inputting the second fused vector into the trained classifier to obtain a classification result of the object.

Method of and system for performing object recognition in data acquired by ultrawide field of view sensors
12482254 · 2025-11-25 · ·

There is provided a method and system for training an object recognition machine learning model to perform object recognition in data acquired by ultrawide field of view (UW FOV) sensors to thereby obtain a distortion-aware object recognition model. The object recognition model comprises convolution layers each associated with a set of kernels. During training on a UW FOV labelled training dataset, deformable kernels are learned in a manifold space, mapped back to Euclidian space and used to perform convolutions to obtain output feature maps which are used to perform object recognition predictions. Model parameters of the distortion-aware object recognition model may be transferred to other architectures of object recognition models, which may be further compressed for deployment on embedded systems such as electronic devices on board autonomous vehicles.

IMAGE BASED ASSESSMENT FOR DENTAL TREATMENT MONITORING

Dental treatment monitoring systems and methods may include accessing an input image of teeth taken at a particular time during dental treatment, and determining virtual-camera parameters that represent an estimated position and orientation of a virtual camera for producing a generated image from a time-projected 3D model of the teeth. The virtual-camera parameters may be iteratively adjusted by: generating a first generated image by modifying the virtual-camera parameters based on a first jaw in the generated image; determining a pixel-associated cost based on a comparison of the first generated image to the input image; generating a second generated image by modifying the first virtual-camera parameters based a second jaw in the first generated image; and determining a pixel-associated cost based on a comparison of the second generated image and the input image. The generated image may be generated from the time-projected 3D model using the adjusted virtual-camera parameters.

Information processing apparatus, information processing method, and storage medium

An information processing apparatus that performs control related to movement of a moving object configured to measure its own position includes a memory storing instructions, and at least one processor that, upon execution of the instructions, is configured to operate as a first acquisition unit configured to acquire environmental information about an environment where the moving object moves, an estimation unit configured to estimate first positional information indicating that a region subjected to measurement accuracy degradation is below a threshold value based on the environmental information, and a determination unit configured to determine content of control information based on the first positional information.

Content processing method and apparatus, computer device, and storage medium

A content processing method is disclosed, including: obtaining a description text of to-be-processed content and an image included in the to-be-processed content; performing feature extraction on the description text based on text semantics to obtain a text eigenvector; performing feature extraction on the image based on image semantics to obtain an image eigenvector; combining the text eigenvector with the image eigenvector to obtain an image-text multi-modal vector; and generating an estimated click-through rate of the to-be-processed content according to the image-text multi-modal vector.