G06V10/80

METHOD OF PERFORMING OBJECT SEGMENTATION ON VIDEO USING SEMANTIC SEGMENTATION MODEL, DEVICE AND STORAGE MEDIUM

A method of performing an object segmentation on a video using a semantic segmentation model, a device, and a storage medium, which relate to a field of artificial intelligence, in particular to computer vision and deep learning technologies. The method includes: sequentially inputting a current video frame and a previous video frame into a first feature extraction network to obtain a feature map sequence; sequentially inputting object segmentation information of the previous video frame into a second feature extraction network to obtain a segmentation feature sequence; sequentially inputting the current video frame and the previous video frame into a temporal encoding network to obtain a temporal feature sequence; generating a fused feature sequence based on the feature map sequence, the segmentation feature sequence and the temporal feature sequence; and inputting the fused feature sequence into a segmentation network to obtain an object segmentation information of the current video frame.

IMAGE RECOGNITION METHOD AND APPARATUS, TRAINING METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM
20220375207 · 2022-11-24 ·

An image recognition method and apparatus, a training method, an electronic device, and a storage medium are provided. The image recognition method includes: acquiring an image to be recognized, the image to be recognized including a target text; and determining text content of the target text based on knowledge information and image information of the image to be recognized.

UNMANNED FORKLIFT
20220375206 · 2022-11-24 ·

An image obtaining section obtains a taken image from an imaging device. A pallet type identification section has a learning model for combinations of images of a plurality of types of pallets and types of the pallets, and identifies a type of a target pallet by inputting, to the learning model, the taken image of the target pallet, which is obtained by the image obtaining section. A pallet position/shape obtaining section obtains position/shape data of the target pallet from a distance measuring device for measuring a distance to the target pallet. A pallet deviation detection section previously stores position/shape data of the pallets and performs comparison between the stored position/shape data corresponding to the identified type of the target pallet and the position/shape data of the target pallet.

Automatic robotically steered camera for targeted high performance perception and vehicle control
11592832 · 2023-02-28 · ·

Disclosed are methods, systems, and non-transitory computer readable media that control an autonomous vehicle via at least two sensors. One aspect includes capturing an image of a scene ahead of the vehicle with a first sensor, identifying an object in the scene at a confidence level based on the image, determining the confidence level of the identifying is below a threshold, in response to the confidence level being below the threshold, directing a second sensor having a field of view smaller than the first sensor to generate a second image including a location of the identified object, further identifying the object in the scene based on the second image, controlling the vehicle based on the further identification of the object.

Flexible multi-channel fusion perception
11592565 · 2023-02-28 · ·

A method may include obtaining first sensor data from a first sensor system and second sensor data from a second sensor system. The first and the second sensor systems may capture sensor data from a total measurable world. The method may include identifying a first object included in the first sensor data and a second object included in the second sensor data and determining first parameters corresponding to the first object and second parameters corresponding to the second object. The first parameters may be compared with the second parameters and whether the first object and the second object are a same object may be determined based on the comparing the first parameters and the second parameters. Responsive to determining that the first object and the second object are the same object, a set of objects representative of objects in the total measurable world including the same object may be generated.

APPARATUS AND METHOD FOR IMAGE CLASSIFICATION AND SEGMENTATION BASED ON FEATURE-GUIDED NETWORK, DEVICE, AND MEDIUM
20230055256 · 2023-02-23 · ·

The present invention provides an apparatus and method for image classification and segmentation based on a feature-guided network, a device, and a medium, and belongs to the technical field of deep learning. A feature-guided classification network and feature-guided segmentation network of the present invention include basic unit blocks. A local feature is enhanced and a global feature is extracted among the basic unit blocks. This resolves a problem that features are not fully utilized in existing image classification and image segmentation network models. In this way, a trained feature-guided classification network and feature-guided segmentation network have better effects and are more robust. The present invention selects the feature-guided classification network or the feature-guided segmentation network based on a requirement of an input image and outputs a corresponding category or segmented image, to resolve a problem that the existing classification or segmentation network model has an unsatisfactory classification or segmentation effect.

DETECTING AN OBJECT IN AN IMAGE USING MULTIBAND AND MULTIDIRECTIONAL FILTERING

A detection method includes performing multiband filtering on a first area to obtain a plurality of band sub-images, the first area being an area in a first video frame, and performing multidirectional filtering on the plurality of band sub-images to obtain a plurality of direction sub-images. The method further includes acquiring a direction-band fused feature of the first area according to the plurality of direction sub-images, and inputting the direction-band fused feature into a detection model, and performing detection based on the direction-band fused feature using the detection model to detect whether the first area comprises an object.

IMAGE GAZE CORRECTION METHOD, APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT

An image gaze correction method, apparatus, electronic device, computer-readable storage medium, and computer program product. The image gaze correction method includes: acquiring a to-be-corrected eye image from a to-be-corrected image, generating, based on the to-be-corrected eye image, an eye motion flow field and an eye contour mask, the eye motion flow field being used for adjusting a pixel position in the to-be-corrected eye image, and the eye contour mask being used for indicating a probability that the pixel position in the to-be-corrected eye image belongs to an eye region, performing, based on the eye motion flow field and the eye contour mask, gaze correction processing on the to-be-corrected eye image to obtain a corrected eye image, and generating a gaze corrected image based on the corrected eye image.

TERM WEIGHT GENERATION METHOD, APPARATUS, DEVICE AND MEDIUM
20230057010 · 2023-02-23 ·

A term weight determination method includes: obtaining a video and video-associated text, the video-associated text including at least one term; generating a halfway vector of the term by performing multimodal feature fusion on the features of the video, the video-associated text and the at least one term; and generating the weight of the at least one term based on the halfway vector of the at least one term.

METHODS, SYSTEMS, AND MEDIA FOR GENERATING VIDEO CLASSIFICATIONS USING MULTIMODAL VIDEO ANALYSIS

Methods, systems, and media for generating video classifications using multimodal video analysis are provided. In some embodiments, a method for classifying videos comprising: receiving, from a computing device, a video identifier; parsing a video associated with the video identifier into an audio portion and a plurality of image frames; analyzing the plurality of images frames associated with the video using (i) an optical character recognition technique to obtain first textual information corresponding to text appearing in at least one of the plurality of image frames and (ii) an image classifier to obtain, for each of a plurality of objects appearing in at least one of the plurality of frames of the video, a probability that an object appearing in at least one of the plurality of images falls within an image class; concurrently with analyzing the plurality of image frames associated with the video, analyzing the audio portion of the video using an automated speech recognition technique to obtain second textual information corresponding to words spoken in the video; combining the first textual information, the probability of each of the plurality of objects appearing in the at least one of the plurality of frames of the video, and the second textual information to obtain a combined analysis output for the video; determining, using a neural network, a safety score for each of a plurality of categories that the video contains content belonging to a category of the plurality of categories, wherein the combined analysis output is input into the neural network; and, in response to receiving the video identifier, transmitting a plurality of safety scores corresponding to the plurality of categories to the computing device for the video associated with the video identifier.