G06V10/806

TRAINING METHOD FOR CHARACTER GENERATION MODEL, CHARACTER GENERATION METHOD, APPARATUS AND STORAGE MEDIUM

Provided is a training method for a character generation model, a character generation method, apparatus and device, which relate to the technical field of artificial intelligences, particularly, the technical field of computer vision and deep learning. The specific implementation scheme includes: a first training sample is acquired, a target model is trained based on the first training sample, and a first character confrontation loss is acquired; a second training sample is acquired, the target model is trained based on the second training sample, and a second character confrontation loss, a component classification loss and a style confrontation loss are acquired; and a parameter of the character generation model is adjusted according to the first character confrontation loss, the second character confrontation loss, the component classification loss and the style confrontation loss.

THREE-DIMENSIONAL LOCATION PREDICTION FROM IMAGES

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicting three-dimensional object locations from images. One of the methods includes obtaining a sequence of images that comprises, at each of a plurality of time steps, a respective image that was captured by a camera at the time step; generating, for each image in the sequence, respective pseudo-lidar features of a respective pseudo-lidar representation of a region in the image that has been determined to depict a first object; generating, for a particular image at a particular time step in the sequence, image patch features of the region in the particular image that has been determined to depict the first object; and generating, from the respective pseudo-lidar features and the image patch features, a prediction that characterizes a location of the first object in a three-dimensional coordinate system at the particular time step in the sequence.

SYSTEM AND METHOD FOR TRACKING DETECTED OBJECTS

Systems and methods for tracking objects are disclosed herein. In one embodiment, a system having a processor merges features of detected objects extracted from a point cloud and a corresponding image to generate fused features for the detected objects, generates a learned distance metric for the detected objects using the fused features, determines matched detected objects and unmatched detected objects, applies prior tracking identifiers of the detected objects at the prior time to the matched detected objects, determines a confidence score for the fused features of the unmatched detected objects, and applies new tracking identifiers to the unmatched detected objects based on the confidence score.

OBJECT DETECTION NETWORK AND METHOD
20220180088 · 2022-06-09 ·

An object detection network includes: a hybrid voxel feature extractor configured to acquire a raw point cloud, extract a hybrid scale voxel feature from the raw point cloud, and project the hybrid scale voxel feature to generate a pseudo-image feature map; a backbone network configured to perform a hybrid voxel scale feature fusion by using the pseudo-image feature map to generate multi-class pyramid features; and a detection head configured to predict a three-dimensional object box of a corresponding class according to the multi-class pyramid features. The object detection network can effectively solve a problem that under a single voxel scale, inference time is longer if the voxel scale is smaller, and an intricate feature cannot be captured and a smaller object cannot be accurately located if the voxel scale is larger. Different classes of 3D objects can be detected quickly and accurately in a 3D scene.

COLOR REPRESENTATIONS FOR TEXTUAL PHRASES

Systems and methods for color representation are described. Embodiments of the inventive concept are configured to receive an attribute-object pair including a first term comprising an attribute label and a second term comprising an object label, encode the attribute-object pair to produce encoded features using a neural network that orders the first term and the second term based on the attribute label and the object label, and generate a color profile for the attribute-object pair based on the encoded features, wherein the color profile is based on a compositional relationship between the first term and the second term.

VIDEO OBJECT DETECTION AND TRACKING METHOD AND APPARATUS

A target object detection method and apparatus are provided. The target object detection method and apparatus are applicable to fields such as artificial intelligence, object tracking, object detection, and image processing. An object is detected from a frame image of a video including a plurality of frame images based on a target template set including one or more target templates.

Video synthesis method, model training method, device, and storage medium

Embodiments of this application disclose methods, systems, and devices for video synthesis. In one aspect, a method comprises obtaining a plurality of frames corresponding to source image information of a first to-be-synthesized video, each frame of the source image information. The method also comprises obtaining a plurality of frames corresponding to target image information of a second to-be-synthesized video. For each frame of the plurality of frames corresponding to the target image information of the second to-be-synthesized video, the method comprises fusing a respective source image from the first to-be-synthesized video, a corresponding source motion key point, and a respective target motion key point corresponding to the frame using a pre-trained video synthesis model, and generating a respective output image in accordance with the fusing. The method further comprises repeating the fusing and the generating steps for the second to-be-synthesized video to produce a synthesized video.

Cognitive classification of workload behaviors in multi-tenant cloud computing environments

One embodiment provides a method comprising receiving data relating to a tenant utilizing a cloud computing environment, and determining one or more classifications for a variation in current workload resource consumption of the tenant based on the data. The current workload resource consumption is indicative of current usage of one or more computing resources of the cloud computing environment.

Multisensor data fusion method and apparatus to obtain static and dynamic environment features

A multisensor data fusion perception method includes receiving feature data from a plurality of types of sensors, obtaining static feature data and dynamic feature data from the feature data, constructing current static environment information based on the static feature data and reference dynamic target information, and constructing current dynamic target information based on the dynamic feature data and reference static environment information such that construction of a dynamic target and construction of a static environment are performed by referring to each other's construction results and the perception capability is for the dynamic target and the static environment that are in an environment in which the moving carrier is located.

Temporally distributed neural networks for video semantic segmentation

A Video Semantic Segmentation System (VSSS) is disclosed that performs accurate and fast semantic segmentation of videos using a set of temporally distributed neural networks. The VSSS receives as input a video signal comprising a contiguous sequence of temporally-related video frames. The VSSS extracts features from the video frames in the contiguous sequence and based upon the extracted features, selects, from a set of labels, a label to be associated with each pixel of each video frame in the video signal. In certain embodiments, a set of multiple neural networks are used to extract the features to be used for video segmentation and the extraction of features is distributed among the multiple neural networks in the set. A strong feature representation representing the entirety of the features is produced for each video frame in the sequence of video frames by aggregating the output features extracted by the multiple neural networks.