G06V30/274

REFORM INPUT IN FLOW EXECUTION
20220382994 · 2022-12-01 ·

Systems and processes for operating an intelligent automated assistant are provided. An example method includes, at an electronic device having one or more processors and memory, receiving an utterance including a user request, determining a natural language representation of the user request, determining a first software process associated with the natural language representation, determining whether the natural language representation can be executed by a task flow of the first software process, and in accordance with a determination that the natural language representation cannot be executed by the task flow of the first software process: determining a set of transformation instructions, determining a revised natural language representation using the set of transformation instructions, and providing the revised natural language representation to a second software process.

Sensor based semantic object generation

Provided are methods, systems, and devices for generating semantic objects and an output based on the detection or recognition of the state of an environment that includes objects. State data, based in part on sensor output, can be received from one or more sensors that detect a state of an environment including objects. Based in part on the state data, semantic objects are generated. The semantic objects can correspond to the objects and include a set of attributes. Based in part on the set of attributes of the semantic objects, one or more operating modes, associated with the semantic objects can be determined. Based in part on the one or more operating modes, object outputs associated with the semantic objects can be generated. The object outputs can include one or more visual indications or one or more audio indications.

Method and apparatus for determining user intent

The disclosed embodiments describe methods, systems, and apparatuses for determining user intent. A method is disclosed comprising obtaining a session text of a user; calculating, by the processor, a feature vector based on the session text; determining probabilities that the session text belongs to a plurality of intent labels, the probabilities calculated using a multi-level hierarchal intent classification model, the intent labels assigned to levels in the multi-level hierarchal intent classification model; and assigning a user intent to the session text based on the probabilities.

Teaching GAN (generative adversarial networks) to generate per-pixel annotation

A method and apparatus for joint image and per-pixel annotation synthesis with a generative adversarial network (GAN) are provided. The method includes: by inputting data to a generative adversarial network (GAN), obtaining a first image from the GAN; inputting, to a decoder, a first feature value that is obtained from at least one intermediate layer of the GAN according to the inputting of the data to the GAN; and obtaining a first semantic segmentation mask from the decoder according to the inputting of the first feature value to the decoder.

Learning apparatus, operation program of learning apparatus, and operation method of learning apparatus
11594056 · 2023-02-28 · ·

A learning apparatus learns a machine learning model for performing semantic segmentation of determining a plurality of classes in an input image in units of pixels by extracting, for each layer, features which are included in the input image and have different frequency bands of spatial frequencies. A learning data analysis unit analyzes the frequency bands included in an annotation image of learning data. A learning method determination unit determines a learning method using the learning data based on an analysis result of the frequency bands by the learning data analysis unit. A learning unit learns the machine learning model via the determined learning method using the learning data.

Self-supervised hierarchical motion learning for video action recognition

There are numerous features in video that can be detected using computer-based systems, such as objects and/or motion. The detection of these features, and in particular the detection of motion, has many useful applications, such as action recognition, activity detection, object tracking, etc. The present disclosure provides a neural network that learns motion from unlabeled video frames. In particular, the neural network uses the unlabeled video frames to perform self-supervised hierarchical motion learning. The present disclosure also describes how the learned motion can be used in video action recognition.

Method and apparatus for generating context information

A memory stores therein a document and a plurality of word vectors that are word embeddings respectively computed for a plurality of words. A processor extracts, with respect to one of the words, two or more surrounding words within a prescribed range from one occurrence position where the one word occurs, from the document, and computes a sum vector by adding word vectors corresponding to the surrounding words. The processor determines a parameter such as to predict the surrounding words from the sum vector and the parameter using a machine learning model. The processor stores the parameter as context information for the one occurrence position, in association with the word vector corresponding to the one word.

Method for generating tag of video, electronic device, and storage medium

A method for generating a tag of a video, an electronic device, and a storage medium are related to a field of natural language processing and deep learning technologies. The detailed implementing solution includes: obtaining multiple candidate tags and video information of the video; determining first correlation information between the video information and each of the multiple candidate tags; sorting the multiple candidate tags based on the first correlation information to obtain a sort result; and generating the tag of the video based on the sort result.

System and method for synthetic image generation with localized editing

Embodiments described herein provide a system for generating synthetic images with localized editing. During operation, the system obtains a source image and a target image for image synthesis and selects a semantic element from the source image. The semantic element indicates a semantically meaningful part of an object depicted in the source image. The system then determines the style information associated with the source and target images. Subsequently, the system generates a synthetic image by transferring the style of the semantic element from the source image to the target image based on the feature representations. In this way, the system can facilitate localized editing of the target image.

DETERMINING OBJECT MOBILITY PARAMETERS USING AN OBJECT SEQUENCE
20230057118 · 2023-02-23 ·

A system can use semantic images, lidar images, and/or 3D bounding boxes to determine mobility parameters for objects in the semantic image. In some cases, the system can generate virtual points for an object in a semantic image and associate the virtual points with lidar points to form denser point clouds for the object. The denser point clouds can be used to estimate the mobility parameters for the object. In certain cases, the system can use semantic images, lidar images, and/or 3D bounding boxes to determine an object sequence for an object. The object sequence can indicate a location of the particular object at different times. The system can use the object sequence to estimate the mobility parameters for the object.