G06V30/274

Floorplan generation based on room scanning

Various implementations disclosed herein include devices, systems, and methods that generate floorplans and measurements using a three-dimensional (3D) representation of a physical environment generated based on sensor data.

Apparatus for generating annotated image information using multimodal input data, apparatus for training an artificial intelligence model using annotated image information, and methods thereof
11694021 · 2023-07-04 · ·

A method for providing a user interface (UI) for generating training data for an artificial intelligence (AI) model may include providing, for display via the UI, image information that depicts an object, a set of operations of the object, and a process associated with the set of operations. The method may include providing, for display via the UI, text information that describes the object, the set of operations of the object, and the process associated with the set of operations. The method may include receiving, via the UI, a user input that associates respective image information of the image information with corresponding text information of the text information. The method may include generating association information that associates the respective image information with the corresponding text information, based on the user input. The method may include generating discourse and semantic information from the text information associated to the image information.

Automatic identification of misleading videos using a computer network

Machine-based video classifying to identify misleading videos by training a model using a video corpus, obtaining a subject video from a content server, generating respective feature vectors of a title, a thumbnail, a description, and a content of the subject video, determining a first semantic similarities between ones of the feature vectors, determining a second semantic similarity between the title of subject video and titles of videos in the misleading video corpus in a same domain as the subject video, determining a third semantic similarity between comments of the subject video and comments of videos in the misleading video corpus in the same domain as the subject video, classifying the subject video using the model and based on the first semantic similarities, the second semantic similarity, and the third semantic similarity, and outputting the classification of the subject video to a user.

Intent detection with a computing device

A method can perform a process with a method including capturing an image, determining an environment that a user is operating a computing device, detecting a hand gesture based on an object in the image, determining, using a machine learned model, an intent of a user based on the hand gesture and the environment, and executing a task based at least on the determined intent.

Analytics system onboarding of web content

Analytics system onboarding of web content is described. In one example, an analytics onboarding system is configured to process web content to generate recommendations, automatically and without user intervention. The recommendations are configured to assist in mapping of web content variables in web content to data elements supported by an analytics system to generate metrics that describe occurrence of events as part of user interaction with web content.

TRAINING METHOD OF TEXT RECOGNITION MODEL, TEXT RECOGNITION METHOD, AND APPARATUS

The present disclosure provides a training method of a text recognition model, a text recognition method, and an apparatus, relating to the technical field of artificial intelligence, and specifically, to the technical field of deep learning and computer vision, which can be applied in scenarios such as optional character recognition, etc. The specific implementation solution is: performing mask prediction on visual features of an acquired sample image, to obtain a predicted visual feature; performing mask prediction on semantic features of acquired sample text, to obtain a predicted semantic feature, where the sample image includes text; determining a first loss value of the text of the sample image according to the predicted visual feature; determining a second loss value of the sample text according to the predicted semantic feature; training, according to the first loss value and the second loss value, to obtain the text recognition model.

Extraction of genealogy data from obituaries

Systems, methods, and other techniques for extracting data from obituaries are provided. In some embodiments, an obituary containing a plurality of words is received. Using a machine learning model, an entity tag from a set of entity tags may be assigned to each of one or more words of the plurality of words. Each particular tag from the set of entity tags may include a relationship component and a category component. The relationship component may indicate a relationship between a particular word and the deceased individual. The category component may indicate a categorization of the particular word to a particular category from a set of categories. The extracted data may be stored in a genealogical database.

APPLICATION-SPECIFIC OPTICAL CHARACTER RECOGNITION CUSTOMIZATION

A method for customizing an optical character recognition system is disclosed. The optical character recognition system includes a general-purpose decoder configured to convert character images, recognized in a digital image, into text based on a general-purpose text structure. An application-specific customization is received. The application-specific customization includes an application-specific text structure that differs from the general-purpose text structure. A customized model is generated based on the application-specific customization. An enhanced application-specific decoder is generated by modifying the general-purpose decoder to, during run-time execution of the optical character recognition system, leverage the customized model to convert character images demonstrating the application-specific text structure into text.

Method and apparatus for generating training data for VQA system, and medium

Embodiments of the present disclosure are directed to a method and an apparatus for generating training data for a visual question answering (VQA) system, and a computer readable medium. The method for generating training data for a visual question answering system includes: obtaining a first set of training data of the visual question answering system, the first set of training data comprising a first question for an image in the visual question answering system and a first answer corresponding to the first question; obtaining information related to the image; generating a second question corresponding to the first answer based on the information to obtain a second set of training data for the image in the visual question answering system, the second set of training data comprising the second question and the first answer.

PERSONALLY IDENTIFIABLE INFORMATION REMOVAL BASED ON PRIVATE AREA LOGIC
20220382903 · 2022-12-01 ·

Removal of PII is provided. Sensor data is captured using sensors of a vehicle. Object detection is performed on the sensor data to create a sematic labeling of objects in the sensor data. A model is utilized to classify regions of the sensor data with a public or private labeling according to the sematic labeling and a PII filter corresponding to a jurisdiction of a current location of the vehicle. The sensor data is utilized in accordance with the public or private labeling.