G06V30/262

GEOGRAPHIC MANAGEMENT OF DOCUMENT CONTENT
20230215207 · 2023-07-06 ·

Methods and systems are provided to manage documents and extract information from documents by defining segments in each document, each of which is assigned a location in a coordinate system defined over a collection of documents. Metadata is attached to each segment to describe the contents, position, and semantic meaning of material within the segment. A segmenting-specific query language can be used to query the segments and respond to requests for information contained in the documents.

Floorplan generation based on room scanning

Various implementations disclosed herein include devices, systems, and methods that generate floorplans and measurements using a three-dimensional (3D) representation of a physical environment generated based on sensor data.

Optical character recognition method and apparatus, electronic device and storage medium

The present application discloses a method and an apparatus for optical character recognition, an electronic device and a storage medium, and relates to the fields of artificial intelligence and deep learning. The method may include: determining, for a to-be-recognized image, a text bounding box of a text area therein, and extracting a text area image from the to-be-recognized image according to the text bounding box; determining a bounding box of text lines in the text area image, and extracting a text-line image from the text area image according to the bounding box; and performing text sequence recognition on the text-line image, and obtaining a recognition result. The application of the solution in the present application can improve a recognition speed and the like.

Optical character recognition method and apparatus, electronic device and storage medium

The present application discloses a method and an apparatus for optical character recognition, an electronic device and a storage medium, and relates to the fields of artificial intelligence and deep learning. The method may include: determining, for a to-be-recognized image, a text bounding box of a text area therein, and extracting a text area image from the to-be-recognized image according to the text bounding box; determining a bounding box of text lines in the text area image, and extracting a text-line image from the text area image according to the bounding box; and performing text sequence recognition on the text-line image, and obtaining a recognition result. The application of the solution in the present application can improve a recognition speed and the like.

Apparatus for generating annotated image information using multimodal input data, apparatus for training an artificial intelligence model using annotated image information, and methods thereof
11694021 · 2023-07-04 · ·

A method for providing a user interface (UI) for generating training data for an artificial intelligence (AI) model may include providing, for display via the UI, image information that depicts an object, a set of operations of the object, and a process associated with the set of operations. The method may include providing, for display via the UI, text information that describes the object, the set of operations of the object, and the process associated with the set of operations. The method may include receiving, via the UI, a user input that associates respective image information of the image information with corresponding text information of the text information. The method may include generating association information that associates the respective image information with the corresponding text information, based on the user input. The method may include generating discourse and semantic information from the text information associated to the image information.

Automatic identification of misleading videos using a computer network

Machine-based video classifying to identify misleading videos by training a model using a video corpus, obtaining a subject video from a content server, generating respective feature vectors of a title, a thumbnail, a description, and a content of the subject video, determining a first semantic similarities between ones of the feature vectors, determining a second semantic similarity between the title of subject video and titles of videos in the misleading video corpus in a same domain as the subject video, determining a third semantic similarity between comments of the subject video and comments of videos in the misleading video corpus in the same domain as the subject video, classifying the subject video using the model and based on the first semantic similarities, the second semantic similarity, and the third semantic similarity, and outputting the classification of the subject video to a user.

Intent detection with a computing device

A method can perform a process with a method including capturing an image, determining an environment that a user is operating a computing device, detecting a hand gesture based on an object in the image, determining, using a machine learned model, an intent of a user based on the hand gesture and the environment, and executing a task based at least on the determined intent.

Analytics system onboarding of web content

Analytics system onboarding of web content is described. In one example, an analytics onboarding system is configured to process web content to generate recommendations, automatically and without user intervention. The recommendations are configured to assist in mapping of web content variables in web content to data elements supported by an analytics system to generate metrics that describe occurrence of events as part of user interaction with web content.

TRAINING METHOD OF TEXT RECOGNITION MODEL, TEXT RECOGNITION METHOD, AND APPARATUS

The present disclosure provides a training method of a text recognition model, a text recognition method, and an apparatus, relating to the technical field of artificial intelligence, and specifically, to the technical field of deep learning and computer vision, which can be applied in scenarios such as optional character recognition, etc. The specific implementation solution is: performing mask prediction on visual features of an acquired sample image, to obtain a predicted visual feature; performing mask prediction on semantic features of acquired sample text, to obtain a predicted semantic feature, where the sample image includes text; determining a first loss value of the text of the sample image according to the predicted visual feature; determining a second loss value of the sample text according to the predicted semantic feature; training, according to the first loss value and the second loss value, to obtain the text recognition model.

Message composition and customization in a user handwriting style

Message composition and customization in a user's handwriting style includes obtaining electronic source text from a user, the electronic source text to be sent to a recipient, ascertaining properties of the electronic source text, the properties including words used in the electronic source text and a context of the electronic source text, and the context including an emotion of the electronic source text, and building an electronic message based on the ascertained properties, the electronic message including the electronic source text presented graphically in a handwriting style of the user.