G06V30/19173

Re-training a model for abnormality detection in medical scans based on a re-contrasted training set

A method includes generating first contrast significance data for a first computer vision model generated from a first training set of medical scans. First significant contrast parameters are identified based on the first contrast significance data. A first re-contrasted training set is generated based on performing a first intensity transformation function on the first training set of medical scans, where the first intensity transformation function utilizes the first significant contrast parameters. A first re-trained model is generated from the first re-contrasted training set, which is associated with corresponding output labels based on abnormality data for the first training set of medical scans. Re-contrasted image data of a new medical scan is generated based on performing the first intensity transformation function. Inference data indicating at least one abnormality detected in the new medical scan is generated based on utilizing the first re-trained model on the re-contrasted image data.

Optical character recognition method and apparatus, electronic device and storage medium

The present application discloses a method and an apparatus for optical character recognition, an electronic device and a storage medium, and relates to the fields of artificial intelligence and deep learning. The method may include: determining, for a to-be-recognized image, a text bounding box of a text area therein, and extracting a text area image from the to-be-recognized image according to the text bounding box; determining a bounding box of text lines in the text area image, and extracting a text-line image from the text area image according to the bounding box; and performing text sequence recognition on the text-line image, and obtaining a recognition result. The application of the solution in the present application can improve a recognition speed and the like.

Apparatus for generating annotated image information using multimodal input data, apparatus for training an artificial intelligence model using annotated image information, and methods thereof
11694021 · 2023-07-04 · ·

A method for providing a user interface (UI) for generating training data for an artificial intelligence (AI) model may include providing, for display via the UI, image information that depicts an object, a set of operations of the object, and a process associated with the set of operations. The method may include providing, for display via the UI, text information that describes the object, the set of operations of the object, and the process associated with the set of operations. The method may include receiving, via the UI, a user input that associates respective image information of the image information with corresponding text information of the text information. The method may include generating association information that associates the respective image information with the corresponding text information, based on the user input. The method may include generating discourse and semantic information from the text information associated to the image information.

Vision based target tracking that distinguishes facial feature targets
11544964 · 2023-01-03 · ·

A facial recognition method using online sparse learning includes initializing target position and scale, extracting positive and negative samples, and extracting high-dimensional Haar-like features. A sparse coding function can be used to determine sparse Haar-like features and form a sparse feature matrix, and the sparse feature matrix in turn is used to classify targets.

Method and apparatus to train image recognition model, and image recognition method and apparatus

An apparatus and method to train an image recognition model to accurately estimate a location of a reference point for each class of landmark is disclosed. The apparatus and method use the image recognition model, which is trained based on calculating a class loss and a class-dependent localization loss from training data based on an image recognition model and training the image recognition model using a total loss comprising the class loss and the localization loss.

Intent detection with a computing device

A method can perform a process with a method including capturing an image, determining an environment that a user is operating a computing device, detecting a hand gesture based on an object in the image, determining, using a machine learned model, an intent of a user based on the hand gesture and the environment, and executing a task based at least on the determined intent.

System and method for multi-modal image classification

Systems and methods for classifying images (e.g., ads) are described. An image is accessed. Optical character recognition is performed on at least a first portion of the image. Image recognition is performed via a convolutional neural network on at least a second portion of the image. At least one class for the image is automatically identified, via a fully connected neural network, based on one or more predictions, each of the one or more predictions being based on both the optical character recognition and the image recognition. Finally, the at least one class identified for the image is output.

Prefetching and/or computing resource allocation based on predicting classification labels with temporal data

Methods, systems and computer program products are provided for prefetching information and/or (pre)allocating computing resources based on predicting classification labels with temporal data. A trained temporal classification model forecasts events (e.g., too numerous for individual modeling) by predicting classification labels indicating whether events will occur, or a number of occurrences of the events, during each of a plurality of future time intervals. Time-series datasets, indicating whether events occurred, or a number of occurrences of the events, during each of a plurality of past time intervals, are transformed into temporal classification datasets. Classifications may be based, at least in part, on extracted features, such as data seasonality, temporal representation, statistical and/or real-time features. Classification labels are used to determine whether to take one or more actions, such as, for example, prefetching information or (pre)allocating a computing resource.

Entity extraction with encoder decoder machine learning model
11544943 · 2023-01-03 · ·

A method includes executing an encoder machine learning model on multiple token values contained in a document to create an encoder hidden state vector. A decoder machine learning model executing on the encoder hidden state vector generates raw text comprising an entity value and an entity label for each of multiple entities. The method further includes generating a structural representation of the entities directly from the raw text and outputting the structural representation of the entities of the document.

INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM
20220415018 · 2022-12-29 · ·

An information processing system (10) includes: an acquisition unit (50) configured to sequentially acquire a plurality of elements included in sequential data; a first calculation unit (110) configured to calculate, for each of the plurality of elements, a first indicator indicating which one of a plurality of classes the element belongs to; a weight calculation unit (130) configured to calculate, for each of the plurality of elements, a weight according to a confidence related to calculation of the first indicator; a second calculation unit (120) configured to calculate, based on the first indicators each weighted with the weight, a second indicator indicating which one of the plurality of classes the sequential data belongs to; and a classification unit (60) configured to classify the sequential data as any one of the plurality of classes, based on the second indicator. According to such an information processing system, sequential data can be appropriately classified.