Patent classifications
G06V30/19147
Systems and Methods for Generating Document Numerical Representations
Described embodiments relate to a method comprising: determining a candidate document comprising image data and character data and extracting the image data and the character data from the candidate document. The method comprises providing, to an image-based numerical representation generation model, the image data, and generating, by the image-based numerical representation generation model, an image-based numerical representation of the image data. The method comprises providing, to a character-based numerical representation generation model, the character data; and generating, by the character-based numerical representation generation model, a character-based numerical representation of the character data. The method comprises providing, to a consolidated image-character based numerical representation generation model, the image-based numerical representation and the character-based numerical representation; and generating, by the consolidated image-character based numerical representation generation model, a combined image-character based numerical representation of the candidate document.
TRAINING METHOD FOR HANDWRITTEN TEXT IMAGE GENERATION MODE, ELECTRONIC DEVICE AND STORAGE MEDIUM
A training method for a handwritten text image generation model includes: obtaining training data including a sample content image, a first sample handwritten text image and a second sample handwritten text image, constructing an initial training model; obtaining a first predicted handwritten text image by inputting the sample content image and the second sample handwritten text image into an initial handwritten text image generation model of the initial training model; obtaining a second predicted handwritten text image by inputting the sample content image and the first sample handwritten text image into an initial handwritten text image reconstruction model of the initial training model; training the initial training model according to the first and second predicted handwritten text images and the first sample handwritten text image; and determining a handwritten text image generation model of the training model after training as a target handwritten text image generation model.
System and method of synthesizing potential malware for predicting a cyberattack
A system and method for malware classification using machine learning models trained using synthesized feature sets based on features extracted from samples of known malicious objects and known safe objects. The synthesized feature sets act as virtual samples for training a machine learning classifier to recognize new objects in the wild that are likely to be malicious.
KNOWLEDGE-GROUNDED COMPLETE CRITERIA GENERATION
Disclosed herein is a model flow that generates eligibility criteria for a clinical trial based on eligibility criteria associated with a protocol title of the trial. Unlike standard black-box generation models, the techniques disclosed herein leverage existing knowledge to enhance the title. The enhanced title also acts as an intermediate between the title and the generated criteria clauses, enabling explicit control of the generated content as well as an explanation of why the generated content is relevant. The resulting workflow is knowledge-grounded, controllable, transparent, and interpretable.
UTILIZING VISUAL AND TEXTUAL ASPECTS OF IMAGES WITH RECOMMENDATION SYSTEMS
Described herein are systems and methods for generating an embedding—a learned representation—for an image. The embedding for the image is derived to capture visual aspects, as well as textual aspects, of the image. An encoder-decoder is trained to generate the visual representation of the image. An optical character recognition (OCR) algorithm is used to identify text/words in the image. From these words, an embedding is derived by performing an average pooling operation on pre-trained embeddings that map to the identified words. Finally, the embedding representing the visual aspects of the image is combined with the embedding representing the textual aspects of the image to generate a final embedding for the image.
VISION PROCESSING AND MODEL TRAINING METHOD, DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT
The present disclosure provides a vision processing and model training method, device, storage medium and program product. A specific implementation solution is as follows: establishing an image classification network with the same backbone network as the vision model, performing a self-monitoring training on the image classification network by using an unlabeled first data set; initializing a weight of a backbone network of the vision model according to a weight of a backbone network of the trained image classification network to obtain a pre-training model, the structure of the pre-training model being consistent with that of the vision model, and optimize the weight of the backbone network by using real data set in a current computer vision task scenario, so as to be more suitable for the current computer vision task; then, training the pre-training model by using a labeled second data set to obtain a trained vision model.
AUTOMATIC GENERATION OF TRAINING DATA FOR HAND-PRINTED TEXT RECOGNITION
A method for generating training data for hand-printed text recognition includes obtaining a structured document, obtaining a set of hand-printed character images and database metadata from a database, generating a modified document page image, and outputting a training file. The structured document includes a document page image that includes text characters and document metadata that associates each of the text characters to a document character label. The database metadata associates each of the set of hand-printed character images to a database character label. The modified document page image is generated by iteratively processing each of the text characters. The iterative processing includes determining whether an individual text character should be replaced, selecting a replacement hand-printed character image from the set of hand-printed character images, scaling the replacement hand-printed character image, and inserting the replacement hand-printed character image into the modified document page image.
Methods and systems for the automated quality assurance of annotated images
A framework in which annotated images can be analyzed in small batches to learn and distinguish between higher-quality annotations and lower-quality annotations, especially in the case of manual annotations for which quality assurance is desired. This framework is extremely generalizable and can be used for indoor images, outdoor images, medical images, etc., without limitation. An echo state network (ESN) is provided as a special case of semantic segmentation model that can be trained using as few as tens of annotated images to predict semantic regions and provide metrics that can be used to distinguish between higher-quality annotations and lower-quality annotations.
Method and apparatus for data efficient semantic segmentation
A method and system for training a neural network are provided. The method includes receiving an input image, selecting at least one data augmentation method from a pool of data augmentation methods, generating an augmented image by applying the selected at least one data augmentation method to the input image, and generating a mixed image from the input image and the augmented image.
AUTOMATICALLY IDENTIFYING AND DISPLAYING OBJECTS OF INTEREST IN A GRAPHIC NOVEL
Locations and presentation orders of objects of interest (e.g., speech bubbles) in digital graphic novel content are identified such that expanded versions of the objects of interest can be presented to a reader. Specifically, digital graphic novel content is received and locations of interest regions (e.g., rectangular text regions of speech bubbles) in the content are identified by applying a machine-learned model to the content. Locations and presentation orders of objects of interest in the digital graphic novel content are identified based on the identified locations of the interest regions. The digital graphic novel content and presentation metadata including the locations and presentation orders of the objects of interest are provided to a reading device such that expanded versions of the objects of interest are presented to the user in accordance with the presentation metadata.