G06V30/00

Instruction generation using one or more neural networks

Apparatuses, systems, and techniques are presented for generating instructional text. In at least one embodiment, an instructional video is analyzed to determine logical steps of a process or task demonstrated in that video, and instructive text is generated for those logical steps.

Automatic product description generation
12386920 · 2025-08-12 · ·

Systems, devices, and techniques are disclosed for automatic product description generation. A first set of features including labels including words may be generated from an image using a first feature extraction model. A second set of features including labels including words may be generated from the image using a second feature extraction model. A text description of a product depicted in the image may be generated by inputting the image and metadata for the image to a description generating model. The text description may include words. Each of the words may be generated by assigning probabilities to candidate words, boosting the assigned probabilities of candidate words that are similar to words of labels of the first set of features or words of labels of the second set of features, and selecting one of the candidate words based on the assigned probabilities after the boosting as a word of the text description.

Information processing apparatus, information processing method, computer program product, and recording medium

An information processing apparatus includes a memory and one or more hardware processors. The memory stores order information in which an order of pieces of meta-information for a character to be recognized is defined. The one or more hardware processors are connected to the memory and function as a recognition unit and an update unit. The recognition unit serves to perform character recognition on an image including a character string by using first meta-information specified from the pieces of the meta-information. The update unit serves to update the first meta-information to second meta-information in accordance with the order information in a case when a confidence score of the character recognition satisfies a predetermined condition. The character recognition is performed by using the second meta- information.

Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium

The present disclosure provides a character recognition model training method and apparatus, a character recognition method and apparatus, a device and a medium, relating to the technical field of artificial intelligence, and specifically to the technical fields of deep learning, image processing and computer vision, which can be applied to scenarios such as character detection and recognition technology. The specific implementing solution is: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; where the first training set includes a first sub-sample image with a visible attribute, and the second training set includes a second sub-sample image with an invisible attribute; performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain a target encoder.

Character recognition model training method and apparatus, character recognition method and apparatus, device and storage medium

The present disclosure provides a character recognition model training method and apparatus, a character recognition method and apparatus, a device and a medium, relating to the technical field of artificial intelligence, and specifically to the technical fields of deep learning, image processing and computer vision, which can be applied to scenarios such as character detection and recognition technology. The specific implementing solution is: partitioning an untagged training sample into at least two sub-sample images; dividing the at least two sub-sample images into a first training set and a second training set; where the first training set includes a first sub-sample image with a visible attribute, and the second training set includes a second sub-sample image with an invisible attribute; performing self-supervised training on a to-be-trained encoder by taking the second training set as a tag of the first training set, to obtain a target encoder.

Object recognition processing method, processing apparatus, electronic device, and storage medium

An object recognition processing method, a processing apparatus, an electronic device, and a non-transitory computer-readable storage medium. The method includes: obtaining an object to be recognized; recognizing a type of the object on the basis of a type recognition model; determining a processing rule corresponding to the object; in response to the fact that the type of the object is a basic type, taking the object as a target object to be recognized, and in response to the fact that the type of the object is a non-basic type, transforming the object by means of a transformer learning model, so as to transform the object into the target object to be recognized; and performing, by means of the transformer learning model, recognition processing on the target object to be recognized to obtain a target result corresponding to the target object to be recognized.

Image processing device with intelligent typesetting function and method thereof
12423515 · 2025-09-23 · ·

An image processing device with intelligent typesetting function and method thereof is provided. A processor is connected to an image capture module, a memory module and an output module respectively. A first image is acquired by the image capture module and a set of scanning position information in the memory module. A set of typesetting parameters is applied to the corresponding scanning position information. The processor continues to obtain a second side image through the image capture module and applies the set of typesetting parameters. The processor generates a typesetting image based on the set of typesetting parameters, the first side image and the second side image, and causes the output module to output the typesetting image. Through intuitive scanning procedures and intelligent and quick application of preset image typesetting methods, tedious operations are reduced, thereby improving efficiency and convenience.

Search device, search system, search method, and storage medium
12437568 · 2025-10-07 · ·

A search device generates a character string image of a first character string by using the first character string. The search device inputs the character string image to a classifier. The classifier outputs a classification of a character string according to an input of an image. The search device outputs another character string based on a classification result of the classifier. The other character string is different from the first character string.

Systems and methods of monitoring location labels of product storage structures of a product storage facility

Systems and methods of monitoring location labels on product storage structures of a product storage facility include an image capture device that captures images of the product storage structures and a computing device programmed to analyze the images of the product storage structures captured by the image capture device to detect location labels located on the product storage structures. Based on detection that one or more location labels located on the product storage structures are associated with an error condition, the computing device generates a location label alert indicating at least one location label that requires a location label check by a worker at the product storage facility. A mobile application executable on a device of the worker at the product storage facility displays a user interface that lists location labels alerts and permits the worker to print replacement labels for product structures associated with the alerts.

Text-based information extraction from images

A method for extracting text information from images includes obtaining an extraction request associated with live data comprising an image; generating, using a prediction model, rotational variant features and rotational invariant features associated with the live data; generating, using the prediction model, text embeddings associated with the rotational variant features using overlapping kernel-based embedding on the live data; generating, using the prediction model, attention values for each pixel in the live data using context attention; applying a trained language model to the text embeddings, attention values, and the live data to generate predictions; and performing extraction actions based on the predictions.