G06V30/191

Information processing apparatus and information processing method

Provided is an information processing apparatus capable of improving detection accuracy of a position pointed by a target object. An acquisition unit acquires a distance image indicating a distance to each object present within a predetermined range. Subsequently, a vector calculation unit calculates a vector extending from the target object present within the predetermined range in a direction pointed by the target object on the basis of the acquired distance image. Subsequently, an intersection calculation unit calculates a position of an intersection of a predetermined surface present within the predetermined range and the calculated vector on the basis of the acquired distance image. Subsequently, a processing execution unit executes processing corresponding to the calculated position of the intersection.

DOCUMENT ENTITY EXTRACTION

An end-to-end solution to create document templates and perform document entity extraction from a query document based on a subset (e.g., one/few) representative document templates. Certain embodiments employ a RANSAC algorithm in a new way for document entity extraction, e.g., using a combination of text-embedding and RANSAC to find the nearest neighbor from a document gallery. These embodiments use OCR features in the RANSAC application as opposed to the use of vision descriptors in document classification (e.g., treating OCR as the noise for the classification). In addition, the innovations of OCR usage include filtering out the unique OCR words between the document templates and query documents during RANSAC to increase the accuracy and efficiency, and filtering out command keywords in the extracted OCR text between a document template and a filled query document.

Methods, systems, and media for generating video classifications using multimodal video analysis

Methods, systems, and media for generating video classifications using multimodal video analysis are provided. In some embodiments, a method for classifying videos comprising: receiving, from a computing device, a video identifier; parsing a video associated with the video identifier into an audio portion and a plurality of image frames; analyzing the plurality of images frames associated with the video using (i) an optical character recognition technique to obtain first textual information corresponding to text appearing in at least one of the plurality of image frames and (ii) an image classifier to obtain, for each of a plurality of objects appearing in at least one of the plurality of frames of the video, a probability that an object appearing in at least one of the plurality of images falls within an image class; concurrently with analyzing the plurality of image frames associated with the video, analyzing the audio portion of the video using an automated speech recognition technique to obtain second textual information corresponding to words spoken in the video; combining the first textual information, the probability of each of the plurality of objects appearing in the at least one of the plurality of frames of the video, and the second textual information to obtain a combined analysis output for the video; determining, using a neural network, a safety score for each of a plurality of categories that the video contains content belonging to a category of the plurality of categories, wherein the combined analysis output is input into the neural network; and, in response to receiving the video identifier, transmitting a plurality of safety scores corresponding to the plurality of categories to the computing device for the video associated with the video identifier.

METHOD AND SYSTEM FOR MANAGING APPLICATIONS USING ARTIFICIAL INTELLIGENCE (AI)

This disclosure relates to a method and a system for managing applications. The method includes identifying a text label from a testcase and a real-time image associated with an application. The application is one of the mobile application or a web application. The method further includes determining a positioning of each of a set of web elements within the real-time image. The method further includes mapping the text label to a web element from the set of web elements based on the determined positioning using a mapping algorithm. The text label is mapped to the web element based on a corresponding set of attributes. The method further includes generating a segmented image comprising the text label and the web element, upon mapping; and transmitting the segmented image to a testing unit for performing an action associated with the text label and the web element.

REFLOWING INFOGRAPHICS FOR CROSS-DEVICE DISPLAY

Embodiments are disclosed for reflowing an infographic image for display in a mobile device using machine learning models. In particular, in one or more embodiments, the method may include receiving a document for display in a user device, the document including an infographic image. The method may further include identifying, using a convolutional neural network, visual components of the infographic image. The method may further include determining, using an encoder-decoder network, an ordered sequence of the identified visual components. A generative adversarial network then generates a modified visual representation of the infographic image based on the identified visual components and the determined ordered sequence of the identified visual components. The modified visual of representation of the infographic image is then presented for display in a viewing pane of a user device in place of the infographic image.

Smart content load
12222899 · 2025-02-11 · ·

A system and a method are disclosed for automatic content upload and process. The system retrieves a set of files from a source location based on instructions received from a client device of a user. The system then classifies the set of files into a plurality of categories corresponding to a sequence of one or more services configured to process or store files. The system then generates a data structure storing key values, where the key values are derived based on respective processing of subsets of files. Responsive to receiving an input to execute logic relating to the set of files, the system determines that the input is associated with one or more of the key values, retrieves the one or more of the key values, and executing the logic using the one or more retrieved key values.

Position information providing system
12262151 · 2025-03-25 · ·

A position information providing system includes: an information device which includes a GPS sensor and a camera, the information device being configured to extract image data including subject identification information for identifying a photographing subject from image data acquired by the camera, and to generate subject location information that associates position information acquired by the GPS sensor and the subject identification information with each other; and a computer configured to generate information provision data based on the subject location information generated by the information device, wherein the information device has installed therein dedicated software for extracting, from a plurality of pieces of the acquired image data, image data including the subject identification information as required information worth information provision, and is configured to generate the subject location information for the image data extracted through use of the dedicated software, and to transmit the generated subject location information to the computer.

Machine-learning models for image processing

Presented herein are systems and methods for the employment of machine learning models for image processing. A method may include a capture of a video feed including image data of a document at a client device. The client device can provide the video feed to another computing device. The method can include, by the client device or the other computing device object recognition for recognizing a type of document and capturing an image exceeding a quality threshold of the document amongst the frames within the video feed. The method may further include the execution of other image processing operations on the image data to improve the quality of the image or features extracted therefrom. The method may further include anti-fraud detection or scoring operations to determine an amount of risk associated with the image data.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM
20250078549 · 2025-03-06 ·

Provided is an information processing apparatus including: a character recognition unit configured to perform a character recognition process on an image of a processing target document; a generation unit configured to generate an instruction message based on a result of the character recognition process, the instruction message being a message for causing a large language model to reply a first character string corresponding to a predetermined item which is included in the document; a transmission unit configured to transmit the instruction message in order to obtain a reply to the instruction message from the large language model; and a reception unit configured to receive the reply to the instruction message from the large language model.

SYSTEM AND METHOD FOR QUESTIONNAIRE DATA DIGITIZATION AND RECONCILIATION

Various methods and processes, apparatuses/systems, and media for questionnaire data digitization and reconciliation are disclosed. A processor generates an autonomous program for continuously monitoring shared mailbox for unread emails having questionnaire data containing a plurality of line items filled out by a client; converts, by utilizing an OCR tool, the questionnaire data containing the plurality of line items into a machine-readable format data; reads, by utilizing an automated reconciliation tool, the machine-readable format data for each line item; compares, by utilizing the automated reconciliation tool, data for each line item against a corresponding predefined guidance data; identifies, based on comparing, missing response data, negative response data, and insufficient response data corresponding to the questionnaire data filled out by the client by applying predefined rules; and automatically reconciles the missing response data, negative response data, and insufficient response data.