G06V30/19167

METHOD FOR TRAINING TEXT POSITIONING MODEL AND METHOD FOR TEXT POSITIONING
20220392242 · 2022-12-08 ·

A method for training a text positioning model includes: obtaining a sample image, where the sample image contains a sample text to be positioned and a text marking box for the sample text; inputting the sample image into a text positioning model to be trained to position the sample text, and outputting a prediction text box for the sample image; obtaining a sample prior anchor box corresponding to the sample image; and adjusting model parameters of the text positioning model based on the sample prior anchor box, the text marking box and the prediction text box, and continuing training the adjusted text positioning model based on a next sample image until model training is completed, to generate a target text positioning model.

Deep learning based on image encoding and decoding
11593632 · 2023-02-28 · ·

A deep learning based compression (DLBC) system trains multiple models that, when deployed, generates a compressed binary encoding of an input image that achieves a reconstruction quality and a target compression ratio. The applied models effectively identifies structures of an input image, quantizes the input image to a target bit precision, and compresses the binary code of the input image via adaptive arithmetic coding to a target codelength. During training, the DLBC system reconstructs the input image from the compressed binary encoding and determines the loss in quality from the encoding process. Thus, the models can be continually trained to, when applied to an input image, minimize the loss in reconstruction quality that arises due to the encoding process while also achieving the target compression ratio.

MACHINE LEARNING-BASED TEXT RECOGNITION SYSTEM WITH FINE-TUNING MODEL

A non-transitory processor-readable medium stores instructions to be executed by a processor. The instructions cause the processor to receive a first trained machine learning model that generates a transcription based on a document. The instructions cause the processor to execute the first trained machine learning model and a second trained machine learning model to generate a refined transcription based on the transcription. The instructions cause the processor to execute a quality assurance program to generate a transcription score based on the document and the transcription. The instructions cause the processor to execute the quality assurance program to generate a refined transcription score based on the refined transcription and at least one of the document or the transcription. The at least one refined transcription score indicates an automation performance better than an automation performance for the at least one transcription score.

Document processing using hybrid rule-based artificial intelligence (AI) mechanisms

A hybrid rule-based Artificial Intelligence (AI) document processing system processes a non-editable document with at least one invoice to accurately extract data from tables in the invoices. The non-editable document is preprocessed for conversion into a markup format and pages including the invoice are identified. The invoice is processed via a document process by parsing the pages in different directions to generate a first set of predictions and via a block process wherein logical information blocks from the invoice are processed to generate a second set of predictions. The missing entries from a selected table are identified by applying rules to the first set of predictions and the second set of predictions. Any discrepancy between the missing entry values between the first and second set of predictions are resolved and the resulting data is exported to downstream systems for further uses.

Automated classification and interpretation of life science documents

A computer-implemented tool for automated classification and interpretation of documents, such as life science documents supporting clinical trials, is configured to perform a combination of raw text, document construct, and image analyses to enhance classification accuracy by enabling a more comprehensive machine-based understanding of document content. The combination of analyses provides context for classification by leveraging relative spatial relationships among text and image elements, identifying characteristics and formatting of elements, and extracting additional metadata from the documents as compared to conventional automated classification tools, wherein natural language processing (NLP) is applied to associate text with tokens, and relevant differences and similarities between protocols are identified.

Data interpretation analysis

Quality associated with an interpretation of data captured as unstructured data can be determined. Attributes can be identified within the unstructured data automatically. Subsequently, sentiment associated with each of the attributes can be determined based on the unstructured data. Correctness of the unstructured data, and thus the interpretation, can be assessed based on a comparison of the attribute and associated sentiment with structured data. A quality score can be generated that captures the quality of the data interpretation in terms of correctness and as well as results of another analysis including completeness, among others. Comparison of the quality score to a threshold can dictate whether or not the interpretation is subject to further review.

Blocking deceptive online content

In one aspect, the present disclosure relates to a method for reducing fraud in computer networks, the method including receiving, from each of a plurality of user devices, a request to block an ad displayed within a web browser installed on the user device, the request comprising image data and a forwarding URL associated with the ad; storing crowdsourced ad blocking data based on the received requests to block ads; receiving a request for a list of blocked ads; generating a list of blocked ads based on analyzing the crowdsourced ad blocking data, wherein analyzing the crowdsourced ad blocking data comprises identifying ads blocked by at least a threshold number of users; and sending the list of blocked ads to a first user device, the first user device comprising a browser extension configured to prevent ads within the list of blocked ads from being rendered in a browser.

Manual curation tool for map data using aggregated overhead views

Examples disclosed herein may involve (i) obtaining a first layer of map data associated with sensor data capturing a geographical area, the first layer of map data comprising an aggregated overhead-view image of the geographical area, where the aggregated overhead-view image is generated from aggregated pixel values from a plurality of images associated with the geographical area, (ii) obtaining a second layer of map data, the second layer of map data comprising label data for the geographical area derived from the aggregated overhead-view image of the geographical area, and (iii) causing the first layer of map data and the second layer of map data to be presented to a user for curation of the label data.

AUTONOMOUSLY REMOVING SCAN MARKS FROM DIGITAL DOCUMENTS UTILIZING CONTENT-AWARE FILTERS
20230090313 · 2023-03-23 ·

The present disclosure relates to systems, non-transitory computer-readable media, and methods for implementing content-aware filters to autonomously remove scan marks from digital documents. In particular implementations, the disclosed systems utilize a set of targeted scan mark models in a scan mark removal pipeline. For example, each scan mark model includes a corresponding content-aware filter configured to identify document regions that match a designated class of scan marks to filter. Examples of scan mark models include staple scan mark models, punch hole scan mark models, and page turn scan mark models. In certain embodiments, the disclosed systems then use the scan mark models to generate mark-specific masks based on document input features. Additionally, in some embodiments, the disclosed systems combine the mark-specific masks into a final segmentation mask and apply the final segmentation mask to the digital document for correcting the identified regions with scan marks.

Machine learning-based text recognition system with fine-tuning model

A non-transitory processor-readable medium stores instructions to be executed by a processor. The instructions cause the processor to receive a first trained machine learning model that generates a transcription based on a document. The instructions cause the processor to execute the first trained machine learning model and a second trained machine learning model to generate a refined transcription based on the transcription. The instructions cause the processor to execute a quality assurance program to generate a transcription score based on the document and the transcription. The instructions cause the processor to execute the quality assurance program to generate a refined transcription score based on the refined transcription and at least one of the document or the transcription. The at least one refined transcription score indicates an automation performance better than an automation performance for the at least one transcription score.