G06V30/10

Method and apparatus for automatically extracting information from unstructured data

Various methods, apparatuses/systems, and media for automatically extracting information from unstructured data are provided. A receiver receives digitized data of a document having unstructured data format. A processor applies machine learning models for sectioning the digitized data. An OCR device applies an OCR processing to the sectioned digitized data. The processor matches the sectioned digitized data to patterns and rules; applies classification models to the matched digitized data to identify entities and events from the sectioned digitized data; automatically link each entity with corresponding event in a hierarchical format to generate a document having structured data format; and output the document having the structured data with metadata having the linked entity with corresponding event in the hierarchical format to downstream applications.

SYSTEM AND METHOD FOR FORMAT-AGNOSTIC DOCUMENT INGESTION
20230237101 · 2023-07-27 ·

A system for format-agnostic document ingestion including a document ingestion server and a database is disclosed. The server is configured to receive an image of a document comprising text in an unknown format, convert the image, using OCR, into a plurality of text elements a content, a size, and an absolute position. The server is also configured to retrieve data detectors from the database, each associated with a data type anticipated to be in the document, and comprising at least one identifier and direction, and at least one validation criteria. The server is also configured to identify a potential descriptor by comparing the content of each text element with the at least one identifier, and then determine if the text element pointed to by the data detector meets the validation criteria. Finally, the server is configured to associate the validated text element with the data detector, and store the content.

SYSTEM AND METHOD FOR DATA PROCESSING AND COMPUTATION
20230237340 · 2023-07-27 ·

A data processing device and a computer-implemented method are configured to execute in parallel a data hub process (6) comprising at least a segmentation sub-process (61) which segments input data into data segments and at least one keying sub-process (62) which provides keys to the data segments creating keyed data segments, wherein the data hub process (6) stores the keyed data segments in a shared memory device (4) as shared keyed data segments and a plurality of processes in the form of computation modules (7) wherein each computation module (7) is configured to access the at least one shared memory device (4) to look for modulo-specific data segments which are shared keyed data segments that are keyed with at least one key which is specific for at least one of the computation modules (7) and to execute a machine learning method on the module-specific data segments, said machine learning method comprising data interpretation and classification methods using at least one pre-trained neuronal network (71) and to output the result of the executed machine learning method to the shared memory device (4) or another computation module.

MACHINE LEARNING BASED END-TO-END EXTRACTION OF TABLES FROM ELECTRONIC DOCUMENTS
20230237828 · 2023-07-27 · ·

In some embodiments, a method includes identifying a set of word bounding boxes in a first electronic document, and identifying locations of horizontal white space between two adjacent rows from a set of rows in a table. The method includes determining, using a Natural Language Processing algorithm, an entity name from a set of entity names for each table cell from a set of table cells in the table. The method includes determining, using a machine learning algorithm a class from a set of classes for each row from the set of rows. The method includes extracting a set of table cell values associated with the set of table cells, and generating a second electronic document including the set of table cell values arranged in the set of rows and the set of columns such that the set of words in the table are computer-readable in the second electronic document.

MULTI-PAGE DOCUMENT RECOGNITION IN DOCUMENT CAPTURE
20230005285 · 2023-01-05 ·

Techniques to capture document data are disclosed. It is determined that a sequence of pages in a stream of document page images comprise a single multi-page document. Data is extracted from two or more different pages included in the sequence. The data extracted from two or more different pages included in the sequence of pages is used to populate a data entry form associated with the multi-page document.

CONTEXTUALLY DISAMBIGUATING QUERIES

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextually disambiguating queries are disclosed. In an aspect, a method includes receiving an image being presented on a display of a computing device and a transcription of an utterance spoken by a user of the computing device, identifying a particular sub-image that is included in the image, and based on performing image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image. The method also includes, based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image, based on the transcription, the first labels, and the second labels, generating a search query, and providing, for output, the search query.

CONTEXTUALLY DISAMBIGUATING QUERIES

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for contextually disambiguating queries are disclosed. In an aspect, a method includes receiving an image being presented on a display of a computing device and a transcription of an utterance spoken by a user of the computing device, identifying a particular sub-image that is included in the image, and based on performing image recognition on the particular sub-image, determining one or more first labels that indicate a context of the particular sub-image. The method also includes, based on performing text recognition on a portion of the image other than the particular sub-image, determining one or more second labels that indicate the context of the particular sub-image, based on the transcription, the first labels, and the second labels, generating a search query, and providing, for output, the search query.

System and Method for Internal Etching Surfaces of Transparent Materials with Information Pertaining to a Blockchain
20230239147 · 2023-07-27 · ·

In one embodiment, a system includes a tangible token comprising a transparent gemstone, wherein: the transparent gemstone is internally etched with information pertaining to a blockchain, and the information comprises at least a private key, a public key, and an address, and the information is represented as a quick response code. The system includes a computing device configured to execute instructions that cause the computing device to: read the information, and validate, via a network and the address, the public key and the private key are associated with at least one block on the blockchain.

Fool-Proofing Product Identification
20230237091 · 2023-07-27 · ·

A method includes receiving, from an image capture device in communication with the data processing hardware, image data for an area of interest of a user. The method also includes receiving a query from the user referring to one or more objects detected within the image data and requesting a digital assistant to discern insights associated with the one or more objects referred to by the query. The method also includes processing the query and the image data to: identify, based on context data extracted from the image data, the one or more objects referred to by the query; and determine the insights associated with the identified one or more objects for the digital assistant to discern. The method also includes generating, for output from a user device associated with the user, content indicating the discerned insights associated with the identified one or more objects.

APPARATUSES AND METHODS FOR PARSING AND COMPARING VIDEO RESUME DUPLICATIONS

Aspects relate to apparatuses and methods for parsing and comparing resume video duplications. An exemplary apparatus includes a memory communicatively connected to at least a processor and includes instructions configuring the at least a processor to acquire a plurality of video elements from an existing video resume, wherein the existing video resume includes at least an image component, recognize subject-specific data of the existing video resume as a function of the at least an image component, wherein the subject-specific data includes verbal content and non-verbal content, recognize at least a keyword of the existing video resume as a function of the subject-specific data, recognize at least a feature of the existing video resume as a function of the subject-specific data, compare the subject-specific data to a target video resume and determine, as a function of the comparison result, a duplication coefficient for the target resume video.