G06V30/40

Information extraction from open-ended schema-less tables

Systems and methods for generating and annotating cell documents include extracting tables from a document using a table extraction engine. Headers are extracted for each of the tables using a header detection engine. Cells are extracted from each of the tables using a cell extraction engine. A cell document is generated for each of the cells which are each correlated to corresponding portions of the headers, each cell document recording the correlation between the cells and the headers. Each cell document is annotated to generate annotated cell documents with a cell recognition model trained to perform natural language processing on the cell documents by classifying each term in each of the cell documents and extracting relationships between the terms of each of the cell documents.

Generating training sets to train machine learning models

A computer system trains a machine learning model. A vector representation is generated for each document in a collection of documents. The documents are clustered based on the vector representations of the documents to produce a plurality of clusters. A training set is produced by selecting one or more documents from each cluster, wherein the selected documents represent a sample of the collection of documents to train the machine learning model. The machine learning model is trained by applying the training set to the machine learning model. Embodiments of the present invention further include a method and program product for training a machine learning model in substantially the same manner described above.

Learning user interface controls via incremental data synthesis

A User Interface (UI) interface object detection system employs an initial dataset comprising a set of images, that may include synthesized images, to train a Machine Learning (ML) engine to generate an initial trained model. A data point generator is employed to generate an updated synthesized image set which is used to further train the ML engine. The data point generator may employ images generated by an application program as a reference by which to generate the updated synthesized image set. The images generated by the application program may be tagged in advance. Alternatively, or in addition, the images generated by the application program may be captured dynamically by a user using the application program.

Systems and methods for processing images
11514702 · 2022-11-29 · ·

Systems and methods for identifying landmarks of a document from a digital representation of the document. The method comprises accessing the digital representation of the document and operating a Machine Learning Algorithm (MLA), the MLA having been trained based on a set of training digital representations of documents associated with labels. The operating the MLA comprises down-sampling the digital representation of the document, detecting landmarks, generating fractional pixel coordinates for the detected landmarks. The method further determines the pixel coordinates of the landmarks by upscaling the fractional pixel coordinates from the second resolution to the first resolution and outputs the pixel coordinates of the landmarks.

MULTI USER COLLECTIVE PREFERENCES PROFILE

Described herein is a method by which more than one person is enabled to actively participate in the process of finalizing a real estate property either for purchase or rent. Each deciding party is enabled to create a custom style profile capturing their individual preferences at an attribute level by providing both visual and verbal feedback. A Collective Preferences Profile (CPP) is created by integrating the multiple style profiles of the chosen deciding parties. The generated CPP is then utilized to curate the different houses available and to surface those houses that are most likely to fit the aesthetic and requirements of the combined audience. The CPP evolves on an ongoing basis by active solicitation of feedback on properties viewed or waitlisted to accommodate changing preferences and provide the most suited recommendation at any time.

Method and apparatus for generating context information

A memory stores therein a document and a plurality of word vectors that are word embeddings respectively computed for a plurality of words. A processor extracts, with respect to one of the words, two or more surrounding words within a prescribed range from one occurrence position where the one word occurs, from the document, and computes a sum vector by adding word vectors corresponding to the surrounding words. The processor determines a parameter such as to predict the surrounding words from the sum vector and the parameter using a machine learning model. The processor stores the parameter as context information for the one occurrence position, in association with the word vector corresponding to the one word.

Document lineage management system

In some implementations, a system may obtain document lineage training data associated with a plurality of historical documents and corresponding lineage data of independent historical documents of the plurality of historical documents. The system may train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document. The system may receive a plurality of document files that correspond to a plurality of versions of a document. The system may determine, using a similarity analysis model, that a first section from a first version of the plurality of versions corresponds to a second section from a second version of the plurality of versions. The system may determine, using the lineage analysis model, a lineage of a corresponding section of the document that is associated with the first section and the second section.

Document lineage management system

In some implementations, a system may obtain document lineage training data associated with a plurality of historical documents and corresponding lineage data of independent historical documents of the plurality of historical documents. The system may train, based on the document lineage training data, a lineage analysis model to determine a lineage of edited sections of a source document. The system may receive a plurality of document files that correspond to a plurality of versions of a document. The system may determine, using a similarity analysis model, that a first section from a first version of the plurality of versions corresponds to a second section from a second version of the plurality of versions. The system may determine, using the lineage analysis model, a lineage of a corresponding section of the document that is associated with the first section and the second section.

Dynamic generation of client-specific feature maps
11507786 · 2022-11-22 · ·

The present disclosure relates to methods and systems to generate a modified feature map specific to a client. A template feature map may be modified based on usage data associated with a client. The template feature map may represent a visual representation of a plurality of features provided by an operator, each feature associated with a plurality of instructions to be processed for the client. The usage data may be compared with each feature to determine whether any feature is associated and/or utilized by the client. Based on determining whether the usage data indicates that any feature is associated and/or utilized by the client, the template feature map may be modified to perform an action to the template feature map indicating that a feature is associated and/or utilized by the client. A modified template feature map may be generated that is specific for a client.

Methods and systems for automatically identifying IR security marks in a document based on halftone frequency information

The present disclosure discloses methods and systems for automatically detecting Infrared (IR) security mark based on unknown halftone frequency information. The method includes receiving a document from a user including an IR security mark. The document is scanned. Then, one or more halftone frequencies associated with the IR security mark portion are estimated. Based on the estimation, the IR security mark portion is classified into a background region and the IR marked region including the IR security mark. The IR security mark is extracted and pixels falling in the IR marked region are reconstructed to identify content in the IR security mark. Finally, the identified content is compared with one or more pre-stored IR security marks to ascertain the presence of the IR security mark in the document for further assessment. This way, the method automatically detects the IR security mark in the document.