G06F16/316

Determining Similarity Between Documents
20220309116 · 2022-09-29 ·

Method and system for processing digital works, the method comprising the steps of identifying terms within each digital work of a plurality of digital works, wherein the terms are words and/or phrases. Determining a number of times that the identified terms occur within each digital work of the plurality of digital works. Generating a fingerprint for each digital work of the plurality of digital works, the generated fingerprint based on the identified terms and the number of times that the identified terms occur within each digital work. Using a neural network to find an encoding function, g, that encodes a higher dimensionality space, x, of each fingerprint into a lower dimensionality space, y. Applying the encoding function to each fingerprint of the plurality of digital works to reduce their dimensionality. Determining a similarity between a first fingerprint and one or more dimensionality reduced fingerprints.

Reading and Information Enhancement System and Method

A written document (hereinafter referred to as a “work,” on electronic format which includes, stories, novels, education texts, biographies, compilations, collections, anthologies, tracts, and any other traditional format for relatively extensive texts) provides access to reference, bibliography and/or definition material through an electronic software capability associated with the work. Depending upon reader access information or characteristics (e.g., age, grade, proficiency, or position within the work or any other identifiable reader characteristic or access limitation), any request for reference material, definitions, explanations, translations, or other material provided in the associated software capability is automatically limited by system acknowledgement of the reader access information or characteristics. As the reader's access information or characteristics change, the quality and/or quantity and/or format of requested information with respect to a work changes.

TRAINING AND APPLYING STRUCTURED DATA EXTRACTION MODELS

A computer system for extracting structured data from unstructured or semi-structured text in an electronic document, the system comprising: a graphical user interface configured to present to a user a graphical view of a document for use in training multiple data extraction models for the document, each data extraction model associated with a user defined question; a user input component configured to enable the user to highlight portions of the document; the system configured to present in association with each highlighted portion an interactive user entry object which presents a menu of question types to a user in a manner to enable the user to select one of the question types, and a field for receiving from the user a question identifier in the form of human readable text, wherein the question identifier and question type selected by the user are used for selecting a data extraction model, and wherein the highlighted portion of the document associated with the question identifier is used to train the selected data extraction model.

Asynchronous image repository functionality

Embodiments of methods for asynchronous image repository functionality are presented. In an embodiment, a method includes storing user data in a data storage device that is local to a user interface device, storing a copy of the user data to a storage location that is remote from the user interface device, performing a service for a user of the user interface device using the copy of the user data stored to the storage location, and communicating information associated with the service back to the user interface device. Additionally, the data image may be directly scanned for malicious software. In a further embodiment, the method may include providing a software inventory associated with the user data, such as software, stored in the image.

METHOD AND SYSTEM FOR SEARCHING PHRASE CONCEPTS IN DOCUMENTS
20170228456 · 2017-08-10 ·

A system and method for fast concept search in multiple documents where the concept is expressed by plurality of words, all of which have to be in the same sentence and within specified range. The system automatically finds equivalent expressions of the same concept, and returns as search results all documents in which the concept is contained.

GENERATING FEATURE EMBEDDINGS FROM A CO-OCCURRENCE MATRIX

Methods, and systems, including computer programs encoded on computer storage media for generating compressed representations from a co-occurrence matrix. A method includes obtaining a set of sub matrices of a co-occurrence matrix, where each row of the co-occurrence matrix corresponds to a feature from a first feature vocabulary and each column of the co-occurrence matrix corresponds to a feature from a second feature vocabulary; selecting a sub matrix, wherein the sub matrix is associated with a particular row block and column block of the co-occurrence matrix; assigning respective d-dimensional initial row and column embedding vectors to each row and column from the particular row and column blocks, respectively; and determining a final row embedding vector and a final column embedding vector by iteratively adjusting the initial row embedding vectors and the initial column embedding vectors using the co-occurrence matrix.

Multi-magnitudinal vectors with resolution based on source vector features
11237830 · 2022-02-01 · ·

Methods, systems and computer program products for resolving multiple magnitudes assigned to a target vector are disclosed. A target vector that includes one or more target vector dimensions is received. One of the target vector dimensions is processed to determine a total number of magnitudes assigned to the processed target vector dimension. Also, a source vector that includes one or more source vector dimensions is received. The received source vector is processed to determine a total number of features associated with the source vector. When it is detected that the total number of magnitudes assigned to the processed target vector dimension exceeds one, one of the assigned magnitudes is selected based on one of the determined features associated with the source vector.

METHOD AND APPARATUS FOR POSTAL ADDRESS MATCHING
20170220975 · 2017-08-03 ·

Provided are methods and apparatus for matching postal addresses. In an example, provided is a method for comparing postal addresses. The method includes receiving a first postal address, standardizing the form of the first postal address, removing a component of the first postal address to create a canonical representation of the first postal address, and utilizing a signature-based algorithm to identify at least one stored signature which substantially matches the first postal address.

System and method to search and generate reports from semi-structured data including dynamic metadata
09721016 · 2017-08-01 · ·

Embodiments of the invention provide a system and method for searching and reporting on semistructured data that can include dynamic metadata. One embodiment can comprise providing a user interface to a user based on an object type definition for an object type that allows the user to specify search criteria associated with a set of metadata, mapping the user search criteria to a query that comprises at least one structured query constraint and at least one unstructured query constraint, processing the query to search a set of data objects containing semistructured data associated with the object type according to the query and returning a set of results to the user. The search results can be returned to a user based on user-specified reporting parameters. Additionally, the reporting definition can be saved as an object for future execution.

Extracting method, computer product, extracting system, information generating method, and information contents

An extracting method includes storing to a storage device: files that include character units; first index information indicating which file includes at least one character unit in a character unit group having a usage frequency less than a predetermined frequency and among character units having common information in a predetermined portion, the usage frequency indicating the extent of files having a given character unit; second index information indicating which file includes a first character unit having a usage frequency at least equal to the predetermined frequency and among the character units having common information in a predetermined portion; and referring to the first and second index information to extract a file having character units in the first and second index information, when a request is received for extraction of a file having the first character unit and a second character unit that is included in the character unit group.