Patent classifications
G06F40/194
Systems and Methods for the Comparison of Selected Text
Systems and methods are disclosed for comparing selections of text to show differences between the two selections. The text may be selected from the same source or from two different sources. In one implementation, a system receives a first selection of text for comparison and places the selection in a first buffer. The system receives a second selection of text for comparison and places the second selection in a second buffer. The system compares the first buffer and the second buffer to determine differences and displays the differences. In some embodiments, the system may allow a user to choose two buffers from among a plurality of buffers for comparison.
Systems and Methods for the Comparison of Selected Text
Systems and methods are disclosed for comparing selections of text to show differences between the two selections. The text may be selected from the same source or from two different sources. In one implementation, a system receives a first selection of text for comparison and places the selection in a first buffer. The system receives a second selection of text for comparison and places the second selection in a second buffer. The system compares the first buffer and the second buffer to determine differences and displays the differences. In some embodiments, the system may allow a user to choose two buffers from among a plurality of buffers for comparison.
TEXT OUTPUT METHOD AND SYSTEM, STORAGE MEDIUM, AND ELECTRONIC DEVICE
Embodiments of the present application provide a text output method and system, a storage medium, and an electronic device. The system includes at least an automatic speech recognition ASR model group, a text alignment model, and a re-scoring model that are sequentially connected, where the ASR model group includes a plurality of ASR models each configured to convert input audio data into respective first texts; the text alignment model is configured to perform alignment for a plurality of first texts, to obtain a plurality of target texts, where lengths of the plurality of target texts are all equal; and the re-scoring model is configured to score words/terms at each alignment position of the plurality of target texts, to obtain a word/term with the highest score at each alignment position, as a target word/term, and determine the target word/terms, as an output text, by the respective alignment positions.
String Alignment with Translocation Insensitivity
A method, apparatus, system, and computer program code for determining string alignment with insensitivity to translocation. A computer system arranges a pair of strings in a similarity matrix. The computer system determines a match score for an optimal local alignment of whole-word sequences between the pair of strings. The computer system masks the whole-word sequences of the optimal local alignment to generate word-masked strings. Using the word-masked strings, the computer system repeats the arranging, determining, and masking steps a number of times to generate a number of match scores. The computer system combines the number of match scores into a combined score that represents similarities between the pair of strings, wherein the combined score is insensitive to translocation and word truncations. Based on the combined score, the computer system determines alignment between the pair of strings.
Methods and systems for detecting duplicate document using document similarity measuring model based on deep learning
Disclosed is a method and system, the method including extracting similar and dissimilar document pair sets from a document database, the similar document pair set including similar document pairs having a common attribute, and the dissimilar document pair set including dissimilar document pairs extracted randomly, calculating a mathematical similarity for each of the similar and dissimilar document pairs using a mathematical measure to obtain a first and second mathematical similarities, calculating a semantic similarity for each of the similar and dissimilar document pairs to obtain a first and second semantic similarities, the first semantic similarities being higher than the first mathematical similarities, and the second semantic similarities being lower than the second mathematical similarities, training a similarity model based on the similar and dissimilar document pairs, and the first and second semantic similarities to obtain a trained similarity model, and detecting a duplicate document using the trained similarity model.
Methods and systems for detecting duplicate document using document similarity measuring model based on deep learning
Disclosed is a method and system, the method including extracting similar and dissimilar document pair sets from a document database, the similar document pair set including similar document pairs having a common attribute, and the dissimilar document pair set including dissimilar document pairs extracted randomly, calculating a mathematical similarity for each of the similar and dissimilar document pairs using a mathematical measure to obtain a first and second mathematical similarities, calculating a semantic similarity for each of the similar and dissimilar document pairs to obtain a first and second semantic similarities, the first semantic similarities being higher than the first mathematical similarities, and the second semantic similarities being lower than the second mathematical similarities, training a similarity model based on the similar and dissimilar document pairs, and the first and second semantic similarities to obtain a trained similarity model, and detecting a duplicate document using the trained similarity model.
System and engine for seeded clustering of news events
The present invention provides a seeded news event clustering and retrieval system configured to first create a candidate data set of documents, second create a set of initial clusters based on nearness or duplicate similarity status, and third create an aggregate cluster by merging initial clusters with seed documents. The invention generates top-level clusters for news events based on an editorially supplied topical label or “seed” component and generates sub-topic-focused clusters based on algorithm. The system uses an agglomerative clustering algorithm to gather and structure documents into distinct result sets. Decisions on whether to merge related documents or clusters are made according to similarity of evidence derived from two distinct sources, one, relying on a digital signature based on the unstructured text in the document, the other based on the presence of named entity tags that have been assigned to the document by an event or named entity tagger such as the Thomson Reuters Calais engine/web service.
System and engine for seeded clustering of news events
The present invention provides a seeded news event clustering and retrieval system configured to first create a candidate data set of documents, second create a set of initial clusters based on nearness or duplicate similarity status, and third create an aggregate cluster by merging initial clusters with seed documents. The invention generates top-level clusters for news events based on an editorially supplied topical label or “seed” component and generates sub-topic-focused clusters based on algorithm. The system uses an agglomerative clustering algorithm to gather and structure documents into distinct result sets. Decisions on whether to merge related documents or clusters are made according to similarity of evidence derived from two distinct sources, one, relying on a digital signature based on the unstructured text in the document, the other based on the presence of named entity tags that have been assigned to the document by an event or named entity tagger such as the Thomson Reuters Calais engine/web service.
System and method for automatically generating calculations for fields in compliance forms
A method and system to learn new forms to be incorporated into an electronic document preparation system, or to learn the behavior of existing systems, receive form data related to a new form having a plurality of data fields that expect data values based on specific functions. The method and system gather training set data including previously filled forms having completed data fields corresponding to the data fields of the new form. The method and system include multiple analysis modules that each generate candidate functions for providing data values for the data fields of the new form. The method and system evaluate the candidate functions from each analysis technique and select the candidate functions that are most accurate based on comparisons with the training set data.
Mapping annotations to ranges of text across documents
An annotation corresponding to a first range of text of a first document may be received. Based on the annotation, comparisons may be performed between a text string that comprises the first range of text and a group of text of a second document at different positions in the group of text. Based on the comparisons, similarity scores between the text string and the group of text may be determined at the different positions in the group of text. A position for the annotation in the group of text may be selected based on the similarity scores at the different positions. The annotation may be associated with a second range of text in the group of text that corresponds to the position.