Patent classifications
G06V30/1983
OPTICAL CHARACTER RECOGNITION OF DOCUMENTS HAVING NON-COPLANAR REGIONS
Systems and methods for performing OCR of an image depicting text symbols and imaging a document having a plurality of planar regions are disclosed. An example method comprises: receiving a first image of a document having a plurality of planar regions and one or more second images of the document; identifying a plurality of coordinate transformations corresponding to each of the planar regions of the first image of the document; identifying, using the plurality of coordinate transformations, a cluster of symbol sequences of the text in the first image and in the one or more second images; and producing a resulting OCR text comprising a median symbol sequence for the cluster of symbol sequences.
METHOD FOR TESTING MEDICAL DATA
A method for testing medical data is provided. Each medical datum includes a plurality of information units and a plurality of separators, and the method includes the following steps: a. matching the medical data against a standard library including a plurality of patterns, a matching expression being:
[\s\S][number/sequence/relation]&[\b|\B] (S101); and b. determining, based on a matching result of the step a, whether the medical datum is qualified (S102). A standardized standard library is first established, a matching result is obtained by matching the medical datum and the standard library for a non-initial boundary, an initial boundary, an information quantity, information sequences, a semantic relationship quantity, a character boundary, and a non-character boundary, and whether the medical datum meets a requirement is further determined according to the matching result.
Automated classification and interpretation of life science documents
A computer-implemented tool for automated classification and interpretation of documents, such as life science documents supporting clinical trials, is configured to perform a combination of raw text, document construct, and image analyses to enhance classification accuracy by enabling a more comprehensive machine-based understanding of document content. The combination of analyses provides context for classification by leveraging relative spatial relationships among text and image elements, identifying characteristics and formatting of elements, and extracting additional metadata from the documents as compared to conventional automated classification tools.
SYSTEMS AND METHODS FOR USING DYNAMIC REFERENCE GRAPHS TO ACCURATELY ALIGN SEQUENCE READS
A method for matching character strings to a reference character string, is disclosed. One or more processors receive a plurality of character strings. The one or more processors match each of the plurality of character strings to a main reference character string and registers a match to positions on the main reference character string that satisfy a pre-set match criteria. The one or more processors match each of the plurality of character strings to an alternate reference character string and registers a match to positions on the alternate reference character string that satisfy the pre-set match criteria. The alternate reference character string is derived from the main character string. The one or more processors identifies a match for each of the plurality of character strings that match to either a position on the main reference character string or the alternate reference character string.
Image-processing device for document image, image-processing method for document image, and storage medium on which program is stored
An image-processing device includes: a reliability calculation unit configured to calculate reliability of a character recognition result on a document image which is a character recognition target on the basis of a feature amount of a character string of a specific item included in the document image; and an output destination selection unit configured to select an output destination of the character recognition result in accordance with the reliability.
Domain-Specific Phrase Mining Method, Apparatus and Electronic Device
A domain-specific phrase mining method, apparatus and electronic device are provided. A specific implementation includes: performing word vector conversion on a domain-specific phrase in a target text to obtain a first word vector, and performing word vector conversion on an unknown phrase in the target text to obtain a second word vector, where the domain-specific phrase is a phrase in a domain to which the target text belongs; obtaining a word vector space formed by the first and second word vectors, and identifying a preset quantity of target word vectors around the second word vector in the word vector space; determining, based on similarity values indicative of similarity between the preset quantity of target word vectors and the second word vector, whether the unknown phrase is a phrase in the domain to which the target text belongs.
Apparatus for detecting contextually-anomalous sentence in document, method therefor, and computer-readable recording medium having program for performing same method recorded thereon
Disclosed are an apparatus and a method for detecting whether an anomalous sentence having a context different from that of other sentences exists in a document. The apparatus for detecting a contextually-anomalous sentence in a document according to the present invention includes: a sentence encoder for encoding individual sentences constituting document data by means of a predetermined rule (function) to generate encoding vectors; a context embedder neural network for converting the generated encoding vector into embedding vectors corresponding thereto; and a context anomaly detector neural network for detecting whether an anomalous sentence exists in the converted document data.
Regular expression generation using longest common subsequence algorithm on combinations of regular expression codes
Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence.
Medical test results and identity authentication system and method
A system and method that enables users to provide authenticated medical records (e.g., vaccination records, viral anti-body test results, etc.) to a third-party (e.g., a venue) to gain access to the third-party is provided. In this way, the third party may confirm that the user is sufficiently immune to a particular disease (e.g., COVID-19) and may thereby minimize the threat of the user introducing the contagious disease to the third party. The system includes a biometric data recognition system that authenticates the identity of a user, a medical records acquisition system that acquires the medical records of the authenticated user, and a system for the displaying or otherwise providing the medical records to the third-party for review. The system also includes a system identification card that includes the user's contact information, alphanumeric characters associated with the user's driver's license number, medical records of the user, and other elements.
APPARATUS FOR DETECTING CONTEXTUALLY-ANOMALOUS SENTENCE IN DOCUMENT, METHOD THEREFOR, AND COMPUTER-READABLE RECORDING MEDIUM HAVING PROGRAM FOR PERFORMING SAME METHOD RECORDED THEREON
Disclosed are an apparatus and a method for detecting whether an anomalous sentence having a context different from that of other sentences exists in a document. The apparatus for detecting a contextually-anomalous sentence in a document according to the present invention includes: a sentence encoder for encoding individual sentences constituting document data by means of a predetermined rule (function) to generate encoding vectors; a context embedder neural network for converting the generated encoding vector into embedding vectors corresponding thereto; and a context anomaly detector neural network for detecting whether an anomalous sentence exists in the converted document data.