G06V30/19093

METHOD AND APPARATUS FOR TRAINING EMBEDDING VECTOR GENERATION MODEL

A method and apparatus for training an embedding vector generation model are provided, the method includes identifying a keyword in a query sentence, generating an embedding vector of the query sentence and an embedding vector of the keyword based on the embedding vector generation model, and training the embedding vector generation model such that a first similarity between the embedding vector of the query sentence and the embedding vector of the keyword is greater than a second similarity between an embedding vector of a reference sentence that does not include the keyword and the embedding vector of the keyword.

COMPUTER-READABLE RECORDING MEDIUM STORING TRAINING DATA GENERATION PROGRAM, TRAINING DATA GENERATION METHOD, AND TRAINING DATA GENERATION APPARATUS
20220309814 · 2022-09-29 · ·

A non-transitory computer-readable recording medium storing a training data generation program for causing a computer to execute processing including: identifying, from among meta-analysis literatures stored in a memory, a plurality of meta-analysis literatures in which a first literature is cited; determining a degree of similarity between the plurality of identified meta-analysis literatures based on feature information of the plurality of identified meta-analysis literatures; and in response to the degree of similarity being equal to or higher than a threshold, generating training data for machine learning including the first literature.

DISTRIBUTED COMPUTER SYSTEM FOR DOCUMENT AUTHENTICATION
20220237937 · 2022-07-28 ·

Methods and distributed computer devices for automatically determining whether a document is genuine. The method involves generating an image of the document, pre-processing of the image to obtain at least one segment of the image with an area of interest and dividing the at least one segment into portions containing single characters and/or combinations of characters. A validation of at least two single characters and/or at least two combinations of characters is performed for each of the single character and/or character combinations for at least two different categories. Score values are created for each category for each validated single character and/or character combination. Feature vectors are created for each single character and/or character combination, with the respective score values for each category as components. The method involves classifying the feature vectors to determine whether the single character or character combination to which the feature vector is associated is genuine.

Three-dimensional Environment Analysis Method and Device, Computer Storage Medium and Wireless Sensor System
20210406515 · 2021-12-30 ·

A three-dimensional environment analysis method is disclosed. The method includes (i) receiving original point cloud data of a working environment, (ii) processing a map constructed on the basis of the original point cloud data in order to separate out a ground surface, a wall surface and an obstacle in the working environment, (iii) pairing the ground surface with the wall surface according to the degree of proximity between the ground surface and wall surface that are separated out to form one or more adjacent ground-wall pair sets, and (iv) subjecting the one or more adjacent ground-wall pair sets to ray tracing analysis in order to obtain a line-of-sight zone and a non-line-of-sight zone in the working environment. A three-dimensional environment analysis device, a computer storage medium and a wireless sensor system is also disclosed.

METHOD FOR TESTING MEDICAL DATA

A method for testing medical data is provided. Each medical datum includes a plurality of information units and a plurality of separators, and the method includes the following steps: a. matching the medical data against a standard library including a plurality of patterns, a matching expression being:

[\s\S][number/sequence/relation]&[\b|\B] (S101); and b. determining, based on a matching result of the step a, whether the medical datum is qualified (S102). A standardized standard library is first established, a matching result is obtained by matching the medical datum and the standard library for a non-initial boundary, an initial boundary, an information quantity, information sequences, a semantic relationship quantity, a character boundary, and a non-character boundary, and whether the medical datum meets a requirement is further determined according to the matching result.

INFORMATION PROCESSING APPARATUS AND NON-TRANSITORY COMPUTER READABLE MEDIUM
20220198183 · 2022-06-23 · ·

An information processing apparatus includes a processor configured to: receive an image on a paper sheet having an entry field ready to be filled with information; and present in a user selectable manner three production methods to produce definition information indicating an attribute of information to fill in the entry field, the three production methods including a method in which a user newly produces definition information, a method of reusing definition information that has been produced for another paper sheet and is prepared beforehand, and a method of producing definition information by using results provided by an artificial intelligence having sorted the received paper sheet.

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, CONTROL METHOD OF THE SAME, AND STORAGE MEDIUM
20220201146 · 2022-06-23 ·

An image processing apparatus for setting a property of a document file by using a result of a character recognition process performed on a scanned image of a document is provided and includes an obtaining unit and an a setting unit. The obtaining unit obtains a character string by performing the character recognition process on a scanned image relating to a document file to be generated in this operation. The setting unit automatically sets the character string obtained by the obtaining unit as a character string to be used in a property of the document file to be generated in this operation if the character string obtained by the obtaining unit is a character string obtained in the character recognition process performed on a scanned image relating to a document file generated in the past and approved by a user a certain number of times or more.

Term extraction in highly technical domains

A language model is fine-tuned by extracting terminology terms from a text document. The method comprises identifying a text snippet, identifying candidate multi-word expressions using part of speech tags, and determining a specificity score value for each of the candidate multi-word expressions. Moreover, the method comprises determining a topic similarity score value for each of the candidate multi-word expressions, selecting remaining expressions from the candidate multi-word expressions using a function of a specificity value and a topic similarity value of each of the candidate multi-word expressions, adding a noun comprised in the text snippet to the remaining expressions depending on a correlation function, labeling the remaining multi-word expressions, and fine-tuning an existing pre-trained transformer-based language model using as training data the identified text snippet marked with the labeled remaining expressions.

METHOD OF GENERATING FONT DATABASE, AND METHOD OF TRAINING NEURAL NETWORK MODEL
20220180650 · 2022-06-09 ·

A method of generating a font database, and a method of training a neural network model are provided, which relate to a field of artificial intelligence, in particular to a computer vision and deep learning technology. The method of generating the font database includes: determining, by using a trained similarity comparison model, a basic font database most similar to handwriting font data of a target user in a plurality of basic font databases as a candidate font database; and adjusting, by using a trained basic font database model for generating the candidate font database, the handwriting font data of the target user, so as to obtain a target font database for the target user.

INFORMATION PROCESSING APPARATUS AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM RECORDING INFORMATION PROCESSING PROGRAM
20230274567 · 2023-08-31 · ·

An information processing apparatus includes a processor, the processor extracting, from at least a part of a document, a feature of the part, extracting related elements which are elements related to the feature from a question and an answer about a first target object stored in a storage unit using the extracted feature, and combining the extracted related elements and the feature to generate a question about a second target object specified in the document.