G06V30/19187

SEARCH QUERY GENERATION BASED UPON RECEIVED TEXT

In an example, a first set of text may be received from a client device. A set of content items may be selected from among content items based upon the first set of text and a plurality of sets of content item text associated with the content items. A set of terms may be determined based upon the first set of text and the set of content items. A similarity profile associated with the set of terms may be generated. The similarity profile is indicative of similarity scores associated with similarities between terms of the set of terms. Relevance scores associated with the set of terms may be determined based upon the similarity profile. One or more search terms may be selected from among the set of terms based upon the relevance scores. A search may be performed based upon the one or more search terms.

Article topic alignment

A method including: analyzing, by a computing device, a plurality of portions of a document; determining, by the computing device and based on the analyzing, a concept of each of the portions of the document; comparing, by the computing device, a title of the document with the concept of each of the portions of the document; determining, by the computing device and based on the comparing, an alignment of the concept of each of the portions of the document with the title; generating, by the computing device and based on the alignment, a propensity score for each of the portions of the document; and reordering, by the computing device and based on the propensity scores, the portions of the document from most aligned with the title to least aligned with the title.

DATA NETWORK, SYSTEM AND METHOD FOR DATA INGESTION IN A DATA NETWORK

The present invention provides a data network, a data ingestion system and a method of data ingestion in the data network for a supply chain management enterprise application. The data network includes one or more data objects of different data types received from different data sources structured on multiple distinct architecture, connected to each other for executing multiple functions in the enterprise application.

METHOD AND DEVICE FOR CONSTRUCTING LEGAL KNOWLEDGE GRAPH BASED ON JOINT ENTITY AND RELATION EXTRACTION

A method and device for constructing a legal knowledge graph based on joint entity and relation extraction. The construction method comprises the following steps: constructing a triple data set; design of a model architecture and training of a model, wherein the model architecture comprises an encoding layer, a head entity extraction layer and a relation-tail entity extraction layer; determination of the relation between the sentences of the text; triple combination and graph visualization. The design of the model framework of the present disclosure adopts a Chinese Bert pre-training model as an encoder. In the entity extraction part, two BiLSTM binary classifiers are used to identify the start position and end position of an entity. The head entity is first extracted, and then the tail entity corresponding to the entity relation is extracted from the extracted head entity.

Document analysis architecture

Systems and methods for generation and use of document analysis architectures are disclosed. A model builder component may be utilized to receiving user input data for labeling a set of documents as in class or out of class. That user input data may be utilized to train one or more classification models, which may then be utilized to predict classification of other documents. Trained models may be incorporated into a model taxonomy for searching and use by other users for document analysis purposes.

METHOD FOR DETERMINING TEXT SIMILARITY, METHOD FOR OBTAINING SEMANTIC ANSWER TEXT, AND QUESTION ANSWERING METHOD
20220121824 · 2022-04-21 ·

A method for determining a text similarity, a method for obtaining a semantic answer text, and a question answering method relate to the filed of smart question answering technologies. The method for determining the text similarity includes: converting a text to be answered into a semantic vector to be answered; and calculating a similarity between the semantic vector to be answered and a question semantic vector of each of at least one question text, each similarity being a text similarity between the text to be answered and a question text.

PSEUDO LABELLING FOR KEY-VALUE EXTRACTION FROM DOCUMENTS

A computing device may access visually rich documents comprising an image and metadata. A graph, based on the image or metadata, can be generated for a visually rich document. The graph's nodes can correspond to words from the visually rich document. Features for nodes can be determined by the device. The device may generate model labeled graphs by assigning a pseudo-label to nodes using a pretrained model. The device may generate a plurality of graph labeled graphs by assigning a pseudo-label to nodes by matching a first node from a first graph to at least a second node from a second graph. The device may generate a plurality of updated graphs by cross referencing labels from the model labeled graphs and the graph labeled graphs. Until a change in labels is below a threshold, a model can be trained to perform key-value extraction using the updated graphs.

METHOD FOR RECOGNIZING RECEIPT, ELECTRONIC DEVICE AND STORAGE MEDIUM
20230282016 · 2023-09-07 ·

Provided are method for recognizing a receipt, an electronic device and a storage medium, which relate to the fields of deep learning and pattern recognition. The method may include: a target receipt to be recognized is acquired; two-dimensional position information of multiple text blocks on the target receipt respectively is encoded, to obtain multiple encoding results; graph convolution is performed on the multiple encoding results respectively, to obtain multiple convolution results; and each of the multiple convolution results is recognized based on a first conditional random field model, to obtain a first prediction result at text block-level of the target receipt, wherein the first conditional random field model and a second conditional random field model are co-trained, so as to obtain a second prediction result at token-level of the target receipt.

CHARACTER-BASED REPRESENTATION LEARNING FOR TABLE DATA EXTRACTION USING ARTIFICIAL INTELLIGENCE TECHNIQUES
20230368556 · 2023-11-16 ·

Methods, apparatus, and processor-readable storage media for character-based representation learning for table data extraction using artificial intelligence techniques are provided herein. An example computer-implemented method includes identifying, from unstructured documents comprising tabular data, items of text and corresponding document position information using artificial intelligence-based text extraction techniques; generating an intermediate output by implementing character embedding with respect to the unstructured documents using an artificial intelligence-based encoder; determining structure-related information for the unstructured documents using one or more artificial intelligence-based graph-related techniques by inferring columns from the tabular data; generating a character-based representation of the unstructured documents using an artificial intelligence-based decoder by converting the inferred columns into one or more line items; classifying portions of the character-based representation using artificial intelligence-based statistical modeling techniques; and performing one or more automated actions based on the classifying.

Pseudo labelling for key-value extraction from documents

A computing device may access visually rich documents comprising an image and metadata. A graph, based on the image or metadata, can be generated for a visually rich document. The graph's nodes can correspond to words from the visually rich document. Features for nodes can be determined by the device. The device may generate model labeled graphs by assigning a pseudo-label to nodes using a pretrained model. The device may generate a plurality of graph labeled graphs by assigning a pseudo-label to nodes by matching a first node from a first graph to at least a second node from a second graph. The device may generate a plurality of updated graphs by cross referencing labels from the model labeled graphs and the graph labeled graphs. Until a change in labels is below a threshold, a model can be trained to perform key-value extraction using the updated graphs.