G06F16/2468

Generalization processing method, apparatus, device and computer storage medium

The present disclosure provides a generalization processing method, apparatus, device and computer storage medium, and relates to technical field of artificial intelligence and specifically to a deep learning technique. A specific implementation solution is: determining a set of candidate queries in a query library that are similar to a requested query in at least one of a literal matching manner, a semantic matching manner and a query rewriting manner; determining a generalized query corresponding to the requested query from the set of candidate queries by using a pre-trained query matching model; wherein the query matching model is obtained by pre-training based on a cross attention model. The generalization for the requested query can be achieved according to the present disclosure.

System and method for processing of events

Systems and methods for processing events are disclosed. Event data comprising passive event data, active event data, or both is received. It is determined whether the received event data is available for a pattern of passive event data and active event data. In response to determining that the received event data is available for the pattern of passive event data and active event data, one or more constraints between the passive event data and the active event data are converted into one or more query terms. The query terms are used to construct at least one query. Remaining passive event data that is related to some, but not all, of the active event data is obtained using the constructed at least one query.

Search system and search method for finding new relationships between material property parameters

To effectively utilize knowledge of relationship information among material property parameters the users tangibly and intangibly have in a search system that generates a graph in which material property parameters are nodes and relationships of the material property parameters are edges from a database of material property parameter pairs whose relationships are already known, and conducts a path search in the generated graph. A search system, which includes the database, a graph generator that generates the graph, and a graph searcher searches the graph, further includes a user interface and a user information storage unit corresponding to each user. The user conducts a search unique to the user by inputting relationship information between the material property parameters that he has to the user information storage unit and integrating the relationship information into the above graph. Further, by accumulating a history of searches conducted by the user in the user information storage unit and analyzing the search history, the user can be provided with new knowledge.

Personalized merchant scoring based on vectorization of merchant and customer data

Provided are various mechanisms and processes for generating dynamic merchant scoring predictions. A system is configured to receive datasets comprising pairings between training customer profiles and training merchant profiles. For each pairing, a set of feature values corresponding to features specified by the customer and merchant profiles are extracted and converted into a training vector to train a machine learning model to determine a weighted coefficient for each feature. Once sufficiently trained, the system determines a set of available merchant profiles for a customer profile in response to receiving a search request from a customer associated with the customer profile. For each pairing between the customer profile and an available merchant profile, the system determines an order score for the available merchant based on the weighted coefficients and an input set of feature values specified by the customer profile and the available merchant profile.

Full-text fuzzy search method for similar-form Chinese characters in ciphertext domain

The invention discloses a full-text fuzzy search method for similar-form Chinese characters in a ciphertext domain. The method realises a fuzzy search in the Chinese ciphertext domain based on a symmetric searchable encryption scheme and an inverted index structure, supports a fuzzy search on Chinese characters having similar glyphs in ciphertext status, ensures that searching results are ordered, and supports a multi-keyword logical connection fuzzy search. The present invention uses a distributed search engine Lucene and a Chinese word segmentator IKAnalyzer to perform full-text word segmentation on a document and constructs a plaintext inverted index comprising similar-form Chinese characters by means of the established similar-form character library of 3,755 commonly used Chinese characters. Considering the security of the inverted index structure, each keyword in the plaintext inverted index and its corresponding document number are constructed in an encrypted chain form, and a B+ tree structure is used to speed up the search. The invention realizes a fuzzy search in a Chinese full-text ciphertext domain in a semi-trusted cloud server without false detection and missed detection.

DETECTION OF ABBREVIATION AND MAPPING TO FULL ORIGINAL TERM

Translation capability for language processing determines an existence of an abbreviation, followed by non-exact matching to map the abbreviation to the original full term. A received string in a source language is provided as input to a translation service. Translation proposals in a different target language are received back. A ruleset (considering factors, e.g., camel case format, the presence of a concluding period, and/or consecutive consonants) is applied to generate abbreviation candidates from the translation proposals. Non-exact matching (referencing e.g., a comparison metric) may then be used to map the abbreviation candidates to text strings of their original full terms. A mapping of the abbreviation to the text string of the original full term is stored in a translation database comprising linguistic data. Embodiments leverage existing resources (e.g., translation service, non-exact matching) to reduce effort and expense of accurately identifying abbreviations and then mapping them to their full original terms.

SYSTEM AND METHOD OF AUTOMATIC TOPIC DETECTION IN TEXT

A method and system for automatic topic detection in text may include receiving a text document of a corpus of documents and extracting one or more phrases from the document, based on one or more syntactic patterns. For each phrase, embodiments of the invention may: apply a word embedding neural network on one or more words of the phrase, to obtain one or more respective word embedding vectors; calculate a weighted phrase embedding vector, and compute a phrase saliency score, based on the weighted phrase embedding vector. Embodiments of the invention may subsequently produce one or more topic labels, representing one or more respective topics in the document, based on the computed phrase saliency scores, and may select one or more topic labels according to their relevance to the business domain of the corpus.

QUERY EXECUTION UTILIZING PROBABILISTIC INDEXING
20220382751 · 2022-12-01 · ·

A method for execution by at least one processor of a database system includes indexing a first column via a probabilistic indexing scheme. An IO pipeline that includes a probabilistic index-based IO construct for access of the first column is determined based on a query including a query predicate indicating the first column. The probabilistic index-based IO construct is applied in conjunction with execution of the query via the IO pipeline by applying an index element of the probabilistic index-based IO construct to identify a first subset of rows based on index data of the probabilistic indexing scheme for the first column. A filter element of the probabilistic index-based IO construct is applied to identify ones of a first subset of the plurality of column values corresponding to the first subset of rows that compare favorably to the query predicate.

Data normalization system
11507549 · 2022-11-22 · ·

A data normalization system receives a first string and a second string that are ordered according to an initial string ordering. The data normalization system analyzes, the first string and the second string based on a list of known character sets included in surnames, yielding an analysis, and determines, based on the analysis, that a set of characters in the second string matches a known character set included in the list of known character sets included in surnames. In response to determining that the set of characters in the second string matches a known character set included in the list of known character sets included in surname, the data normalization system orders the first string and the second string according to an updated string ordering.

MACHINE LEARNING TECHNIQUES FOR GENERATING STRING-BASED DATABASE MAPPING PREDICTION

Various embodiments of the present invention provide methods, apparatus, systems, computing devices, computing entities, and/or the like for performing predictive mapping operations with respect to a ground-truth database table. Certain embodiments of the present invention utilize systems, methods, and computer program products that perform predictive mapping operations utilizing a hierarchical string matching machine learning framework using at least one or more of an exact match model, a probabilistic match model, a disjoint match model, and an embedding-based match model.