Patent classifications
G06F16/355
Preventing the distribution of forbidden network content using automatic variant detection
The subject matter of this specification generally relates to preventing the distribution of forbidden network content. In one aspect, a system includes a front-end server that receives content for distribution over a data communication network. The back-end server identifies, in the query log, a set of received queries for which a given forbidden term was used to identify a search result in response to the received query even though the given forbidden term was not included in queries included in the set of received queries. The back-end server classifies, as variants of the given forbidden term, a term from one or more queries in the set of received queries that caused a search engine to use the given forbidden term to identify one or more search results in response to the one or more queries and prevents distribution of content that includes a variant.
Label guided unsupervised learning based network-level application signature generation
Application-initiated network traffic is intercepted and analyzed by an application firewall in order to identify streams of traffic for a target application. An application signature generator preprocesses the raw data packets from the intercepted network traffic by tokenizing the data packets and then weighting each token according to its importance for application identification. The weighted features for each data packet are clustered using an unsupervised learning model, and the resulting clusters are iteratively refined and re-clustered using a proximity score between the clusters and feature vectors for key tokens for the target application. The application signature generator generates a signature for the clusters corresponding to the target application which the application firewall implements for filtering network traffic.
Computer-based systems for data entity matching detection based on latent similarities in large datasets and methods of use thereof
At least some embodiments are directed to an entity matching detection system. The entity matching detection system includes a latent similarity identification machine learning model that receives one or more data records and generates a final similarity score indicative of a latent similarity between the one or more data records and a second data record. The entity matching detection system can identify lexical and semantic similarities between attribute values and can analyze and compute similarity scores for direct-linked attribute values and cross-linked attribute values extracted from different data records.
Method and system for automatic discovery of topics and trends over time
A method and system for automatically performing a discovery of topics within temporal ordered text document collections are provided. The method includes generating a bag of words vector for each text document collection using a predefined dictionary. The method also includes iteratively calculating, based on the generated bag of words vectors, for each text document collection, a hidden topic vector representing topics of the respective text document collection using a calculated hidden state vector memorizing a hidden state of all previous text document collections.
Summary evaluation device, method, program, and storage medium
The present disclosure relates to a method of evaluating accuracy of a summary of a document. The method includes receiving a plurality of reference summaries of a document and a system summary of the document. The system summary is generated by a machine. The method further includes extracting, for each reference summary, a tuple that is a pair of words composed of a modified word and a dependent word having a dependency relation to the modified word and a label representing the dependency relation. The method further includes replacing, for each of the extracted tuples, each of the modified word of the tuple's word pair and the dependent word with a class predetermined for the words. The method further generates a score of the system summary based on the class and a set of tuples of the system summary.
Detecting extraneous topic information using artificial intelligence models
Systems and methods for improving machine learning systems used to model topics on a plurality of calls are described herein. In an embodiment, a server computer receives plurality of digitally stored call transcripts that have been prepared from digitally recorded voice calls. The server computer uses a topic model of an artificial intelligence machine learning system, the topic model modeling words of a call as a function of one or more word distributions for each topic of a plurality of topics, to generate an output of the topic model which identifies the plurality of topics represented in the plurality of call transcripts. The server computer computes, for a particular topic of the plurality of topics a first value representing a vocabulary of the particular topic and a second value representing a consistency of the particular topic in two more call transcripts of the plurality of call transcripts which include the particular topic. Based, at least in part, on one or more of the first value or the second value, the server computer determines that the particular topic meets a particular criterion and, in response, updates the output of the topic model to remove the particular topic or distinguish the particular topic from other topics of the plurality of topics which do not meet the particular criterion.
Learning interpretable strategies in the presence of existing domain knowledge
A mechanism is provided in a data processing system to implement a medical concept searching engine for improving searches of medical concepts based on an index model. The mechanism generates a concept index model data structure that records medical concepts and corresponding numbers of instances of the medical concepts in the corpus of documents. Responsive to receiving a search request from a user, the medical concept searching engine identifies at least one medical concept in the search request and one or more related medical concepts that are related to the at least one medical concept based on an ontology data structure. The medical concept searching engine generates a bubble graph user interface comprising a plurality of bubbles corresponding to the at least one medical concept and the one or more related medical concepts.
Database query generation using natural language text
Some embodiments may obtain a natural language question, determine a context of the natural language question, and generate a first vector based on the natural language question using encoder neural network layers. Some embodiments may access a data table comprising column names, generate vectors based on the column names, and determine attention scores based on the vectors. Some embodiments may update the vectors based on the attention scores, generating a second vector based on the natural language question, determine a set of strings comprising a name of the column names and a database language operator based on the vectors. Some embodiments may determine a values based on the determined database language operator, the name, using a transformer neural network model. Some embodiments may generate a query based on the set of strings and the values.
GENERATING NUMERICAL DATA ESTIMATES FROM DETERMINED CORRELATIONS BETWEEN TEXT AND NUMERICAL DATA
The present invention relates to a method and apparatus for determining correlations between text or text-derived data and numerical data. Specifically the present invention relates to determining correlation(s) between text-derived and numerical data in order to generate estimated numerical data using the determined correlation(s) for specific text-derived data. Aspects and/or embodiments seek to provide a method for estimating numerical data using historical numerical data and historical text-derived data. Aspects and/or embodiments also seek to determine a correlation between the historical numerical data and historical text-derived data for use in generating the estimated numerical data using text-derived data, optionally to identify relevant trends in text-derived data that can be used to generate estimated/predicted numerical data, and optionally in order to train a computer implemented model to generate estimates of numerical data for given text-derived data.
TEXT SEQUENCE GENERATING METHOD AND APPARATUS, DEVICE AND MEDIUM
Embodiments of the present disclosure disclose a text sequence generating method and apparatus, a device and a medium. The method includes: obtaining an initial text sequence, extracting an initial text feature from the initial text sequence; obtaining a structured text sequence, and extracting a structured feature from the structured text sequence, where the structured text sequence is associated with a fact in the initial text sequence; and fusing and generating a target text sequence based on the initial text feature and the structured feature.