G06F40/216

Revealing content reuse using coarse analysis

Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime.

Revealing content reuse using coarse analysis

Systems and methods for managing content provenance are provided. A network system accesses a plurality of documents. The plurality of documents is then hashed to identify one or more content features within each of the documents. In one embodiment, the hash is a MinHash. The network system compares the content features of each of the plurality of documents to determine a similarity score between each of the plurality of documents. In one embodiment, the similarly score is a Jaccard score. The network system then clusters the plurality of documents into one or more clusters based on the similarity score of each of the plurality of documents. In one embodiment, the clustering is performed using DBSCAN. DBSCAN can be iteratively performed with decreasing epsilon values to derive clusters of related but relatively dissimilar documents. The clustering information associated with the clusters are stored for use during runtime.

COMPUTER DATA SYSTEM DATA SOURCE REFRESHING USING AN UPDATE PROPAGATION GRAPH

Described are methods, systems and computer readable media for data source refreshing.

ASSISTING ENTITIES IN RESPONDING TO A REQUEST OF A USER
20180013699 · 2018-01-11 ·

A third-party service may be used to assist entities in responding to requests of users. A third-party service may receive, directly or indirectly, a request of a first user for assistance from a first entity. The third-party service may request information about the first user by sending a request to a computer of the first entity. The third-party service may use the request of the first user and the information about the first user to automatically generate a response to the request of the first user. The third-party service may then transmit, directly or indirectly, the response to the first user.

Table item information extraction with continuous machine learning through local and global models

A bipartite application implements a table auto-completion (TAC) algorithm on the client side and the server side. A client module runs a local model of the TAC algorithm on a user device and a server module runs a global model of the TAC algorithm on a server machine. The local model is continuously adapted through on-the-fly training, with as few as a negative example, to perform TAC on the client side, one document at a time. Knowledge thus learned by the local model is used to improve the global model on the server side. The global model can be utilized to automatically and intelligently extract table information from a large number of documents with significantly improved accuracy, requiring minimal human intervention even on complex tables.

Table item information extraction with continuous machine learning through local and global models

A bipartite application implements a table auto-completion (TAC) algorithm on the client side and the server side. A client module runs a local model of the TAC algorithm on a user device and a server module runs a global model of the TAC algorithm on a server machine. The local model is continuously adapted through on-the-fly training, with as few as a negative example, to perform TAC on the client side, one document at a time. Knowledge thus learned by the local model is used to improve the global model on the server side. The global model can be utilized to automatically and intelligently extract table information from a large number of documents with significantly improved accuracy, requiring minimal human intervention even on complex tables.

Dynamic and unscripted virtual agent systems and methods

Systems and methods that offer significant improvements to current chatbot conversational experiences are disclosed. The proposed systems and methods are configured to manage conversations in real-time with human customers based on a dynamic and unscripted conversation flow with a virtual assistant. In one embodiment, a knowledge graph or domain model represents the sole or primary source of information for the virtual assistant, thereby removing the reliance on any form of conversational modelling. Based on the information provided by the knowledge graph, the virtual agent chatbot will be equipped to answer customer queries, as well as demonstrate reasoning, offering customers a more natural and efficacious dialogue experience.

Method and system for suggesting revisions to an electronic document

A method for suggesting revisions to a document-under-analysis from a seed database, the seed database including a plurality of original texts each respectively associated with one of a plurality of final texts, the method for suggesting revisions including selecting a statement-under-analysis (“SUA”), selecting a first original text of the plurality of original texts, determining a first edit-type classification of the first original text with respect to its associated final text, generating a first similarity score for the first original text based on the first edit-type classification, the first similarity score representing a degree of similarity between the SUA and the first original text, selecting a second original text of the plurality of original texts, determining a second edit-type classification of the second original text with respect to its associated final text, generating a second similarity score for the second original text based on the second edit-type classification, the second similarity score representing a degree of similarity between the SUA and the second original text, selecting a candidate original text from one of the first original text and the second original text, and creating an edited SUA (“ESUA”) by modifying a copy of the first SUA consistent with a first candidate final text associated with the first candidate original text.

SPEECH ENDPOINTING BASED ON WORD COMPARISONS

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.

METHOD AND APPARATUS FOR DETECTING ANOMALIES IN MISSION CRITICAL ENVIRONMENTS

A method including isolating a protocol language of a data set comprising a text structure representing data regarding a network communication procedure between a plurality of user devices, wherein the protocol language comprises a pattern for implementing the network communication procedure; generating a document from the data set, wherein the document includes a text structure, organizing, in light of the protocol language, the text structure into a natural language scheme; and detecting, using the natural language scheme, insights in the document.