Patent classifications
G06F16/3349
Method of and server for generating meta-feature for ranking documents
A method and a server for generating a meta-feature for ranking documents by a machine learning algorithm (MLA). A past query having been previously submitted on a server is acquired, and a set of past documents having been presented as search results in response to the past query is acquired, where each respective document includes a plurality of features, and respective values for the plurality of features. The meta-feature is generated, where a respective value of the meta-feature for a respective document is based on: a respective value of a given feature of the plurality of features for the respective document, and a value of a parameter associated with the set of past documents. The meta-feature is validated based on its usefulness for ranking future search engine results pages (SERPs). The MLA is then trained to generate the meta-feature for ranking documents in response to a new query.
ITERATIVE APPLICATION OF A MACHINE LEARNING-BASED INFORMATION EXTRACTION MODEL TO DOCUMENTS HAVING UNSTRUCTURED TEXT DATA
An apparatus comprises a processing device configured to receive a query to extract information from a document, and to perform two or more iterations of utilizing a machine learning-based information extraction model to extract portions of unstructured text data from the document. In each iteration, a portion of the unstructured text data extracted from the document and an associated relevance score are output. In a first iteration, the query and document are input while in subsequent iterations the query and modified versions of the document are input, the modified versions having previously-extracted portions of the unstructured text data removed therefrom. The processing device is also configured to generate a response to the query comprising a subset of the portions of the unstructured text data extracted from the document determined to have associated relevance scores exceeding a threshold relevance score and at least a threshold level of similarity to the query.
Self-adapting resource aware phrase indexes
In one aspect, there is provided a method. The method may include adding a tuple phrase to a phrase index as part of a lazy creation of the phrase index. The adding may include adding the tuple phrase to a dictionary mapped to the phrase index, adding to the phrase index a document identifier and a positional information for the tuple phrase, and adapting a first threshold and a second threshold value, the first threshold representing a threshold lower estimate of query processing burden associated with a least burdensome tuple phrase in the dictionary and the second threshold representing a top estimate of query processing burden associated with a most burdensome tuple phrase in the dictionary. The method may further include sending a query response including a generated query result corresponding to the tuple phrase.
INFORMATION PROCESSING APPARATUS AND NON-TRANSITORY COMPUTER READABLE MEDIUM
An information processing apparatus includes a processor configured to display a screen for specifying one or more search conditions of a search. The screen presents one or more search histories. Each search history of the one or more search histories presents one or more previous search conditions used for a previous search, together with an attribute of a file on which a user performed an operation following the previous search.
Neural network system for text classification
A computer-implemented method is provided to perform text classification with a neural network system. The method includes providing a computing device to receive input datasets including user input question text and feed the datasets to the neural network system. The neural network system includes one or more neural networks configured to extract and concatenate character-based features, word-based features from the question datasets and clickstream embeddings of clickstream data to form a representation vector indicative of the question text and user behavior. A representation vector is fed into fully connected layers of a feed-forward network. The feed-forward network is configured to predict a first class and a second class associated with respective user input questions based on the representation vector.
DOCUMENT PRE-PROCESSING FOR QUESTION-AND-ANSWER SEARCHING
Disclosed are methods, systems, devices, apparatus, media, design structures, and other implementations, including a method that includes receiving a source document, applying one or more pre-processes to the source document to produce contextual information representative of the structure and content of the source document, and transforming the source document, based on the contextual information, to generate a question-and-answer searchable document.
SYSTEMS AND METHODS FOR DATA CURATION IN A DOCUMENT PROCESSING SYSTEM
Disclosed are implementations that include a method including obtaining at a local device, a question-and-answer data item representative of a question-and-answer pair relating to source content of a source document, with the question-and-answer data item being associated with question-and-answer history information representative of historical information and access information for the question-and-answer data item, and determining based on the question-and-answer history information whether to perform data refinement operations on the question-and-answer data item. The method further includes selecting, in response to a determination that the question-and-answer data item is to be refined, one of a plurality of users according to one or more matching criteria relating, at least in part, to subject-matter associated with the question-and-answer data item and expertise information respectively associated with the plurality of users, and communicating at least the question-and-answer data item to a destination device associated with the selected one of the plurality of users.
DOCUMENT PROCESSING AND RESPONSE GENERATION SYSTEM
- David Nahamoo ,
- Igor Roditis Jablokov ,
- Vaibhava Goel ,
- Etienne Marcheret ,
- Ellen Eide Kislal ,
- Steven John Rennie ,
- Marie Wenzel Meteer ,
- Soonthorn Ativanichayaphong ,
- Joseph Allen Pruitt ,
- John Pruitt ,
- Bryan Dempsey ,
- Rangachari Anand ,
- Meghan Hickey ,
- Ajinkya Zadbuke ,
- Neil Rohit Mallinar ,
- Lucas Siler ,
- Gregory Pelton ,
- Chul Sung ,
- Matthew Brian Cobb ,
- Carmi Joseph Rothberg
Disclosed are methods, systems, devices, apparatus, media, design structures, and other implementations, including a method that includes receiving, at a local device from a remote device, query data representative of a question relating to source content of a source document of a repository of a plurality of source documents, with the source content being associated with transformed content accessible from the local device. The method further includes generating, in response to determination, at the local device, of a match between the query data and at least one portion of the transformed content, output data comprising one or more of a pointer to access, in the source document accessible from the remote device, at least one portion of the source document corresponding to the at least one portion of the transformed content, or a copy of the at least one portion of the source document.
FORM TEMPLATE MATCHING TO POPULATE FORMS DISPLAYED BY CLIENT DEVICES
A server includes a memory and a processor to receive from a client device a screenshot of an application page from an application. The application page includes a form requiring data to be filled in by a user of the client device. A form template is extracted from the screenshot, with the extracted form template not including form field values. The extracted form template is compared to a private form template database for a match. The private form template database includes private form templates from different applications, with each private form template having form field values previously filled in for the user. Form field values from a matched private form template are provided to the client device for the client device to populate the form in the screenshot.
Sequence to sequence conversational query understanding
Systems and techniques for sequence to sequence conversational query understanding are described herein. A query may be received that includes multiple words. It may be identified that the query is to be reformulated based on an attention value for an attention word in the query. Relationships may be determined among words of the query and words in a previously submitted query and words in results from the previously submitted query. The query may be reformulated based on the relationships. The reformulated query may be employed to retrieve query results.