Patent classifications
G06F40/237
FACETED NAVIGATION
A method includes extracting a set of candidate keywords from clickstream data and natural language processing of product text for a plurality of search queries. The set of candidate keywords are filtered based on the clickstream data. The set of candidate keywords as filtered are ranked based on the clickstream data. The set of candidate keywords as ranked are clustered to remove near duplicates. The set of candidate keywords as ranked for a respective search query is output.
Translation Method, Apparatus and Storage Medium
The present disclosure provides a translation method and apparatus, an electronic device, and a non-transitory storage medium. An implementation includes: determining an encoded feature of a sentence to be translated by an encoding module; determining, by a graph network module, a knowledge fusion feature of the sentence to be translated based on a preset graph network, wherein the preset graph network is constructed based on a polysemous word in a source language corresponding to the sentence to be translated and a plurality of translated words corresponding to the polysemous word in a target language; determining, by a decoding network, a translated sentence corresponding to the sentence to be translated based on the encoded feature and the knowledge fusion feature.
SMART DATASET COLLECTION SYSTEM
Datasets are available from different dataset servers and often lack well-defined metadata. Thus, comparing datasets is difficult. Additionally, there might be different versions of the same dataset which makes the search even more difficult. Using systems and methods described herein, quality scores, dataset versioning, topic identification, and semantic relatedness metadata is stored about datasets stored on dataset servers. A user interface is provided to allow a user to search for datasets by specifying search criteria (e.g., a topic and a minimum quality score) and to be informed of responsive datasets. The user interface may further inform the user of the quality scores of the responsive datasets, the versions of the responsive datasets, or other metadata. From the search results, the user may select and download one or more of the responsive datasets.
SYSTEMS AND METHODS FOR DETERMINING THE SHAREABILITY OF VALUES OF NODE PROFILES
The present disclosure relates to determining the shareability of values of node profiles. Record objects and electronic activities of a system of record corresponding to a data source provider may be accessed. Each record object may correspond to a record object type and have one or more object field-value pairs. Node profiles may be maintained. Values of fields corresponding to a predetermined type of field including fewer than a predetermined threshold number of data source providers may be identified. A restriction tag used to restrict populating other node profiles may be generated. Provision of the value with a second data source provider may be restricted.
SYSTEMS AND METHODS FOR DETERMINING THE SHAREABILITY OF VALUES OF NODE PROFILES
The present disclosure relates to determining the shareability of values of node profiles. Record objects and electronic activities of a system of record corresponding to a data source provider may be accessed. Each record object may correspond to a record object type and have one or more object field-value pairs. Node profiles may be maintained. Values of fields corresponding to a predetermined type of field including fewer than a predetermined threshold number of data source providers may be identified. A restriction tag used to restrict populating other node profiles may be generated. Provision of the value with a second data source provider may be restricted.
Enhancing ASR System Performance for Agglutinative Languages
A training-stage technique trains a language model for use in an ASR system. The technique includes: obtaining a training corpus that includes a sequence of terms; determining that an original term in the training corpus is not present in a dictionary resource; segmenting the original term into two or more sub-terms using a segmentation resource; determining that the segmentation of the original term into the two or more sub-terms is a valid segmentation, based on two or more validity tests; and training the language model based on the terms that have been identified. A computer-implemented inference-stage technique applies the language model to produce ASR output results. The inference-stage technique merges a sub-term with a preceding term if these two terms are separated by no more than a prescribed interval of time.
Classifier Determination through Label Function Creation and Unsupervised Learning
Software architectures relating to machine learning (e.g., relating to classifying sequential text data. Unlabeled sequential text data may be produced by a variety of sources such as text messages, email messages, message chats, social media applications, and web pages. Classifying such data may be difficult due to the freeform and unlabeled nature of text data from these sources. Thus, techniques for training a machine learning algorithm to classify unlabeled text data in freeform format. Training is based on generation of labelling functions from lexical databases, applying the labelling functions to unlabeled text data in an unsupervised manner, and generating trained classifiers that accurately classify the unlabeled text data. The trained classifiers may then be implemented classify text data accessed from the variety of sources. The present techniques provide high-quality and efficient labeling of unlabeled text data in freeform formats.
AUTOMATIC REPLACEMENT OF MEDIA CONTENT ASSOCIATED WITH A REAL-TIME BROADCAST
A method for automatically replacing a first type of content associated with a real-time broadcast with a second type of content is provided. The method may include automatically parsing media content associated with the real-time broadcast and assigning timecode to the parsed media content. The method may further include determining whether the parsed media content includes the first type of content. The method may further include, in response to determining the parsed media content includes the first type of content, automatically determining a context associated with the first type of content. The method may further include automatically identifying the second type of content that matches the determined context. The method may also include, automatically replacing the first type of content in the parsed media content with the second type of content. The method may further include automatically presenting the real-time broadcast with the second type of content.
AUTOMATIC REPLACEMENT OF MEDIA CONTENT ASSOCIATED WITH A REAL-TIME BROADCAST
A method for automatically replacing a first type of content associated with a real-time broadcast with a second type of content is provided. The method may include automatically parsing media content associated with the real-time broadcast and assigning timecode to the parsed media content. The method may further include determining whether the parsed media content includes the first type of content. The method may further include, in response to determining the parsed media content includes the first type of content, automatically determining a context associated with the first type of content. The method may further include automatically identifying the second type of content that matches the determined context. The method may also include, automatically replacing the first type of content in the parsed media content with the second type of content. The method may further include automatically presenting the real-time broadcast with the second type of content.
SYSTEM AND METHOD FOR K-NUGGET DISCOVERY AND RETROFITTING FRAMEWORK AS SELF-ORGANISING TREE ALGORITHM (SOTA) FACTORY
The present disclosure provides a system (110) for retrofitting words represented using the vectors for Natural Language Processing (NLP) models and a streamlined process which is an ideal pipeline for any NLP tasks. The system (110) may discover the user meta data or k-nuggets in five stages for retrofitting and stacking the retrofitted embeddings. Further, the system (110) may use the retrofitted embeddings for NLP Tasks. The five stages of the k-nugget discovery pipeline are Lexical, Syntactic, Semantic, transactional, and language agnostic stages for retrofitting the word embeddings. The proposed embedding layer is replaced with the retrofitted embedding which may be obtained after the fifth stage and improved performance can be achieved. To validate the approach, the K-nugget discovery pipeline has been tested on the SemEval (Hinglish and English Tweet dataset) and HOT dataset (Hinglish Tweet dataset) and to achieve state of the art results on the test dataset.