Patent classifications
G06F40/237
Word vector changing device, method, and program
To arrange all words so that the distance of a given word pair will be appropriate. Using as input a concept base 22 which is a set of pairs of a word and a vector representing a concept of the word, and a dictionary 24 which is a set of semantically distant or close word pairs, when a word pair C being a pair of given words A, B in the concept base 22 is present in the dictionary 24, conversion means 30 associates with the word pair C a magnitude D of a difference vector between a difference vector V′ between a converted vector of the word A and a converted vector of the word B, and a vector kV determined by multiplying a difference vector V between the vector of the word A in the concept base 22 and the vector of the word B in the concept base 22 by a scalar value k. When the word pair C is not present in the dictionary 24, the conversion means 30 associates the magnitude D of the difference vector between the difference vector V′ and the difference vector V with the word pair C. The conversion means 30 converts the vector of a given word in the concept base 22 such that a total sum of the magnitude D corresponding to every word pair C is minimized.
Learning to select vocabularies for categorical features
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining, for each of one or more categorical features, a respective vocabulary of categorical feature values of the categorical feature that should be active during processing of inputs by a machine learning model. In one aspect, a method comprises: generating a batch of output sequences, each output sequence in the batch specifying, for each of the categorical features, a respective vocabulary of categorical feature values of the categorical feature that should be active; for each output sequence in the batch, determining a performance metric of the machine learning model on a machine learning task after the machine learning model has been trained to perform the machine learning task with only the respective vocabulary of categorical feature values of each categorical feature specified by the output sequence being active.
Unusual score generators for a neuro-linguistic behavorial recognition system
Techniques are disclosed for generating anomaly scores for a neuro-linguistic model of input data obtained from one or more sources. According to one embodiment, generating anomaly scores includes receiving a stream of symbols generated from an ordered stream of normalized vectors generated from input data received from one or more sensor devices during a first time period. Upon receiving the stream of symbols, generating a set of words based on an occurrence of groups of symbols from the stream of symbols, determining a number of previous occurrences of a first word of the set of words, determining a number of previous occurrences of words of a same length as the first word, and determining a first anomaly score based on the number of previous occurrences of the first word and the number of previous occurrences of words of the same length as the first word.
Unusual score generators for a neuro-linguistic behavorial recognition system
Techniques are disclosed for generating anomaly scores for a neuro-linguistic model of input data obtained from one or more sources. According to one embodiment, generating anomaly scores includes receiving a stream of symbols generated from an ordered stream of normalized vectors generated from input data received from one or more sensor devices during a first time period. Upon receiving the stream of symbols, generating a set of words based on an occurrence of groups of symbols from the stream of symbols, determining a number of previous occurrences of a first word of the set of words, determining a number of previous occurrences of words of a same length as the first word, and determining a first anomaly score based on the number of previous occurrences of the first word and the number of previous occurrences of words of the same length as the first word.
Natural language processing for mapping dependency data and parts-of-speech to group labels
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for information extraction using natural language processing. One of the methods includes determining, for one or more tokens from a plurality of tokens that represent an unstructured sentence, a token type from a plurality of predetermined token types that indicates an element type for a phrase that corresponds to the token and has one or more properties using dependency data and a part-of-speech label for the token; assigning, for a token whose associated dependency data indicates that the token has a child, data for the child token to one of the one or more properties for the token type of the token; and providing, for use by a downstream semantic system and for the token, a textual representation of the phrase for the token and the phrases for one or more of the child tokens.
Detecting cross-lingual comparable listings
In various example embodiments, a system and method for a Listing Engine that translates a first listing from a first language to a second language. The first listing includes an image(s) of a first item. The Listing Engine provides as input to an encoded neural network model a portion(s) of a translated first listing and a portions(s) of a second listing in the second language. The second listing includes an image(s) of a second item. The Listing Engine receives from the encoded neural network model a first feature vector for the translated first listing and a second feature vector for the second listing. The first and the second feature vectors both include at least one type of image signature feature and at least one type of listing text-based feature. Based on a similarity score of the first and second feature vectors at least meeting a similarity score threshold, the Listing Engine generates a pairing of the first listing in the first language with the second listing in the second language for inclusion in training data of a machine translation system.
Methods, apparatuses, devices, and computer-readable storage media for determining category of entity
According to embodiments of the present disclosure, a method, an apparatus, a device, and a computer-readable storage medium for determining a category of an entity are provided. The method includes: based on a suffix of the entity, obtaining a suffix feature associated with the suffix; determining one or more candidate categories of the entity based on a name of the entity; and determining a set of categories of the entity based on the one or more candidate categories and the suffix feature.
Contextualizing searches in a collaborative session
A computer-implemented method, computer system, and computer program product for contextualizing searches in a collaborative session having two or more users. The method may include generating, by a processor, one or more keywords from user context sources of the collaborative session. Users engaged in the collaborative session may use computing devices interconnected with each other via a collaborative tool. The user context source may comprise a document, a file, a webpage, a search history, or an application. Context of the collaborative session, having a start and stop, may be established using a natural language processing system. The method may include adding one of the one or more keywords to the search string of one of the users participating in the collaborative session. In some embodiments, one user may be an expert user whose user context source may be the only user context source collected and analyzed during the collaborative session.
Named entity disambiguation using entity distance in a knowledge graph
According to an embodiment, a method includes converting a knowledge base into a graph. In this embodiment, the knowledge base contains a plurality of entities and specifies a plurality of relationships among the plurality of entities, and entities in the knowledge base correspond to vertices in the graph, and relationships between entities in the knowledge base correspond to edges between vertices in the graph. The method may also include extracting a plurality of vertex embeddings from the graph. An example vertex embedding of the plurality of vertex embeddings represents, for a particular vertex, a proximity of the particular vertex to other vertices of the graph. Further, the method may include performing, based at least in part on the plurality of vertex embeddings, entity linking between input text and the knowledge base.
Discovering ranked domain relevant terms using knowledge
One embodiment of the invention provides a method for terminology ranking for use in natural language processing. The method comprises receiving a list of terms extracted from a corpus, where the list comprises a ranking of the terms based on frequencies of the terms across the corpus. The method further comprises accessing a domain ontology associated with the corpus, and re-ranking the list based on the domain ontology. The resulting re-ranked list comprises a different ranking of the terms based on relevance of the terms using knowledge from the domain ontology. The method further comprises generating clusters of terms via a trained model adapted to the corpus, and boosting a rank of at least one term of the re-ranked list based on the clusters to increase a relevance of the at least one term using knowledge from the trained model.