Patent classifications
G10L2015/227
MULTI-TIER SPEECH PROCESSING AND CONTENT OPERATIONS
A multi-tier architecture is provided for processing user voice queries and making routing decisions for generating responses, including responses to book browsing requests and other content requests. When an utterance is associated with multiple applications in a given domain, the applications may be organized into a subdomain and a tier of routing decisions may be added to the inter-domain and intra-domain routing decision system. The system uses contextual signals to make subdomain routing decisions, including signals regarding content items that are already in a user's content catalog, consumption status of individual content items in the user's catalog, and the like
MULTI-DOMAIN INTENT HANDLING WITH CROSS-DOMAIN CONTEXTUAL SIGNALS
A multi-tier domain is provided for processing user voice queries and making routing decisions for generating responses, including for user voice queries that include multi-domain trigger words or phrases. When an utterance is recognized as different intents in different domains, a routing system for a domain may consider contextual signals, including those associated with other domains, to determine whether the domain is the proper one to handle the request. This determination can be performed with a statistical model specifically trained to make such determinations using the available contextual data.
Apparatus, systems and methods for determining a commentary rating
Commentary rating determination systems and methods determine a commentary rating for commentary about a subject media content event that has been generated by a community member. An exemplary embodiment receives video information acquired by a 360° video camera, identifies a physical object from the received video information, determines a physical attribute associated with the identified physical object, wherein the determined physical attribute describes a characteristic of the identified physical object, compares the determined physical attribute of the identified physical object with a plurality of predefined physical object attributes stored in a database, and in response to identifying one of the plurality of predefined physical object attributes that matches the determined physical attribute, associates the quality value of the identified one of the plurality of predefined physical object attributes with the identified physical object. Then, the commentary rating is determined for the commentary based on the associated quality value.
Electronic device for processing user utterance and controlling method thereof
A system includes at least one communication interface, at least one processor operatively connected to the at least one communication interface, and at least one memory operatively connected to the at least one processor and storing a plurality of natural language understanding (NLU) models. The at least one memory stores instructions that, when executed, cause the processor to receive first information associated with a user from an external electronic device associated with a user account, using the at least one communication interface, to select at least one of the plurality of NLU models, based on at least part of the first information, and to transmit the selected at least one NLU model to the external electronic device, using the at least one communication interface such that the external electronic device uses the selected at least one NLU model for natural language processing.
Noise data augmentation for natural language processing
Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.
Terminal device, server and controlling method thereof
A terminal device is provided and includes a communication interface including circuitry, a display and at least one processor configured to control the communication interface to transmit a user voice including a plurality of intents to an external server, based on word use information included in the user voice and summary information regarding the user voice generated based on user-related information being received from the external server, control the display to display the received summary information, based on a user feedback regarding the summary information being input, transmit information regarding the user feedback to the external server, and based on response information regarding the user voice generated based on the user feedback being received from the external server, control the display to provide the response information.
Multicomputer System Providing Voice Enabled Event Processing
Arrangements for voice enabled event processing are provided. In some aspects, a self-service kiosk may detect a mobile device of a user and a connection may be established between the self-service kiosk and the mobile device. The user may request, via natural language data input, processing of an event, such as a transaction. The natural language data input may be captured by the mobile device of the user and transmitted to the self-service kiosk or other processing device. The natural language input may be processed to identify the requested event. Based on the processed natural language data, an event processing request may be generated. Based on processing the event, one or more event processing commands may be generated. The event processing commands may be executed to perform one or more functions associated with completion of the event processing (e.g., distributing funds, activating a deposit receptacle, or the like).
Provision of targeted advertisements based on user intent, emotion and context
An electronic device and method are disclosed herein. The electronic device includes a microphone, a camera, an output device, a memory, and a processor. The processor implements the method, including receiving a voice input and/or capturing an image, and analyze the first voice input or the image to determine at least one of a user's intent, emotion, and situation based on predefined keywords and expressions, identifying a category based on the input, selecting first information based on the category, selecting and outputting a first query prompting confirmation of output of the first information, detect a first responsive input to the first query, and when a condition to output the first information is satisfied, output a second query, detecting a second input responsive to the second query, and selectively outputting the first information based on the second input.
Apparatus control system and apparatus control method
A voice inputting device inputs a voice operation of a user, and transmits voice data based on the voice operation to a first cloud server. The first cloud server receives the voice data from the voice inputting device, analyzes the received voice data, and determines an operational skill level of the user and the details of the voice operation. A second cloud server generates a control command for an air conditioner based on the operational skill level and the details of the voice operation determined by the first cloud server, and transmits the generated control command to the air conditioner.
Dynamic multilingual speech recognition
A method, computer program product, and a system where a processor(s), monitors multilingual switches performed on a client on behalf of a given user. Based on the monitoring, the processor(s) identifies switch patterns of the given user to generate a service profile for the user of machine learned multilingual switch patterns for the given user. The processor(s) determines a priority order for languages comprising the voice input streams, for the given user. The processor(s) obtains a new translation request initiated by the client, on behalf of the given user and applies the priority order to identify one or more languages spoken in a voice input stream of the new translation request. The processor(s) transmits indicators of the identified one or more languages to the client, where upon receiving the indicators, the client translates the voice input stream from the identified one or more languages to one or more target languages.