Patent classifications
G10L2015/228
ELECTRONIC DEVICE FOR PROVIDING UPDATE INFORMATION THROUGH AN ARTIFICIAL INTELLIGENCE AGENT SERVICE
An electronic device is provided and includes a processor and a memory operatively connected to the processor. The memory stores instructions that cause, when executed, the processor to acquire a first assistant result including data indicative of a first intent understood from an utterance of a first user, data indicative of an attribute of the utterance, first information provided to a terminal of the first user as a response of an artificial intelligence (AI) agent to the utterance, and a first parameter indicative of an attribute of the first information, to recognize the utterance as an information request utterance, based on the first intent and the attribute, and to track second information to be provided to the first user terminal as update information for the first information, based on the utterance being recognized as the information request utterance and the first parameter being identified as a specified type.
Voice user interface for intervening in conversation of at least one user by adjusting two different thresholds
An electronic device is provided. The electronic device includes a memory configured to store at least one instruction, and at least one processor where the at least one processor is configured to execute the instruction to obtain voice data from a conversation of at least one user, convert the voice data to text data, determine at least one parameter indicating characteristic of the conversation based on at least one of the voice data or the text data, adjust a condition for triggering intervention in the conversation based on the determined at least one parameter, and output a feedback based on the text data when the adjusted condition is satisfied, wherein the adjustment of the condition includes adjusting a first and a second threshold based on change of the at least one parameter.
Agent system, and, information processing method
An agent system includes: a recognizer configured to recognize speech including speech contents of an occupant in a mobile object; an acquirer configured to acquire an image including the occupant; and an estimator configured to compare wording included in the speech contents of the occupant recognized by the recognizer with unclear information which is stored in a storage and includes wording making the speech contents unclear, to estimate a first direction which is a sight direction of the occupant or a second direction which is indicated by the occupant on the basis of the image acquired by the acquirer when the speech contents of the occupant includes unclear wording, and to estimate an object which is located in the estimated first direction or the estimated second direction. The recognizer is configured to recognize the speech contents of the occupant on the basis of the object estimated by the estimator.
Dialogue system and dialogue processing method
A dialogue system includes: a storage configured to store a parameter tree including at least one parameter used for performing an action; a speech input device configured to receive speech from a user; an input processor configured to apply a natural language understanding algorithm to the received speech to generate a speech recognition result; a dialogue manager configured to determine an action corresponding to the received speech based on the speech recognition result, to retrieve a parameter tree corresponding to the action from the storage, and to determine additional information needed to perform the action based on the retrieved parameter tree; and a result processor configured to generate a dialogue response for requesting the additional information.
Resolving natural language ambiguities with respect to a simulated reality setting
The present disclosure relates to resolving natural language ambiguities with respect to a simulated reality setting. In an exemplary embodiment, a simulated reality setting having one or more virtual objects is displayed. A stream of gaze events is generated from the simulated reality setting and a stream of gaze data. A speech input is received within a time period and a domain is determined based on a text representation of the speech input. Based on the time period and a plurality of event times for the stream of gaze events, one or more gaze events are identified from the stream of gaze events. The identified one or more gaze events is used to determine a parameter value for an unresolved parameter of the domain. A set of tasks representing a user intent for the speech input is determined based on the parameter value and the set of tasks is performed.
Electronic apparatus including control command identification tool generated by using a control command identified by voice recognition identifying a control command corresponding to a user voice and control method thereof
An electronic apparatus is provided. The electronic apparatus includes a microphone, a transceiver, a memory configured to store a control command identification tool based on a control command identified by a voice recognition server that performs voice recognition processing on a user voice received from the electronic apparatus, and at least one processor configured to, based on the user voice being received through the microphone, acquire user intention information by performing the voice recognition processing on the received user voice, receive status information of external devices related to the acquired user intention information from a device control server, identify a control command for controlling a device to be controlled among the plurality of external devices by applying the acquired user intention information and the received status information of the external devices to the control command identification tool, and transmit the identified control command to the device control server.
Development of voice and other interaction applications
Among other things, a developer of an interaction application for an enterprise can create items of content to be provided to an assistant platform for use in responses to requests of end-users. The developer can deploy the interaction application using defined items of content and an available general interaction model including intents and sample utterances having slots. The developer can deploy the interaction application without requiring the developer to formulate any of the intents, sample utterances, or slots of the general interaction model.
Method and apparatus with speech processing
Disclosed is a method and apparatus for processing a speech. The method includes obtaining context information from a speech signal of a user using a neural network-based encoder, determining intent information of the speech signal based on the context information, determining, based on the context information, attention information corresponding to a segment included in the speech signal, and determining, based on the attention information, a segment value of the segment by recognizing, using a decoder, a portion of the context information identified as corresponding to the segment.
Automated conversation content items from natural language
A conversation augmentation system can automatically augment a conversation with content items based on natural language from the conversation. The conversation augmentation system can select content items to add to the conversation based on determined user “intents” generated using machine learning models. The conversation augmentation system can generate intents for natural language from various sources, such as video chats, audio conversations, textual conversations, virtual reality environments, etc. The conversation augmentation system can identify constraints for mapping the intents to content items or context signals for selecting appropriate content items. In various implementations, the conversation augmentation system can add selected content items to a storyline the conversation describes or can augment a platform in which an unstructured conversation is occurring.
Natural language input routing
Techniques for performing runtime ranking of skill components are described. A skill developer may generate a rule indicating a skill component is to be invoked at runtime when a natural language input corresponds to a specific context. At runtime, a virtual assistant system may implement a machine learned model to generate an initial ranking of skill components. Thereafter, the virtual assistant system may use skill component-specific rules to adjust the initial ranking, and this second ranking is used to determine which skill component to invoke to respond to the natural language input. Overtime, if a rule results in beneficial user experiences, the virtual assistant system may incorporate the rule into the machine learned model.