G10L2015/227

Electronic device and method for controlling the same

An electronic device is provided. The electronic device includes a microphone to receive audio, a communicator, a memory configured to store computer-executable instructions, and a processor configured to execute the computer-executable instructions. The processor is configured to determine whether the received audio includes a predetermined trigger word; based on determining that the predetermined trigger word is included in the received audio; activate a speech recognition function of the electronic device; detect a movement of a user while the speech recognition function is activated; and based on detecting the movement of the user, transmit a control signal, to a second electronic device to activate a speech recognition function of the second electronic device.

Agent system, and, information processing method

An agent system includes: a recognizer configured to recognize speech including speech contents of an occupant in a mobile object; an acquirer configured to acquire an image including the occupant; and an estimator configured to compare wording included in the speech contents of the occupant recognized by the recognizer with unclear information which is stored in a storage and includes wording making the speech contents unclear, to estimate a first direction which is a sight direction of the occupant or a second direction which is indicated by the occupant on the basis of the image acquired by the acquirer when the speech contents of the occupant includes unclear wording, and to estimate an object which is located in the estimated first direction or the estimated second direction. The recognizer is configured to recognize the speech contents of the occupant on the basis of the object estimated by the estimator.

Method and system for completing an operation

A computer server system comprises a communications module; a processor coupled with the communications module; and a memory coupled to the processor and storing processor-executable instructions which, when executed by the processor, configure the processor to receive, via the communications module and from a server associated with a first device, a request to perform an operation; determine that the first device cannot perform the operation; send, via the communications module and to the server associated with the first device, a signal causing the first device to output a message indicating that the first device cannot perform the operation and requesting authentication from a second device; receive, via the communications module and from the second device, a signal including authentication information; and send, via the communications module and to the second device, a signal including a selectable option to perform the operation.

Sentiment aware voice user interface

Described herein is a system for responding to a frustrated user with a response determined based on spoken language understanding (SLU) processing of a user input. The system detects user frustration and responds to a repeated user input by confirming an action to be performed or presenting an alternative action, instead of performing the action responsive to the user input. The system also detects poor audio quality of the captured user input, and responds by requesting the user to repeat the user input. The system processes sentiment data and signal quality data to respond to user inputs.

Determining input for speech processing engine

A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.

Communication system and method of extracting emotion data during translations
11587561 · 2023-02-21 ·

A communication system is provided that generates human emotion metadata during language translation of verbal content. The communication system includes a media control unit that is coupled to a communication device and a translation server that receive verbal content from the communication device in a first language. An adapter layer having a plurality of filters determines emotion associated with the verbal content, wherein the adapter layer associates emotion metadata with the verbal content based on the determined emotion. The plurality of filters may include user-specific filters and non-user-specific filters. An emotion lexicon is provided that links an emotion value to the corresponding verbal content. The communication system may include a display that graphically displays emotions alongside the corresponding verbal content.

Dialogue method, dialogue system, dialogue apparatus and program

It is an object of the present invention to promote a user's understanding or agreement, and to cause a dialogue to last long. A dialogue system 100 conducts a dialogue with a user 101. A humanoid robot 50-1 presents a first utterance which is a certain utterance. When the user 101 performs an action indicating that the user cannot understand the first utterance or it is predicted that the user performs an action indicating that the user cannot understand the first utterance or when the user does not perform any action indicating that the user can understand the first utterance, or it is predicted that the user will not perform any action indicating that the user can understand the first utterance, then the humanoid robot 50-1 presents a second utterance which is at least one utterance resulting from paraphrasing the contents of the first utterance.

Dynamic adjustment of story time special effects based on contextual data

The disclosure provides technology for enabling a computing device to provide context sensitive special effects that supplement a text source as it is read aloud. An example method includes receiving, by a processing device, audio data comprising a spoken word of a user, analyzing contextual data associated with the user, determining a match between the audio data and data of a text source; and initiating a physical effect in response to the determining the match, wherein the physical effect corresponds to the text source and is based on the contextual data.

Voice recognition function link control system and method of vehicle

A voice recognition function link control system of a vehicle, which is configured for mounting a smart speaker used in the home or office in the vehicle and utilizing the smart speaker in linkage with an infotainment system of the vehicle, includes a traffic management system server, an infotainment system, a content service provider system server, and a smart speaker for receiving and transmitting a voice command of any user to the content service provider system server, receiving specific content from the content service provider system server, and outputting the received specific content.

Apparatus and method for providing voice assistant service
11501755 · 2022-11-15 · ·

Provided are an electronic device and method for providing a voice assistant service. The method, performed by the electronic device, of providing the voice assistant service includes: obtaining a voice of a user; obtaining voice analysis information of the voice of the user by inputting the voice of the user to a natural language understanding model; determining whether a response operation with respect to the voice of the user is performable, according to a preset criterion, based on the obtained voice analysis information; and based on the determining that the response operation is not performable, outputting a series of guide messages for learning the response operation related to the voice of the user.