G10L2015/225

MULTI-USER VOICE ASSISTANT WITH DISAMBIGUATION

Disambiguating question answering responses by receiving voice command data associated with a first user, determining a first user identity according to the first user voice command data, determining a first user activity context according to the first user voice command data, determining a first response for the first user, receiving voice command data associated with a second user, determining a second user identity according to the second user voice command data, determining a second user activity context according to the second user voice command data, determining a second response for the second user, determining a predicted ambiguity between the first response and the second response, altering the first response according to the predicted ambiguity, and providing the first response and the second response.

Generating input alternatives

Exemplary embodiments relate to a system for recovering a conversation between a user and the system when the system is unable to properly respond to a user's input. The system may process the user input and determine an error condition exists. The system may query one or more storage systems to identify candidate text data based on their semantic similarity to the user input. The storage systems may store data related to past frequently entered inputs and/or user-generated inputs. Alternative text data is selected from the candidate text data, and presented to the user for confirmation.

Systems and methods for variably paced real-time translation between the written and spoken forms of a word
11581006 · 2023-02-14 · ·

An enunciation system (ES) enables users to gain acquaintance, understanding, and mastery of the relationship between letters and sounds in the context of an alphabetic writing system. The ES enables the user to experience the action of sounding out a word, before their own phonics knowledge enables them to sound out the word independently; its continuous, unbroken speech output or input avoids the common confusions that ensue from analyzing words by breaking them up into discrete sounds; its user-controlled pacing allows the user to slow down enunciation at specific points of difficulty within the word; its real-time touch control allows the written word to be “played” like a musical instrument, with expressive and aesthetic possibilities; and its highlighting of the letter cluster that is responsible for the recognized phoneme enunciated by the user as it occurs allows the user to more easily associated the letters with the sounds.

Background audio identification for speech disambiguation
11557280 · 2023-01-17 · ·

Implementations relate to techniques for providing context-dependent search results. A computer-implemented method includes receiving an audio stream at a computing device during a time interval, the audio stream comprising user speech data and background audio, separating the audio stream into a first substream that includes the user speech data and a second substream that includes the background audio, identifying concepts related to the background audio, generating a set of terms related to the identified concepts, influencing a speech recognizer based on at least one of the terms related to the background audio, and obtaining a recognized version of the user speech data using the speech recognizer.

VOICE ASSISTANT SYSTEM WITH AUDIO EFFECTS RELATED TO VOICE COMMANDS

Voice command type entry used as a basis for applying “audio effects” (see definition herein), “sound effects” (see definition herein) and/or audio edits (see definition herein) to a sound signal. This may be done so that the various types of instructed audio processing evoke, in typical listeners, a desired sentiment or mood. Artificial intelligence may be used to accomplish this objective.

Speech fluency evaluation and feedback

Speech fluency evaluation and feedback tools are described. A computing device such as a smartphone may be used to collect speech (and/or other data). The collected data may be analyzed to detect various speech events (e.g., stuttering) and feedback may be generated and provided based on the detected speech events. The collected data may be used to generate a fluency score or other performance metric associated with speech. Collected data may be provided to a practitioner such as a speech therapist or physician for improved analysis and/or treatment.

METHOD AND DEVICE FOR IMPROVING DYSARTHRIA
20230237928 · 2023-07-27 ·

A method of providing a language training to a user by a computing device comprising a processor and a memory is provided. The method comprises: providing contents corresponding to the language training to a user terminal; receiving the user’s voice data from the user terminal; detecting a pitch and a loudness of the user’s voice by analyzing the voice data; and generating a training evaluation by evaluating the user’s training for the contents corresponding to the language training based on the user’s voice data, further comprising determining a phoneme with poor pronunciation accuracy by analyzing the user’s voice data; and automatically generating and providing at least one of a vocabulary, a sentence, and a paragraph including the determined phoneme.

Voice recognition method using artificial intelligence and apparatus thereof
11568853 · 2023-01-31 · ·

Disclosed is a voice recognition method and apparatus using artificial intelligence. A voice recognition method using artificial intelligence may include: generating a utterance by receiving a voice command of a user; obtaining a user's intention by analyzing the generated utterance; deriving an urgency level of the user on the basis of the generated utterance and prestored user information; generating a first response in association with the user's intention; obtaining main vocabularies included in the first response; generating a second response by using the main vocabularies and the urgency level of the user; determining a speech rate of the second response on the basis of the urgency level of the user; and outputting the second response according to the speech rate by synthesizing the second response to a voice signal.

Methods and systems for recommending content in context of a conversation

A media guidance application may monitor a conversation among users, and identify keywords in the conversation, without the use of wakewords. The keywords are used to search for media content that is relevant to the on-going conversation. Accordingly, the media guidance application presents relevant content to the users, during the conversation, to more actively engage the users. A conversation monitoring window may be used to present conversation information as well as relevant content. A listening mode may be used to manage when the media guidance application processes speech from a conversation. The media guidance application may access user profiles for keywords, select content types, select content sources, and determine relevancy of media content, to provide content in context of a conversation.

FREE-FORM TEXT PROCESSING FOR SPEECH AND LANGUAGE EDUCATION

Methods, systems, and computer-readable storage media for providing reading performance feedback to a user from a voice recording of the user reading an arbitrary text. A target text comprising a text passage that a user intends to read and a user recording comprising an audio recording of the user reading the target text aloud are received from a user device. The user recording is converted to a user speech hypothesis comprising text corresponding to speech recognized in the audio recording. The user speech hypothesis is then compared to the target text to generate reading performance feedback comprising relevant differences between the speech in the user recording and the target text and the reading performance feedback is displayed to the user on the user device.