Patent classifications
G10L25/48
Remote care system
The innovation disclosed and claimed herein, in one aspect thereof, comprises systems and methods of remote scheduling of calendar and other means based queues and reminders to a user of a presentation device. The user of the presentation device may be suffering from a progressive cognitive disorder, and the queues and other care provided may be provided remotely by a loved one or other administrator.
Stylizing text-to-speech (TTS) voice response for assistant systems
In one embodiment, a method includes receiving a voice input from a user and determining a first style of the voice input, based on first features extracted from the voice input. A second style for a voice response having second features may then be determined based on the first style. Finally, the voice response may be generated based on the second features of the second style, and this voice response may be provided in response to the voice input.
TECHNIQUES FOR AUDIO FEATURE DETECTION
Training a user-specific perturbation generator for an audio feature detection model includes receiving one or more positive audio samples of a user, each of the one or more positive audio samples including an audio feature; receiving one or more negative audio samples of the user, each of the one or more negative audio samples sharing an acoustic similarity with at least one of the one or more positive audio samples; and adversarially training a user-specific perturbation generator model to generate a user-specific perturbation, the training based on the one or more positive audio samples and the one or more negative audio samples. Perturbing audio samples of the user with the user-specific perturbation can cause an audio feature detection model to recognize the audio feature in audio samples that include the audio feature and/or to refrain from recognizing the audio feature in audio samples that do not include the audio feature.
MULTIMODAL CONVERSATIONAL PLATFORM FOR REMOTE PATIENT DIAGNOSIS AND MONITORING
A virtual agent instructs a responding person to perform specific verbal exercises. Audio and image inputs from the responding person's performance of the exercises are used to identify speech, video, cognitive, and/or respiratory biomarkers, which are then used to evaluate speech motor function and/or neurological health. Contemplated exercises include test aspects of oral motor proficiency, sustained phonation, diadochokinesis, reading speech, spontaneous speech, spirometry, picture description, and emotion elicitation. Metrics from evaluation of the responding person's performance are advantageously produced automatically, and are presented in spreadsheet format.
ALIGNING PARAMETER DATA WITH AUDIO RECORDINGS
Various techniques relate to aligning parameters and audio recordings obtained at a rescue scene. An example method includes receiving, from a first device, a first file including first measurements of a first parameter at first discrete times in a time interval. The first file further indicates a marker output by the first device during the time interval. The method also includes receiving, from a second device, a second file comprising second measurements of a second parameter at second discrete times in the time interval. The method includes detecting the marker output by the first device in the second measurements of the second parameter and based on detecting the signal output by the first device in the second measurements, generating aligned data by time-aligning the first measurements of the first parameter and the second measurements of the second parameter. The method further includes outputting the aligned data.
Adapting Automated Speech Recognition Parameters Based on Hotword Properties
A method for optimizing speech recognition includes receiving a first acoustic segment characterizing a hotword detected by a hotword detector in streaming audio captured by a user device, extracting one or more hotword attributes from the first acoustic segment, and adjusting, based on the one or more hotword attributes extracted from the first acoustic segment, one or more speech recognition parameters of an automated speech recognition (ASR) model. After adjusting the speech recognition parameters of the ASR model, the method also includes processing, using the ASR model, a second acoustic segment to generate a speech recognition result. The second acoustic segment characterizes a spoken query/command that follows the first acoustic segment in the streaming audio captured by the user device.
VOICE TRANSLATION AND VIDEO MANIPULATION SYSTEM
A communication modification system including an audio gathering unit that gathers an audio stream, a language detection unit that converts the audio stream into text, where the language detection unit correlates portions of the text with audio portions of the audio stream, and the language detection unit determines a first and second deviation in the audio stream portion based on the text portion and audio portion gathered by the audio gathering unit.
Conversation assistance system
Systems and methods for providing conversation assistance include receiving from at least one user device of a user, conversation information and determining that the conversation information is associated with a conversation involving the user and a first person that is associated with first conversation assistance information in a non-transitory memory. Body measurement data of the user is retrieved from the at least first user device. A need for conversation assistance in the conversation involving the user and the first person is detected using the body measurement data. First conversation assistance information associated with the first person is retrieved from the non-transitory memory. The first conversation assistance information associated with the first person is provided through the at least one user device.
Conversation assistance system
Systems and methods for providing conversation assistance include receiving from at least one user device of a user, conversation information and determining that the conversation information is associated with a conversation involving the user and a first person that is associated with first conversation assistance information in a non-transitory memory. Body measurement data of the user is retrieved from the at least first user device. A need for conversation assistance in the conversation involving the user and the first person is detected using the body measurement data. First conversation assistance information associated with the first person is retrieved from the non-transitory memory. The first conversation assistance information associated with the first person is provided through the at least one user device.
Method of performing function of electronic device and electronic device using same
An electronic device includes: a camera; a microphone; a display; a memory; and a processor configured to receive an input for activating an intelligent agent service from a user while at least one application is executed, identify context information of the electronic device, control to acquire image information of the user through the camera, based on the identified context information, detect movement of a user's lips included in the acquired image information to recognize a speech of the user, and perform a function corresponding to the recognized speech.