G10L25/00

Voice application platform

Among other things, requests are received from voice assistant devices expressed in accordance with different corresponding protocols of one or more voice assistant frameworks. Each of the requests represents a voiced input by a user to the corresponding voice assistant device. The received requests are re-expressed in accordance with a common request protocol. Based on the received requests, responses to the requests are expressed in accordance with a common response protocol. Each of the responses is re-expressed according to a protocol of the framework with respect to which the corresponding request was expressed. The responses are sent to the voice assistant devices for presentation to the users.

Sound source direction estimation device, sound source direction estimation method, and recording medium therefor
10524051 · 2019-12-31 · ·

A sound source direction estimation device includes: a phase difference calculator which calculates, from an acoustic signal obtained by a microphone array, a first phase difference of a pair of microphone units; a similarity calculator which calculates similarities between the calculated first phase difference and second phase differences precalculated for directions and stored in a phase difference database; a peak searcher which searches for a direction for which a highest similarity is calculated by the similarity calculator, and estimates the direction searched out to be a sound source direction; a feature quantity calculator which uses the calculated similarities, the estimated sound source direction, and an acoustic feature quantity obtained from the obtained acoustic signal, to calculate a feature quantity obtained by correcting the acoustic feature quantity; and a speech/non-speech determiner which determines whether the obtained acoustic signal indicates speech, using the feature quantity calculated by the feature quantity calculator.

Voice controlled system

A distributed voice controlled system has a primary assistant and at least one secondary assistant. The primary assistant has a housing to hold one or more microphones, one or more speakers, and various computing components. The secondary assistant is similar in structure, but is void of speakers. The voice controlled assistants perform transactions and other functions primarily based on verbal interactions with a user. The assistants within the system are coordinated and synchronized to perform acoustic echo cancellation, selection of a best audio input from among the assistants, and distributed processing.

Voice controlled system

A distributed voice controlled system has a primary assistant and at least one secondary assistant. The primary assistant has a housing to hold one or more microphones, one or more speakers, and various computing components. The secondary assistant is similar in structure, but is void of speakers. The voice controlled assistants perform transactions and other functions primarily based on verbal interactions with a user. The assistants within the system are coordinated and synchronized to perform acoustic echo cancellation, selection of a best audio input from among the assistants, and distributed processing.

Adapting automated assistant functionality based on generated proficiency measure(s)
11935527 · 2024-03-19 · ·

Implementations relate to generating a proficiency measure, and utilizing the proficiency measure to adapt one or more automated assistant functionalities. The generated proficiency measure is for a particular class of automated assistant actions, and is specific to an assistant device and/or is specific to a particular user. A generated proficiency measure for a class can reflect a degree of proficiency, of a user and/or of an assistant device, for that class. Various automated assistant functionalities can be adapted, for a particular class, responsive to determining the proficiency measure satisfies a threshold, or fails to satisfy the threshold (or an alternate threshold). The adaptation(s) can make automated assistant processing more efficient and/or improve (e.g., shorten the duration of) user-assistant interaction(s).

Techniques for detecting and processing domain-specific terminology

Various embodiments set forth systems and techniques for explaining domain-specific terms detected in a media content stream. The techniques include detecting a speech portion included in an audio signal; determining that the speech portion comprises a domain-specific term; determining an explanatory phrase associated with the domain-specific term; and integrating the explanatory phrase associated with the domain-specific term into playback of the audio signal.

ELECTRONIC DEVICE AND METHOD OF CONTROLLING TEXT-TO-SPEECH (TTS) RATE

Disclosed are an electronic device and a method of controlling a text-to-speech (TTS) rate. An electronic device may include a processor, and a memory configured to store instructions to be executed by the processor. The processor may receive a voice signal of a user. The processor may calculate a speaking rate of the voice signal based on the voice signal. The processor may generate an output text to be output to the user based on the voice signal. The processor may determine a TTS rate of the output text based on the speaking rate. The processor may convert the output text into voice data based on the TTS rate and output the voice data.

ELECTRONIC DEVICE AND METHOD OF CONTROLLING TEXT-TO-SPEECH (TTS) RATE

Disclosed are an electronic device and a method of controlling a text-to-speech (TTS) rate. An electronic device may include a processor, and a memory configured to store instructions to be executed by the processor. The processor may receive a voice signal of a user. The processor may calculate a speaking rate of the voice signal based on the voice signal. The processor may generate an output text to be output to the user based on the voice signal. The processor may determine a TTS rate of the output text based on the speaking rate. The processor may convert the output text into voice data based on the TTS rate and output the voice data.

Dynamic voice search transitioning

Systems, methods, and computer-readable media are disclosed for dynamic voice search transitioning. Example methods may include receiving, by a computer system in communication with a display, a first incoming voice data indication, initiating a first user interface theme for presentation at a display, wherein the first user interface theme is a default user interface theme, and receiving first voice data. Example methods may include sending the first voice data to a remote server for processing, receiving an indication from the remote server to initiate a second user interface theme, and initiating the second user interface theme for presentation at the display.

Signal processing apparatus, training apparatus, and method
11894008 · 2024-02-06 · ·

Provided is a signal processing apparatus that includes a voice quality conversion unit that converts acoustic data of any sound of an input sound source to acoustic data of voice quality of a target sound source different from the input sound source on the basis of a voice quality converter parameter obtained by training using acoustic data for each of one or more sound sources as training data, the acoustic data being different from parallel data or clean data.