Patent classifications
G10L21/02
VOICE DRIVEN DYNAMIC MENUS
Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.
Adaptive coefficients and samples elimination for circular convolution
Technologies are disclosed for improving the efficiency of real-time audio processing, and specifically for improving the efficiency of continuously modifying a real-time audio signal. Efficiency is improved by reducing memory bandwidth requirements and by reducing the amount of processing used to modify the real-time audio signal. In some configurations, memory bandwidth requirements are reduced by selectively transferring active samples in the frequency domain—e.g. avoiding the transfer samples with amplitudes of zero or near-zero. This has particular importance when the specialized hardware retrieves samples from main memory in real-time. In some configurations, the amount of processing needed to modify the audio signal is reduced by omitting operations that do not meaningfully affect the output audio signal. For example, a multiplication of samples may be avoided when at least one of the samples has an amplitude of zero or near-zero.
IN-VEHICLE COMMUNICATION SUPPORT SYSTEM
A right seat processing unit includes, assuming that a voice of a user in a right front seat is a right front seat voice, a filter HR that converts the right front seat voice being output from a left front seat microphone disposed on a headrest of a left front seat into the right front seat voice collected by a right seat virtual microphone that is a virtual microphone located on a left side of a headrest of the right front seat, a delay unit Z.sup.-TR that delays and outputs an output from a right front seat microphone located on a right side of the headrest of the right front seat, a filter VVRA that extracts the right front seat voice being output from the filter HR, a filter WRB that extracts the right front seat voice being output from the delay unit Z.sup.-TR, and a right adder that adds outputs of the filter WRA and the filter WRB and outputs an added output as a right front seat speech voice signal.
GENERATING AUDIO WAVEFORMS USING ENCODER AND DECODER NEURAL NETWORKS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.
ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF
An electronic apparatus is provided. The electronic apparatus includes a communication interface with communication circuitry, a memory configured to store at least one instruction and a processor, and the processor is configured to receive a first audio recognized as a wake up word by an external device from the external device, determine whether the first audio corresponds to the wake up word by analyzing the first audio, based on determining that the first audio does not correspond to the wake up word, obtain a neural network model for detecting a wake up word misrecognition based on the first audio, and transmit information regarding the neural network model to the external device.
Shared speech processing network for multiple speech applications
A device to process speech includes a speech processing network that includes an input configured to receive audio data corresponding to audio captured by one or more microphones. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules. A first speech application module corresponds to a speaker verifier, and a second speech application module corresponds to a speech recognition network.
Shared speech processing network for multiple speech applications
A device to process speech includes a speech processing network that includes an input configured to receive audio data corresponding to audio captured by one or more microphones. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules. A first speech application module corresponds to a speaker verifier, and a second speech application module corresponds to a speech recognition network.
VOICE REINFORCEMENT IN MULTIPLE SOUND ZONE ENVIRONMENTS
Microphone signal is received from at least one microphone. AEC produces an echo cancelled microphone signal using first adaptive filters to estimate and cancel feedback that is a result of the environment. AFC produces a processed microphone signal using second adaptive filters to estimate and cancel feedback resulting from application of the reinforced voice signal within the environment. The uttered speech is reinforced in the processed microphone signal to produce the reinforced voice signal. The reinforced voice signal and the audio signal is applied to the loudspeakers. A step size of adjustment of the second adaptive filters may be increased responsive to detection of reverberation in the microphone signal. The reverberation that is used to control the step size of the second adaptive filters may be added artificially. This may provide multiple benefits including improving adjustment of the second adaptive filters and also improving the sound impression of the voice.
Electronic device and method of controlling thereof
An electronic device and a method for controlling the electronic device are disclosed. The electronic device of the disclosure includes a microphone, a memory storing at least one instruction, and a processor configured to execute the at least one instruction. The processor, by executing the at least one instruction, is configured to: obtain second voice data by inputting first voice data input via the microphone to a first model trained to enhance sound quality, obtain a weight by inputting the first voice data and the second voice data to a second model, and identify input data to be input to a third model using the weight.
Apparatus and method for processing an audio signal using a harmonic post-filter
An apparatus for processing an audio signal having associated therewith a pitch lag information and a gain information, includes a domain converter for converting a first domain representation of the audio signal into a second domain representation of the audio signal; and a harmonic post-filter for filtering the second domain representation of the audio signal, wherein the post-filter is based on a transfer function including a numerator and a denominator, wherein the numerator includes a gain value indicated by the gain information, and wherein the denominator includes an integer part of a pitch lag indicated by the pitch lag information and a multi-tap filter depending on a fractional part of the pitch lag.