Patent classifications
G10L19/0018
PHASE RECONSTRUCTION IN A SPEECH DECODER
Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.
Audio encoding for functional interactivity
Some examples include a computing device that receives media content to distribute to a plurality of electronic devices. The computing device may receive an indication of first data to relate to the media content for distribution to the plurality of electronic devices. A portion of the multimedia content may be decoded to enable a determination that the media content already has second data embedded in the media content. A psychoacoustic mask may be extracted from the media content and subtracted from the received media content to remove the embedded second data. The first data may be associated with the media content by either embedding third data in the media content, or by embedding the first data in the media content.
Calibration of haptic device using sensor harness
A haptic calibration device comprises a signal generator configured to receive the subjective force value and the force location from a subjective magnitude input device. The signal generator also receives from at least one of a plurality of haptic sensors a sensor voltage value, with the at least one of the plurality of haptic sensors corresponding to the force location. The signal generator stores the subjective force value and the corresponding sensor voltage value in a data store. The signal generator generates a calibration curve indicating a correspondence between subjective force values and sensor voltage values for the location where the subjective force was experienced using the data from the data store, wherein the calibration curve is used to calibrate a haptic feedback device.
AUDIO ENCODING FOR FUNCTIONAL INTERACTIVITY
Some examples include a computing device that receives media content to distribute to a plurality of electronic devices. The computing device may receive an indication of first data to relate to the media content for distribution to the plurality of electronic devices. A portion of the multimedia content may be decoded to enable a determination that the media content already has second data embedded in the media content. A psychoacoustic mask may be extracted from the media content and subtracted from the received media content to remove the embedded second data. The first data may be associated with the media content by either embedding third data in the media content, or by embedding the first data in the media content.
Envelope encoding of speech signals for transmission to cutaneous actuators
A haptic communication device includes a speech signal generator configured to receive speech sounds or a textual message and generate speech signals corresponding to the speech sounds or the textual message. An envelope encoder is operably coupled to the speech signal generator to extract a temporal envelope from the speech signals. The temporal envelope represents changes in amplitude of the speech signals. Carrier signals having a periodic waveform are generated. Actuator signals are generated by encoding the changes in the amplitude of the speech signals from the temporal envelope into the carrier signals. One or more cutaneous actuators are operably coupled to the envelope encoder to generate haptic vibrations representing the speech sounds or the textual message using the actuator signals.
Artificially generated speech for a communication session
A device for communicating with a remote device is disclosed, which includes a processor and a memory in communication with the processor. The memory includes executable instructions that, when executed, cause the processor to control the device to perform functions of establishing, via a communication network, a communication session with the remote device; capturing a speech spoken by a user and generating audio data representing the captured speech by the user; encoding the audio data for transmission to the remote device via the communication network; converting the audio data to text data representing the captured speech; and transmitting, during the communication session, the encoded audio data and the text data to the remote device via the communication network. The device thus can provide the text data representing the captured speech when a quality of the encoded audio signal received by the remote device is below a predetermined level.
REAL TIME DIGITAL VOICE COMMUNICATION METHOD
A communication system includes at least one first device and at least one second device which are linked in a manner that enables data transfer with each other. The first device enables the speech signal that it receives as the input to be expressed in terms of the energy functions representing the energy patterns, information functions representing the information patterns and the noise functions of the frames of the real speech samples; and transfers the indexes of these functions in the database and the frame gain factor of each frame to the second device. The second device finds the functions via the indexes from the copy database which is a copy of the database and reconstructs the speech signal by these functions and the frame gain factor, enabling it to be provided as the voice output.
Phase reconstruction in a speech decoder
Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.
DURATION INFORMED ATTENTION NETWORK (DURIAN) FOR AUDIO-VISUAL SYNTHESIS
A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A spectrogram frame is generated based on the duration model. An audio waveform is generated based on the spectrogram frame. Video information is generated based on the audio waveform. The audio waveform is provided as an output along with a corresponding video.
HIERARCHICAL ENCODER FOR SPEECH CONVERSION SYSTEM
A speech conversion system is described that includes a hierarchical encoder and a decoder. The system may comprise a processor and memory storing instructions executable by the processor. The instructions may comprise to: using a second recurrent neural network (RNN) (GRU1) and a first set of encoder vectors derived from a spectrogram as input to the second RNN, determine a second concatenated sequence; determine a second set of encoder vectors by doubling a stack height and halving a length of the second concatenated sequence; using the second set of encoder vectors, determine a third set of encoder vectors; and decode the third set of encoder vectors using an attention block.