G10L25/69

PHASE RECONSTRUCTION IN A SPEECH DECODER

Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.

PHASE RECONSTRUCTION IN A SPEECH DECODER

Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.

SPEAKER RECOGNITION WITH QUALITY INDICATORS

Embodiments described herein provide for a machine-learning architecture for modeling quality measures for enrollment signals. Modeling these enrollment signals enables the machine-learning architecture to identify deviations from expected or ideal enrollment signal in future test phase calls. These differences can be used to generate quality measures for the various audio descriptors or characteristics of audio signals. The quality measures can then be fused at the score-level with the speaker recognition's embedding comparisons for verifying the speaker. Fusing the quality measures with the similarity scoring essentially calibrates the speaker recognition's outputs based on the realities of what is actually expected for the enrolled caller and what was actually observed for the current inbound caller.

Single-sided speech quality measurement
09786300 · 2017-10-10 · ·

A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.

Single-sided speech quality measurement
09786300 · 2017-10-10 · ·

A non-intrusive speech quality estimation technique is based on statistical or probability models such as Gaussian Mixture Models (“GMMs”). Perceptual features are extracted from the received speech signal and assessed by an artificial reference model formed using statistical models. The models characterize the statistical behavior of speech features. Consistency measures between the input speech features and the models are calculated to form indicators of speech quality. The consistency values are mapped to a speech quality score using a mapping optimized using machine learning algorithms, such as Multivariate Adaptive Regression Splines (“MARS”). The technique provides competitive or better quality estimates relative to known techniques while having lower computational complexity.

COMMUNICATION TERMINAL DEVICE, COMMUNICATION SYSTEM, COMMUNICATION METHOD, RECORDING MEDIUM FOR TRANSMITTING SPEECH DATA
20170243591 · 2017-08-24 · ·

A microphone receives speech data to be transmitted. A business wireless communication unit transmits the received speech data. A processing unit causes a storage unit to store the speech data and time information for identifying blocks produced by dividing the received speech data into a plurality of blocks. The business wireless communication unit receives a request signal for requesting transmission of the speech data stored in the storage unit, the request signal including the time information. The processing unit acquires from the storage unit the speech data for the block identified based on the time information included in the received request signal, and causes the business wireless communication unit to transmit the acquired speech data.

REALTIME ASSESSMENT OF TTS QUALITY USING SINGLE ENDED AUDIO QUALITY MEASUREMENT
20170278506 · 2017-09-28 ·

A system and method of regulating speech output by a text-to-speech (TTS) system includes: evaluating speech that has been converted from text using an initial speech quality test before presentation to a user; applying a classification test to the evaluated speech if the evaluated speech falls below a threshold based on the initial speech quality test; generating an abnormal speech classification for the evaluated speech; and applying a corrective action to the evaluated speech based on the abnormal speech classification.

Maintaining Audio Communication in a Congested Communication Channel

The invention relates to a communication system and a method of maintaining audio communication in a congested communication channel currently bearing the transmission of speech in audio communication between a sender side and a receiver side, the communication channel having at least one signaling channel and at least one payload channel having a quality of service. During the audio communication the quality of service of the payload channel is monitored. If the quality of service of the payload channel is below a threshold the speech at the respective sender side is converted to text; and transmitted over the retained communication channel to the respective receiver side. The text may be converted back to speech at the receiver side.

Maintaining Audio Communication in a Congested Communication Channel

The invention relates to a communication system and a method of maintaining audio communication in a congested communication channel currently bearing the transmission of speech in audio communication between a sender side and a receiver side, the communication channel having at least one signaling channel and at least one payload channel having a quality of service. During the audio communication the quality of service of the payload channel is monitored. If the quality of service of the payload channel is below a threshold the speech at the respective sender side is converted to text; and transmitted over the retained communication channel to the respective receiver side. The text may be converted back to speech at the receiver side.

SPEECH SIGNAL PROCESSING CIRCUIT

A speech-signal-processing-circuit configured to receive a time-frequency-domain-reference-speech-signal and a time-frequency-domain-degraded-speech-signal. The time-frequency-domain-reference-speech-signal comprises: an upper-band-reference-component with frequencies that are greater than a frequency-threshold-value; and a lower-band-reference-component with frequencies that are less than the frequency-threshold-value. The time-frequency-domain-degraded-speech-signal comprises: an upper-band-degraded-component with frequencies that are greater than the frequency-threshold-value; and a lower-band-degraded-component with frequencies that are less than the frequency-threshold-value. The speech-signal-processing-circuit comprises: a disturbance calculator configured to determine one or more SBR-features based on the time-frequency-domain-reference-speech-signal and the time-frequency-domain-degraded-speech-signal by: for each of a plurality of frames: determining a reference-ratio based on the ratio of (i) the upper-band-reference-component to (ii) the lower-band-reference-component; determining a degraded-ratio based on the ratio of (i) the upper-band-degraded-component to (ii) the lower-band-degraded-component; and determining a spectral-balance-ratio based on the ratio of the reference-ratio to the degraded-ratio; and (ii) determining the one or more SBR-features based on the spectral-balance-ratio for the plurality of frames.