G10L21/0388

ENCODING DEVICE AND METHOD, DECODING DEVICE AND METHOD, AND PROGRAM
20170270940 · 2017-09-21 · ·

The present technology relates to an encoding device and a method, a decoding device and a method, and a program that enables acquisitions of high-quality sound even in a resource-poor setting.

A demultiplexer demultiplexes a supplied code string, to obtain the quantized low-band spectrum, the spectral characteristic code, and the quantized expansion coefficient(s). At this point, the code string includes a single quantized expansion coefficient or quantized expansion coefficients of the respective bands in the high band depending on the spectral characteristic code. A spectral inverse quantization unit obtains the low-band spectrum by inversely quantizing the quantized low-band spectrum. An expansion coefficient inverse quantization unit obtains the expansion coefficient(s) by inversely quantizing the quantized expansion coefficient(s). An expanded spectrum generation unit generates an expanded spectrum, in accordance with the low-band spectrum and the expansion coefficient(s) depending on the spectral characteristic code. An IMDCT unit generates a band-expanded time-series signal from the low-band spectrum and the expanded spectrum. The present technology can be applied to decoding devices.

SPEECH SIGNAL PROCESSING CIRCUIT

A speech-signal-processing-circuit configured to receive a time-frequency-domain-reference-speech-signal and a time-frequency-domain-degraded-speech-signal. The time-frequency-domain-reference-speech-signal comprises: an upper-band-reference-component with frequencies that are greater than a frequency-threshold-value; and a lower-band-reference-component with frequencies that are less than the frequency-threshold-value. The time-frequency-domain-degraded-speech-signal comprises: an upper-band-degraded-component with frequencies that are greater than the frequency-threshold-value; and a lower-band-degraded-component with frequencies that are less than the frequency-threshold-value. The speech-signal-processing-circuit comprises: a disturbance calculator configured to determine one or more SBR-features based on the time-frequency-domain-reference-speech-signal and the time-frequency-domain-degraded-speech-signal by: for each of a plurality of frames: determining a reference-ratio based on the ratio of (i) the upper-band-reference-component to (ii) the lower-band-reference-component; determining a degraded-ratio based on the ratio of (i) the upper-band-degraded-component to (ii) the lower-band-degraded-component; and determining a spectral-balance-ratio based on the ratio of the reference-ratio to the degraded-ratio; and (ii) determining the one or more SBR-features based on the spectral-balance-ratio for the plurality of frames.

SPEECH SIGNAL PROCESSING CIRCUIT

A speech-signal-processing-circuit configured to receive a time-frequency-domain-reference-speech-signal and a time-frequency-domain-degraded-speech-signal. The time-frequency-domain-reference-speech-signal comprises: an upper-band-reference-component with frequencies that are greater than a frequency-threshold-value; and a lower-band-reference-component with frequencies that are less than the frequency-threshold-value. The time-frequency-domain-degraded-speech-signal comprises: an upper-band-degraded-component with frequencies that are greater than the frequency-threshold-value; and a lower-band-degraded-component with frequencies that are less than the frequency-threshold-value. The speech-signal-processing-circuit comprises: a disturbance calculator configured to determine one or more SBR-features based on the time-frequency-domain-reference-speech-signal and the time-frequency-domain-degraded-speech-signal by: for each of a plurality of frames: determining a reference-ratio based on the ratio of (i) the upper-band-reference-component to (ii) the lower-band-reference-component; determining a degraded-ratio based on the ratio of (i) the upper-band-degraded-component to (ii) the lower-band-degraded-component; and determining a spectral-balance-ratio based on the ratio of the reference-ratio to the degraded-ratio; and (ii) determining the one or more SBR-features based on the spectral-balance-ratio for the plurality of frames.

DENOISING A SIGNAL
20170270945 · 2017-09-21 ·

A computer-implemented method according to one embodiment includes creating a clean dictionary, utilizing a clean signal, creating a noisy dictionary, utilizing a first noisy signal, determining a time varying projection, utilizing the clean dictionary and the noisy dictionary, and denoising a second noisy signal, utilizing the time varying projection.

SYSTEM AND METHOD FOR PERFORMING AUTOMATIC GAIN CONTROL USING AN ACCELEROMETER IN A HEADSET

A method performing automatic gain control (AGC) using an accelerometer in a headset starts with an accelerometer-based voice activity detector (VADa) generating a VADa output based on (i) acoustic signals received from at least one microphone included in a pair of earbuds and (ii) data output by at least one accelerometer that is included in the pair of earbuds. The at least one accelerometer detects vibration of the user's vocal chords. The headset includes the pair of earbuds. An AGC controller then performs automatic gain control (AGC) on the acoustic signals from the at least one microphone based on the VADa output. Other embodiments are also described.

SYSTEM AND METHOD FOR PERFORMING AUTOMATIC GAIN CONTROL USING AN ACCELEROMETER IN A HEADSET

A method performing automatic gain control (AGC) using an accelerometer in a headset starts with an accelerometer-based voice activity detector (VADa) generating a VADa output based on (i) acoustic signals received from at least one microphone included in a pair of earbuds and (ii) data output by at least one accelerometer that is included in the pair of earbuds. The at least one accelerometer detects vibration of the user's vocal chords. The headset includes the pair of earbuds. An AGC controller then performs automatic gain control (AGC) on the acoustic signals from the at least one microphone based on the VADa output. Other embodiments are also described.

Method and apparatus for encoding and decoding high frequency for bandwidth extension
09761238 · 2017-09-12 · ·

Disclosed are a method and apparatus for encoding and decoding a high frequency for bandwidth extension. The method includes: estimating a weight; and generating a high frequency excitation signal by applying the weight between random noise and a decoded low frequency spectrum.

Method and apparatus for encoding and decoding high frequency for bandwidth extension
09761238 · 2017-09-12 · ·

Disclosed are a method and apparatus for encoding and decoding a high frequency for bandwidth extension. The method includes: estimating a weight; and generating a high frequency excitation signal by applying the weight between random noise and a decoded low frequency spectrum.

Speech processing device, teleconferencing device, speech processing system, and speech processing method
11398220 · 2022-07-26 · ·

A speech processing method executes at least one of first speech processing and second speech processing. The first speech processing identifies a language based on speech, performs signal processing according to the identified language, and transmits the speech on which the signal processing has been performed, to a far-end-side. The second speech processing identifies a language based on speech, receives the speech from the far-end-side, and performs signal processing on the received speech, according to the identified language.

Speech processing device, teleconferencing device, speech processing system, and speech processing method
11398220 · 2022-07-26 · ·

A speech processing method executes at least one of first speech processing and second speech processing. The first speech processing identifies a language based on speech, performs signal processing according to the identified language, and transmits the speech on which the signal processing has been performed, to a far-end-side. The second speech processing identifies a language based on speech, receives the speech from the far-end-side, and performs signal processing on the received speech, according to the identified language.