G10L2019/0012

System and method to correct for packet loss in ASR systems

A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.

Signal filtering

In methods and systems for filtering an information input signal, a system may have: a first filter unit filtering an input signal at an initial subinterval in a current update interval according to parameters associated to the preceding update interval, the parameters being scaled by a first scaling factor changing towards 0; and a second filter unit filtering a second filter input signal, based on the output of the first filter unit, at the initial subinterval, according to parameters associated to the current update interval, the parameters being scaled by a second scaling factor changing from 0, or a value close to 0, toward a value more distant from 0.

Method for speech coding, method for speech decoding and their apparatuses
09852740 · 2017-12-26 · ·

A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.

ENCODING DEVICE, DECODING DEVICE, AND COMMUNICATION SYSTEM FOR EXTENDING VOICE BAND
20170330584 · 2017-11-16 · ·

A first encoding unit generates a first encoded signal by encoding a component within a first band in a voice signal. A frequency shifting unit shifts the frequency of a component within a second band in the voice signal, the second band having a frequency higher than that of the first band, to the frequency of a component within the first band. A second encoding unit generates a second encoded signal by encoding the component whose frequency has been shifted in the frequency shifting unit. An output unit outputs both the first encoded signal generated in the first encoding unit and the second encoded signal generated in the second encoding unit.

SYSTEM AND METHOD TO CORRECT FOR PACKET LOSS IN ASR SYSTEMS

A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.

Signal filtering

In methods and systems for filtering an information input signal, a system may have: a first filter unit filtering an input signal at an initial subinterval in a current update interval according to parameters associated to the preceding update interval, the parameters being scaled by a first scaling factor changing towards 0; and a second filter unit filtering a second filter input signal, based on the output of the first filter unit, at the initial subinterval, according to parameters associated to the current update interval, the parameters being scaled by a second scaling factor changing from 0, or a value close to 0, toward a value more distant from 0.

System and method to correct for packet loss using hidden markov models in ASR systems

A system and method are presented for the correction of packet loss in audio in automatic speech recognition (ASR) systems. Packet loss correction, as presented herein, occurs at the recognition stage without modifying any of the acoustic models generated during training. The behavior of the ASR engine in the absence of packet loss is thus not altered. To accomplish this, the actual input signal may be rectified, the recognition scores may be normalized to account for signal errors, and a best-estimate method using information from previous frames and acoustic models may be used to replace the noisy signal.

Speech recognition method based on artificial intelligence and terminal

Embodiments of the present disclosure provide a speech recognition method based on artificial intelligence, and a terminal. The method includes obtaining speech data to be recognized; performing a processing on the speech data to be recognized using a trained sub-band energy normalized acoustic model, to determine an normalized energy feature corresponding to each time-frequency unit in the speech data to be recognized; and determining text data corresponding to the speech data to be recognized according to the normalized energy feature corresponding to each time-frequency unit.

Apparatus and method for processing soundfield data

An apparatus for processing soundfield data is provided. The soundfield data defines a soundfield within a spatial reproduction region comprising at least one bright zone and at least one quiet zone. The apparatus comprises an applicator configured to apply a spatially continuously varying weighting function to the soundfield data in order to obtain weighted soundfield data defining a weighted soundfield, wherein the spatially continuously varying weighting function is configured to enhance the soundfield in at least one of the bright zone and the quiet zone.

Variable length coding of indices and bit scheduling in a pyramid vector quantizer
10366698 · 2019-07-30 · ·

Certain embodiments enable improved encoding and decoding of a vector of coefficients by associating a vector element of a signed pyramid with an encoded value that includes a first portion and a second portion, where the first portion identifies a corresponding vector element of an unsigned pyramid and a second portion characterizes sign values for nonzero components of the vector element of the signed pyramid. As a result, computational constraints such as word size apply to the unsigned pyramid instead of the signed pyramid. The smaller size of the unsigned pyramid enables extending the range of signed pyramid parameters that are operable within the computational constraints.