G10L25/12

System and method for continuous media segment identification
11272226 · 2022-03-08 · ·

This invention provides a means to identify unknown media programming using the audio component of said programming. The invention extracts audio information from the media received by consumer electronic devices such as smart TVs and TV set-top boxes then conveys said information to a remote server means which will in turn identify said audio information of unknown identity by way of testing against a database of known audio segment information. The system identifies unknown media programming in real-time such that time-sensitive services may be offered such as interactive television applications providing contextually related information or television advertisement substitution. Other uses include tracking media consumption among many other services.

ENCODING DEVICE AND ENCODING METHOD

This encoding device is able to encode an S signal efficiently in MS prediction encoding. An M signal encoding unit generates first encoding information by encoding a sum signal indicating a sum of a left channel signal and a right channel signal that constitute a stereo signal. An energy difference calculation unit calculates a prediction parameter for predicting a difference signal indicating a difference between the left channel signal and the right channel signal by using a parameter regarding an energy difference between the left channel signal and the right channel signal. An entropy encoding unit generates second encoding information by encoding the prediction parameter.

PRONUNCIATION CONVERSION APPARATUS, PITCH MARK TIMING EXTRACTION APPARATUS, METHODS AND PROGRAMS FOR THE SAME

Provided is a system which allows a learner who is a non-native speaker of a given language to intuitively improve pronunciation of the language. A pronunciation conversion apparatus includes a conversion section which converts a first feature value corresponding to a first speech signal obtained when a first speaker who speaks a given language as his/her native language speaks another language such that the first feature value approaches a second feature value corresponding to a second speech signal obtained when a second speaker who speaks the other language as his/her native language speaks the other language, each of the first feature value and the second feature value is a feature value capable of representing a difference in pronunciation, and a speech signal obtained from the first feature value after the conversion is presented to the first speaker.

PRONUNCIATION CONVERSION APPARATUS, PITCH MARK TIMING EXTRACTION APPARATUS, METHODS AND PROGRAMS FOR THE SAME

Provided is a system which allows a learner who is a non-native speaker of a given language to intuitively improve pronunciation of the language. A pronunciation conversion apparatus includes a conversion section which converts a first feature value corresponding to a first speech signal obtained when a first speaker who speaks a given language as his/her native language speaks another language such that the first feature value approaches a second feature value corresponding to a second speech signal obtained when a second speaker who speaks the other language as his/her native language speaks the other language, each of the first feature value and the second feature value is a feature value capable of representing a difference in pronunciation, and a speech signal obtained from the first feature value after the conversion is presented to the first speaker.

Method of error concealment, and associated device

In an embodiment, a method includes: receiving an audio frame; decomposing the received audio frame into M sub-band pulse-code modulation (PCM) audio frames, where M is a positive integer number; predicting a PCM sample of one sub-band PCM audio frame of the M sub-band PCM audio frames; comparing the predicted PCM sample with a corresponding received PCM sample to generate a prediction error sample; comparing an instantaneous absolute value of the prediction error sample with a threshold; and replacing the corresponding received PCM sample with a value based on the predicted PCM sample when the instantaneous absolute value of the prediction error sample is greater than the threshold.

Acoustic feature extractor selected according to status flag of frame of acoustic signal

A method, computer system, and a computer program product for adaptively selecting an acoustic feature extractor in an Artificial Intelligence system is provided. The present invention may include acquiring a frame of an acoustic signal. The present invention may include checking a status of a flag to be used to indicate a proper acoustic feature extractor to be selected. The present invention may include processing the frame of the acoustic signal by the selected acoustic feature extractor indicated by the checked status. The present invention may include determining, based on data generated in the processing of the frame of the acoustic signal, an actual status of the frame of the acoustic signal. The present invention may include updating the status of the flag according to the actual status.

Acoustic feature extractor selected according to status flag of frame of acoustic signal

A method, computer system, and a computer program product for adaptively selecting an acoustic feature extractor in an Artificial Intelligence system is provided. The present invention may include acquiring a frame of an acoustic signal. The present invention may include checking a status of a flag to be used to indicate a proper acoustic feature extractor to be selected. The present invention may include processing the frame of the acoustic signal by the selected acoustic feature extractor indicated by the checked status. The present invention may include determining, based on data generated in the processing of the frame of the acoustic signal, an actual status of the frame of the acoustic signal. The present invention may include updating the status of the flag according to the actual status.

PHASE RECONSTRUCTION IN A SPEECH DECODER

Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.

PHASE RECONSTRUCTION IN A SPEECH DECODER

Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.

LINEAR PREDICTION ANALYSIS DEVICE, METHOD, PROGRAM, AND STORAGE MEDIUM

An autocorrelation calculation unit 21 calculates an autocorrelation R.sub.o(i) from an input signal. A prediction coefficient calculation unit 23 performs linear prediction analysis by using a modified autocorrelation R′.sub.o(i) obtained by multiplying a coefficient w.sub.o(i) by the autocorrelation R.sub.o(i). It is assumed here, for each order i of some orders i at least, that the coefficient w.sub.o(i) corresponding to the order i is in a monotonically increasing relationship with an increase in a value that is negatively correlated with a fundamental frequency of the input signal of the current frame or a past frame.