Patent classifications
G10L25/12
SYSTEM AND METHOD FOR CONTINUOUS MEDIA SEGMENT IDENTIFICATION
This invention provides a means to identify unknown media programming using the audio component of said programming. The invention extracts audio information from the media received by consumer electronic devices such as smart TVs and TV set-top boxes then conveys said information to a remote server means which will in turn identify said audio information of unknown identity by way of testing against a database of known audio segment information. The system identifies unknown media programming in real-time such that time-sensitive services may be offered such as interactive television applications providing contextually related information or television advertisement substitution. Other uses include tracking media consumption among many other services.
VOICE PROCESSING METHOD, APPARATUS, AND DEVICE AND STORAGE MEDIUM
A voice processing method includes: determining a historical voice frame corresponding to a target voice frame; determining a frequency-domain characteristic of the historical voice frame; invoking a network model to predict the frequency-domain characteristic of the historical voice frame, to obtain a parameter set of the target voice frame, the parameter set including a plurality of types of parameters, the network model including a plurality of neural networks (NNs), and a number of the types of the parameters in the parameter set being determined according to a number of the NNs; and reconstructing the target voice frame according to the parameter set.
VOICE PROCESSING METHOD, APPARATUS, AND DEVICE AND STORAGE MEDIUM
A voice processing method includes: determining a historical voice frame corresponding to a target voice frame; determining a frequency-domain characteristic of the historical voice frame; invoking a network model to predict the frequency-domain characteristic of the historical voice frame, to obtain a parameter set of the target voice frame, the parameter set including a plurality of types of parameters, the network model including a plurality of neural networks (NNs), and a number of the types of the parameters in the parameter set being determined according to a number of the NNs; and reconstructing the target voice frame according to the parameter set.
AUDIO SIGNAL ENCODING AND DECODING METHOD USING LEARNING MODEL, TRAINING METHOD OF LEARNING MODEL, AND ENCODER AND DECODER THAT PERFORM THE METHODS
An audio signal encoding and decoding method using a learning model, a training method of the learning model, and an encoder and decoder that perform the method, are disclosed. The audio signal decoding method may include extracting a first residual signal and a first linear prediction coefficient by decoding a bitstream received from an encoder, generating a first audio signal from the first residual signal using the first linear prediction coefficient, generating a second linear prediction coefficients and a second residual signal from the first audio signal, obtaining a third linear prediction coefficient by inputting the second linear prediction coefficient into a trained learning model, and generating a second audio signal from the second residual signal using the third linear prediction coefficient.
AUDIO SIGNAL ENCODING AND DECODING METHOD USING LEARNING MODEL, TRAINING METHOD OF LEARNING MODEL, AND ENCODER AND DECODER THAT PERFORM THE METHODS
An audio signal encoding and decoding method using a learning model, a training method of the learning model, and an encoder and decoder that perform the method, are disclosed. The audio signal decoding method may include extracting a first residual signal and a first linear prediction coefficient by decoding a bitstream received from an encoder, generating a first audio signal from the first residual signal using the first linear prediction coefficient, generating a second linear prediction coefficients and a second residual signal from the first audio signal, obtaining a third linear prediction coefficient by inputting the second linear prediction coefficient into a trained learning model, and generating a second audio signal from the second residual signal using the third linear prediction coefficient.
SYSTEMS AND METHODS FOR DETECTING MANIPULATED VOCAL SAMPLES
Systems and methods for detecting manipulated vocal audio are disclosed. The system may receive a communication from a user, which may include a vocal sample. The system may transform the vocal sample from a wavelength domain into a frequency domain. The system may determine a divergence of one or more amplitude values of the transformed frequency domain from a predetermined frequency distribution. According to some embodiments, the predetermined frequency distribution may be a Benford's distribution. When the divergence exceeds a predetermined threshold, the system may execute one or more security measures. The one or more security measures may include (i) transferring the user from an automated operator to a human operator, (ii) requiring second factor authentication from the user, and/or (iii) denying a user-initiated request.
SYSTEMS AND METHODS FOR DETECTING MANIPULATED VOCAL SAMPLES
Systems and methods for detecting manipulated vocal audio are disclosed. The system may receive a communication from a user, which may include a vocal sample. The system may transform the vocal sample from a wavelength domain into a frequency domain. The system may determine a divergence of one or more amplitude values of the transformed frequency domain from a predetermined frequency distribution. According to some embodiments, the predetermined frequency distribution may be a Benford's distribution. When the divergence exceeds a predetermined threshold, the system may execute one or more security measures. The one or more security measures may include (i) transferring the user from an automated operator to a human operator, (ii) requiring second factor authentication from the user, and/or (iii) denying a user-initiated request.
System for converting vibration to voice frequency wirelessly
The present application discloses a system for converting vibration to voice frequency wirelessly and a method thereof. By sensing a first vibration variation data and a voice frequency variation data of a vocal vibration part in a first sensing period, a voice frequency reference data is obtained from the voice frequency variation data and the first vibration result. A second vibration result is obtained at a second sensing period for converting to a voice frequency output signal, and the voice frequency output signal is used to output as a voice signal corresponding to the voice frequency various result. Thus, the present application provides a voice signal close to a human voice.
Speech quality under heavy noise conditions in hands-free communication
When the noise in an audio signal made up of both speech and noise is suppressed, the quality of the speech in the audio signal is usually degraded. The speech obtained from a noise-suppressed audio signal is improved by determining linear predictive coding (LPC) characteristics of the audio signal without or prior to noise suppression and by determining the LPC characteristics of the noise-suppressed audio. The convolution of those differing characteristics provides an improved-quality speech signal, with the original noise level reduced or suppressed.
Linear prediction residual energy tilt-based audio signal classification method and apparatus
A linear prediction residual energy tilt-based audio signal classification method and apparatus, where the method includes: determining, according to voice activity of a current audio frame, whether to obtain a linear prediction residual energy tilt of a current audio frame of the current audio frame and store a frequency spectrum fluctuation of the current frame in a frequency spectrum fluctuation memory, where the linear prediction residual energy tilt denotes an extent to which an audio signal's linear prediction residual energy changes as a linear prediction order inscreases; updating, according to whether the audio frame is percussive music or activity of a historical audio frame, frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory; and classifying the current audio frame as a speech frame or a music frame according to statistics of some or all of effective data of the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory.