Patent classifications
G10L25/69
Method and apparatus for differentiating between human and electronic speaker for voice interface security
A system for distinguishing between a human voice generated command and an electronic speaker generated command is provided. An exemplary system comprises a microphone array for receiving an audio signal collection, preprocessing circuitry configured for converting the audio signal collection into processed recorded audio signals, energy balance metric determination circuitry configured for calculating a final energy balance metric based on the processed recorded audio signals, and energy balance metric evaluation circuitry for outputting a command originator signal based at least in part on the final energy balance metric.
Method and apparatus for differentiating between human and electronic speaker for voice interface security
A system for distinguishing between a human voice generated command and an electronic speaker generated command is provided. An exemplary system comprises a microphone array for receiving an audio signal collection, preprocessing circuitry configured for converting the audio signal collection into processed recorded audio signals, energy balance metric determination circuitry configured for calculating a final energy balance metric based on the processed recorded audio signals, and energy balance metric evaluation circuitry for outputting a command originator signal based at least in part on the final energy balance metric.
OPTIMIZATION APPARATUS, OPTIMIZATION METHOD, AND PROGRAM
To perform optimization processing of parameters with various structures without having to manually redesign processing contents of encoding and decoding. An evaluation step of obtaining an evaluated value representing an evaluation result of signal processing using a first signal processing parameter value that is a signal processing parameter; a coding step of converting, based on at least a definition file that defines an attribute of the signal processing parameter, the first signal processing parameter value into a first external parameter value that is an external parameter; a generation step of generating a second external parameter value that is the external parameter of which a value differs from the first external parameter value based on the evaluated value and the first external parameter value; and a decoding step of converting, based on the definition file, the second external parameter value into a second signal processing parameter value that is the signal processing parameter are executed.
TECHNIQUES FOR COMPUTING PERCEIVED AUDIO QUALITY BASED ON A TRAINED MULTITASK LEARNING MODEL
In various embodiments, a quality inference application estimates perceived audio quality. The quality inference application computes a set of feature values for a set of audio features based on an audio clip. The quality inference application then uses a trained multitask learning model to generate predicted labels based on the set of feature values. The predicted labels specify metric values for metrics that are relevant to audio quality. Subsequently, the quality inference application computes an audio quality score for the audio clip based on the predicted labels.
Apparatus and method for determining a predetermined characteristic related to a spectral enhancement processing of an audio signal
An apparatus for determining a predetermined characteristic related to a spectral enhancement processing of an audio signal includes a deriver configured for obtaining a spectrum of the audio signal and for deriving a local maximum signal from the spectrum. The apparatus includes a determiner configured for determining a similarity between segments of the local maximum signal and includes a processor for providing an information indicating that the audio signal includes the predetermined characteristic dependent on an evaluation of the similarity.
Apparatus and method for determining a predetermined characteristic related to a spectral enhancement processing of an audio signal
An apparatus for determining a predetermined characteristic related to a spectral enhancement processing of an audio signal includes a deriver configured for obtaining a spectrum of the audio signal and for deriving a local maximum signal from the spectrum. The apparatus includes a determiner configured for determining a similarity between segments of the local maximum signal and includes a processor for providing an information indicating that the audio signal includes the predetermined characteristic dependent on an evaluation of the similarity.
Audio quality feedback during live transmission from a source
Method and system are provided for audio quality feedback during live transmission from a source that is received at multiple audience devices. The method carried out at a server includes: obtaining audio information of an audio signal as received by at least some of the audience devices in a transmission session; classifying one or more subsets of the audience devices by one or more common factors per subset; and analyzing the obtained audio information from the audience devices in conjunction with the classifications of the subsets of the audience devices to determine one or more common factors that affect received audio quality at an identified subset of the audience devices classified by the one or more common factors. The method provides feedback of the one or more common factors to at least one of the audience devices in the identified subset or to the source device, or to both.
Audio quality feedback during live transmission from a source
Method and system are provided for audio quality feedback during live transmission from a source that is received at multiple audience devices. The method carried out at a server includes: obtaining audio information of an audio signal as received by at least some of the audience devices in a transmission session; classifying one or more subsets of the audience devices by one or more common factors per subset; and analyzing the obtained audio information from the audience devices in conjunction with the classifications of the subsets of the audience devices to determine one or more common factors that affect received audio quality at an identified subset of the audience devices classified by the one or more common factors. The method provides feedback of the one or more common factors to at least one of the audience devices in the identified subset or to the source device, or to both.
Audio quality estimation apparatus, audio quality estimation method and program
A voice quality estimation apparatus includes: a packet sequence creation unit configured to create a first sequence by applying a first characteristic indicating that quality degradation caused by packet loss is perceived by a user all at once, to a sequence consisting of elements each indicating whether or not a packet of a voice call has been lost; a smoothing unit configured to create a second sequence from the first sequence; a degradation amount emphasis unit configured to create a third sequence from the second sequence; a packet loss tolerance characteristics reflection unit configured to create a fourth sequence from the third sequence; a degradation amount calculation unit configured to calculate a degradation amount from the fourth sequence; and a listening quality estimation unit configured to estimate voice quality that is to be experienced by the user, from the degradation amount.
Audio quality estimation apparatus, audio quality estimation method and program
A voice quality estimation apparatus includes: a packet sequence creation unit configured to create a first sequence by applying a first characteristic indicating that quality degradation caused by packet loss is perceived by a user all at once, to a sequence consisting of elements each indicating whether or not a packet of a voice call has been lost; a smoothing unit configured to create a second sequence from the first sequence; a degradation amount emphasis unit configured to create a third sequence from the second sequence; a packet loss tolerance characteristics reflection unit configured to create a fourth sequence from the third sequence; a degradation amount calculation unit configured to calculate a degradation amount from the fourth sequence; and a listening quality estimation unit configured to estimate voice quality that is to be experienced by the user, from the degradation amount.