Patent classifications
G10L17/04
COMPUTER-IMPLEMENTED DETECTION OF ANOMALOUS TELEPHONE CALLS
Computer-implemented detection of anomalous telephone calls, for example detection of interconnect bypass fraud, is disclosed. A telephone call associated with user devices is analyzed remote from the user devices. A first set of multiple features, for example Mel Frequency Cepstral Coefficients, is derived from a call audio stream. The first set is converted to an embedding vector, for example via a model based on a Universal Background Model comprising a Gaussian Mixture Model, which model is preferably configured based on a training plurality of first sets of multiple features derived form a corresponding training plurality of audio streams. Occurrence, or probability of occurrence, of an anomalous telephone call is determined based on the embedding vector, for example via a back-end classifier, such as a Gaussian Backend Model, which classifier is preferably configured based on labels associated with the training plurality of audio streams.
COMPUTER-IMPLEMENTED DETECTION OF ANOMALOUS TELEPHONE CALLS
Computer-implemented detection of anomalous telephone calls, for example detection of interconnect bypass fraud, is disclosed. A telephone call associated with user devices is analyzed remote from the user devices. A first set of multiple features, for example Mel Frequency Cepstral Coefficients, is derived from a call audio stream. The first set is converted to an embedding vector, for example via a model based on a Universal Background Model comprising a Gaussian Mixture Model, which model is preferably configured based on a training plurality of first sets of multiple features derived form a corresponding training plurality of audio streams. Occurrence, or probability of occurrence, of an anomalous telephone call is determined based on the embedding vector, for example via a back-end classifier, such as a Gaussian Backend Model, which classifier is preferably configured based on labels associated with the training plurality of audio streams.
SMART SPEAKER, MULTI-VOICE ASSISTANT CONTROL METHOD, AND SMART HOME SYSTEM
The invention discloses a smart loudspeaker, wherein the smart loudspeaker includes a voice input module, a language recognition module and at least two voice assistants, and the language recognition module receives a voice information from the voice input module and determines the language category based on the voice information and activates the voice assistant corresponding to the language category.
SMART SPEAKER, MULTI-VOICE ASSISTANT CONTROL METHOD, AND SMART HOME SYSTEM
The invention discloses a smart loudspeaker, wherein the smart loudspeaker includes a voice input module, a language recognition module and at least two voice assistants, and the language recognition module receives a voice information from the voice input module and determines the language category based on the voice information and activates the voice assistant corresponding to the language category.
SPEECH ENHANCEMENT APPARATUS, LEARNING APPARATUS, METHOD AND PROGRAM THEREOF
A mask to enhance speech emitted from a speaker is estimated from an observation signal, the mask is applied to the observation signal, and thereby a post-mask speech signal is acquired. The mask is estimated from a feature obtained by combining a feature for speaker recognition extracted from the observation signal and a feature for generalized mask estimation extracted from the observation signal.
SPEECH ENHANCEMENT APPARATUS, LEARNING APPARATUS, METHOD AND PROGRAM THEREOF
A mask to enhance speech emitted from a speaker is estimated from an observation signal, the mask is applied to the observation signal, and thereby a post-mask speech signal is acquired. The mask is estimated from a feature obtained by combining a feature for speaker recognition extracted from the observation signal and a feature for generalized mask estimation extracted from the observation signal.
EMOTION TAG ASSIGNING SYSTEM, METHOD, AND PROGRAM
Provided are an emotion tag assigning system, method, and program for assigning, to a content, an emotion tag indicating an emotion of a user in execution of an event using the content.
An emotion tag assigning method includes a step of detecting, by a voice detector, voice data indicating a voice uttered by a person who participates in an event using a content during execution of the event; a step of recognizing, by an emotion recognizer, an emotion of the person based on the voice data; a step of acquiring, by a processor, emotion information indicating the recognized emotion of the person during the execution of the event using the content; and a step of assigning, by the emotion recognizer, an emotion rank calculated from the acquired emotion information to the content as an emotion tag.
EMOTION TAG ASSIGNING SYSTEM, METHOD, AND PROGRAM
Provided are an emotion tag assigning system, method, and program for assigning, to a content, an emotion tag indicating an emotion of a user in execution of an event using the content.
An emotion tag assigning method includes a step of detecting, by a voice detector, voice data indicating a voice uttered by a person who participates in an event using a content during execution of the event; a step of recognizing, by an emotion recognizer, an emotion of the person based on the voice data; a step of acquiring, by a processor, emotion information indicating the recognized emotion of the person during the execution of the event using the content; and a step of assigning, by the emotion recognizer, an emotion rank calculated from the acquired emotion information to the content as an emotion tag.
EXTRANEOUS VOICE REMOVAL FROM AUDIO IN A COMMUNICATION SESSION
The technology disclosed herein enables removal of extraneous voices from audio in a communication session. In a particular embodiment, a method includes receiving audio captured from an endpoint operated by a user on a communication session. The method further includes identifying an extraneous voice in the audio, wherein the voice is from a person other than the user, and removing the extraneous voice from the audio. After removing the extraneous voice, the method includes transmitting the audio to another endpoint on the communication session.
EXTRANEOUS VOICE REMOVAL FROM AUDIO IN A COMMUNICATION SESSION
The technology disclosed herein enables removal of extraneous voices from audio in a communication session. In a particular embodiment, a method includes receiving audio captured from an endpoint operated by a user on a communication session. The method further includes identifying an extraneous voice in the audio, wherein the voice is from a person other than the user, and removing the extraneous voice from the audio. After removing the extraneous voice, the method includes transmitting the audio to another endpoint on the communication session.