G10L17/16

Methods and system for distributing information via multiple forms of delivery services

A content distribution facilitation system is described comprising configured servers and a network interface configured to interface with a plurality of terminals in a client server relationship and optionally with a cloud-based storage system. A request from a first source for content comprising content criteria is received, the content criteria comprising content subject matter. At least a portion of the content request content criteria is transmitted to a selected content contributor. If recorded content is received from the first content contributor, the first source is provided with access to the received recorded content. The recorded content may be transmitted via one or more networks to one or more destination devices. Optionally, a voice analysis and/or facial recognition engine are utilized to determine if the recorded content is from the first content contributor.

Methods and system for distributing information via multiple forms of delivery services

A content distribution facilitation system is described comprising configured servers and a network interface configured to interface with a plurality of terminals in a client server relationship and optionally with a cloud-based storage system. A request from a first source for content comprising content criteria is received, the content criteria comprising content subject matter. At least a portion of the content request content criteria is transmitted to a selected content contributor. If recorded content is received from the first content contributor, the first source is provided with access to the received recorded content. The recorded content may be transmitted via one or more networks to one or more destination devices. Optionally, a voice analysis and/or facial recognition engine are utilized to determine if the recorded content is from the first content contributor.

CALL RECORDING
20180324293 · 2018-11-08 ·

An enterprise voice system such as a contact centre is disclosed which provides a speech analytics capability. Whilst call recording is common in many contact centres, calls are normally recorded in single-channel audio files in order to save costs. Previous attempts to provide automatic diarization of those recorded calls have relied on training the system to recognise voiceprints of users of the system, and then comparing utterances within the recorded calls to those voiceprints in order to identify who was speaking at that time. In order to avoid the need to train the system to recognise voiceprints, an enterprise voice system is disclosed which inserts a digital watermark into the digitised audio signal from each user's microphone. By inserting the digital watermark with an energy, and, in some cases also with a spectrum, which matches the digitised audio signal, and taking advantage of typically only one user speaking at a time, a mark is left in the recorded call which a speech analytics system can use in order to identify who was speaking at different times in the conversation.

CALL RECORDING
20180324293 · 2018-11-08 ·

An enterprise voice system such as a contact centre is disclosed which provides a speech analytics capability. Whilst call recording is common in many contact centres, calls are normally recorded in single-channel audio files in order to save costs. Previous attempts to provide automatic diarization of those recorded calls have relied on training the system to recognise voiceprints of users of the system, and then comparing utterances within the recorded calls to those voiceprints in order to identify who was speaking at that time. In order to avoid the need to train the system to recognise voiceprints, an enterprise voice system is disclosed which inserts a digital watermark into the digitised audio signal from each user's microphone. By inserting the digital watermark with an energy, and, in some cases also with a spectrum, which matches the digitised audio signal, and taking advantage of typically only one user speaking at a time, a mark is left in the recorded call which a speech analytics system can use in order to identify who was speaking at different times in the conversation.

Blind diarization of recorded calls with arbitrary number of speakers
10109280 · 2018-10-23 · ·

In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.

Blind diarization of recorded calls with arbitrary number of speakers
10109280 · 2018-10-23 · ·

In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.

Speaker diarization with early-stop clustering

A method and apparatus for speaker diarization with early-stop clustering, segmenting an audio stream into at least one speech segment (710), the audio stream comprising speeches from at least one speaker; clustering the at least one speech segment into a plurality of clusters (720), the number of the plurality of clusters being greater than the number of the at least one speaker; selecting, from the plurality of clusters, at least one cluster of the highest similarity (730), the number of the selected at least one cluster being equal to the number of the at least one speaker; establishing a speaker classification model based on the selected at least one cluster (740); and aligning, through the speaker classification model, speech frames in the audio stream to the at least one speaker (750).

Speaker diarization with early-stop clustering

A method and apparatus for speaker diarization with early-stop clustering, segmenting an audio stream into at least one speech segment (710), the audio stream comprising speeches from at least one speaker; clustering the at least one speech segment into a plurality of clusters (720), the number of the plurality of clusters being greater than the number of the at least one speaker; selecting, from the plurality of clusters, at least one cluster of the highest similarity (730), the number of the selected at least one cluster being equal to the number of the at least one speaker; establishing a speaker classification model based on the selected at least one cluster (740); and aligning, through the speaker classification model, speech frames in the audio stream to the at least one speaker (750).

COMMUNICATION METHOD, AND ELECTRONIC DEVICE THEREFOR
20180268824 · 2018-09-20 ·

Provided is a method including: receiving an audio signal of a transmitter; detecting sensitive information in the audio signal based on content of the audio signal; encrypting the sensitive information by using characteristic information of a receiver; and transmitting the audio signal including the encrypted sensitive information.

METHODS AND SYSTEM FOR DISTRIBUTING INFORMATION VIA MULTIPLE FORMS OF DELIVERY SERVICES
20180241714 · 2018-08-23 ·

A content distribution facilitation system is described comprising configured servers and a network interface configured to interface with a plurality of terminals in a client server relationship and optionally with a cloud-based storage system. A request from a first source for content comprising content criteria is received, the content criteria comprising content subject matter. At least a portion of the content request content criteria is transmitted to a selected content contributor. If recorded content is received from the first content contributor, the first source is provided with access to the received recorded content. The recorded content may be transmitted via one or more networks to one or more destination devices. Optionally, a voice analysis and/or facial recognition engine are utilized to determine if the recorded content is from the first content contributor.