G10L17/16

Automatic speaker identification using speech recognition features

Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (ASR) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (GMMs) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.

Communication method, and electronic device therefor
10650827 · 2020-05-12 · ·

Provided is a method including: receiving an audio signal of a transmitter; detecting sensitive information in the audio signal based on content of the audio signal; encrypting the sensitive information by using characteristic information of a receiver; and transmitting the audio signal including the encrypted sensitive information.

System and method for assessing expressive language development of a key child

A method of assessing expressive language development of a key child. The method can include processing an audio recording taken in a language environment of the key child to identify segments of the audio recording that correspond to vocalizations of the key child. The method also can include applying an adult automatic speech recognition phone decoder to the segments of the audio recordings to identify each occurrence of a plurality of phone categories and to determine a duration for each of the plurality of phone categories. The method additionally can include determining a duration distribution for the plurality of phone categories based on the durations for the plurality of phone categories. The method further can include using the duration distribution for the plurality of phone categories in an age-based model to assess the expressive language development of the key child. The age-based model is selected based on a chronological age of the key child and the age-based model includes a plurality of different weights associated with the plurality of phone categories. Other embodiments are described.

System and method for assessing expressive language development of a key child

A method of assessing expressive language development of a key child. The method can include processing an audio recording taken in a language environment of the key child to identify segments of the audio recording that correspond to vocalizations of the key child. The method also can include applying an adult automatic speech recognition phone decoder to the segments of the audio recordings to identify each occurrence of a plurality of phone categories and to determine a duration for each of the plurality of phone categories. The method additionally can include determining a duration distribution for the plurality of phone categories based on the durations for the plurality of phone categories. The method further can include using the duration distribution for the plurality of phone categories in an age-based model to assess the expressive language development of the key child. The age-based model is selected based on a chronological age of the key child and the age-based model includes a plurality of different weights associated with the plurality of phone categories. Other embodiments are described.

Methods and system for distributing information via multiple forms of delivery services

A content distribution facilitation system is described comprising configured servers and a network interface configured to interface with a plurality of terminals in a client server relationship and optionally with a cloud-based storage system. A request from a first source for content comprising content criteria is received, the content criteria comprising content subject matter. At least a portion of the content request content criteria is transmitted to a selected content contributor. If recorded content is received from the first content contributor, the first source is provided with access to the received recorded content. The recorded content may be transmitted via one or more networks to one or more destination devices. Optionally, a voice analysis and/or facial recognition engine are utilized to determine if the recorded content is from the first content contributor.

Methods and system for distributing information via multiple forms of delivery services

A content distribution facilitation system is described comprising configured servers and a network interface configured to interface with a plurality of terminals in a client server relationship and optionally with a cloud-based storage system. A request from a first source for content comprising content criteria is received, the content criteria comprising content subject matter. At least a portion of the content request content criteria is transmitted to a selected content contributor. If recorded content is received from the first content contributor, the first source is provided with access to the received recorded content. The recorded content may be transmitted via one or more networks to one or more destination devices. Optionally, a voice analysis and/or facial recognition engine are utilized to determine if the recorded content is from the first content contributor.

Automatic speaker identification using speech recognition features

Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (ASR) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (GMMs) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.

Automatic speaker identification using speech recognition features

Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (ASR) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (GMMs) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.

AUTOMATIC SPEAKER IDENTIFICATION USING SPEECH RECOGNITION FEATURES

Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (ASR) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (GMMs) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.

AUTOMATIC SPEAKER IDENTIFICATION USING SPEECH RECOGNITION FEATURES

Features are disclosed for automatically identifying a speaker. Artifacts of automatic speech recognition (ASR) and/or other automatically determined information may be processed against individual user profiles or models. Scores may be determined reflecting the likelihood that individual users made an utterance. The scores can be based on, e.g., individual components of Gaussian mixture models (GMMs) that score best for frames of audio data of an utterance. A user associated with the highest likelihood score for a particular utterance can be identified as the speaker of the utterance. Information regarding the identified user can be provided to components of a spoken language processing system, separate applications, etc.