Patent classifications
G10L17/16
METHODS AND SYSTEM FOR DISTRIBUTING INFORMATION VIA MULTIPLE FORMS OF DELIVERY SERVICES
A content distribution facilitation system is described comprising configured servers and a network interface configured to interface with a plurality of terminals in a client server relationship and optionally with a cloud-based storage system. A request from a first source for content comprising content criteria is received, the content criteria comprising content subject matter. At least a portion of the content request content criteria is transmitted to a selected content contributor. If recorded content is received from the first content contributor, the first source is provided with access to the received recorded content. The recorded content may be transmitted via one or more networks to one or more destination devices. Optionally, a voice analysis and/or facial recognition engine are utilized to determine if the recorded content is from the first content contributor.
Dynamic Face and Voice Signature Authentication for Enhanced Security
Techniques and apparatuses for a dynamic face and voice signature authentication for enhanced authentication techniques are described herein. In one or more implementations, an authentication system is configured to authenticate a user using a combination of voice and facial recognition techniques. The authentication system supports multiple phrases per user, such that the user can use different phrases to gain different types of access to a device or resources. Doing so provides enhanced customized access to the device or resources.
METHODS AND SYSTEM FOR DISTRIBUTING INFORMATION VIA MULTIPLE FORMS OF DELIVERY SERVICES
A content distribution facilitation system is described comprising configured servers and a network interface configured to interface with a plurality of terminals in a client server relationship and optionally with a cloud-based storage system. A request from a first source for content comprising content criteria is received, the content criteria comprising content subject matter. At least a portion of the content request content criteria is transmitted to a selected content contributor. If recorded content is received from the first content contributor, the first source is provided with access to the received recorded content. The recorded content may be transmitted via one or more networks to one or more destination devices. Optionally, a voice analysis and/or facial recognition engine are utilized to determine if the recorded content is from the first content contributor.
WORD-LEVEL BLIND DIARIZATION OF RECORDED CALLS WITH ARBITRARY NUMBER OF SPEAKERS
Disclosed herein are methods of diarizing audio data using first-pass blind diarization and second-pass blind diarization that generate speaker statistical models, wherein the first pass-blind diarization is on a per-frame basis and the second pass-blind diarization is on a per-word basis, and methods of creating acoustic signatures for a common speaker based only on the statistical models of the speakers in each audio session.
ACOUSTIC SIGNATURE BUILDING FOR A SPEAKER FROM MULTIPLE SESSIONS
Disclosed herein are methods of diarizing audio data using first-pass blind diarization and second-pass blind diarization that generate speaker statistical models, wherein the first pass-blind diarization is on a per-frame basis and the second pass-blind diarization is on a per-word basis, and methods of creating acoustic signatures for a common speaker based only on the statistical models of the speakers in each audio session.
ELECTRONIC DEVICE FOR RECOGNIZING SPEECH
An electronic device includes a microphone obtaining an audio signal, a memory in which a speaker model is stored, and at least one processor. The at least one processor is configured to obtain a voice signal from the audio signal, to compare the voice signal with the speaker model to verify a user, and, if a verification result indicates that the user corresponds to a pre-enrolled speaker, to perform an operation corresponding to the obtained voice signal.
ELECTRONIC DEVICE FOR RECOGNIZING SPEECH
An electronic device includes a microphone obtaining an audio signal, a memory in which a speaker model is stored, and at least one processor. The at least one processor is configured to obtain a voice signal from the audio signal, to compare the voice signal with the speaker model to verify a user, and, if a verification result indicates that the user corresponds to a pre-enrolled speaker, to perform an operation corresponding to the obtained voice signal.
Systems and methods for audio command recognition with speaker authentication
The present application discloses a method, an electronic system and a non-transitory computer readable storage medium for recognizing audio commands in an electronic device. The electronic device obtains audio data based on an audio signal provided by a user and extracts characteristic audio fingerprint features from the audio data. The electronic device further determines whether the corresponding audio signal is generated by an authorized user by comparing the characteristic audio fingerprint features with an audio fingerprint model for the authorized user and with a universal background model that represents user-independent audio fingerprint features, respectively. When the corresponding audio signal is generated by the authorized user of the electronic device, an audio command is extracted from the audio data, and an operation is performed according to the audio command.
Systems and methods for audio command recognition with speaker authentication
The present application discloses a method, an electronic system and a non-transitory computer readable storage medium for recognizing audio commands in an electronic device. The electronic device obtains audio data based on an audio signal provided by a user and extracts characteristic audio fingerprint features from the audio data. The electronic device further determines whether the corresponding audio signal is generated by an authorized user by comparing the characteristic audio fingerprint features with an audio fingerprint model for the authorized user and with a universal background model that represents user-independent audio fingerprint features, respectively. When the corresponding audio signal is generated by the authorized user of the electronic device, an audio command is extracted from the audio data, and an operation is performed according to the audio command.
SYSTEM AND METHOD FOR ASSESSING EXPRESSIVE LANGUAGE DEVELOPMENT OF A KEY CHILD
A method of assessing expressive language development of a key child. The method can include processing an audio recording taken in a language environment of the key child to identify segments of the audio recording that correspond to vocalizations of the key child. The method also can include applying an adult automatic speech recognition phone decoder to the segments of the audio recordings to identify each occurrence of a plurality of phone categories and to determine a duration for each of the plurality of phone categories. The method additionally can include determining a duration distribution for the plurality of phone categories based on the durations for the plurality of phone categories. The method further can include using the duration distribution for the plurality of phone categories in an age-based model to assess the expressive language development of the key child. The age-based model is selected based on a chronological age of the key child and the age-based model includes a plurality of different weights associated with the plurality of phone categories. Other embodiments are described.