Patent classifications
G10L17/14
REDUCING BANDWIDTH REQUIREMENTS OF VIRTUAL COLLABORATION SESSIONS
A computer-implemented method, a computer system and a computer program product reduce bandwidth requirements of a virtual collaboration session. The method includes capturing session data from a virtual collaboration session. The session data is selected from a group consisting of video data, audio data, an image of a screen of a connected device and text data. The method also includes connecting to a live blog platform. The method further includes transmitting a text transcription of the virtual collaboration session to the live blog platform. The text transcription is generated by scanning the audio data using a speech-to-text algorithm. In addition, the method includes classifying a topic in the virtual collaboration session based on importance. Lastly, the method includes transmitting a multimedia file related to the topic to the live blog platform in response to the topic being classified as important. The multimedia file is extracted from the session data.
VOICE WAKEUP METHOD AND VOICE WAKEUP DEVICE
A voice wakeup method is applied to wake up an electronic apparatus. The voice wakeup method includes executing a speaker identification function to analyze user voice and acquire a predefined identification of the user voice, executing a voiceprint extraction function to acquire a voiceprint segment of the user voice, executing an on-device training function via the voiceprint segment to generate an updated parameter, and utilizing the updated parameter to calibrate a speaker verification model, so that the speaker verification model is used to analyze a wakeup sentence and decide whether to wake up the electronic apparatus.
AUTHENTICATION VIA A DYNAMIC PASSPHRASE
A computerize method for voice authentication of a customer in a self-service system is provided. A request for authentication of the customer is received and the customer is enrolled in the self-service system with a text-independent voice print. A passphrase from a plurality of passphrases to transmit to the customer is determined based on comparing each of the plurality of passphrases to a text-dependent or text-independent voice biometric model. The passphrase is transmitted to the customer, and when the customer responds, an audio stream of the passphrase is received. The customer is authenticated by comparing the audio stream of the passphrase against the text-independent voice print. If the customer is authenticated, then storing the audio stream of the passphrase and the topic of the passphrase
TECHNOLOGIES FOR AUTHENTICATING A SPEAKER USING VOICE BIOMETRICS
Technologies for authenticating a speaker in a voice authentication system using voice biometrics include a speech collection computing device and a speech authentication computing device. The speech collection computing device is configured to collect a speech signal from a speaker and transmit the speech signal to the speech authentication computing device. The speech authentication computing device is configured to compute a speech signal feature vector for the received speech signal, retrieve a speech signal classifier associated with the speaker, and feed the speech signal feature vector to the retrieved speech signal classifier. Additionally, the speech authentication computing device is configured to determine whether the speaker is an authorized speaker based on an output of the retrieved speech signal classifier. Additional embodiments are described herein.
Speaker verification using co-location information
A method includes generating an audio signal encoding an utterance captured by a microphone of a user device and transmitting the audio signal encoding the utterance to a server. The server is configured to determine a speaker of the utterance from one of a plurality of different users of the user device based on a comparison between the audio signal encoding the utterance and corresponding speaker verification data, and process the audio signal encoding the utterance using a speech recognition module to identify a particular action. The method also includes executing the particular action identified by the server to cause a particular application to launch on the user device based on user permissions associated with the speaker determined by the server to access the particular data.
Speaker verification using co-location information
A method includes generating an audio signal encoding an utterance captured by a microphone of a user device and transmitting the audio signal encoding the utterance to a server. The server is configured to determine a speaker of the utterance from one of a plurality of different users of the user device based on a comparison between the audio signal encoding the utterance and corresponding speaker verification data, and process the audio signal encoding the utterance using a speech recognition module to identify a particular action. The method also includes executing the particular action identified by the server to cause a particular application to launch on the user device based on user permissions associated with the speaker determined by the server to access the particular data.
Systems and methods for automatic joining as a virtual meeting participant for transcription
Method, system, device, and non-transitory computer-readable medium for joining a virtual participant in a conversation. In some examples, a computer-implemented method includes: identifying a first conversation scheduled to be participated by a first group of actual participants; joining a first virtual participant into the first conversation; obtaining, via the first virtual participant, a first set of audio data associated with the first conversation while the first conversation occurs; transcribing, via the first virtual participant, the first set of audio data into a first set of text data while the first conversation occurs; and presenting the first set of text data to the first group of actual participants while the first conversation occurs.
Systems and methods for automatic joining as a virtual meeting participant for transcription
Method, system, device, and non-transitory computer-readable medium for joining a virtual participant in a conversation. In some examples, a computer-implemented method includes: identifying a first conversation scheduled to be participated by a first group of actual participants; joining a first virtual participant into the first conversation; obtaining, via the first virtual participant, a first set of audio data associated with the first conversation while the first conversation occurs; transcribing, via the first virtual participant, the first set of audio data into a first set of text data while the first conversation occurs; and presenting the first set of text data to the first group of actual participants while the first conversation occurs.
AUTOMATICALLY ADAPTING AUDIO DATA BASED ASSISTANT PROCESSING
Implementations relate to at least intermittently processing dynamic contextual parameters and dynamically automatically adapting, in dependence on the processing of the dynamic contextual parameters, audio data processing that is performed at an assistant device. The dynamic and automatic adapting of the audio data processing mitigates occurrences of false positives and/or false negatives in hot word processing, invocation-free speech recognition, and/or other automated assistant audio data based processing techniques. Implementations dynamically automatically adapt the audio data processing between two or more states and the automatic adaptation of the audio data processing from a current state to an alternate state is in response to the processing, of current values for the dynamic contextual parameters, satisfying one or more conditions.
AUTOMATICALLY ADAPTING AUDIO DATA BASED ASSISTANT PROCESSING
Implementations relate to at least intermittently processing dynamic contextual parameters and dynamically automatically adapting, in dependence on the processing of the dynamic contextual parameters, audio data processing that is performed at an assistant device. The dynamic and automatic adapting of the audio data processing mitigates occurrences of false positives and/or false negatives in hot word processing, invocation-free speech recognition, and/or other automated assistant audio data based processing techniques. Implementations dynamically automatically adapt the audio data processing between two or more states and the automatic adaptation of the audio data processing from a current state to an alternate state is in response to the processing, of current values for the dynamic contextual parameters, satisfying one or more conditions.