G10L17/22

Barrier-free intelligent voice system and control method thereof
11705126 · 2023-07-18 ·

A barrier-free intelligent voice system and a method for controlling thereof, wherein multiple words are recognized from a voice audio to create multiple independent semantic units. Meanwhile, the system can continuously determine whether they are one of multiple voice tags created by the user. Thereafter, a target object, a program command, and a remark corresponding to the voice tag can be determined based on the successfully compared voice tag combination. Accordingly, a corresponding program can be started or a remote device can be triggered to operate. The present disclosure can be regarded as an AI intelligent voice processing engine. By allowing users to define different types of voice tag combinations, it can eliminate the grammatical and semantic analysis of natural language processing, eliminate speech translation differences and errors between different languages, effectively reduce the amount of calculations, increase the processing speed of the system, minimize system judgment errors.

Barrier-free intelligent voice system and control method thereof
11705126 · 2023-07-18 ·

A barrier-free intelligent voice system and a method for controlling thereof, wherein multiple words are recognized from a voice audio to create multiple independent semantic units. Meanwhile, the system can continuously determine whether they are one of multiple voice tags created by the user. Thereafter, a target object, a program command, and a remark corresponding to the voice tag can be determined based on the successfully compared voice tag combination. Accordingly, a corresponding program can be started or a remote device can be triggered to operate. The present disclosure can be regarded as an AI intelligent voice processing engine. By allowing users to define different types of voice tag combinations, it can eliminate the grammatical and semantic analysis of natural language processing, eliminate speech translation differences and errors between different languages, effectively reduce the amount of calculations, increase the processing speed of the system, minimize system judgment errors.

Hotword detection on multiple devices
11557299 · 2023-01-17 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

Hotword detection on multiple devices
11557299 · 2023-01-17 · ·

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

User Identification with Voiceprints on Online Social Networks
20230222371 · 2023-07-13 ·

In one embodiment, a method includes, by one or more computing devices of an online social network, receiving, from a client system at a first location, an audio input from an unknown user, identifying a first user who is proximate to the first location, identifying the unknown user as a second user based on a comparison of the audio input to one or more voiceprints of one or more candidate users accessible by the client system, respectively, wherein each voiceprint comprises audio data for auditory identification of a unique user, and wherein each candidate user is a contact of the first user, and sending customized content to one or more of the first user or the second user, wherein the content is customized using interest information associated with the first or second user.

Hotword-based speaker recognition
11557301 · 2023-01-17 · ·

Systems, methods performed by data processing apparatus and computer storage media encoded with computer programs for receiving an utterance from a user in a multi-user environment, each user having an associated set of available resources, determining that the received utterance includes at least one predetermined word, comparing speaker identification features of the uttered predetermined word with speaker identification features of each of a plurality of previous utterances of the predetermined word, the plurality of previous predetermined word utterances corresponding to different known users in the multi-user environment, attempting to identify the user associated with the uttered predetermined word as matching one of the known users in the multi-user environment, and based on a result of the attempt to identify, selectively providing the user with access to one or more resources associated with a corresponding known user.

Processing Multimodal User Input for Assistant Systems
20230222605 · 2023-07-13 ·

In one embodiment, a method includes receiving at a head-mounted device a speech input from a user and a visual input captured by cameras of the head-mounted device, wherein the visual input comprises subjects and attributes associated with the subjects, and wherein the speech input comprises a co-reference to one or more of the subjects, resolving entities corresponding to the subjects associated with the co-reference based on the attributes and the co-reference, and presenting a communication content responsive to the speech input and the visual input at the head-mounted device, wherein the communication content comprises information associated with executing results of tasks corresponding to the resolved entities.

Transcription System with Contextual Automatic Speech Recognition
20230223030 · 2023-07-13 ·

An automated speech recognition (“ASR”) system with an audio processing engine and contextual transcription engine on a computing device is provided. The audio processing engine determines audio segmentation corresponding with multiple identified speakers of audio data. The contextual transcription engine generates a text file based on the audio data in a legally-formatted transcript using one or more AI/ML models. Embodiments of the ASR system provide provides results that will comply with most of the stenographic standards for legal transcription out of the box without further setup or tuning.

Transcription System with Contextual Automatic Speech Recognition
20230223030 · 2023-07-13 ·

An automated speech recognition (“ASR”) system with an audio processing engine and contextual transcription engine on a computing device is provided. The audio processing engine determines audio segmentation corresponding with multiple identified speakers of audio data. The contextual transcription engine generates a text file based on the audio data in a legally-formatted transcript using one or more AI/ML models. Embodiments of the ASR system provide provides results that will comply with most of the stenographic standards for legal transcription out of the box without further setup or tuning.

SYSTEM AND METHOD FOR AUGMENTED DATA CHANNEL PROCESSING USING ACOUSTIC DEVICES
20230223029 · 2023-07-13 · ·

Systems, methods, and computer program products are provided for augmented data channel processing using acoustic devices. The method includes receiving a request for a user data channel processing action associated with a user. The user data channel processing action is associated with a vendor. The method also includes causing an audible notification to an acoustic device associated with the user. The audible notification is a prompt to authorize the user data channel processing action to be executed. The method further includes receiving a voice command from the acoustic device associated with the user. The voice command is a confirmation of the user data channel processing action. The method still further includes verifying the voice command from the acoustic device associated with the user is from the user. The method also includes causing an execution of the user data channel processing action based upon the verification of the voice command.