Patent classifications
G10L17/00
Systems and methods for proactive listening bot-plus person advice chaining
A computing system facilities live and robo-advising sessions, and transitions therebetween based on detection of triggers. The computing system may extract voice inputs of a user from ambient sounds captured using a sound sensor. Based on an analysis of the voice inputs and detection of corresponding triggers, the computing system may initiate a robo-advising session, or a live communication session between a user computing device and an advisor computing device. Detection of another trigger during the robo-advising session or the live communication session may lead to a transition to a live communication session or a robo-advising session, respectively.
Systems and methods for proactive listening bot-plus person advice chaining
A computing system facilities live and robo-advising sessions, and transitions therebetween based on detection of triggers. The computing system may extract voice inputs of a user from ambient sounds captured using a sound sensor. Based on an analysis of the voice inputs and detection of corresponding triggers, the computing system may initiate a robo-advising session, or a live communication session between a user computing device and an advisor computing device. Detection of another trigger during the robo-advising session or the live communication session may lead to a transition to a live communication session or a robo-advising session, respectively.
AUTOMATIC INTERPRETATION SERVER AND METHOD BASED ON ZERO UI
Provided a method performed by an automatic interpretation server based on a zero user interface (UI), which communicates with a plurality of terminal devices having a microphone function, a speaker function, a communication function, and a wearable function. The method includes connecting terminal devices disposed within a designated automatic interpretation zone, receiving a voice signal of a first user from a first terminal device among the terminal devices within the automatic interpretation zone, matching a plurality of users placed within a speech-receivable distance of the first terminal device, and performing automatic interpretation on the voice signal and transmitting results of the automatic interpretation to a second terminal device of at least one second user corresponding to a result of the matching.
Automatic synthesis of translated speech using speaker-specific phonemes
An embodiment includes converting an original audio signal to an original text string, the original audio signal being from a recording of the original text string spoken by a specific person in a source language. The embodiment generates a translated text string by translating the original text string from the source language to a target language, including translation of a word from the source language to a target language. The embodiment assembles a standard phoneme sequence from a set of standard phonemes, where the standard phoneme sequence includes a standard pronunciation of the translated word. The embodiment also associates a custom phoneme with a standard phoneme of the standard phoneme sequence, where the custom phoneme includes the specific person's pronunciation of a sound in the translated word. The embodiment synthesizes the translated text string to a translated audio signal including the translated word pronounced using the custom phoneme.
Methods and apparatus to determine an audience composition based on voice recognition
Methods, apparatus, systems and articles of manufacture are disclosed. An example apparatus includes a controller to cause a people meter to emit a prompt for input of audience identification information at a first time and determine a first audience count based on the input, an audio detector to determine a second audience count based on signatures generated from audio data captured in the media environment, and a comparator to cause the people meter to not emit the prompt for at least a first time period after the first time when the first audience count is equal to the second audience count.
Systems and methods for performing commands in a vehicle using speech and image recognition
Systems and methods are disclosed herein for implementation of a vehicle command operation system that may use multi-modal technology to authenticate an occupant of the vehicle to authorize a command and receive natural language commands for vehicular operations. The system may utilize sensors to receive data indicative of a voice command from an occupant of the vehicle. The system may receive second sensor data to aid in the determination of the corresponding vehicular operation in response to the received command. The system may retrieve authentication data for the occupants of the vehicle. The system authenticates the occupant to authorize a vehicular operation command using a neural network based on at least one of the first sensor data, the second sensor data, and the authentication data. Responsive to the authentication, the system may authorize the operation to be performed in the vehicle based on the vehicular operation command.
Systems and methods for performing commands in a vehicle using speech and image recognition
Systems and methods are disclosed herein for implementation of a vehicle command operation system that may use multi-modal technology to authenticate an occupant of the vehicle to authorize a command and receive natural language commands for vehicular operations. The system may utilize sensors to receive data indicative of a voice command from an occupant of the vehicle. The system may receive second sensor data to aid in the determination of the corresponding vehicular operation in response to the received command. The system may retrieve authentication data for the occupants of the vehicle. The system authenticates the occupant to authorize a vehicular operation command using a neural network based on at least one of the first sensor data, the second sensor data, and the authentication data. Responsive to the authentication, the system may authorize the operation to be performed in the vehicle based on the vehicular operation command.
Utilizing sensor data for automated user identification
This disclosure describes techniques for identifying users that are enrolled for use of a user-recognition system. To be identified using the user-recognition system, a user may first enroll in the system by stating an utterance at a first device having a first microphone. In response, the first microphone may generate first audio data. Later, when the user would like to be identified by the system, the user may state the utterance again, although this time to a second device having a second microphone. This second microphone may accordingly generate second audio data. Because the acoustic response of the first microphone may differ from the acoustic response of the second microphone, however, this disclosure describes techniques to apply a relative transfer function to one or both of the first or second audio data prior to comparing these data so as to increase the recognition accuracy of the system.
Utilizing sensor data for automated user identification
This disclosure describes techniques for identifying users that are enrolled for use of a user-recognition system. To be identified using the user-recognition system, a user may first enroll in the system by stating an utterance at a first device having a first microphone. In response, the first microphone may generate first audio data. Later, when the user would like to be identified by the system, the user may state the utterance again, although this time to a second device having a second microphone. This second microphone may accordingly generate second audio data. Because the acoustic response of the first microphone may differ from the acoustic response of the second microphone, however, this disclosure describes techniques to apply a relative transfer function to one or both of the first or second audio data prior to comparing these data so as to increase the recognition accuracy of the system.
LOCATING INDIVIDUALS USING MICROPHONE ARRAYS AND VOICE PATTERN MATCHING
Examples disclosed herein provide the ability to identify the location of an individual within a room by using a combination of microphone arrays and voice pattern matching. In one example, a computing device may extract a voice detected by microphones of a microphone array located in a room, perform voice pattern matching to identify an individual associated with the extracted voice, and determine a location of the individual in the room based on an intensity of the voice detected individually by the microphones of the microphone array.