Patent classifications
G10L2015/088
CALL MANAGEMENT SYSTEM AND ITS SPEECH RECOGNITION CONTROL METHOD
A speech recognition server has a speech recognition engine, and a mode control table to hold a speech recognition mode for each call. The speech recognition engine has a mode management unit to designate a speech recognition mode for a decoder, and an output analysis unit to analyze recognition result data speech-to-text converted by speech recognition. The output analysis unit designates the speech recognition mode for the mode management unit in accordance with result of analysis of the recognition result data speech-to-text converted by the speech recognition. The mode management unit designates the speech recognition mode for the decoder in accordance with the designation with the output analysis unit. Upon speech recognition of call data, it is possible to suppress hardware resource consumption while improve users' satisfaction.
Privacy device for smart speakers
Systems, apparatuses, and methods are described for a privacy blocking device configured to prevent receipt, by a listening device, of video and/or audio data until a trigger occurs. A blocker may be configured to prevent receipt of video and/or audio data by one or more microphones and/or one or more cameras of a listening device. The blocker may use the one or more microphones, the one or more cameras, and/or one or more second microphones and/or one or more second cameras to monitor for a trigger. The blocker may process the data. Upon detecting the trigger, the blocker may transmit data to the listening device. For example, the blocker may transmit all or a part of a spoken phrase to the listening device.
Removal of identifying traits of a user in a virtual environment
A virtual environment platform may receive, from a user device, a request to access a virtual reality (VR) environment and may verify, based on the request, a user of the user device to allow the user device access to the VR environment. The virtual environment platform may receive, after verifying the user of the user device, user voice input and user handwritten input from the user device. The virtual environment platform may generate processed user speech by processing the user voice input, wherein a characteristic of the processed user speech and a corresponding characteristic of the user voice input are different and may generate formatted user text by processing the user handwritten input, wherein the formatted user text is machine-encoded text. The virtual environment platform may cause the processed user speech to be audibly presented and the formatted user text to be visually presented in the VR environment.
Locally distributed keyword detection
In one aspect, a playback device includes at least one microphone configured to detect a voice input and generate sound input data. The playback device detects a first command keyword in the detected sound and, in response, makes a first determination, via a first local natural language unit (NLU), whether the input sound data includes at least one keyword within a first predetermined library of keywords. The playback device receives an indication of a second determination made by a second NLU that the input sound data includes at least one keyword from a second predetermined library of keywords. The playback device compares the results of the first determination and the second determination and, based on the comparison, foregoes further processing of the input sound data.
Method and apparatus for speech analysis
Disclosed are method and apparatus for speech analysis. The speech analysis apparatus and a server are capable of communicating with each other in a 5G communication environment by executing mounted artificial intelligence (AI) algorithms and/or machine learning algorithms. The speech analysis method and apparatus may collect and analyze speech data to build a database of structured speech data.
SPEECH ENDPOINTING BASED ON WORD COMPARISONS
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
GENERATING IOT-BASED NOTIFICATION(S) AND PROVISIONING OF COMMAND(S) TO CAUSE AUTOMATIC RENDERING OF THE IOT-BASED NOTIFICATION(S) BY AUTOMATED ASSISTANT CLIENT(S) OF CLIENT DEVICE(S)
Remote automated assistant component(s) generate client device notification(s) based on a received IoT state change notification that indicates a change in at least one state associated with at least one IoT device. The generated client device notification(s) can each indicate the change in state associated with the at least one IoT device, and can optionally indicate the at least one IoT device. Further, the remote automated assistant component(s) can identify candidate assistant client devices that are associated with the at least one IoT device, and determine whether each of the one or more of the candidate assistant client device(s) should render a corresponding client device notification. The remote automated assistant component(s) can then transmit a corresponding command to each of the assistant client device(s) it determines should render a corresponding client device notification, where each transmitted command causes the corresponding assistant client device to render the corresponding client device notification.
NETWORKED DEVICES, SYSTEMS, & METHODS FOR INTELLIGENTLY DEACTIVATING WAKE-WORD ENGINES
In one aspect, a playback deice is configured to identify in an audio stream, via a second wake-word engine, a false wake word for a first wake-word engine that is configured to receive as input sound data based on sound detected by a microphone. The first and second wake-word engines are configured according to different sensitivity levels for false positives of a particular wake word. Based on identifying the false wake word, the playback device is configured to (i) deactivate the first wake-word engine and (ii) cause at least one network microphone device to deactivate a wake-word engine for a particular amount of time. While the first wake-word engine is deactivated, the playback device is configured to cause at least one speaker to output audio based on the audio stream. After a predetermined amount of time has elapsed, the playback device is configured to reactivate the first wake-word engine.
COMMUNICATION SYSTEM AND EVALUATION METHOD
A communication system is configured to broadcast utterance voice data received from one of mobile communication terminals to other mobile communication terminals, to control text delivery such that a result of utterance voice recognition from voice recognition processing on the received utterance voice data is displayed on the mobile communication terminals in synchronization, and to use the result of utterance voice recognition to perform communication evaluation. The communication evaluation includes a first evaluation including evaluating a dialogue between users based on a group dialogue index to produce group communication evaluation information, a second evaluation including evaluating utterances constituting the dialogue between the users based on a personal utterance index to produce personal utterance evaluation information, and a third evaluation including using the group communication evaluation information and the personal utterance evaluation information to produce entire communication group evaluation information.
INFORMATION OUTPUT APPARATUS, INFORMATION OUTPUT METHOD, AND NON-TRANSITORY RECORDING MEDIUM
An information output apparatus is realized that is capable of notifying information based on an input voice content. An information output apparatus according to the present embodiment includes an input unit to which data indicating a sound are input from an outside, a voice extraction unit that analyzes data which are input from the input unit and indicate a sound and that extracts data which indicate a voice emitted by a person, a vibration generation unit that generates vibration data, which are associated with data indicating a sound in advance set, based on a result of a comparison between the data which indicate the voice and are extracted by the voice extraction unit and the data indicating the sound in advance set, and a vibrator that vibrates based on the vibration data generated by the vibration generation unit.