G10L2025/783

SYSTEM AND METHOD TO VISUALIZE CONNECTED LANGUAGE
20170309200 · 2017-10-26 ·

Systems and methods for visualizing connected speech are disclosed. The systems and methods include receiving reading content as vocalized speech; analyzing the vocalizations to determine the nature and duration of the vocalized breath strings of a text when read aloud; and generating highlighting for beginning and end points to be the visual content based on the nature and duration of the vocalized, breath strings.

SPEECH THERAPEUTIC DEVICES AND METHODS

Disclosed herein are speech therapeutic devices and methods. In one aspect, the speech therapeutic device includes audio input circuitry, signal processing circuitry, and stimulus circuitry. In certain embodiments, the audio input circuitry is configured to provide an input signal that is indicative of speech provided by a user and the signal processing circuitry is configured to utilize a reconfigurable rule that includes a condition, receive the input signal, process the input signal using the reconfigurable rule, and provide an alert signal responsive to attainment of the condition. The stimulus circuitry is configured to receive the alert signal and provide a stimulus to the user. The signal processing circuitry is additionally configured to (i) receive the reconfigurable rule from a communication network, and/or (ii) generate a record indicative of the alert signal, store the record in a memory, and send the record to a communication network.

REDUCING BANDWIDTH VIA VOICE DETECTION
20170301203 · 2017-10-19 ·

A method for an automation system is described. In one embodiment, the method includes monitoring for detection of sound via a microphone on a security camera. The security camera is configured to generate an audio stream and a video stream and to transmit the audio and video streams via a transmitter associated with the security camera. Upon detecting sound via the microphone, the method includes determining whether the sound includes a human voice and, upon determining the sound includes the human voice, modifying at least one aspect of the audio or video streams of the security camera.

Apparatuses and methods for multi-channel signal compression during desired voice activity detection
09792927 · 2017-10-17 · ·

Apparatuses and methods are described to identify desired audio. A first input of an apparatus is configured to receive a main signal. A second input of the apparatus is configured to receive a reference signal. A normalizer is configured to normalize a compressed main signal by a compressed reference signal to create a normalized main signal. A single channel normalized voice threshold comparator is configured to receive as an input the normalized main signal and to output a desired voice activity detection signal.

Privacy device for smart speakers
11503418 · 2022-11-15 ·

Systems, apparatuses, and methods are described for a privacy blocking device configured to prevent receipt, by a listening device, of video and/or audio data until a trigger occurs. A blocker may be configured to prevent receipt of video and/or audio data by one or more microphones and/or one or more cameras of a listening device. The blocker may use the one or more microphones, the one or more cameras, and/or one or more second microphones and/or one or more second cameras to monitor for a trigger. The blocker may process the data. Upon detecting the trigger, the blocker may transmit data to the listening device. For example, the blocker may transmit all or a part of a spoken phrase to the listening device.

Frame Loss Compensation Processing Method and Apparatus
20170287493 · 2017-10-05 ·

A frame loss compensation processing method and apparatus is presented, where the method includes, when a i.sup.th frame is a lost frame, estimating a spectrum frequency parameter, a pitch period, and a gain of the i.sup.th frame according to at least one of an inter-frame relationship between first N frames of the i.sup.th frame or an intra-frame relationship between first N frames of the i.sup.th frame. A parameter of the i.sup.th frame is determined using the signal correlation between the first N frames, the signal energy stability between the first N frames, intra-frame signal correlation of each frame, and intra-frame signal energy stability of each frame.

Artificial intelligence voice interaction method, computer program product, and near-end electronic device thereof

An artificial intelligence voice interaction method and a near-end electronic device thereof are disclosed. The method includes the following steps: receiving a voice input by a user; transmitting the voice to a remote artificial intelligence server; determining whether the voice has ended; when determining that the voice has ended and has not received a stop recording signal transmitted by the remote artificial intelligence server, it stops transmitting the voice to the remote artificial intelligence server; before determining that the voice has ended, and has received the stop recording signal from the remote artificial intelligence server, it stops transmitting the voice to the remote artificial intelligence server; and receiving a response signal send back from the remote artificial intelligence server.

Speech recognition wake-up of a handheld portable electronic device
09734830 · 2017-08-15 · ·

A system and method for parallel speech recognition processing of multiple audio signals produced by multiple microphones in a handheld portable electronic device. In one embodiment, a primary processor transitions to a power-saving mode while an auxiliary processor remains active. The auxiliary processor then monitors the speech of a user of the device to detect a wake-up command by speech recognition processing the audio signals in parallel. When the auxiliary processor detects the command it then signals the primary processor to transition to active mode. The auxiliary processor may also identify to the primary processor which microphone resulted in the command being recognized with the highest confidence. Other embodiments are also described.

METHOD AND APPARATUS FOR PROCESSING LIVE STREAM AUDIO, AND ELECTRONIC DEVICE AND STORAGE MEDIUM
20220270638 · 2022-08-25 ·

A method for processing live stream audio, and an electronic device and a storage medium are provided. The method is applied to a live streamer end, and includes: acquiring a first audio signal formed by mixing a guest audio signal with a background audio signal of the live streamer end; obtaining a second audio signal by performing echo cancellation on the guest audio signal in the first audio signal according to the guest audio signal; detecting a voice activity state of a guest end according to the guest audio signal, the first audio signal and the second audio signal; obtaining a third audio signal by performing echo cancellation on the first audio signal in a mixed audio signal according to the voice activity state and the first audio signal; synthesizing and pushing the second audio signal and the third audio signal to the guest end.

UTTERANCE SECTION DETECTION DEVICE, UTTERANCE SECTION DETECTION METHOD, AND PROGRAM

An utterance section detection device which is capable of detecting an utterance section with high accuracy on the basis of whether or not an end of a speech section is an end of utterance. The utterance section detection device includes a speech/non-speech determination unit configured to perform speech/non-speech determination which is determination as to whether a certain frame of an acoustic signal is speech or non-speech, an utterance end determination unit configured to perform utterance end determination which is determination as to whether or not an end of a speech section is an end of utterance for each speech section which is a section determined as speech as a result of the speech/non-speech determination, a non-speech section duration threshold determination unit configured to determine a threshold regarding a duration of a non-speech section on the basis of a result of the utterance end determination, and an utterance section detection unit configured to detect an utterance section by comparing a duration of a non-speech section following the speech section with the corresponding threshold.