G10L2025/783

Smart device input method based on facial vibration
11662610 · 2023-05-30 · ·

A smart device input method based on facial vibration includes: collecting a facial vibration signal generated when a user performs voice input; extracting a Mel-frequency cepstral coefficient from the facial vibration signal; and taking the Mel-frequency cepstral coefficient as an observation sequence to obtain text input corresponding to the facial vibration signal by using a trained hidden Markov model. The facial vibration signal is collected by a vibration sensor arranged on glasses. The vibration signal is processed by: amplifying the collected facial vibration signal; transmitting the amplified facial vibration signal to the smart device via a wireless module; and intercepting a section from the received facial vibration signal as an effective portion and extracting the Mel-frequency cepstral coefficient from the effective portion by the smart device.

SYSTEMS AND METHODS FOR IMPROVED ACCURACY OF BULLYING OR ALTERCATION DETECTION OR IDENTIFICATION OF EXCESSIVE MACHINE NOISE
20230162756 · 2023-05-25 ·

Systems and methods for identifying potential bullying are disclosed. In various aspects, a system for identifying potential bullying includes a sound detector configured to provide samples of sounds over time, a processor, and a memory storing instructions. The instructions, when executed by the processor, cause the system to determine that a noise event has occurred by processing the samples to determine that the sounds exceed a sound level threshold over a time period that exceeds a time period threshold, process the samples to provide frequency spectrum information of the noise event, determine whether the noise event is a potential bullying occurrence based on comparing the frequency spectrum information of the noise event and at least one frequency spectrum profile, and initiate a bullying notification in a case of determining that the noise event is a potential bullying occurrence.

VOICE TRANSMISSION COMPENSATION APPARATUS, VOICE TRANSMISSION COMPENSATION METHOD AND PROGRAM

A speech transmission compensation apparatus that assists discrimination of speech heard by a user, includes: one or more computers each including a memory and a processor configured to: accept input of a speech signal, detect a specific type of sound in the speech signal, analyze an acoustic characteristic of the specific type of sound in the speech signal and output the acoustic characteristic; accept input of the acoustic characteristic being output by the memory and the processor, generate a vibration signal of a duration corresponding to the acoustic characteristic and output the vibration signal; and accept input of the vibration signal being output by the memory and the processor and provide the user with vibration for the duration on the basis of the vibration signal.

Acoustic voice activity detection (AVAD) for electronic systems

Acoustic Voice Activity Detection (AVAD) methods and systems are described. The AVAD methods and systems, including corresponding algorithms or programs, use microphones to generate virtual directional microphones which have very similar noise responses and very dissimilar speech responses. The ratio of the energies of the virtual microphones is then calculated over a given window size and the ratio can then be used with a variety of methods to generate a VAD signal. The virtual microphones can be constructed using either an adaptive or a fixed filter.

VOICE INTERACTIVE WAKEUP ELECTRONIC DEVICE AND METHOD BASED ON MICROPHONE SIGNAL, AND MEDIUM
20220319538 · 2022-10-06 · ·

An electronic device configured with a microphone, a voice interaction wake-up method executed by an electronic device equipped with a microphone, and a computer-readable medium, the electronic device comprising a memory and a central processing unit, wherein the memory stores computer-executable instructions, and when executed by the central processing unit, the computer-executable instructions perform the following operations: analyzing a sound signal collected by a microphone, identifying whether the sound signal contains speech spoken by a person and whether it contains wind noise sounds generated by airflows hitting the microphone as a result of the speech spoken by the person, and in response to determining that the sound signal contains sound spoken by the person and contains wind noise sounds generated by airflows hitting the microphone as a result of the speech spoken by the user, processing the sound signal as speech input by the user. The solution disclosed in the present application is applicable to performing voice input when a user carries an intelligent electronic device, and the operation is natural and simple, simplifying the steps of voice input, reducing the burden and difficulty of interaction, and making the interaction more natural.

Multithreaded Speech Data Preprocessing

An apparatus includes a processor to: receive, from a requesting device, a request to perform speech-to-text conversion of a speech data set; within a first thread of a thread pool, perform a first pause detection technique to identify a first set of likely sentence pauses; within a second thread of the thread pool, perform a second pause detection technique to identify a second set of likely sentence pauses; perform a speaker diarization technique to identify a set of likely speaker changes; divide the speech data set into data segments representing speech segments based on a combination of at least the first set of likely sentence pauses, the second set of likely sentence pauses, and the set of likely speaker changes; use at least an acoustic model with each data segment to identify likely speech sounds; and generate a transcript based, at least in part, on the identified likely speech sounds.

Event detection for playback management in an audio device
11621017 · 2023-04-04 · ·

In accordance with embodiments of the present disclosure, a method for processing audio information in an audio device may include reproducing audio information by generating an audio output signal for communication to at least one transducer of the audio device, receiving at least one input signal indicative of ambient sound external to the audio device, detecting from the at least one input signal a near-field sound in the ambient sound, and modifying a characteristic of the audio information reproduced to the at least one transducer in response to detection of the near-field sound.

User presence detection

A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.

IDENTIFYING METHOD OF SOUND WATERMARK AND SOUND WATERMARK IDENTIFYING APPARATUS
20230142323 · 2023-05-11 · ·

An identifying method of a sound watermark and a sound watermark identifying apparatus are provided. The method includes the following. A synthesized sound signal is received through a network. Noise interference transferred through the network in the synthesized sound signal is determined according to a reflection-cancelling sound signal. A coding threshold is determined according to the noise interference. A sound watermark signal in the synthesized sound signal is identified according to the coding threshold.

Adapting automated speech recognition parameters based on hotword properties
11620990 · 2023-04-04 · ·

A method for optimizing speech recognition includes receiving a first acoustic segment characterizing a hotword detected by a hotword detector in streaming audio captured by a user device, extracting one or more hotword attributes from the first acoustic segment, and adjusting, based on the one or more hotword attributes extracted from the first acoustic segment, one or more speech recognition parameters of an automated speech recognition (ASR) model. After adjusting the speech recognition parameters of the ASR model, the method also includes processing, using the ASR model, a second acoustic segment to generate a speech recognition result. The second acoustic segment characterizes a spoken query/command that follows the first acoustic segment in the streaming audio captured by the user device.