G10L21/00

Voice command processing method and electronic device utilizing the same
09836276 · 2017-12-05 · ·

An voice command processing method provides a unified voice control interface to access and control Internet of things (IoT) devices and configure value of attributes of graphical user interface (GUI) elements, attributes of applications, and attributes of the IoT devices. As a voice command comprises an expression of a percentage or a fraction of a baseline value of an attribute, or an exact value of the attribute of an IoT device, the unified voice control interface sets the attribute of the IoT device in response to the percentage, the fraction, or the exact value in the voice command.

Method, apparatus and systems for audio decoding and encoding

An audio processing system (100) accepts an audio bitstream having one of a plurality of predefined audio frame rates. The system comprises a front-end component (110), which receives a variable number of quantized spectral components, corresponding to one audio frame in any of the predefined audio frame rates, and performs an inverse quantization according to predetermined, frequency-dependent quantization levels. The front-end component may be agnostic of the audio frame rate. The audio processing system further comprises a frequency-domain processing stage (120) and a sample rate converter (130), which provide a reconstructed audio signal sampled at a target sampling frequency independent of the audio frame rate. By its frame-rate adaptability, the system can be configured to operate frame-synchronously in parallel with a video processing system that accepts plural video frame rates.

Time domain level adjustment for audio signal decoding or encoding

An audio signal decoder for providing a decoded audio signal representation on the basis of an encoded audio signal representation has a decoder preprocessing stage for obtaining a plurality of frequency band signals from the encoded audio signal representation, a clipping estimator, a level shifter, a frequency-to-time-domain converter, and a level shift compensator. The clipping estimator analyzes the encoded audio signal representation and/or side information relative to a gain of the frequency band signals in order to determine a current level shift factor. The level shifter shifts levels of the frequency band signals according to the level shift factor. The frequency-to-time-domain converter converts the level shifted frequency band signals into a time-domain representation. The level shift compensator acts on the time-domain representation for at least partly compensating a corresponding level shift and for obtaining a substantially compensated time-domain representation.

Adaptive noise cancellation

Systems and methods for controlling adaptivity of noise cancellation are presented. One or more audio signals are received by one or more corresponding microphones. The one or more signals may be decomposed into frequency sub-bands. Noise cancellation consistent with identified adaptation constraints is performed on the one or more audio signals. The one or more audio signals may then be reconstructed from the frequency sub-bands and outputted via an output device.

Media content playing scheme
09830933 · 2017-11-28 · ·

A system may include a server configured to detect speech data from media content and to divide the detected speech data into one or more speech data segments in accordance with at least a respective speaker and a break in the detected speech data; and a media content playing device configured to receive the speech data segments from the server, to receive, from an input device, a control signal to play the media content, and to skip forward or rewind to play the media content starting at the identified starting point corresponding to a first one of the respective speech data segments.

Matching output volume to a command volume

A speech recognition system that automatically sets the volume of output audio based on a sound intensity of a command spoken by a user to adjust the output volume. The system can compensate for variation in the intensity of the captured speech command based on the distance between the speaker and the audio capture device, the pitch of the spoken command and the acoustic profile of the system, and the relative intensity of ambient noise.

Matching output volume to a command volume

A speech recognition system that automatically sets the volume of output audio based on a sound intensity of a command spoken by a user to adjust the output volume. The system can compensate for variation in the intensity of the captured speech command based on the distance between the speaker and the audio capture device, the pitch of the spoken command and the acoustic profile of the system, and the relative intensity of ambient noise.

Method and apparatus for speech recognition using device usage pattern of user

A method and apparatus for improving the performance of voice recognition in a mobile device are provided. The method of recognizing a voice includes: monitoring the usage pattern of a user of a device for inputting a voice; selecting predetermined words from among words stored in the device based on the result of monitoring, and storing the selected words; and recognizing a voice based on an acoustic model and predetermined words. In this way, a voice can be recognized by using prediction of whom the user mainly makes a call to. Also, by automatically modeling the device usage pattern of the user and applying the pattern to vocabulary for voice recognition based on probabilities, the performance of voice recognition, as actually felt by the user, can be enhanced.

Time domain spectral bandwidth replication

A wireless audio system for encoding and decoding an audio signal using spectral bandwidth replication is provided. Bandwidth extension is performed in the time-domain, enabling low-latency audio coding.

Arbitrating between multiple potentially-responsive electronic devices
11670293 · 2023-06-06 · ·

Techniques described herein are directed to arbitrating between multiple potentially-responsive, automated-assistant capable electronic devices to determine which should respond to the user's utterance, and/or which should defer to other electronic device(s). In various implementations, a spoken utterance of a user may be detected at a microphone of a first electronic device, a spoken utterance provided by a user. Sound(s) emitted by additional electronic device(s) may also be detected at the microphone. Each of the sound(s) may encode a timestamp corresponding to detection of the spoken utterance at a respective electronic device. Timestamp(s) may be extracted from the sound(s) and compared to a local timestamp corresponding to detection of the spoken utterance at the first electronic device. Based on the comparison, the first electronic device may either invoke an automated assistant locally or defer to one of the additional electronic devices.