H04R2227/009

Sound reproduction

A method, and system, of digital room correction for a device, such as a smart speaker, including a loudspeaker. The method comprises capturing audio from an environment local to the device, for example from one or more microphones of a smart speaker. The captured audio is then processed to recognize one or more categories of sound. A digital room correction procedure may then be controlled dependent upon recognition and/or analysis of at least one of the categories of sound.

ACTIVE NETWORK MANAGEMENT METHODS FOR ENSURING LIVE VOICE QUALITY
20210250236 · 2021-08-12 ·

Active management of a data network facilitates the high-quality live broadcast of captured digital audio from multiple client devices present in a venue. In various aspects, the system includes a network hub. The network hub transmits and receives data from a plurality of the client devices which may be personal mobile devices. A network manager component manages communications of the client devices with the network hub. The network manager uses measured dynamic network parameters and determines conditions for maintaining audio quality according to configurable rules. The dynamic network parameters may include device qualification(s), live measured audio quality of service metrics, and unwanted data packets. In various aspects, dynamic authentication and de-authentication of client devices as well as data transmission throttling of client devices are techniques utilized for active network management. The network manager maintains audio quality for the client device configured or selected to broadcast over the public address system.

MIXING DEVICE, MIXING METHOD, AND MIXING PROGRAM

A mixing technique is provided that can suppress degradation of non-priority sound and output more natural mixed sound, regardless of the size and quality of a playback device.

A mixing device of a first signal and a second signal on a time-frequency plane, includes a control signal generation unit configured to generate a control signal indicating whether to perform prioritized mixing that includes amplification of the first signal and attenuation of the second signal; and a gain derivation unit configured to derive a first gain for amplifying the first signal and a second gain for attenuating the second signal based on the control signal, wherein the control signal takes at least a first value and a second value different from the first value, wherein the first value is not continuous beyond a predetermined bandwidth on a frequency axis, and wherein the mixing device applies the prioritized mixing to the first signal and the second signal in response to the control signal indicating the first value, and applies simple addition to the first signal and the second signal in response to the control signal indicating the second value.

INFORMATION PROCESSING DEVICE, MIXING DEVICE USING THE SAME, AND LATENCY REDUCTION METHOD
20210152936 · 2021-05-20 ·

An information processing device includes a first time-frequency converter configured to perform a time-frequency conversion with respect to an input signal, using a window function having a first width, a second time-frequency converter configured to perform a time-frequency conversion with respect to the input signal, using a second window function having a second width smaller than the first width, and a modification processing unit configured to modify an output of the second time-frequency converter, using a frequency analysis result based on an output of the first time-frequency converter.

Voice detection optimization using sound metadata

Systems and methods for optimizing voice detection via a network microphone device are disclosed herein. In one example, individual microphones of a network microphone device detect sound. The sound data is captured in a first buffer and analyzed to detect a trigger event. Metadata associated with the sound data is captured in a second buffer and provided to at least one network device to determine at least one characteristic of the detected sound based on the metadata. The network device provides a response that includes an instruction, based on the determined characteristic, to modify at least one performance parameter of the NMD. The NMD then modifies the at least one performance parameter based on the instruction.

Metadata for ducking control

An audio encoding device and an audio decoding device are described herein. The audio encoding device may examine a set of audio channels/channel groups representing a piece of sound program content and produce a set of ducking values to associate with one of the channels/channel groups. During playback of the piece of sound program content, the ducking values may be applied to all other channels/channel groups. Application of these ducking values may cause (1) the reduction in dynamic range of ducked channels/channel groups and/or (2) movement of channels/channel groups in the sound field. This ducking may improve intelligibility of audio in the non-ducked channel/channel group. For instance, a narration channel/channel group may be more clearly heard by listeners through the use of selective ducking of other channels/channel groups during playback.

AUDIO OUTPUT CONTROL

Systems and methods for audio output control are disclosed. Audio may be output via a speaker of a communal device associated with a first portion of an environment. A user may provide a user utterance indicating an intent to add another device in a second portion of the environment to the audio-output session, and/or an intent to move the audio-output session from the first device to the second device, and/or an intent to remove a device from an audio-output session. Based on this determined intent, audio-session queues may be associated and dissociated from devices and device states may be altered to effectuate the intent of the user utterance.

Doppler microphone processing for conference calls
10978085 · 2021-04-13 · ·

Systems and methods are provided for conducting conference calls using doppler-based, i.e., reverberation-based techniques. The embodiments comprise a call device performing operations to join a call session hosted on a session server; receive sensor data comprising an audio signal from a first microphone and location information associated with the first microphone; determine a reverberation parameter associated with the location information; generate a first processed audio signal based on the audio signal and the reverberation parameter; and transmit the first processed audio signal to the session server. The session server may perform operations to receive a respective processed audio signal; determine a sound quality parameter of the respective processed audio signal; generate a balanced audio signal based on the sound quality parameter and the received processed audio signal; and transmit the balanced audio signal to a remote call device belonging to a second party.

Electronic device and operation method thereof

Provided are an electronic device and an operation method thereof. The operation method of an electronic device for processing an audio signal may include obtaining viewing environment information related to sound intelligibility, processing an input audio signal by separating the input audio signal into a first channel including a primary signal and a second channel including an ambient signal based on the viewing environment information, processing the input audio signal based on a frequency band and based on the viewing environment information, and generating an output signal based on processing the input audio signal.

Audio signal processing with acoustic echo cancellation

Multi-channel audio signal processing includes receiving a left stereo audio signal from a first channel and a right stereo audio signal from a second channel; up-mixing the left and the right stereo audio signals to generate an up-mixed audio signal for a third channel; de-correlating the up-mixed audio signal from the left and the right stereo audio signals to generate a de-correlated up-mixed audio signal; providing the left and right stereo signals and the de-correlated up-mixed audio signal to first, second, and third loudspeakers respectively to generate first, second, and third sound signals, respectively; picking up the first, second and third sound signals with a microphone to generate a microphone output signal; and adaptively filtering the microphone output signal with an acoustic echo canceller based on the left, the right, and the de-correlated up-mixed audio signal to generate an echo compensated microphone signal.