Patent classifications
H04S7/30
DATA PROCESSING METHOD AND APPARATUS, DEVICE, AND READABLE STORAGE MEDIUM
A data processing method includes acquiring video frame data including one or more video frames and audio data of a video, and determining position attribute information of a target object in the acquired one or more video frames, the target object being associated with the audio data. The method also includes acquiring a channel encoding parameter associated with the position attribute information, and performing azimuth enhancement processing on the audio data according to the channel encoding parameter to obtain enhanced audio data. Apparatus and non-transitory computer-readable storage medium counterpart embodiments are also contemplated.
VOICE OUTPUT CONTROL DEVICE, CONFERENCE SYSTEM DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM
A voice output control device includes a base control unit configured to set, based on information on relative positions between an own base of the base control unit and other bases, a direction of a position where voice to be output to each of the other bases is localized, and set, based on information on relative distances between the own base and the other bases, a height of the position where the voice is localized; and a sound source processor configured to localize voice from the other base to generate a voice signal to be output, based on the position set by the base control unit.
Dynamic audio upmixer parameters for simulating natural spatial variations
A system and method for creating natural spatial variations in an audio output. At least one parameter in a set of mixer tuning parameters is dynamically modified over time and within a predetermined range that is defined by a set of modification control parameters. The set of mixer tuning parameters that includes the at least one dynamically modified parameter is applied to a mixer allowing the mixer to create natural spatial variations in the audio output to be played at one or more loudspeakers.
Audio system for dynamic determination of personalized acoustic transfer functions
An eyewear device includes an audio system. In one embodiment, the audio system includes a microphone array that includes a plurality of acoustic sensors. Each acoustic sensor is configured to detect sounds within a local area surrounding the microphone array. For a plurality of the detected sounds, the audio system performs a direction of arrival (DoA) estimation. Based on parameters of the detected sound and/or the DoA estimation, the audio system may then generate or update one or more acoustic transfer functions unique to a user. The audio system may use the one or more acoustic transfer functions to generate audio content for the user.
Display apparatus and method for processing audio
A display apparatus and a method for processing audio are provided, the display apparatus includes a circuit board provided with a hybrid circuit, a filter circuit and a speaker; the hybrid circuit is configured to receive an original audio signal and superpose a first sub-signal of the original audio signal on a second sub-signal of the original audio signal to obtain a hybrid audio signal; the first sub-signal includes at least one channel of audio signal, the second sub-signal includes at least two channels of audio signal; the filter circuit is configured to filter the hybrid audio signal according to a frequency characteristic of the first sub-signal and the second sub-signal to obtain a restored original audio signal; and the speaker, connected with the filter circuit, is configured to output the restored original audio signal.
QUANTIZATION OF SPATIAL AUDIO DIRECTION PARAMETERS
A method for spatial audio signal encoding comprising: obtaining a plurality of audio direction parameters, wherein each parameter comprises an elevation value and an azimuth value and wherein each parameter has an ordered position; deriving for each of the plurality of audio direction parameters a corresponding derived audio direction parameter (SP) comprising an elevation and an azimuth value, corresponding derived audio direction parameters (SP) being arranged in a manner determined by a spatial utilization defined by the elevation values and the azimuth values of the plurality of audio direction parameters; rotating each derived audio direction parameter (SP) by the azimuth value (φ.sub.0) of an audio direction parameter in the first position of the plurality of audio direction parameters and quantizing the rotation to determine for each a corresponding quantized rotated derived audio direction parameter; changing the ordered position of an audio direction parameter to a further position coinciding with a position of a rotated derived audio direction parameter when the azimuth value of the audio direction parameter is closest to the azimuth value of the further rotated derived audio direction parameter compared to the azimuth values of other rotated derived audio direction parameters, followed by determining for each of the plurality audio direction parameters a difference between each audio direction parameter and their corresponding quantized rotated derived audio direction parameter; and quantizing a difference for each of the plurality of audio direction parameters, wherein a difference quantization resolution for each of the plurality of audio direction parameters is defined based on a spatial extent of the audio direction parameters.
SIGNAL PROCESSING APPARATUS, SIGNAL PROCESSING METHOD, AND SIGNAL PROCESSING SYSTEM
A signal processing apparatus includes: an audio signal processing unit configured to perform wavefront synthesis processing for at least part of a plurality of sound source data; a first output unit configured to output N-channel audio signals output from the audio signal processing unit to a first speaker device; a mix processing unit configured to mix the N-channel audio signals output from the audio signal processing unit; and a second output unit configured to output an audio signal output from the mix processing unit to a second speaker device, in which a setting regarding an output of the second speaker device is possible.
TRAINING DATA EXTENSION APPARATUS, TRAINING DATA EXTENSION METHOD, AND PROGRAM
An input of a first observation signal corresponding to an incoming signal from a first direction is received, an angular rotation operation of the first observation signal is performed to obtain a second observation signal corresponding to an incoming signal from a second direction that is different from the first direction and the second observation signal is added to a set of training data.
EXTRACTION OF AN AUDIO OBJECT
A method for extracting at least one audio object from at least two audio input signals, each of which contains the audio object. The second audio input signal is syncronized with the first audio input signal while obtaining a synchronized second audio input signal. The audio object is extracted by applying at least one trained model to the first audio signal and to the synchronized second audio input signal. The audio object is outputted. Further, the step of synchronizing the second audio input signal with the first audio input signal includes the steps of: generating audio signals; analytically calculating a correlation between the audio signals; optimizing the correlation vector; and determining the synchronized second audio input signal using the optimized correlation vector.
SYSTEM FOR INTELLIGENT AUDIO RENDERING USING HETEROGENEOUS SPEAKER NODES AND METHOD THEREOF
A system for intelligent audio rendering using speaker nodes is provided. A source device determines a spatial location and speaker capability of one or more speaker nodes based on information embedded in a corresponding node of each of the one or more media devices, selects a first speaker most suitable for each audio channel based on the speaker capability and the spatial location of each of the one or more speakers, generates speaker profiles for the one or more speakers, maps an audio channel to each of the one or more speakers based on a speaker profile corresponding to each of the one or more speakers, estimates a media path between the source device and each of the one or more speakers, detects a change in the estimated media path, renders an audio on the one or more speakers based on the speaker profiles and the changes in the media paths corresponding to each of the one or more speakers in real-time.