Patent classifications
H04S7/302
Apparatus, Method, or Computer Program for Processing an Encoded Audio Scene using a Bandwidth Extension
Apparatus for processing an audio scene representing a sound field, the audio scene comprising information on a transport signal and a set of parameters. The apparatus comprising an output interface for generating a processed audio scene using the set of parameters and the information on the transport signal, wherein the output interface is configured to generate a raw representation of two or more channels using the set of parameters and the transport signal and a multichannel enhancer for generating an enhancement representation of the two or more channels using the transport signal, and a signal combiner for combining the raw representation of the two or more channels and the enhancement representation of the two or more channels to obtain the processed audio scene.
APPARATUS AND METHOD FOR ENCODING A PLURALITY OF AUDIO OBJECTS USING DIRECTION INFORMATION DURING A DOWNMIXING OR APPARATUS AND METHOD FOR DECODING USING AN OPTIMIZED COVARIANCE SYNTHESIS
An apparatus for encoding a plurality of audio objects and related metadata indicating direction information on the plurality of audio objects has: a downmixer for downmixing the plurality of audio objects to obtain one or more transport channels; a transport channel encoder for encoding one or more transport channels to obtain one or more encoded transport channels; and an output interface for outputting an encoded audio signal comprising the one or more encoded transport channels, wherein the downmixer is configured to downmix the plurality of audio objects in response to the direction information on the plurality of audio objects.
SIGNAL PROCESSING APPARATUS AND METHOD, AND PROGRAM TO REDUCE CALCULATION AMOUNT BASED ON MUTE INFORMATION
The present technology relates to a signal processing apparatus and method, and a program that make it possible to reduce an arithmetic operation amount.
The signal processing apparatus performs, on the basis of audio object mute information indicative of whether or not a signal of an audio object is a mute signal, at least either one of a decoding process or a rendering process of an object signal of the audio object. The present technology can be applied to a signal processing apparatus.
VIDEO PROCESSING DEVICE AND METHOD
A video processing apparatus includes a memory storing instructions, and at least one processor configured to execute the instructions to generate a plurality of feature information by analyzing a video signal comprising a plurality of images based on a first DNN, extract a first altitude component and a first planar component corresponding to a movement of an object in a video from the video signal based on a second DNN, extract a second planar component corresponding to a movement of a sound source in audio from a first audio signal based on a third DNN, generate a second altitude component based on the first altitude component, the first planar component, and the second planar component, output a second audio signal comprising the second altitude component based on the feature information, and synchronize the second audio signal with the video signal and output the synchronized second audio signal and video signal.
SIGNAL PROCESSING DEVICE, METHOD, AND PROGRAM
The present technology relates to a signal processing device, a method, and a program capable of improving transmission efficiency and efficiency in the data processing amount. A signal processing device includes: an acquisition unit that acquires polar coordinate position information indicating a position of a first object expressed by polar coordinates, audio data of the first object, absolute coordinate position information indicating a position of a second object expressed by absolute coordinates, and audio data of the second object; a coordinate conversion unit that converts the absolute coordinate position information into polar coordinate position information indicating a position of the second object; and a rendering processing unit that performs rendering processing on the basis of the polar coordinate position information and the audio data of the first object and the polar coordinate position information and the audio data of the second object. The present technology can be applied to a content reproduction system.
LOUDSPEAKER CONTROL
There is provided a computer-implemented method of generating audio signals for an array of loudspeakers, the method comprising: receiving a plurality of input audio signals, wherein a respective one of the plurality of input audio signals is to be reproduced, by the array, at each of a plurality of control points in an acoustic environment, and wherein each of the plurality of control points is associated with a respective one of a plurality of loudspeaker groups; receiving an estimate of a position of each of the plurality of control points; assigning, using the received estimate of the position of each of the plurality of control points, each of the loudspeakers in the array to at least one of the plurality of loudspeaker groups, wherein the assigning of a particular loudspeaker to a particular loudspeaker group is based on a relative position of the particular loudspeaker with respect to one or more of the at least one control points associated with the particular loudspeaker group; and generating a respective output audio signal for each of the loudspeakers in the array by applying a set of filters to the plurality of input audio signals, the output audio signal for a particular loudspeaker being generated according to the at least one loudspeaker group to which the particular loudspeaker is assigned.
Apparatus, method, computer program for enabling access to mediated reality content by a remote user
An apparatus comprising means for: simultaneously controlling content rendered by a hand portable device and content rendered by a spatial audio device; and providing for rendering to a user, in response to an action by the user, of a first part, not a second part, of a spatial audio content via the hand portable device not the spatial audio device.
COLORLESS GENERATION OF ELEVATION PERCEPTUAL CUES USING ALL-PASS FILTER NETWORKS
A system includes one or more computing devices that encode spatial perceptual cues into a monaural channel to generate a plurality of output channels. A computing device determines a target amplitude response for the mid and side channels of the plurality of output channels, defining a spatial perceptual associated with one or more frequency-dependent phase shifts. The computing device determines a transfer function of a single-input, multi-output allpass filter based on the target amplitude response and determines coefficients of the allpass filter based on the transfer function, and processes the monaural channel with the coefficients of the allpass filter to generate the plurality of channels having the encoded spatial perceptual cues. The allpass filter is configured to be colorless with respect to the individual output channels, allowing for the placement of spatial cues into the audio stream to be decoupled from the overall coloration of the audio.
ARRANGEMENT FOR DISTRIBUTING HEAD RELATED TRANSFER FUNCTION FILTERS
Arrangement for distributing head related transfer function filters. In the arrangement a user device sends a request for a head related transfer function filter to the service being used. The service verifies if the user of the device has a subscription for a head related transfer function filters in the service being used and retrieves a filter as a response to a positive verification result. The service may filter audio channels and transmit filtered audio further. In an alternative embodiment the service transmits the filter to the user device for filtering the audio.
CALL ENVIRONMENT GENERATION METHOD, CALL ENVIRONMENT GENERATION APPARATUS, AND PROGRAM
Provided is a technique to generate a call environment that prevents call contents from being heard by a person other than a person speaking on the phone in a case where call voice is output from a speaker. Speakers installed in an automobile are denoted by SP.sub.1, ..., SP.sub.N, a first filter coefficient used to generate an input signal for a speaker SP.sub.n is denoted by F.sub.n (ω), and a second filter coefficient that is different from the first filter coefficient and is used to generate an input signal for the speaker SP.sub.n is denoted by .sup.~F.sub.n (ω). A call environment generation method includes: an acoustic signal generation step of generating, when detecting a start signal of a call, a call-time acoustic signal that is obtained by adjusting volume of an acoustic signal to be reproduced during the call, by using a predetermined volume value; a first local signal generation step of generating a sound signal S.sub.n as an input signal for the speaker SP.sub.n from a voice signal of the call by using the first filter coefficient F.sub.n (ω); and a second local signal generation step of generating an acoustic signal A.sub.n as an input signal for the speaker SP.sub.n from the call-time acoustic signal by using the second filter coefficient .sup.~F.sub.n (ω).