G10L19/173

Metadata transcoding

The present document relates to transcoding of metadata, and in particular to a method and system for transcoding metadata with reduced computational complexity. A transcoder configured to transcode an inbound bitstream comprising an inbound content frame and an associated inbound metadata frame into an outbound bitstream comprising an outbound content frame and an associated outbound metadata frame is described. The inbound content frame is indicative of a signal encoded according to a first codec system and the outbound content frame is indicative of the signal encoded according to a second codec system. The transcoder is configured to identify an inbound block of metadata from the inbound metadata frame, the inbound block of metadata associated with an inbound descriptor indicative of one or more properties of metadata comprised within the inbound block of metadata, and to generate the outbound metadata frame from the inbound metadata frame based on the inbound descriptor.

Apparatus and method for processing an encoded audio signal by a mapping drived by SBR from QMF onto MCLT

An apparatus for processing an encoded audio signal, which includes a sequence of access units, each access unit including a core signal with a first spectral width and parameters describing a spectrum above the first spectral width, has a demultiplexer generating, from an access unit of the encoded audio signal, the core signal and a set of the parameters, an upsampler upsampling the core signal of the access unit and outputting a first upsampled spectrum and a timely consecutive second upsampled spectrum, the first upsampled spectrum and the second upsampled spectrum, both, having a same content as the core signal and having a second spectral width being greater than the first spectral width of the core spectrum, a parameter converter converting parameters of the set of parameters of the access unit to obtain converted parameters, and a spectral gap filling processor processing the first upsampled spectrum and the second upsampled spectrum using the converted parameters.

TRANSFORM AMBISONIC COEFFICIENTS USING AN ADAPTIVE NETWORK

A device includes a memory configured to store untransformed ambisonic coefficients at different time segments. The device also includes one or more processors configured to obtain the untransformed ambisonic coefficients at the different time segments, where the untransformed ambisonic coefficients at the different time segments represent a soundfield at the different time segments. The one or more processors are also configured to apply one adaptive network, based on a constraint, to the untransformed ambisonic coefficients at the different time segments to generate transformed ambisonic coefficients at the different time segments, wherein the transformed ambisonic coefficients at the different time segments represent a modified soundfield at the different time segments, that was modified based on the constraint.

Optimized Audio Forwarding
20210297777 · 2021-09-23 · ·

Methods and systems for optimizing a routing of audio data to audio transmitting devices using a Bluetooth network are disclosed. One method includes receiving an encoded audio bitstream at a first speaker of the audio rendering system comprising a first and a second audio channels, separating a first set of spectral components of the first audio channel and a second set of spectral components of the second audio channel from the encoded audio bitstream, without decoding the audio bitstream, generating a first encoded bitstream from the first set of spectral components, and forwarding the first encoded bitstream to a second speaker of the audio rendering system over the wireless link.

LOW BITRATE AUDIO ENCODING/DECODING SCHEME HAVING CASCADED SWITCHES

An audio encoder has a first information sink oriented encoding branch such as a spectral domain encoding branch, a second information source or SNR oriented encoding branch such as an LPC-domain encoding branch, and a switch for switching between the first and second encoding branches, the second encoding branch having a converter into a specific domain different from the spectral domain such as an LPC analysis stage generating an excitation signal, and the second encoding branch having a specific domain coding branch such as LPC domain processing branch, and a specific spectral domain coding branch such as LPC spectral domain processing branch, and an additional switch for switching between the specific domain coding branch and the specific spectral domain coding branch. An audio decoder has a first domain decoder, a second domain decoder, and a third domain decoder as well as two cascaded switches for switching between the decoders.

APPARATUS AND METHOD FOR PROVIDING ENHANCED GUIDED DOWNMIX CAPABILITIES FOR 3D AUDIO

An apparatus for downmixing three or more audio input channels to obtain two or more audio output channels is provided. The apparatus includes a receiving interface for receiving the three or more audio input channels and for receiving side information. Moreover, the apparatus includes a downmixer for downmixing the three or more audio input channels depending on the side information to obtain the two or more audio output channels. The number of the audio output channels is smaller than the number of the audio input channels. The side information indicates a characteristic of at least one of the three or more audio input channels, or a characteristic of one or more sound waves recorded within the one or more audio input channels, or a characteristic of one or more sound sources which emitted one or more sound waves recorded within the one or more audio input channels.

AUDIO METADATA PROVIDING APPARATUS AND METHOD, AND MULTICHANNEL AUDIO DATA PLAYBACK APPARATUS AND METHOD TO SUPPORT DYNAMIC FORMAT CONVERSION

An audio metadata providing apparatus and method and a multichannel audio data playback apparatus and method to support a dynamic format conversion are provided. Dynamic format conversion information may include information about a plurality of format conversion schemes that are used to convert a first format set by a writer of multichannel audio data into a second format that is based on a playback environment of the multichannel audio data and that are set for each of playback periods of the multichannel audio data. The audio metadata providing apparatus may provide audio metadata including the dynamic format conversion information. The multichannel audio data playback apparatus may identify the dynamic format conversion information from the audio metadata, may convert the first format of the multichannel audio data into the second format based on the identified dynamic format conversion information, and may play back the multichannel audio data with the second format.

Method and System for Implementing Split and Parallelized Encoding or Transcoding of Audio and Video Content
20210168390 · 2021-06-03 ·

Novel tools and techniques are provided for implementing split and parallelized encoding or transcoding of audio and video. In various embodiments, a computing system might split an audio-video file that is received from a content source into a single video file and a single audio file. The computing system might encode or transcode the single audio file. Concurrently, the computing system might split the single video file into a plurality of video segments. A plurality of parallel video encoders/transcoders might concurrently encode or transcode the plurality of video segments, each video encoder/transcoder encoding or transcoding one video segment of the plurality of video segments. Subsequently, the computing system might assemble the plurality of encoded or transcoded video segments with the encoded or transcoded audio file to produce an encoded or transcoded audio-video file, which may be output to a display device(s), an audio playback device(s), or the like.

LOUDNESS CONTROL METHODS AND DEVICES

Audio data in a first format may be processed to produce audio data in a second format, which may be a reduced or simplified version of the first format. A loudness correction process may produce loudness-corrected audio data in the second format. A first power of the audio data in the second format and a second power of the loudness-corrected audio data in the second format may be determined. A second-format loudness correction factor for the audio data in the second format may be based, at least in part, on a power ratio between the first power and the second power. A first-format loudness correction factor for the audio data in the first format may be based, at least in part, on the power ratio and a power relationship between the audio data in the first format and the audio data in the second format.

METHOD AND SYSTEM FOR PARALLEL AUDIO TRANSCODING

A parallel audio transcoding method includes splitting audio into segments of a certain length; performing parallel transcoding by allocating the split segments to a plurality of encoders; and concatenating the segments encoded through the parallel transcoding and merging the same into a single encoded file. Performing parallel transcoding includes inserting additional regions, which overlap and neighbor each of the split segments, and sending the same to the encoders, and merging includes cutting out the additional regions from the encoded stream to create a stream corresponding to the split segments.