Patent classifications
G10L19/167
AUDIO ENCODER AND DECODER WITH DYNAMIC RANGE COMPRESSION METADATA
An audio processing unit (APU) is disclosed. The APU includes a buffer memory configured to store at least one frame of an encoded audio bitstream, where the encoded audio bitstream includes audio data and a metadata container. The metadata container includes a header and one or more metadata payloads after the header. The one or more metadata payloads include dynamic range compression (DRC) metadata, and the DRC metadata is or includes profile metadata indicative of whether the DRC metadata includes dynamic range compression (DRC) control values for use in performing dynamic range compression in accordance with at least one compression profile on audio content indicated by at least one block of the audio data.
TRANSMISSION-AGNOSTIC PRESENTATION-BASED PROGRAM LOUDNESS
This disclosure falls into the field of audio coding, in particular it is related to the field of providing a framework for providing loudness consistency among differing audio output signals. In particular, the disclosure relates to methods, computer program products and apparatus for encoding and decoding of audio data bitstreams in order to attain a desired loudness level of an output audio signal.
System for maintaining reversible dynamic range control information associated with parametric audio coders
On the basis of a bitstream (P), an n-channel audio signal (X) is reconstructed by deriving an m-channel core signal (Y) and multichannel coding parameters (α) from the bitstream, where 1≤m<n. Also derived from the bitstream are pre-processing dynamic range control, DRC, parameters (DRC2) quantifying an encoder-side dynamic range limiting of the core signal. The n-channel audio signal is obtained by parametric synthesis in accordance with the multichannel coding parameters and while cancelling any encoder-side dynamic range limiting based on the pre-processing DRC parameters. In particular embodiments, the reconstruction further includes use of compensated post-processing DRC parameters quantifying a potential decoder-side dynamic range compression. Cancellation of an encoder-side range limitation and range compression are preferably performed by different decoder-side components. Cancellation and compression may be coordinated by a DRC pre-processor.
METHODS AND SYSTEMS FOR GENERATING AND RENDERING OBJECT BASED AUDIO WITH CONDITIONAL RENDERING METADATA
Methods and audio processing units for generating an object based audio program including conditional rendering metadata corresponding to at least one object channel of the program, where the conditional rendering metadata is indicative of at least one rendering constraint, based on playback speaker array configuration, which applies to each corresponding object channel, and methods for rendering audio content determined by such a program, including by rendering content of at least one audio channel of the program in a manner compliant with each applicable rendering constraint in response to at least some of the conditional rendering metadata. Rendering of a selected mix of content of the program may provide an immersive experience.
SIGNAL PROCESSING DEVICE, METHOD, AND PROGRAM
The present technology relates to a signal processing device, a method, and a program capable of improving transmission efficiency and efficiency in the data processing amount. A signal processing device includes: an acquisition unit that acquires polar coordinate position information indicating a position of a first object expressed by polar coordinates, audio data of the first object, absolute coordinate position information indicating a position of a second object expressed by absolute coordinates, and audio data of the second object; a coordinate conversion unit that converts the absolute coordinate position information into polar coordinate position information indicating a position of the second object; and a rendering processing unit that performs rendering processing on the basis of the polar coordinate position information and the audio data of the first object and the polar coordinate position information and the audio data of the second object. The present technology can be applied to a content reproduction system.
Multi-stride packet payload mapping for robust transmission of data
Systems and methods for packet payload mapping for robust transmission of data are described. For example, methods may include receiving, using a network interface, packets that each respectively include a primary frame and one or more preceding frames from the sequence of frames of data that are separated from the primary frame in the sequence of frames by a respective multiple of a stride parameter; storing the frames of the packets in a buffer with entries that each hold the primary frame and the one or more preceding frames of a packet; reading a first frame from the buffer as the primary frame from one of the entries; determining that a packet with a primary frame that is a next frame in the sequence has been lost; and, responsive to the determination, reading the next frame from the buffer as a preceding frame from one of the entries.
SYSTEMS, METHODS AND APPARATUS FOR CONVERSION FROM CHANNEL-BASED AUDIO TO OBJECT-BASED AUDIO
Embodiments are disclosed for channel-based audio (CBA) (e.g., 22.2-ch audio) to object-based audio (OBA) conversion. The conversion includes converting CBA metadata to object audio metadata (OAMD) and reordering the CBA channels based on channel shuffle information derived in accordance with channel ordering constraints of the OAMD. The OBA with reordered channels is rendered in a playback device using the OAMD or in a source device, such as a set-top box or audio/video recorder. In an embodiment, the CBA metadata includes signaling that indicates a specific OAMD representation to be used in the conversion of the metadata. In an embodiment, pre-computed OAMD is transmitted in a native audio bitstream (e.g., AAC) for transmission (e.g., over HDMI) or for rendering in a source device. In an embodiment, pre-computed OAMD is transmitted in a transport layer bitstream (e.g., ISO BMFF, MPEG4 audio bitstream) to a playback device or source device.
Speech encoding using a pre-encoded database
Methods, systems, and devices for encoding are described. A device, which may be otherwise known as user equipment (UE), may support standards-compatible audio encoding (e.g., speech encoding) using a pre-encoded database. The device may receive a digital representation of an audio signal and identify, based on receiving the digital representation of the audio signal, a database that is pre-encoded according to a coding standard and that includes a quantity of digital representations of other audio signals. The device may encode the digital representation of the audio signal using a machine learning scheme and information from the database pre-encoded according to the coding standard. The device may generate a bitstream of the digital representation that is compatible with the coding standard based on encoding the digital representation of the audio signal, and output a representation of the bitstream.
TRANSMISSION DEVICE, TRANSMISSION METHOD, RECEPTION DEVICE, AND RECEPTION METHOD
A processing load at a receiving side is reduced in a case where a plurality of classes of audio data is transmitted. A predetermined number of audio streams including coded data of a plurality of groups is generated and a container of a predetermined format having this predetermined number of audio streams is transmitted. Command information for creating a command specifying a group to be decoded from among the plurality of groups is inserted into the container and/or the audio stream. For example, a command insertion area for the receiving side to insert a command for specifying a group to be decoded is provided in at least one audio stream among the predetermined number of audio streams.
Integration of high frequency reconstruction techniques with reduced post-processing delay
A method for decoding an encoded audio bitstream is disclosed. The method includes receiving the encoded audio bitstream and decoding the audio data to generate a decoded lowband audio signal. The method further includes extracting high frequency reconstruction metadata and filtering the decoded lowband audio signal with an analysis filterbank to generate a filtered lowband audio signal. The method also includes extracting a flag indicating whether either spectral translation or harmonic transposition is to be performed on the audio data and regenerating a highband portion of the audio signal using the filtered lowband audio signal and the high frequency reconstruction metadata in accordance with the flag. The high frequency regeneration is performed as a post-processing operation with a delay of 3010 samples per audio channel.