Patent classifications
G10L19/167
Inter-channel bandwidth extension spectral mapping and adjustment
A method includes generating a synthesized non-reference high-band channel based on a non-reference high-band excitation corresponding to a non-reference target channel. The method further includes estimating one or more spectral mapping parameters based on the synthesized non-reference high-band channel and a high-band portion of the non-reference target channel. The method also includes applying the one or more spectral mapping parameters to the synthesized non-reference high-band channel to generate a spectrally shaped synthesized non-reference high-band channel. The method further includes generating an encoded bitstream based on the one or more spectral mapping parameters and the spectrally shaped synthesized non-reference high-band channel.
Methods and systems for encoding frequency-domain data
An illustrative frequency-domain encoder system transforms time-domain data representative of a content instance into frequency-domain data representative of the content instance. The frequency-domain data includes a plurality of complex coefficients each representing different frequency components of a plurality of frequency components incorporated by the content instance. The frequency-domain encoder system generates a frequency-domain data container that includes the complex coefficients of the frequency-domain data and metadata descriptive of the frequency-domain data. Additionally, within the frequency-domain data container, the frequency-domain encoder system integrates the complex coefficients of the frequency-domain data with timing data representative of a time-dependent feature of the content instance. Corresponding systems and methods are also disclosed.
Methods and devices for encoding and/or decoding immersive audio signals
The present document describes a method (700) for encoding a multi-channel input signal (201). The method (700) comprises determining (701) a plurality of downmix channel signals (203) from the multi-channel input signal (201) and performing (702) energy compaction of the plurality of downmix channel signals (203) to provide a plurality of compacted channel signals (404). Furthermore, the method (700) comprises determining (703) joint coding metadata (205) based on the plurality of compacted channel signals (404) and based on the multi-channel input signal (201), wherein the joint coding metadata (205) is such that it allows upmixing of the plurality of compacted channel signals (404) to an approximation of the multi-channel input signal (201). In addition, the method (700) comprises encoding (704) the plurality of compacted channel signals (404) and the joint coding metadata (205).
Systems and methods of audio decoder determination and selection
Playback devices can support audio encoded using various encoding schemes. Playing back such content includes receiving, at a playback device, audio data from an audio source; and receiving an indication from the audio source that the audio data is encoded in the compressed audio format. The device determines, independently of receiving the indication from the audio source that the audio data is encoded in the compressed audio format, whether the audio data is encoded in a compressed audio format. If the audio data is determined to be encoded in the compressed audio format: the device selects a decoder from among a plurality of decoders; decodes the audio data using the selected decoder; and plays back the decoded audio data via the playback device. If the audio data is determined not to be encoded in the compressed audio format, the device inhibits playback of the audio data.
Concept for combined dynamic range compression and guided clipping prevention for audio devices
The invention provides a concept for combined dynamic range compression and guided clipping prevention for audio devices. An audio decoder for decoding an audio bitstream and a metadata bitstream related to the audio bitstream according to the concept includes an audio processing chain including a plurality of adjustment stages including a dynamic range control stage for adjusting a dynamic range of the audio output signal and a guided clipping prevention stage for preventing clipping of the audio output signal; and a metadata decoder configured to receive the metadata bitstream and to extract dynamic range control gain sequences and guided clipping prevention gain sequences from the metadata bitstream, at least a part of the dynamic range control gain sequences being supplied to the dynamic range control stage, and at least a part of the guided clipping prevention gain sequences being supplied to the guided clipping prevention stage.
MEDIA DISTRIBUTION DEVICE, MEDIA DISTRIBUTION METHOD, AND PROGRAM
The present disclosure relates to a media distribution device, a media distribution method, and a program enabling to generate a guide voice more appropriately. A media distribution device includes a guide voice generation unit that generates a guide voice describing a rendered image viewed from a viewpoint in a virtual space by using a scene description as information describing a scene in the virtual space and a user viewpoint information indicating a position and a direction of the viewpoint of a user; and an audio encoding unit that mixes the guide voice with original audio, and encodes the guide voice. The present technology can be applied to, for example, a media distribution system that distributes 6DoF media.
Optimized audio forwarding
Methods and systems for optimizing a routing of audio data to audio transmitting devices using a Bluetooth network are disclosed. One method includes receiving an encoded audio bitstream at a first speaker of the audio rendering system comprising a first and a second audio channels, separating a first set of spectral components of the first audio channel and a second set of spectral components of the second audio channel from the encoded audio bitstream, without decoding the audio bitstream, generating a first encoded bitstream from the first set of spectral components, and forwarding the first encoded bitstream to a second speaker of the audio rendering system over the wireless link.
ROBUST RETRANSMISSION TOPOLOGIES USING ERROR CORRECTION
Methods and systems for improving the robustness of wireless communications. The methods and systems provided transmit data packets over a first isochronous stream and transmit one or more supplemental data packets over the same time intervals. The one or more supplemental data packets are used to re-create and/or enhance at least a portion of one or more data packets of the plurality of data packets that have already been sent. Alternatively, the one or more supplemental data packets are used to create and/or enhance at least a portion of one or more data packets of the plurality of data packets that will be received during the next isochronous intervals. The methods and system described herein allow for increased robustness by allowing for better retransmission with correctly received packets and the methods set forth herein work with any Bluetooth broadcaster sink without modification.
LAYERED DESCRIPTION OF SPACE OF INTEREST
Aspects of the disclosure provide methods and apparatuses for audio processing. In some examples, an apparatus for media processing includes processing circuitry. The processing circuitry receive audio inputs associated with a layered description for a space of interest in an audio scene. The space of interest includes a plurality of subspaces. The layered description includes a first layer and a second layer. The first layer has a common node with a first value that is a common attribute value of two or more subspaces in the plurality of subspaces. The second layer has individual nodes respectively associated with each of the plurality of subspaces. The processing circuitry determines the plurality of subspaces of the space of interest based on the layered description, and renders an audio output based on the audio inputs in response to a location of a subject of the audio scene being in the space of interest.
AUTOCORRECTION OF PRONUNCIATIONS OF KEYWORDS IN AUDIO/VIDEOCONFERENCES
The present disclosure relates to automatically correcting mispronounced keywords during a conference session. More particularly, the present invention provides methods and systems for automatically correcting audio data generated from audio input having indications of mispronounced keywords during an audio/videoconferencing system. In some embodiments, the process of automatically correcting the audio data may require a re-encoding process of the audio data at the conference server. In alternative embodiments, the process may require updating the audio data at the receiver end of the conferencing system.