Patent classifications
G10L19/167
INFORMATION EXCHANGE ON MOBILE DEVICES USING AUDIO
In some implementations, a user device may receive input that triggers transmission of information via sound. The user device may select an audio clip based on a setting associated with the device, and may modify a digital representation of the selected audio clip using an encoding algorithm and based on data associated with a user of the device. The user device may transmit, to a remote server, an indication of the selected audio clip, an indication of the encoding algorithm, and the data associated with the user. The user device may use a speaker to play audio, based on the modified digital representation, for recording by other devices. Accordingly, the user device may receive, from the remote server and based on the speaker playing the audio, a confirmation that users associated with the other devices have performed an action based on the data associated with the user of the device.
Using metadata to aggregate signal processing operations
A technique including receiving and decoding a coded bitstream encoded with audio content including first audio objects corresponding to a first media content type of two consecutive media content types and second audio objects corresponding to a second media content type of the two consecutive media content types, and audio metadata corresponding to the audio content. The audio metadata including first and second audio object gains, for the first and second audio objects, generated in part based on a first fading curve of the first media content type and a second fading curve of the second media content type, respectively. The technique further includes applying the first and second audio object gains to the first and second audio objects, and rendering a sound field represented by the first audio object with the applied first audio object gain and the second audio object with the applied second audio object gain.
Switching Binaural Sound
A method provides binaural sound to a person through electronic earphones. The binaural sound localizes to a sound localization point (SLP) in empty space that is away from but proximate to the person. When an event occurs, the binaural sound switches or changes to stereo sound, to mono sound, or to altered binaural sound.
Correlating scene-based audio data for psychoacoustic audio coding
In general, techniques are described by which to correlate scene-based audio data for psychoacoustic audio coding. A device comprising a memory and one or more processors may be configured to perform the techniques. The memory may store a bitstream including a plurality of encoded correlated components of a soundfield represented by scene-based audio data. The one or more processors may perform psychoacoustic audio decoding with respect to one or more of the plurality of encoded correlated components to obtain a plurality of correlated components, and obtain, from the bitstream, an indication representative of how the one or more of the plurality of correlated components were reordered in the bitstream. The one or more processors may reorder, based on the indication, the plurality of correlated components to obtain a plurality of reordered components, and reconstruct, based on the plurality of reordered components, the scene-based audio data.
BITRATE DISTRIBUTION IN IMMERSIVE VOICE AND AUDIO SERVICES
Embodiments are disclosed for bitrate distribution in immersive voice and audio services. In an embodiment, a method of encoding an IVAS bitstream comprises: receiving an input audio signal; downmixing the input audio signal into one or more downmix channels and spatial metadata; reading a set of one or more bitrates for the downmix channels and a set of quantization levels for the spatial metadata from a bitrate distribution control table; determining a combination of the one or more bitrates for the downmix channels; determining a metadata quantization level from the set of metadata quantization levels using a bitrate distribution process; quantizing and coding the spatial metadata using the metadata quantization level; generating, using the combination of one or more bitrates, a downmix bitstream for the one or more downmix channels; combining the downmix bitstream, the quantized and coded spatial metadata and the set of quantization levels into the IVAS bitstream.
INFORMATION PROCESSING APPARATUS, REPRODUCTION PROCESSING APPARATUS, AND INFORMATION PROCESSING METHOD
There is provided an information processing apparatus, a reproduction processing apparatus, and an information processing method that improve data transmission efficiency. A preprocessing unit (102) generates, as scene configuration information indicating a configuration of a scene of 6DoF content including a three-dimensional object in a three-dimensional space, dynamic scene configuration information that changes over time and static scene configuration information that does not change over time, the static scene configuration information being scene configuration information different from the dynamic scene configuration information.
Apparatus and method for screen related audio object remapping
An apparatus for generating loudspeaker signals includes an object metadata processor configured to receive metadata, to calculate a second position of the audio object depending on the first position of the audio object and on a size of a screen if the audio object is indicated in the metadata as being screen-related, to feed the first position of the audio object as the position information into the object renderer if the audio object is indicated in the metadata as being not screen-related, and to feed the second position of the audio object as the position information into the object renderer if the audio object is indicated in the metadata as being screen-related. The apparatus further includes an object renderer configured to receive an audio object and to generate the loudspeaker signals depending on the audio object and on position information.
AUDIO PACKET LOSS CONCEALMENT VIA PACKET REPLICATION AT DECODER INPUT
A system includes a server to generate a real-time stream of audio packets and a client device to decode and playback the audio content of the stream. The client device includes a network interface configured to receive a stream of audio packets via a network and a buffer configured to temporarily buffer a subset of audio packets of the stream. The client device further includes an audio decoder having an input to receive audio packets from the buffer and an output to provide corresponding segments of a decoded audio data stream. The client device also includes a stream monitoring module configured to provide an audio packet of the subset in the buffer which was previously decoded by the decoder to the input of the decoder again for a repeated decoding in place of a decoding of an audio packet that is lost or late.
ENABLING STEREO CONTENT FOR VOICE CALLS
Disclosed are systems and methods to modify the Bluetooth mono HFP protocol to support bi-directional stereo operation for high bandwidth audio including 12-KHz wide-band, 16-KHz super wide-band (SWB), and 24-KHz full band (FB) audio. The techniques leverage the larger packet size and longer duty cycle of the 2-EV5 transport packet and expand the block size of the audio frames generated by the AAC-ELD codec to increase the maximum data throughput from the 64 kbps of the mono HFP protocol to 192 kbps using a stereo HFP protocol. The increased throughput not only supports stereo operations, but allows the transport of redundant or FEC packets for increased robustness against packet loss. In one aspect, the AAC-ELD codec may be configured for dynamic bit rate switching to flexibly perform trade-offs between audio quality and robustness against packet loss. The stereo HFP may configure the maximum throughput based on the desired audio quality.
Mass media presentations with synchronized audio reactions
Systems and methods of the present disclosure provide a plurality of audio reactions from a plurality of client devices. The audio reactions are captured by microphones on the client devices and are time-stamped. The method also includes mixing the audio reactions by a mixer server to form a mixed audio reaction, and sending the mixed audio reaction to at least one of the client devices. The client device is adapted to play the mixed audio reaction and a mass media presentation. The mixed audio reaction and the mass media presentation are synchronized to create an audience effect for the mass media presentation. The present technology also provides echo removal, volume balancing, compression, and time stamping of an audio stream by the client device. Reactions from at least one of buttons and gestures to activate synthesized sounds, for example clapping, booing, and cheering, which are mixed into the mixed audio reaction.