G10L19/00

Mass media presentations with synchronized audio reactions

Systems and methods of the present disclosure provide a plurality of audio reactions from a plurality of client devices. The audio reactions are captured by microphones on the client devices and are time-stamped. The method also includes mixing the audio reactions by a mixer server to form a mixed audio reaction, and sending the mixed audio reaction to at least one of the client devices. The client device is adapted to play the mixed audio reaction and a mass media presentation. The mixed audio reaction and the mass media presentation are synchronized to create an audience effect for the mass media presentation. The present technology also provides echo removal, volume balancing, compression, and time stamping of an audio stream by the client device. Reactions from at least one of buttons and gestures to activate synthesized sounds, for example clapping, booing, and cheering, which are mixed into the mixed audio reaction.

Mass media presentations with synchronized audio reactions

Systems and methods of the present disclosure provide a plurality of audio reactions from a plurality of client devices. The audio reactions are captured by microphones on the client devices and are time-stamped. The method also includes mixing the audio reactions by a mixer server to form a mixed audio reaction, and sending the mixed audio reaction to at least one of the client devices. The client device is adapted to play the mixed audio reaction and a mass media presentation. The mixed audio reaction and the mass media presentation are synchronized to create an audience effect for the mass media presentation. The present technology also provides echo removal, volume balancing, compression, and time stamping of an audio stream by the client device. Reactions from at least one of buttons and gestures to activate synthesized sounds, for example clapping, booing, and cheering, which are mixed into the mixed audio reaction.

Audio encoding/decoding based on an efficient representation of auto-regressive coefficients

An encoder for encoding a parametric spectral representation (f) of auto-regressive coefficients that partially represent an audio signal. The encoder includes a low-frequency encoder configured to quantize elements of a part of the parametric spectral representation that correspond to a low-frequency part of the audio signal. It also includes a high-frequency encoder configured to encode a high-frequency part (f.sup.H) of the parametric spectral representation (f) by weighted averaging based on the quantized elements ({circumflex over (f)}.sup.L) flipped around a quantized mirroring frequency ({circumflex over (f)}.sub.m), which separates the low-frequency part from the high-frequency part, and a frequency grid determined from a frequency grid codebook in a closed-loop search procedure. Described are also a corresponding decoder, corresponding encoding/decoding methods and UEs including such an encoder/decoder.

Methods and systems for creating virtual and augmented reality

Configurations are disclosed for presenting virtual reality and augmented reality experiences to users. The system may comprise an image capturing device to capture one or more images, the one or more images corresponding to a field of the view of a user of a head-mounted augmented reality device, and a processor communicatively coupled to the image capturing device to extract a set of map points from the set of images, to identify a set of sparse points and a set of dense points from the extracted set of map points, and to perform a normalization on the set of map points.

Methods and systems for creating virtual and augmented reality

Configurations are disclosed for presenting virtual reality and augmented reality experiences to users. The system may comprise an image capturing device to capture one or more images, the one or more images corresponding to a field of the view of a user of a head-mounted augmented reality device, and a processor communicatively coupled to the image capturing device to extract a set of map points from the set of images, to identify a set of sparse points and a set of dense points from the extracted set of map points, and to perform a normalization on the set of map points.

Speech processing method and device thereof
11587573 · 2023-02-21 · ·

The disclosure provides a speech processing method and a device thereof. The method includes: acquiring a speech sampling signal frame in a mixed-excitation linear prediction (MELP) speech coding system and estimating signal quality of the speech sampling signal frame; determining, based on the signal quality, a specific linear prediction coding (LPC) order used by an LPC circuit; controlling the LPC circuit to convert the speech sampling signal frame into a line spectrum pair parameter based on the specific LPC order; replacing a speech signal spectrum of the speech sampling signal frame with the line spectrum pair parameter to generate a predicted speech signal; and performing a speech coding operation and a signal synthesizing operation of the MELP speech coding system based on the predicted speech signal.

Encoder, decoder, encoding method, decoding method, program, and recording medium

The present invention aims to encode and decode a sequence of integer values by substantially assigning the number of bits of a decimal fraction value per sample. An integer converter 11 selects M selected integer values from L input integer values for a set of the L input integer values and obtains J-value selection information that specifies which of the L input integer values the M selected integer values are. Furthermore, the integer converter 11 obtains one converted integer value by reversibly converting the M selected integer value and an integer value corresponding to the J-value selection information. An integer encoder 12 encodes the converted integer value to obtain a code.

Encoder, decoder, encoding method, decoding method, program, and recording medium

The present invention aims to encode and decode a sequence of integer values by substantially assigning the number of bits of a decimal fraction value per sample. An integer converter 11 selects M selected integer values from L input integer values for a set of the L input integer values and obtains J-value selection information that specifies which of the L input integer values the M selected integer values are. Furthermore, the integer converter 11 obtains one converted integer value by reversibly converting the M selected integer value and an integer value corresponding to the J-value selection information. An integer encoder 12 encodes the converted integer value to obtain a code.

Techniques for modifying audiovisual media titles to improve audio transitions
11503264 · 2022-11-15 · ·

A playback application is configured to analyze audio frames associated with transitions between segments within a media title to identify one or more portions of extraneous audio. The playback application is configured to analyze the one or more portions of extraneous audio and then determine which of the one or more corresponding audio frames should be dropped. In doing so, the playback application can analyze a topology associated with the media title to determine whether any specific portions of extraneous audio are to be played outside of a logical ordering of audio samples set forth in the topology. These specific portions of extraneous audio are preferentially removed.

Techniques for modifying audiovisual media titles to improve audio transitions
11503264 · 2022-11-15 · ·

A playback application is configured to analyze audio frames associated with transitions between segments within a media title to identify one or more portions of extraneous audio. The playback application is configured to analyze the one or more portions of extraneous audio and then determine which of the one or more corresponding audio frames should be dropped. In doing so, the playback application can analyze a topology associated with the media title to determine whether any specific portions of extraneous audio are to be played outside of a logical ordering of audio samples set forth in the topology. These specific portions of extraneous audio are preferentially removed.