Patent classifications
G10L19/173
Game streaming with spatial audio
A game engine may generate video and audio content on a per-frame basis. Audio data corresponding to a current frame may be generated to comprise sound-field information independent of a speaker configuration or spatialization technology that may be used to play the associated audio. The sound-field may be generated based on monaural audio data corresponding to a sound produced by an in-game object at the object's position as of the current frame. The sound-field information may be transmitted to a remote computing device for reproduction using a selected, available speaker configuration and spatialization technology.
Methods, Apparatus and Systems for Determining Reconstructed Audio Signal
According to an aspect of the present invention, a method for reconstructing an audio signal having a baseband portion and a highband portion is disclosed. The method includes obtaining a decoded baseband audio signal by decoding an encoded audio signal and obtaining a plurality of subband signals by filtering the decoded baseband audio signal. The method further includes generating a high-frequency reconstructed signal by copying a number of consecutive subband signals of the plurality of subband signals and obtaining an envelope adjusted high-frequency signal. The method further includes generating a noise component based on a noise parameter. Finally, the method includes adjusting a phase of the high-frequency reconstructed signal and obtaining a time-domain reconstructed audio signal by combining the decoded baseband audio signal and the combined high-frequency signal to obtain a time-domain reconstructed audio signal.
SYSTEMS AND METHODS FOR A TWO PASS DIARIZATION, AUTOMATIC SPEECH RECOGNITION, AND TRANSCRIPT GENERATION
In one embodiment, a method for transcript generation includes receiving an audio file and dividing it into a plurality of chunks. The method further includes sending each instance of the plurality of chunks to a speech service module. The method further includes converting speech to text for each instance of the plurality of chunks and returning the text for each instance of the plurality of chunks. The method further includes merging the text for each instance of the plurality of chunks to yield an audio file transcript and sending the audio file and chunks to a diarization module. The method further includes performing first pass diarization on the chunks to yield a plurality of diarized chunks and performing second pass diarization on the plurality of diarized chunks and the audio file to yield a diarized audio file. The method further includes merging the files to yield a final transcript.
METADATA TRANSCODING
The present document relates to transcoding of metadata, and in particular to a method and system for transcoding metadata with reduced computational complexity. A transcoder configured to transcode an inbound bitstream comprising an inbound content frame and an associated inbound metadata frame into an outbound bitstream comprising an outbound content frame and an associated outbound metadata frame is described. The inbound content frame is indicative of a signal encoded according to a first codec system and the outbound content frame is indicative of the signal encoded according to a second codec system. The transcoder is configured to identify an inbound block of metadata from the inbound metadata frame, the inbound block of metadata associated with an inbound descriptor indicative of one or more properties of metadata comprised within the inbound block of metadata, and to generate the outbound metadata frame from the inbound metadata frame based on the inbound descriptor.
FRAME CODING FOR SPATIAL AUDIO DATA
The techniques disclosed herein provide apparatuses and related methods for the communication of spatial audio and related metadata. In some implementations, a source provides prerecorded spatial audio that has embedded metadata. A computing device processes the prerecorded spatial audio to generate an audio codec that is segmented to include a first section of audio data and a second section that includes metadata extracted from the prerecorded spatial audio. The generated audio codec may be received by a device that includes an encoder. The encoder may process the generated audio codec to generate audio data that includes the metadata.
Low bitrate audio encoding/decoding scheme having cascaded switches
An audio encoder has a first information sink oriented encoding branch, a second information source or SNR oriented encoding branch, and a switch for switching between the first encoding branch and the second encoding branch, wherein the second encoding branch has a converter into a specific domain different from the spectral domain, and wherein the second encoding branch furthermore has a specific domain coding branch, and a specific spectral domain coding branch, and an additional switch for switching between the specific domain coding branch and the specific spectral domain coding branch. An audio decoder has a first domain decoder, a second domain decoder for decoding a signal, and a third domain decoder and two cascaded switches for switching between the decoders.
Signature tuning filters
A method of providing audio information, said method comprising: (i) receiving audio filter settings in a client device; (ii) receiving audio data in the client device, wherein the received audio data is in an audio coding format; (iii) converting the audio filter settings to an audio filter signal in a processor of the client device, where the audio filter signal is a time-varying signal; (iv) converting the received audio data to an audio signal in a processor of the client device, where the audio signal is a time-varying signal; and (v) transmitting the converted audio filter signal and the converted audio signal from the client device to an audio output device, where the audio output device is separate from and in communication with the client device, and where the audio output device is configured for modifying the audio signal according to the audio filter signal to generate a time-varying audio output.
Device and method for processing internal channel for low complexity format conversion
A method of processing an audio signal, according to an embodiment of the present invention for solving the technical problem, further includes: receiving a signal for one channel pair element (CPE) to which internal channel gains (ICGs) have been pre-applied; when a reproduction channel configuration is not stereo, acquiring inverse ICGs for the one CPE based on Motion Picture Experts Group surround 212 (MPS212) parameters and on rendering parameters corresponding to MPS212 output channels defined in a format converter; and generating output signals based on the received signal for the one CPE and the acquired inverse ICGs.
Audio metadata providing apparatus and method, and multichannel audio data playback apparatus and method to support dynamic format conversion
An audio metadata providing apparatus and method and a multichannel audio data playback apparatus and method to support a dynamic format conversion are provided. Dynamic format conversion information may include information about a plurality of format conversion schemes that are used to convert a first format set by an author of multichannel audio data into a second format that is based on a playback environment of the multichannel audio data and that are each set for corresponding playback periods of the multichannel audio data. The audio metadata providing apparatus may provide audio metadata including the dynamic format conversion information. The multichannel audio data playback apparatus may identify the dynamic format conversion information from the audio metadata, may convert the first format of the multichannel audio data into the second format based on the identified dynamic format conversion information, and may play back the multichannel audio data in the second format.
Method and System for Implementing Split and Parallelized Encoding or Transcoding of Audio and Video Content
Novel tools and techniques are provided for implementing split and parallelized encoding or transcoding of audio and video. In various embodiments, a computing system might split an audio-video file that is received from a content source into a single video file and a single audio file. The computing system might encode or transcode the single audio file. Concurrently, the computing system might split the single video file into a plurality of video segments. A plurality of parallel video encoders/transcoders might concurrently encode or transcode the plurality of video segments, each video encoder/transcoder encoding or transcoding one video segment of the plurality of video segments. Subsequently, the computing system might assemble the plurality of encoded or transcoded video segments with the encoded or transcoded audio file to produce an encoded or transcoded audio-video file, which may be output to a display device(s), an audio playback device(s), or the like.