Patent classifications
G10L19/173
SYSTEMS AND METHODS FOR WIRELESS SURROUND SOUND
A system for surround sound may comprise a user device, a control module, and a plurality of speakers. The system may receive an audio source data via a first interface. The system may transcode the audio source data to generate a transcoded audio data comprising a plurality of channels of audio information. The system may the plurality of channels of audio information from the transcoded audio data. The system may assign the plurality of channels of audio information to the plurality of speakers on a one-to-one basis. The system may stream, via a second interface, the plurality of channels of audio information to the plurality of speakers, wherein the second interface comprises a standard communication protocol operable on a physical layer protocol of the second interface. The system may apply an effects function to at least one of the plurality of channels of audio information.
Spatial Audio Representation and Rendering
An apparatus including means configured to: obtain at least one audio stream, wherein the at least one audio stream includes one or more transport audio signals, wherein the one or more transport audio signals is a defined type of transport audio signal; and convert the one or more transport audio signals to at least one or more further transport audio signals, the one or more further transport audio signals being a further defined type of transport audio signal.
CONCEPT FOR SWITCHING OF SAMPLING RATES AT AUDIO PROCESSING DEVICES
Audio decoder device for decoding a bitstream, the audio decoder device including: a predictive decoder for producing a decoded audio frame from the bitstream, wherein the predictive decoder includes a parameter decoder for producing one or more audio parameters for the decoded audio frame from the bitstream and wherein the predictive decoder includes a synthesis filter device for producing the decoded audio frame by synthesizing the one or more audio parameters for the decoded audio frame; a memory device including one or more memories, wherein each of the memories is configured to store a memory state for the decoded audio frame, wherein the memory state for the decoded audio frame of the one or more memories is used by the synthesis filter device for synthesizing the one or more audio parameters for the decoded audio frame; and a memory state resampling device configured to determine the memory state for synthesizing the one or more audio parameters for the decoded audio frame, which has a sampling rate, for one or more of the memories by resampling a preceding memory state for synthesizing one or more audio parameters for a preceding decoded audio frame, which has a preceding sampling rate being different from the sampling rate of the decoded audio frame, for one or more of the memories and to store the memory state for synthesizing of the one or more audio parameters for the decoded audio frame for one or more of the memories into the respective memory.
COMMUNICATION TRANSMISSION DEVICE AND VOICE QUALITY DETERMINATION METHOD FOR COMMUNICATION TRANSMISSION DEVICE
[Problem] To appropriately determine quality degradation of voice after arithmetic processing, such as codec conversion processing and echo cancellation processing, that is performed by a telephone on voice data during telephone conversation.
[Solution] A communication transmitting apparatus 20 is connected between IP telephones 11 and 12, and includes a tone storage unit 21 configured to store tone data T that is unique, an adding unit 22 configured to add the tone data T to the voice data V transmitted from the IP telephone 11 to generate addition data, an arithmetic processing unit 24 configured to convert a format of the addition data according to a prescribed specification to generate converted data including converted voice data Vc and tone data Tc, a separating unit 25 configured to separate the tone data Tc from the converted data, and a comparison determination unit 26 configured to determine that if the tone data T added to the voice data V before conversion performed by the arithmetic processing unit 24 is different from the tone data Tc separated from the voice data Vc by the separating unit 25 after the conversion, there is quality degradation in the voice data Vc.
Coding of a soundfield representation
A method includes: receiving a representation of a soundfield, the representation characterizing the soundfield around a point in space; decomposing the received representation into independent signals; and encoding the independent signals, wherein a quantization noise for any of the independent signals has a common spatial profile with the independent signal.
Sound file sound quality identification method and apparatus
A sound file sound quality identification method is provided. The method includes converting a format of a to-be-identified sound file into a preset reference audio format; performing framing on the sound file to obtain a plurality of frames; and performing Fourier transformation processing on the to-be-identified sound file to obtain a spectrum of each frame. The method also includes performing model matching according to the spectrum of each frame of the to-be-identified sound file to obtain a preliminary classification result of the to-be-identified sound file; determining an energy change point of the to-be-identified sound file according to the spectrum of each frame; and determining a sound quality of the to-be-identified sound file according to the preliminary classification result of the to-be-identified sound file and the energy change point of the to-be-identified sound file.
Multi-channel signal encoding method, multi-channel signal decoding method, encoder, and decoder
A multi-channel signal encoding method includes determining a downmixed signal of a first channel signal and a second channel signal in a multi-channel signal, and reverberation gain parameters corresponding to different subbands of the first channel signal and the second channel signal, determining a target reverberation gain parameter that needs to be encoded in the reverberation gain parameters corresponding to the different subbands of the first channel signal and the second channel signal, generating parameter indication information, where the parameter indication information is used to indicate a subband corresponding to the target reverberation gain parameter, and encoding the target reverberation gain parameter, the parameter indication information, and the downmixed signal to generate a bitstream.
Systems and methods for a two pass diarization, automatic speech recognition, and transcript generation
In one embodiment, a method for transcript generation includes receiving an audio file and dividing it into a plurality of chunks. The method further includes sending each instance of the plurality of chunks to a speech service module. The method further includes converting speech to text for each instance of the plurality of chunks and returning the text for each instance of the plurality of chunks. The method further includes merging the text for each instance of the plurality of chunks to yield an audio file transcript and sending the audio file and chunks to a diarization module. The method further includes performing first pass diarization on the chunks to yield a plurality of diarized chunks and performing second pass diarization on the plurality of diarized chunks and the audio file to yield a diarized audio file. The method further includes merging the files to yield a final transcript.
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
There is provided an information processing device that enables to set a priority for each of object audio data acquired. The information processing device includes a processing unit that sets a priority for each of object audio data acquired, determines the object audio data to be included in a segment file generated, from one or more pieces of the object audio data on the basis of the priority, and generates a new priority to be set for the segment file generated, as priority information, on the basis of the priority.
Concept for switching of sampling rates at audio processing devices
Audio decoder device for decoding a bitstream, the audio decoder device including: a predictive decoder for producing a decoded audio frame from the bitstream, wherein the predictive decoder includes a parameter decoder for producing one or more audio parameters for the decoded audio frame from the bitstream and wherein the predictive decoder includes a synthesis filter device for producing the decoded audio frame by synthesizing the one or more audio parameters for the decoded audio frame; a memory device including one or more memories, wherein each of the memories is configured to store a memory state for the decoded audio frame, wherein the memory state for the decoded audio frame of the one or more memories is used by the synthesis filter device for synthesizing the one or more audio parameters for the decoded audio frame; and a memory state resampling device configured to determine the memory state for synthesizing the one or more audio parameters for the decoded audio frame, which has a sampling rate, for one or more of the memories by resampling a preceding memory state for synthesizing one or more audio parameters for a preceding decoded audio frame, which has a preceding sampling rate being different from the sampling rate of the decoded audio frame, for one or more of the memories and to store the memory state for synthesizing of the one or more audio parameters for the decoded audio frame for one or more of the memories into the respective memory.