G10L21/0272

Mass media presentations with synchronized audio reactions

Systems and methods of the present disclosure provide a plurality of audio reactions from a plurality of client devices. The audio reactions are captured by microphones on the client devices and are time-stamped. The method also includes mixing the audio reactions by a mixer server to form a mixed audio reaction, and sending the mixed audio reaction to at least one of the client devices. The client device is adapted to play the mixed audio reaction and a mass media presentation. The mixed audio reaction and the mass media presentation are synchronized to create an audience effect for the mass media presentation. The present technology also provides echo removal, volume balancing, compression, and time stamping of an audio stream by the client device. Reactions from at least one of buttons and gestures to activate synthesized sounds, for example clapping, booing, and cheering, which are mixed into the mixed audio reaction.

End-to-end multi-talker overlapping speech recognition
11521595 · 2022-12-06 · ·

A method for training a speech recognition model with a loss function includes receiving an audio signal including a first segment corresponding to audio spoken by a first speaker, a second segment corresponding to audio spoken by a second speaker, and an overlapping region where the first segment overlaps the second segment. The overlapping region includes a known start time and a known end time. The method also includes generating a respective masked audio embedding for each of the first and second speakers. The method also includes applying a masking loss after the known end time to the respective masked audio embedding for the first speaker when the first speaker was speaking prior to the known start time, or applying the masking loss prior to the known start time when the first speaker was speaking after the known end time.

Personal hearing device, external acoustic processing device and associated computer program product

Disclosed is a personal hearing device, an external acoustic processing device and an associated computer program product. The personal hearing device includes: a microphone, for receiving an input acoustic signal, wherein the input acoustic signal is a mixture of sounds coming from a first acoustic source and from other acoustic source(s); a speaker; and an acoustic processing circuit, for automatically distinguishing within the input acoustic signal the sound of the first acoustic source from the sound of other acoustic source(s); wherein the acoustic processing circuit further processes the input acoustic signal by having different modifications to the sound of the first acoustic source and to the sound of other acoustic source(s), whereby the acoustic processing circuit produces an output acoustic signal to be played on the speaker.

APPARATUS FOR OUTPUTTING AN AUDIO SIGNAL IN A VEHICLE CABIN
20220375470 · 2022-11-24 ·

Apparatus for outputting an audio signal in a vehicle cabin comprising: at least one audio outputting device configured to output an audio signal comprising at least one audio signal component containing a human voice, particularly a singer's voice, and/or a musical instrument in a vehicle cabin; at least one audio processing device configured to process at least one audio signal output by the at least one audio outputting device so as to suppress the audio signal component in a suppression mode; at least one audio receiving device configured to receive an acoustic human voice signal, of at least one person located in the vehicle cabin whilst the audio outputting device outputs the audio signal in the vehicle cabin; and a control device configured to control operation of the audio processing device based on at least one acoustic human signal received by the audio receiving device.

SIGNAL PROCESSING APPARATUS, SIGNAL PROCESSING METHOD, AND PROGRAM
20220375485 · 2022-11-24 · ·

A signal processing apparatus is provided that includes a sound source separation section configured to apply sound source separation processing to a mixed sound signal including a mixture of signals of a plurality of sound sources, and band extension sections configured to apply frequency band extension processing to respective sound source separation signals obtained by separation by the sound source separation section.

METHOD AND SYSTEM FOR PROTECTING USER PRIVACY DURING AUDIO CONTENT PROCESSING
20220375458 · 2022-11-24 ·

A method and system for protecting user privacy in audio content is disclosed. An audio content including private information related to at least one user is received. The audio content is segmented to generate a plurality of audio blocks. Each audio block is associated with a sequence number based on a respective chronological position in the audio content. A random key of predefined length is generated for each audio block. The plurality of audio blocks are randomly distributed to a plurality of agents for audio-to-text transcription. The random distribution is configured to scramble a data context for protecting the user privacy of the at least one user during the audio-to-text transcription. A textual transcript corresponding to the audio content is generated based on the audio-to-text transcription, the sequence number and the random key generated for each audio block.

METHOD AND SYSTEM FOR PROTECTING USER PRIVACY DURING AUDIO CONTENT PROCESSING
20220375458 · 2022-11-24 ·

A method and system for protecting user privacy in audio content is disclosed. An audio content including private information related to at least one user is received. The audio content is segmented to generate a plurality of audio blocks. Each audio block is associated with a sequence number based on a respective chronological position in the audio content. A random key of predefined length is generated for each audio block. The plurality of audio blocks are randomly distributed to a plurality of agents for audio-to-text transcription. The random distribution is configured to scramble a data context for protecting the user privacy of the at least one user during the audio-to-text transcription. A textual transcript corresponding to the audio content is generated based on the audio-to-text transcription, the sequence number and the random key generated for each audio block.

ELECTRONIC DEVICE, METHOD AND COMPUTER PROGRAM

An electronic device having circuitry configured to perform source separation on an audio signal to obtain a separated source and a residual signal, to perform feature extraction on the separated source to obtain one or more processing parameters, and to perform audio processing on a captured audio signal based on the one or more processing parameters to obtain an adjusted separated source.

ELECTRONIC DEVICE, METHOD AND COMPUTER PROGRAM

An electronic device having circuitry configured to perform source separation on an audio signal to obtain a separated source and a residual signal, to perform feature extraction on the separated source to obtain one or more processing parameters, and to perform audio processing on a captured audio signal based on the one or more processing parameters to obtain an adjusted separated source.

AUDIO DEVICE AND OPERATION METHOD THEREOF

An audio device capable of inhibiting malfunction of an information terminal is provided. The audio device includes a sound sensor portion, a sound separation portion, a sound determination portion, and a processing portion. The sound sensor portion has a function of sensing sound. The sound separation portion has a function of separating the sound sensed by the sound sensor portion into a voice and sound other than a voice. The sound determination portion has a function of storing the feature quantity of the sound. The sound determination portion has a function of determining, with a machine learning model such as a neural network model, whether the feature quantity of the voice separated by the sound separation portion is the stored feature quantity. The processing portion has a function of analyzing an instruction contained in the voice and generating an instruction signal representing the content of the instruction in the case where the feature quantity of the voice is the stored feature quantity. The processing portion has a function of performing, on the sound other than a voice separated by the sound separation portion, processing for canceling the sound other than a voice. Specifically, the processing portion has a function of performing, on the sound other than a voice, processing for inverting the phase thereof.