Patent classifications
G10L21/0272
Pre-voice separation/recognition synchronization of time-based voice collections based on device clockcycle differentials
Methods and devices for conducting, based on a clock difference, a synchronization process on voice information collected by a plurality of voice collection devices. Then, after the synchronization process is performed on the voice information collected by the plurality of voice collection devices, conducting a voice separation and recognition process on voice information that was collected by the plurality of voice collection devices and synchronized based on the clock difference among the plurality of voice collection devices.
Pre-voice separation/recognition synchronization of time-based voice collections based on device clockcycle differentials
Methods and devices for conducting, based on a clock difference, a synchronization process on voice information collected by a plurality of voice collection devices. Then, after the synchronization process is performed on the voice information collected by the plurality of voice collection devices, conducting a voice separation and recognition process on voice information that was collected by the plurality of voice collection devices and synchronized based on the clock difference among the plurality of voice collection devices.
Audio interactive decomposition editor method and system
A distributed system and a corresponding data processing method are disclosed, for decomposing audio signals including mixed audio sources. The system comprises at least one client terminal, a remote queuing module and at least one remote audio data processing module connected in a network. A client terminal stores source audio signal data, selects at least one signal decomposition type, uploads source audio signal data with data representative of the decomposition type selection to the queuing module, and downloads decomposed audio signal data. The queuing module queues uploaded source audio data and distributes same to data processing module(s). The queuing module also queues uploaded decomposed audio signal data and distributes same to client terminal(s). An audio data processing module processes distributed source audio data into decomposed audio signal data according to the type selection, and uploads decomposed audio signal data to the at least one remote queuing resource.
Audio interactive decomposition editor method and system
A distributed system and a corresponding data processing method are disclosed, for decomposing audio signals including mixed audio sources. The system comprises at least one client terminal, a remote queuing module and at least one remote audio data processing module connected in a network. A client terminal stores source audio signal data, selects at least one signal decomposition type, uploads source audio signal data with data representative of the decomposition type selection to the queuing module, and downloads decomposed audio signal data. The queuing module queues uploaded source audio data and distributes same to data processing module(s). The queuing module also queues uploaded decomposed audio signal data and distributes same to client terminal(s). An audio data processing module processes distributed source audio data into decomposed audio signal data according to the type selection, and uploads decomposed audio signal data to the at least one remote queuing resource.
Situationally Aware Social Agent
A system for providing a situationally aware social agent includes processing hardware and a memory storing a software code. The processing hardware executes the software code to receive radar data and audio data, process the radar data and the audio data to obtain radar-based location data and audio-based location data each corresponding to a location of one or more user(s), and process the radar data and the audio data to obtain radar-based venue data and audio-based venue data each corresponding to an environment surrounding the user(s). The software code further determines, using the radar-based location data and the audio-based location data, the location of the user(s), determines, using the radar-based venue data and the microphone-based venue data, the environment surrounding the user(s), and identifies, based on the location and the environment, an interactive expression for use by the situationally aware social agent to interact with the user(s).
ELECTRONIC DEVICE, METHOD AND COMPUTER PROGRAM
An electronic device comprising circuitry configured to analyze the results of a stereo or multi-channel source separation to determine one or more time-varying parameters, and to create spatially dynamic audio objects based on the one or more time-varying parameters.
METHODS, SYSTEMS, AND DEVICES FOR ASSEMBLY OF LIVE AND RECORDED AUDIO CONTENT
Aspects of the subject disclosure may include, for example, receiving first audio content from a first communication device and receiving second audio content from a second communication device, and adjusting the first audio content and the second audio content. The adjusting of the first audio content and the second audio content can comprise: detecting a gap in the first audio content; analyzing the first audio content resulting in an audio analysis; generating filler audio content based on the audio analysis; and inserting the filler audio content into the gap of the first audio content. Further embodiments can include aggregating the first adjusted audio content with the second adjusted audio content resulting in aggregated audio content, and providing the aggregated audio content to a third communication device for playback. The third communication device plays the aggregated audio content. Other embodiments are disclosed.
METHODS, SYSTEMS, AND DEVICES FOR ASSEMBLY OF LIVE AND RECORDED AUDIO CONTENT
Aspects of the subject disclosure may include, for example, receiving first audio content from a first communication device and receiving second audio content from a second communication device, and adjusting the first audio content and the second audio content. The adjusting of the first audio content and the second audio content can comprise: detecting a gap in the first audio content; analyzing the first audio content resulting in an audio analysis; generating filler audio content based on the audio analysis; and inserting the filler audio content into the gap of the first audio content. Further embodiments can include aggregating the first adjusted audio content with the second adjusted audio content resulting in aggregated audio content, and providing the aggregated audio content to a third communication device for playback. The third communication device plays the aggregated audio content. Other embodiments are disclosed.
SPEECH ENHANCEMENT TECHNIQUES THAT MAINTAIN SPEECH OF NEAR-FIELD SPEAKERS
An endpoint selectively enhances a captured audio signal based on an operating mode. The endpoint obtains an audio input signal of multiple users in a physical location. The audio input signal is captured by a microphone. The endpoint separates voice signals from the audio input signal and determines an operating mode for an audio output signal. The endpoint selectively adjusts each of the voice signals based on the operating mode to generate the audio output signal.
Mass media presentations with synchronized audio reactions
Systems and methods of the present disclosure provide a plurality of audio reactions from a plurality of client devices. The audio reactions are captured by microphones on the client devices and are time-stamped. The method also includes mixing the audio reactions by a mixer server to form a mixed audio reaction, and sending the mixed audio reaction to at least one of the client devices. The client device is adapted to play the mixed audio reaction and a mass media presentation. The mixed audio reaction and the mass media presentation are synchronized to create an audience effect for the mass media presentation. The present technology also provides echo removal, volume balancing, compression, and time stamping of an audio stream by the client device. Reactions from at least one of buttons and gestures to activate synthesized sounds, for example clapping, booing, and cheering, which are mixed into the mixed audio reaction.