Patent classifications
G10L21/028
SYSTEMS AND METHODS FOR VISUALLY GUIDED AUDIO SEPARATION
A system for separating audio based on sound producing objects includes a processor configured to receive video data and audio data. The processor is also configured to perform object detection using the video data to identify a number of sound producing objects in the video data and predict a separation for each sound producing object detected in the video data. The processor is also configured to generate separated audio data for each sound producing object using the separation and the audio data.
SYSTEMS AND METHODS FOR VISUALLY GUIDED AUDIO SEPARATION
A system for separating audio based on sound producing objects includes a processor configured to receive video data and audio data. The processor is also configured to perform object detection using the video data to identify a number of sound producing objects in the video data and predict a separation for each sound producing object detected in the video data. The processor is also configured to generate separated audio data for each sound producing object using the separation and the audio data.
System and method for data augmentation for multi-microphone signal processing
A method, computer program product, and computing system for receiving a signal from each microphone of a plurality of microphones, thus defining a plurality of signals. One or more inter-microphone gain-based augmentations may be performed on the plurality of signals, thus defining one or more inter-microphone gain-augmented signals.
System and method for data augmentation for multi-microphone signal processing
A method, computer program product, and computing system for receiving a signal from each microphone of a plurality of microphones, thus defining a plurality of signals. One or more inter-microphone gain-based augmentations may be performed on the plurality of signals, thus defining one or more inter-microphone gain-augmented signals.
DEVICE FOR DETECTING MUSIC DATA FROM VIDEO CONTENTS, AND METHOD FOR CONTROLLING SAME
A data processing method according to the present invention comprises the steps of: receiving an input of video contents including a video stream and an audio stream; detecting music data from the audio stream; and filtering the audio stream so that the music data detected from the audio stream is removed.
DEVICE FOR DETECTING MUSIC DATA FROM VIDEO CONTENTS, AND METHOD FOR CONTROLLING SAME
A data processing method according to the present invention comprises the steps of: receiving an input of video contents including a video stream and an audio stream; detecting music data from the audio stream; and filtering the audio stream so that the music data detected from the audio stream is removed.
USER INTERFACE FOR SELECTIVE FILTERING OF SPEECH AND NOISE
An audio system can be controlled by a method that includes obtaining a mixture value from a user, where the mixture value has a value in a range from a first value for a first state to a second value for a second state, with the first state corresponding to a desired sound having substantially all of a first content and substantially nil amount of a second content, the second state corresponding to a desired sound having substantially nil amount of the first content and substantially all of the second content, and the mixture value being a selected one among multiple values in the range. The multiple values include an unprocessed mixture value for an unprocessed state corresponding to a desired sound having unprocessed first and second contents. The method can further include generating a control output signal based on the selected mixture value, and processing an audio signal based on the control output signal to generate a sound having the first content and/or the second content according to the selected mixture value.
DEEP SOURCE SEPARATION ARCHITECTURE
A speech separation server comprises a deep-learning encoder with nonlinear activation. The encoder is programmed to take a mixture audio waveform in the time domain, learn generalized patterns from the mixture audio waveform, and generate an encoded representation that effectively characterizes the mixture audio waveform for speech separation.
DEEP SOURCE SEPARATION ARCHITECTURE
A speech separation server comprises a deep-learning encoder with nonlinear activation. The encoder is programmed to take a mixture audio waveform in the time domain, learn generalized patterns from the mixture audio waveform, and generate an encoded representation that effectively characterizes the mixture audio waveform for speech separation.
Audio Processing Method, Method for Training Estimation Model, and Audio Processing System
An audio processing method by which input data are obtained that includes first sound data representing first components of a first frequency band, included in a first sound corresponding to a first sound source, second sound data representing second components of the first frequency band, included in a second sound corresponding to a second sound source, and mix sound data representing mix components of an input frequency band including a second frequency band, the mix components being included in a mix sound of the first sound and the second sound. The input data are then input to a trained estimation model, to generate at least one of first output data representing first estimated components within an output frequency band including the second frequency band, included in the first sound, or second output data representing second estimated components within the output frequency band, included in the second sound.