Patent classifications
G10L25/00
Automatic isolation of multiple instruments from musical mixtures
A system, method and computer product for training a neural network system. The method comprises inputting an audio signal to the system to generate plural outputs f(X, ). The audio signal includes one or more of vocal content and/or musical instrument content, and each output f(X, ) corresponds to a respective one of the different content types. The method also comprises comparing individual outputs f(X, ) of the neural network system to corresponding target signals. For each compared output f(X, ), at least one parameter of the system is adjusted to reduce a result of the comparing performed for the output f(X, ), to train the system to estimate the different content types. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate various different types of vocal and/or instrument components of an audio signal, depending on which type of component(s) the system is trained to estimate.
Method, apparatus for blind signal separating and electronic device
Disclosed are a method and an apparatus for blind signal separation and an electronic device. The method includes modeling a sound source with a complex Gaussian distribution to determine a probability density distribution of the sound source; updating a blind signal separation model based on the probability density distribution; and separating an audio signal with the updated blind signal separation model to obtain a plurality of separated output signals. In this way, the blind signal separation model may be updated through the probability density distribution of the sound source obtained based on the complex Gaussian distribution, thereby effectively improving separation performance of a blind signal separation algorithm in specific scenario.
Machine learning classifications of aphasia
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing aphasia assessment. One of the methods includes receiving a recording, generating a text transcript of the recording, and generating speech quantifying and comprehension scores which can be used to determine an aphasia classification. Another method includes performing an aphasia assessment on a brain image to obtain an aphasia classification.
Envelope encoding of speech signals for transmission to cutaneous actuators
A haptic communication device includes a speech signal generator configured to receive speech sounds or a textual message and generate speech signals corresponding to the speech sounds or the textual message. An envelope encoder is operably coupled to the speech signal generator to extract a temporal envelope from the speech signals. The temporal envelope represents changes in amplitude of the speech signals. Carrier signals having a periodic waveform are generated. Actuator signals are generated by encoding the changes in the amplitude of the speech signals from the temporal envelope into the carrier signals. One or more cutaneous actuators are operably coupled to the envelope encoder to generate haptic vibrations representing the speech sounds or the textual message using the actuator signals.
Method and apparatus for improving call quality in noise environment
A voice signal processing method according to an embodiment of the present disclosure for overcoming the problem includes: acquiring a real-time near-end noise signal; acquiring a far-end voice signal according to an incoming call; measuring subjective speech quality and perceptual-objective speech quality of test signals generated based on a reference signal and the real-time near-end noise signal; selecting at least one speech quality enhancement method based on the subjective speech quality and the perceptual-objective speech quality, and determining parameters that are to be applied to the selected at least one speech quality enhancement method; and enhancing speech quality of the far-end voice signal by using the selected at least one speech quality enhancement method, based on the determined parameters, wherein the test signals are generated by mixing the acquired real-time near-end noise signal with the reference signal whose speech quality is enhanced by applying a combination of parameter values to speech quality enhancement methods.
Singing voice separation with deep u-net convolutional networks
A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.
Singing voice separation with deep U-Net convolutional networks
A system, method and computer product for training a neural network system. The method comprises applying an audio signal to the neural network system, the audio signal including a vocal component and a non-vocal component. The method also comprises comparing an output of the neural network system to a target signal, and adjusting at least one parameter of the neural network system to reduce a result of the comparing, for training the neural network system to estimate one of the vocal component and the non-vocal component. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate vocal or instrumental components of an audio signal, depending on which type of component the system is trained to estimate.
Vocal feedback device and method of use
A vocal feedback device comprising: a microphone; a fundamental frequency accentuator electrically connected to the microphone, a delay circuit electrically connected to the fundamental frequency accentuator, and a speaker electrically connected to the delay circuit. The device configured to convert vocal utterances received at the microphone into an electrical signal, impose a time delay before transmitting the electrical signal, after the time delay, transmit the electrical signal to the speaker, and convert the electrical signal to an audio signal using the speaker, the audio signal being a replication of the vocal utterances.
Theme detection for object-recognition-based notifications
In certain embodiments, speech is converted to text for theme identification by natural language processing. Notification data is generated based on detected themes and the notification data may include rules for notification presentation on a client device. The notification data may include parameters for processing image data captured by an augmented reality device to detect one or more objects. The objects may be associated with the theme and detection thereof within captured image data, and in accordance with other rules, may cause the augmented reality device to present a notification with contextual relevance to a current environment of a user utilizing the augmented reality device.
Systems and methods for switching operational modes based on audio triggers
Systems and methods are provided for enabling different modes of operation based on a detected audio trigger. The systems and methods may generate an audio signature for a detected first sound and compare the audio signature with a plurality of registered audio signatures. In response to determining that the audio signature matches a first registered audio signature, the systems and methods may enable a first operational mode for a device that enables a first plurality of commands. In response to determining that the audio signature matches a second registered audio signature, the systems and methods may enable a second operational mode for a device that enables a second plurality of commands, where the second plurality of commands are different from the first plurality of commands.