Patent classifications
G10L21/12
Method and System for Detecting Anomalous Sound
A system and method for detecting anomalous sound are disclosed. The method includes receiving a spectrogram of an audio signal with elements defined by values in a time-frequency domain of the spectrogram. Each of the values corresponds to an element of the spectrogram that is identified by a coordinate in the time-frequency domain. The time-frequency domain of the spectrogram is partitioned into a context region and a target region. The context region and the target region are processed by a neural network using an attentive neural process to recover values of the spectrogram for elements with coordinates in the target region. The recovered values of the elements of the target region are compared with values of elements of the partitioned target region. An anomaly score is determined based on the comparison. The anomaly score is used for performing a control action.
Method and System for Detecting Anomalous Sound
A system and method for detecting anomalous sound are disclosed. The method includes receiving a spectrogram of an audio signal with elements defined by values in a time-frequency domain of the spectrogram. Each of the values corresponds to an element of the spectrogram that is identified by a coordinate in the time-frequency domain. The time-frequency domain of the spectrogram is partitioned into a context region and a target region. The context region and the target region are processed by a neural network using an attentive neural process to recover values of the spectrogram for elements with coordinates in the target region. The recovered values of the elements of the target region are compared with values of elements of the partitioned target region. An anomaly score is determined based on the comparison. The anomaly score is used for performing a control action.
Audio techniques for music content generation
Techniques are disclosed relating to implementing audio techniques for real-time audio generation. For example, a music generator system may generate new music content from playback music content based on different parameter representations of an audio signal. In some cases, an audio signal can be represented by both a graph of the signal (e.g., an audio signal graph) relative to time and a graph of the signal relative to beats (e.g., a signal graph). The signal graph is invariant to tempo, which allows for tempo invariant modification of audio parameters of the music content in addition to tempo variant modifications based on the audio signal graph.
Audio techniques for music content generation
Techniques are disclosed relating to implementing audio techniques for real-time audio generation. For example, a music generator system may generate new music content from playback music content based on different parameter representations of an audio signal. In some cases, an audio signal can be represented by both a graph of the signal (e.g., an audio signal graph) relative to time and a graph of the signal relative to beats (e.g., a signal graph). The signal graph is invariant to tempo, which allows for tempo invariant modification of audio parameters of the music content in addition to tempo variant modifications based on the audio signal graph.
VOICE EVALUATION SYSTEM, VOICE EVALUATION METHOD, AND COMPUTER PROGRAM
A voice evaluation system includes: an acquisition unit that obtains voice uttered by a group of a plurality of persons; a detection unit that detects an element corresponding to a feeling from the obtained voice; and an evaluation unit that evaluates the obtained voice on the basis of the detected element. According to such a voice evaluation system, it is possible to properly evaluate the voice uttered by the group. For example, it is possible to properly evaluate the feelings as a whole group by using the voice of the group.
VOICE EVALUATION SYSTEM, VOICE EVALUATION METHOD, AND COMPUTER PROGRAM
A voice evaluation system includes: an acquisition unit that obtains voice uttered by a group of a plurality of persons; a detection unit that detects an element corresponding to a feeling from the obtained voice; and an evaluation unit that evaluates the obtained voice on the basis of the detected element. According to such a voice evaluation system, it is possible to properly evaluate the voice uttered by the group. For example, it is possible to properly evaluate the feelings as a whole group by using the voice of the group.
Method and system for learning and using latent-space representations of audio signals for audio content-based retrieval
A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
Method and system for learning and using latent-space representations of audio signals for audio content-based retrieval
A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
METHODS AND SYSTEMS FOR COMPUTER-GENERATED VISUALIZATION OF SPEECH
Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.
METHODS AND SYSTEMS FOR COMPUTER-GENERATED VISUALIZATION OF SPEECH
Methods, systems and apparatuses for computer-generated visualization of speech are described herein. An example method of computer-generated visualization of speech including at least one segment includes: generating a graphical representation of an object corresponding to a segment of the speech; and displaying the graphical representation of the object on a screen of a computing device. Generating the graphical representation includes: representing a duration of the respective segment by a length of the object and representing intensity of the respective segment by a width of the object; and placing, in the graphical representation, a space between adjacent objects.