G10L21/14

SOUND PROCESSING METHOD USING DJ TRANSFORM
20230215456 · 2023-07-06 ·

Provided is a sound processing method performed by a computer, the method comprising generating a DJ transform spectrogram indicating estimated pure-tone amplitudes for respective frequencies corresponding to natural frequencies of a plurality of springs and a plurality of time points by modeling an oscillation motion of the plurality of springs having different natural frequencies, with respect to an input sound, and calculating the estimated pure-tone amplitudes for the respective natural frequencies; calculating degrees of fundamental frequency suitability based on a moving average of the estimated pure-tone amplitudes or a moving standard deviation of the estimated pure-tone amplitudes with respect to each natural frequency of the DJ transform spectrogram; and extracting the fundamental frequency based on local maximum values of the degrees of fundamental frequency suitability for the respective natural frequencies at each of the plurality of time points.

AUDIO INFORMATION PROCESSING METHOD, APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM
20220406311 · 2022-12-22 ·

The present disclosure relates to an audio information processing method, an apparatus, an electronic device and a computer-readable storage medium. The audio information processing method includes: determining whether an audio recording start condition is satisfied; collecting audio information associated with an electronic device in response to determining that the audio recording start condition is satisfied; performing word segmentation on text information corresponding to the audio information to obtain word-segmented text information; and displaying the word-segmented text information on a user interface of the electronic device.

AUDIO INFORMATION PROCESSING METHOD, APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM
20220406311 · 2022-12-22 ·

The present disclosure relates to an audio information processing method, an apparatus, an electronic device and a computer-readable storage medium. The audio information processing method includes: determining whether an audio recording start condition is satisfied; collecting audio information associated with an electronic device in response to determining that the audio recording start condition is satisfied; performing word segmentation on text information corresponding to the audio information to obtain word-segmented text information; and displaying the word-segmented text information on a user interface of the electronic device.

SPECTRUM ALGORITHM WITH TRAIL RENDERER
20220405982 · 2022-12-22 ·

Systems and methods for rendering motion-audio visualizations to a display are described. More specifically, video data and audio data is obtained. A position of a target object in each of one or more video frames of the video data is determined. Additionally, a video data comprising one or more video frames is determined. Audio visualizations for the predetermined time period are determined based on the frequency spectrum. A rendered video is generated by applying the audio visualizations at the position of the target object in the one or more video frames for the predetermined time period.

SPECTRUM ALGORITHM WITH TRAIL RENDERER
20220405982 · 2022-12-22 ·

Systems and methods for rendering motion-audio visualizations to a display are described. More specifically, video data and audio data is obtained. A position of a target object in each of one or more video frames of the video data is determined. Additionally, a video data comprising one or more video frames is determined. Audio visualizations for the predetermined time period are determined based on the frequency spectrum. A rendered video is generated by applying the audio visualizations at the position of the target object in the one or more video frames for the predetermined time period.

METHODS AND SYSTEMS FOR MANIPULATING AUDIO PROPERTIES OF OBJECTS
20230029775 · 2023-02-02 ·

In one implementation, a method of changing an audio property of an object is performed at a device including one or more processors coupled to non-transitory memory. The method includes displaying, using a display, a representation of a scene including a representation of an object associated with an audio property. The method includes displaying, using the display, in association with the representation of the object, a manipulator indicating a value of the audio property. The method includes receiving, using one or more input devices, a user input interacting with the manipulator. The method includes, in response to receiving the user input, changing the value of the audio property based on the user input and displaying, using the display, the manipulator indicating the changed value of the audio property.

Method and System for Detecting Anomalous Sound

A system and method for detecting anomalous sound are disclosed. The method includes receiving a spectrogram of an audio signal with elements defined by values in a time-frequency domain of the spectrogram. Each of the values corresponds to an element of the spectrogram that is identified by a coordinate in the time-frequency domain. The time-frequency domain of the spectrogram is partitioned into a context region and a target region. The context region and the target region are processed by a neural network using an attentive neural process to recover values of the spectrogram for elements with coordinates in the target region. The recovered values of the elements of the target region are compared with values of elements of the partitioned target region. An anomaly score is determined based on the comparison. The anomaly score is used for performing a control action.

Method and System for Detecting Anomalous Sound

A system and method for detecting anomalous sound are disclosed. The method includes receiving a spectrogram of an audio signal with elements defined by values in a time-frequency domain of the spectrogram. Each of the values corresponds to an element of the spectrogram that is identified by a coordinate in the time-frequency domain. The time-frequency domain of the spectrogram is partitioned into a context region and a target region. The context region and the target region are processed by a neural network using an attentive neural process to recover values of the spectrogram for elements with coordinates in the target region. The recovered values of the elements of the target region are compared with values of elements of the partitioned target region. An anomaly score is determined based on the comparison. The anomaly score is used for performing a control action.

Method and system for analyzing customer calls by implementing a machine learning model to identify emotions

A method and system for voice emotion identification contained in audio in a call providing customer support between a customer and a service agent by implementing an emotion identification application to identify emotions captured in a voice of the customer from audio received by a media streaming device; receiving, by the emotion identification application, an audio stream of a series of voice samples contained in consecutive frames from audio received; extracting, by the emotion identification application, a set of voice emotion features from each frame in each voice sample of the audio by applying a trained machine learning (ML) model for identifying emotions utilizing a neural networks to determine one or more voice emotions by a configured set of voice emotion features captured in each voice sample; and classifying, by the emotion identification application, each emotion determined by the trained ML model based on a set of classifying features to label one or more types of emotions captured in each voice sample.

Method and system for analyzing customer calls by implementing a machine learning model to identify emotions

A method and system for voice emotion identification contained in audio in a call providing customer support between a customer and a service agent by implementing an emotion identification application to identify emotions captured in a voice of the customer from audio received by a media streaming device; receiving, by the emotion identification application, an audio stream of a series of voice samples contained in consecutive frames from audio received; extracting, by the emotion identification application, a set of voice emotion features from each frame in each voice sample of the audio by applying a trained machine learning (ML) model for identifying emotions utilizing a neural networks to determine one or more voice emotions by a configured set of voice emotion features captured in each voice sample; and classifying, by the emotion identification application, each emotion determined by the trained ML model based on a set of classifying features to label one or more types of emotions captured in each voice sample.