G10H2250/235

AUDIO DETECTION METHOD AND APPARATUS, COMPUTER DEVICE, AND READABLE STORAGE MEDIUM
20230050565 · 2023-02-16 ·

This application provide an audio detection method performed by a computer device. The method includes: acquiring a target time point and a reference point of the target time point from target audio data; performing energy evaluation on the target time point according to an audio amplitude value of the target time point to obtain an energy evaluation value of the target time point; performing energy evaluation on the reference point according to an audio amplitude value of the reference point to obtain an energy evaluation value of the reference point; performing accuracy verification on the target time point according to the energy evaluation value of the target time point and the energy evaluation value of the reference point; and if the accuracy verification on the target time point succeeds, adding the target time point as a target stress point into a target stress point set.

Media content identification on mobile devices

A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.

Audio Generation Methods and Systems

A method of generating audio assets, comprising the steps of: receiving an input audio asset having a first duration, generating an input image representative of the input audio asset, training a generative model on the input image and implementing the trained generative model to generate an output image representative of an output audio asset having a second duration different to the first duration, and generating the output audio asset based on the output image.

Effect addition device, effect addition method and storage medium
11694663 · 2023-07-04 · ·

An effect addition device includes at least one processor that executes a time domain convolution process of convolving a first time domain data part of impulse response of sound effects with a time domain data on an original sound, a frequency domain convolution process of convoluting a second time domain data part of the impulse response data with the time domain data on the original sound, a convolution extension process of extending a convolved state(s) of an output signal(s) resulting from the time domain convolution process and/or the frequency domain convolution process by arithmetic processing which corresponds to an all-pass filter and/or arithmetic processing which corresponds to a comb filter, and a synthesized sound effect addition process of adding a sound effect which is synthesized by execution of the time domain convolution process, the frequency domain convolution process and the convolution extension process to the original sound.

INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM

Provided is an information processing device including an acquisition unit that acquires subjective evaluation information from a second user about each performance performed by at least part of a body of a first user moving, a learning unit that performs machine learning on a relationship between the each performance and the corresponding subjective evaluation information and generates relationship information between the each performance and the corresponding subjective evaluation information, and a presentation unit that presents feedback information to the second user based on the relationship information.

Method for detecting audio signal beat points of bass drum, and terminal
11527257 · 2022-12-13 · ·

A method for detecting audio signal beat points of a bass drum, and a terminal. The method comprises: acquiring several intrinsic mode functions based on an inputted audio signal to be detected; calculating instantaneous signals, wherein the instantaneous signals includes instantaneous strength signals and instantaneous frequency signals corresponding to the several intrinsic mode functions; acquiring characteristic signals of the bass drum based on the instantaneous strength signals and the instantaneous frequency signals corresponding to the several intrinsic mode functions; performing peak detection on the characteristic signals to acquire a plurality of peak points; and acquiring the beat points of the bass drum based on the plurality of peak points.

METHOD AND ELECTRONIC DEVICE FOR RECOGNIZING SONG, AND STORAGE MEDIUM
20220366880 · 2022-11-17 ·

A method for recognizing a song, including: acquiring a target song segment and transforming the target song segment to generate a corresponding first spectrum map; generating a multi-dimensional first feature vector according to the first spectrum map and a preset neural network model; acquiring second feature vectors of pre-stored songs, wherein one pre-stored song is divided into a plurality of pre-stored song segments, one pre-stored song segment corresponds to one second feature vector, and the first feature vector and the second feature vectors have the same number of dimensions; calculating similarities between the first feature vector and the second feature vectors, and determining a maximum similarity; and determining that the target song segment and a pre-stored song corresponding to the maximum similarity are different versions of the same song in response to the maximum similarity being greater than a preset threshold.

METHOD AND DEVICE FOR FLATTENING POWER OF MUSICAL SOUND SIGNAL, AND METHOD AND DEVICE FOR DETECTING BEAT TIMING OF MUSICAL PIECE
20220351707 · 2022-11-03 · ·

A method for flattening power of a musical sound signal, said method being characterized by comprising: determining second values corresponding to respective first values indicating power at a plurality of time points of a musical sound signal each on the basis of the result of a comparison between the present value of the first value and the present value of the second value; and flattening the plurality of first values using the second values corresponding to the plurality of first values, respectively, wherein the second value changes while drawing a predetermined trajectory when, in the result of the comparison, a state where the present value of the second value is larger than the present value of the first value continues.

AUTOMATIC ISOLATION OF MULTIPLE INSTRUMENTS FROM MUSICAL MIXTURES

A system, method and computer product for training a neural network system. The method comprises inputting an audio signal to the system to generate plural outputs f(X, Θ). The audio signal includes one or more of vocal content and/or musical instrument content, and each output f(X, Θ) corresponds to a respective one of the different content types. The method also comprises comparing individual outputs f(X, Θ) of the neural network system to corresponding target signals. For each compared output f(X, Θ), at least one parameter of the system is adjusted to reduce a result of the comparing performed for the output f(X, Θ), to train the system to estimate the different content types. In one example embodiment, the system comprises a U-Net architecture. After training, the system can estimate various different types of vocal and/or instrument components of an audio signal, depending on which type of component(s) the system is trained to estimate.

Audio techniques for music content generation

Techniques are disclosed relating to implementing audio techniques for real-time audio generation. For example, a music generator system may generate new music content from playback music content based on different parameter representations of an audio signal. In some cases, an audio signal can be represented by both a graph of the signal (e.g., an audio signal graph) relative to time and a graph of the signal relative to beats (e.g., a signal graph). The signal graph is invariant to tempo, which allows for tempo invariant modification of audio parameters of the music content in addition to tempo variant modifications based on the audio signal graph.