G10H2250/031

SOUND PROCESSING METHOD, SOUND PROCESSING SYSTEM, AND RECORDING MEDIUM
20240265902 · 2024-08-08 ·

A sound processing method includes: generating with a trained generative model, for each of a plurality of time points including a first time point, a first acoustic feature amount of a target sound to be generated, by sequentially processing input data including condition data representing conditions of the target sound; generating, for each of the plurality of time points, a time-domain waveform signal representing a waveform of the target sound based on the first acoustic feature amount; and generating, for each of the plurality of time points, a second acoustic feature amount based on the time-domain waveform signal. The input data at the first time point includes the second acoustic feature amount generated before the first time point.

Signal processing method and signal processing apparatus
10134374 · 2018-11-20 · ·

A signal processing method includes a first specifying step and a first modifying step. In the first specifying step, a first modification object section for a singing voice of a music is specified based on a temporal change of pitch of singing voice data representing the singing voice or a temporal change of pitch in a score of the music. In the first modifying step, a modifying process is performed to the singing voice data. The modifying process modifies at least one of the temporal change of pitch and the temporal change of volume of the singing voice in the first modification object section which is specified by the first specifying step.

AUDIO DATA PROCESSING METHOD AND APPARATUS

An audio data processing method and apparatus are provided. The method includes obtaining audio data. An overall spectrum of the audio data is obtained and separated into a singing voice spectrum and an accompaniment spectrum. An accompaniment binary mask of the audio data is calculated according to the audio data. The singing voice spectrum and the accompaniment spectrum are processed using the accompaniment binary mask, to obtain accompaniment data and singing voice data.

MODELING OF THE LATENT EMBEDDING OF MUSIC USING DEEP NEURAL NETWORK
20180276540 · 2018-09-27 ·

Methods and systems are provided for detecting and cataloging qualities in music. While both the data volume and heterogeneity of the digital music content is huge, it has become increasingly important and convenient to build a recommendation or search system to facilitate surfacing these content to the user or consumer community. Embodiments use deep convolutional neural network to imitate how human brain processes hierarchical structures in the auditory signals, such as music, speech, etc., at various timescales. This approach can be used to discover the latent factor models of the music based upon acoustic hyper-images that are extracted from the raw audio waves of music. These latent embeddings can be used either as features to feed to subsequent models, such as collaborative filtering, or to build similarity metrics between songs, or to classify music based on the labels for training such as genre, mood, sentiment, etc.

ELECTRONIC MEASURING DEVICE
20180277083 · 2018-09-27 ·

An electronic measuring device captures a plurality of audio samples, wherein each audio sample corresponds to a different string of a musical instrument. The device further identifies a plurality of frequency components of each of the plurality of audio samples, calculates an optimal tuning curve based on the plurality of frequency components of each of the plurality of audio samples, and determines a deviation of the plurality of frequency components of each of the plurality of audio samples from the optimal tuning curve.

Method and system for speech-to-singing voice conversion
10008193 · 2018-06-26 · ·

A singing voice conversion system configured to generate a song in the voice of a target singer based on a song in the voice of a source singer is disclosed. The embodiment utilizes two complementary approaches to voice timbre conversion. Both combine the natural prosody of a source singer with the pitch of the target singertypically the user of the systemto achieve realistic sounding synthetic singing. The system is able to transpose the key of any song to match the automatically determined or desired pitch range of the target singer, thus allowing the system to generalize to any target singer, irrespective of their gender, natural pitch range, and the original pitch range of the song to be sung.

Systems and methods for audio based synchronization using sound harmonics
09972294 · 2018-05-15 · ·

Multiple audio files may be synchronized using harmonic sound included in audio content obtained from audio tracks. Individual audio tracks are partitioned into multiple temporal windows of a first and second temporal window length. Individual audio waveforms for individual temporal windows of the first and second window length are transformed into frequency space in which energy is represented as a function of frequency. Individual pitches and magnitudes of harmonic sound determined for individual temporal windows may be compared using a multi-resolution framework to correlate pitches and harmonic energy of multiple audio tracks to one another.

SIGNAL PROCESSING METHOD AND SIGNAL PROCESSING APPARATUS
20180122346 · 2018-05-03 ·

A signal processing method includes a first specifying step and a first modifying step. In the first specifying step, a first modification object section for a singing voice of a music is specified based on a temporal change of pitch of singing voice data representing the singing voice or a temporal change of pitch in a score of the music. In the first modifying step, a modifying process is performed to the singing voice data. The modifying process modifies at least one of the temporal change of pitch and the temporal change of volume of the singing voice in the first modification object section which is specified by the first specifying step.

Systems and methods for audio remixing using repeated segments
09916822 · 2018-03-13 · ·

A derivative track for an audio track may be generated. An audio track duration of the audio track may be partitioned into partitions of a partition size. A current partition may be compared to remaining partitions of the audio track. Audio information for the current partition may be correlated to audio information for remaining partitions to determine a correlated partition for the current partition from among the remaining partitions of the track duration. The correlated partition determined may be identified as most likely to represent the same sound as the current partition. This comparison process may be performed iteratively for individual ones of the remaining partitions. One or more regions of the audio track may be identified. Individual regions may include multiple correlated partitions that are temporally adjacent along the audio track duration. One or more partitions within one or more regions may be removed to generate the derivative track.

Customized audio spectrum generation of gaming music

System, process and device configurations are provided for customized audio spectrum generation of gaming music. A method includes detecting at least one audio spectrum input for gaming music and updating a parameter for dynamic music generation based on the at least one audio spectrum input. The method also includes generating gaming music for the gaming system dynamically to include at least one musical motif determined based on the parameter, and output of gaming music. Frequency or tonalities of dynamic generated music that interfere with gameplay sound effects may be adjusted to improve gameplay. In some cases volume of sound elements may be shifted in volume or pitch to indicate presence of gaming elements or opportunities during gameplay. Similarly, frequency ranges may be customized to avoid one or more frequencies, such as a frequency range identified as being associated with hearing loss or with causing a disturbance, such as vibration.