Patent classifications
G10H2210/031
MUSICAL ANALYSIS METHOD AND MUSIC ANALYSIS DEVICE
A music analysis method realized by a computer includes calculating an evaluation index of each of a plurality of structure candidates formed of N analysis points selected in different combinations from K analysis points in an audio signal of a musical piece, and selecting one of the plurality of structure candidates as a boundary of a structure section of the musical piece in accordance with the evaluation index of each of the plurality of structure candidates. N is a natural number greater than or equal to 2 and less than K, and K is a natural number greater than or equal to 2.
METHOD AND SYSTEM FOR LEARNING AND USING LATENT-SPACE REPRESENTATIONS OF AUDIO SIGNALS FOR AUDIO CONTENT-BASED RETRIEVAL
A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
Electronic apparatus and control method thereof
An electronic apparatus, including a memory configured to store a first artificial intelligence model; and a processor connected to the memory and configured to: based on receiving an input audio signal, obtain an input frequency spectrum image representing a frequency spectrum of the input audio signal, input the input frequency spectrum image to the first artificial intelligence model, obtain an output frequency spectrum image from the first artificial intelligence model, obtain an output audio signal based on the output frequency spectrum image, wherein the first artificial intelligence model is trained based on a target learning image, and wherein the target learning image represents a target frequency spectrum of a specific style, and is obtained from a second artificial intelligence model based on a random value.
Music reactive animation of human characters
Example methods for generating an animated character in dance poses to music may include generating, by at least one processor, a music input signal based on an acoustic signal associated with the music, and receiving, by the at least one processor, a model output signal from an encoding neural network. A current generated pose data is generated using a decoding neural network, the current generated pose data being based on previous generated pose data of a previous generated pose, the music input signal, and the model output signal. An animated character is generated based on a current generated pose data; and the animated character caused to be displayed by a display device.
Cuepoint determination system
A cuepoint determination system utilizes a convolutional neural network (CNN) to determine cuepoint placements within media content items to facilitate smooth transitions between them. For example, audio content from a media content item is normalized to a plurality of beats, the beats are partitioned into temporal sections, and acoustic feature groups are extracted from each beat in one or more of the temporal sections. The acoustic feature groups include at least downbeat confidence, position in bar, peak loudness, timbre and pitch. The extracted acoustic feature groups for each beat are provided as input to the CNN on a per temporal section basis to predict whether a beat immediately following the temporal section within the media content item is a candidate for cuepoint placement. A cuepoint placement is then determined from among the candidate cuepoint placements predicted by the CNN.
SOUND GENERATION METHOD USING MACHINE LEARNING MODEL, TRAINING METHOD FOR MACHINE LEARNING MODEL, SOUND GENERATION DEVICE, TRAINING DEVICE, NON-TRANSITORY COMPUTER-READABLE MEDIUM STORING SOUND GENERATION PROGRAM, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM STORING TRAINING PROGRAM
A sound generation method that is realized by a computer includes receiving a first feature amount sequence in which a musical feature amount changes over time, and using a trained model that has learned an input-output relationship between an input feature amount sequence in which the musical feature amount changes over time at a first fineness and a reference sound data sequence corresponding to an output feature amount sequence in which the musical feature amount changes over time at a second fineness that is higher than the first fineness, to process the first feature amount sequence, thereby generating a sound data sequence corresponding to a second feature amount sequence in which the musical feature amount changes at the second fineness.
METHOD FOR ACCOMPANIMENT PURITY CLASS EVALUATION AND RELATED DEVICES
A method for accompaniment purity class evaluation and related devices are provided. Multiple first accompaniment data and a label corresponding to each of the multiple first accompaniment data are obtained, the label being used to indicate that corresponding first accompaniment data is pure instrumental accompaniment data or instrumental accompaniment data with background noise. An audio feature of each of the multiple first accompaniment data is extracted. Model training is performed according to the audio feature of each of the multiple first accompaniment data and the label corresponding to each of the multiple first accompaniment data, to obtain a neural network model for accompaniment purity class evaluation, a model parameter of the neural network model being determined according to an association relationship between the audio feature of each of the multiple first accompaniment data and the label corresponding to each of the multiple first accompaniment data.
SYSTEM AND METHOD FOR DISTRIBUTED MUSICIAN SYNCHRONIZED PERFORMANCES
A computerized method is provided that enables an interactive multimedia session between a group of geographically distributed musicians. The method includes song arrangements for the interactive multimedia session being specified as a sequence of song parts to be played or sung by each of the participating geographically distributed musicians. Each musician performance is automatically detected on an instrument track along with audio and video for each musician performance on any song part. The timing for each musician performance is automatically captured by the system. The captured performances are transmitted to the musicians participating in a same session of the geographically distributed musicians to produce the effect of playing with other musicians live in the interactive multimedia session. A computer-implemented system and a computer program product stored on a non-transitory computer-readable storage medium for practice of the method are also provided.
Techniques for controlling the expressive behavior of virtual instruments and related systems and methods
Techniques for automatically controlling the expressive behavior of a virtual musical instrument by analyzing an audio recording of a live musician are provided. In some embodiments, an audio recording may be analyzed at various points along the timeline of the recording to derive corresponding values of a parameter that is in some way representative of the musical expression of the live musician. Values of control parameters that control one or more aspects of the audio playback of a virtual instrument may then be generated based on the determined values of the expression parameter. Values of control parameters may be provided to a sample library to control how a digital score selects and/or plays back samples from the library, and/or values of the control parameters may be stored with the digital score for subsequent playback.
System and method for creating a sensory experience by merging biometric data with user-provided content
Systems and methods are provided for using a common “vocabulary,” predefined or dynamically generated based on user-provided content, to transform biometric and/or neurometric data collected from one or more people into a coherent audio and/or visual result. One method comprises receiving a first incoming signal from a bio-generated data sensing device worn by a first user; determining a first set of output values based on the first incoming signal, a common vocabulary comprising a list of possible output values, and a parameter file comprising a set of instructions for applying the common vocabulary to the first incoming signal to derive the first set of output values; generating a first output array comprising the first set of output values; and providing the first output array to an output delivery system configured to render the first output array as a first audio and/or visual output.