G10H2210/036

Chord Identification Method and Chord Identification Apparatus
20190266988 · 2019-08-29 ·

A chord identification method selects from among a plurality of chord identifiers a chord identifier that corresponds to an attribute of a piece of music represented by an audio signal, where the plurality of chord identifiers corresponds to respective ones of a plurality of attributes relating to pieces of music; and identifies a chord for the audio signal by applying a feature amount of the audio signal to the selected chord identifier.

System and method for evaluating semantic closeness of data files

The invention provides for the evaluation of semantic closeness of a source data file relative to candidate data files. The system includes an artificial neural network and processing intelligence that derives a property vector from extractable measurable properties of a data file. The property vector is mapped to related semantic properties for that same data file and such that, during ANN training, pairwise similarity/dissimilarity in property is mapped, during towards corresponding pairwise semantic similarity/dissimilarity in semantic space to preserve semantic relationships. Based on comparisons between generated property vectors in continuous multi-dimensional property space, the system and method assess, rank, and then recommend and/or filter semantically close or semantically disparate candidate files from a query from a user that includes the data file. Applications of the categorization and recommendation system apply to search tools, including identification of illicit materials or logically progressive associations between disparate files.

Audio processing techniques for semantic audio recognition and report generation

Example apparatus, articles of manufacture and methods to determine semantic audio information for audio are disclosed. Example methods include extracting a plurality of audio features from the audio, at least one of the plurality of audio features including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature. Example methods also include comparing the plurality of audio features to a plurality of stored audio feature ranges having tags associated therewith. Example methods further include determining a set of ranges of the plurality of stored audio feature ranges having closest matches to the plurality of audio features, a tag associated with the set of ranges having the closest matches to be used to determine the semantic audio information for the audio.

Audio matching with semantic audio recognition and report generation

Example articles of manufacture and apparatus disclosed herein for producing supplemental information for audio signature data obtain the audio signature data of a first time period including data relating to at least one of time or frequency components representing a first characteristic of media. Disclosed examples also obtain first semantic audio signature data, for the first time period, that is a measure of generalized information representing characteristics of the media. Disclosed examples further store the audio signature data of the first time period in association with a second time period when it is determined that second semantic audio signature data for the second time period substantially matches the first semantic audio signature data for the first time period.

Sound signal processing method and sound signal processing apparatus

A method for processing an input sound signal of singing voice, to obtain a sound signal with an impression different from the input sound signal, includes: selecting a genre from among a plurality of tune genres in accordance with a selection operation by a user, setting, to a first unit, a set of first parameters corresponding to the selected genre, displaying a first impression identifier corresponding to the selected genre for a first control of a first user parameter in the set of first parameters, changing the first user parameter in accordance with a change operation on the first control by the user, and strengthening, by the first unit, signal components within a particular frequency band of the sound signal, in accordance with the set of first parameters including the first user parameters.

Music Recommendation Based On Wearable Devices
20240192775 · 2024-06-13 ·

Provided are a method, apparatus, computer device, and storage medium for music recommendation based on a wearable device, and relates to the field of computer technologies. The method for music recommendation includes: obtaining one or more physiological parameters of a target user collected by the wearable device; inputting the one or more physiological parameters of the target user into a trained relaxation state assessment model to determine a current relaxation state of the target user; and determining at least one piece of target recommendation music based on at least one of the current relaxation state of the target user or relaxation parameters corresponding to multiple pieces of music to be recommended, wherein the at least one piece of target recommendation music is configured to be played for the target user.

COMPLEX EVOLUTION RECURRENT NEURAL NETWORKS

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex evolution recurrent neural networks. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A first vector sequence comprising audio features determined from the audio data is generated. A second vector sequence is generated, as output of a first recurrent neural network in response to receiving the first vector sequence as input, where the first recurrent neural network has a transition matrix that implements a cascade of linear operators comprising (i) first linear operators that are complex-valued and unitary, and (ii) one or more second linear operators that are non-unitary. An output vector sequence of a second recurrent neural network is generated. A transcription for the utterance is generated based on the output vector sequence generated by the second recurrent neural network. The transcription for the utterance is provided.

COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Intelligent crossfade with separated instrument tracks

A method is provided including separating a first file into a first plurality of instrument tracks and a second file into a second plurality of instrument tracks, wherein each instrument track of each of the first plurality and second plurality corresponds to a type of instrument; selecting a first instrument track from the first plurality of instrument tracks and a second instrument track from the second plurality of instrument tracks based at least on the type of instrument corresponding to the first instrument track and the second instrument track; fading out other instrument tracks from the first plurality of instrument tracks; performing a crossfade between the first instrument track and the second instrument track; and fading in other instrument tracks from the second plurality of instrument tracks.

Granular User Feedback Tracking for Generative Music Systems
20240290308 · 2024-08-29 ·

Techniques are disclosed that pertain to generating output music content based on musical embeddings. A computer system generates output music content that includes multiple overlapping musical expressions in time. The computer system receives user feedback at a point in time while the output music content is being played. Based on the user feedback and based on characteristics of the output music content associated with the point in time, the computer system determines one or more expression embeddings generated based on expressions selected for inclusion in the output music content and one or more composition embeddings generated based on combined expressions in the output music content. The computer system generates additional output music content based on the expression and composition embeddings.