Patent classifications
G10L21/12
Audio Techniques for Music Content Generation
Techniques are disclosed relating to implementing audio techniques for real-time audio generation. For example, a music generator system may generate new music content from playback music content based on different parameter representations of an audio signal. In some cases, an audio signal can be represented by both a graph of the signal (e.g., an audio signal graph) relative to time and a graph of the signal relative to beats (e.g., a signal graph). The signal graph is invariant to tempo, which allows for tempo invariant modification of audio parameters of the music content in addition to tempo variant modifications based on the audio signal graph.
ELECTRONIC DEVICE FOR RECOGNIZING SOUND AND METHOD THEREOF
An example sound recognition method may include sampling input sound based on a preset sampling rate; performing Fast Fourier Transform (FFT) on the sampled input sound based on at least one of random FFT numbers or random hop lengths, and generating a two-dimensional (2D) feature map with a time axis and a frequency axis from the sampled input sound on which FFT is performed; training a neural network model, which recognizes sound, with a plurality of 2D feature maps including the first 2D feature map and an n.sup.th 2D feature map as training data.
ELECTRONIC DEVICE FOR RECOGNIZING SOUND AND METHOD THEREOF
An example sound recognition method may include sampling input sound based on a preset sampling rate; performing Fast Fourier Transform (FFT) on the sampled input sound based on at least one of random FFT numbers or random hop lengths, and generating a two-dimensional (2D) feature map with a time axis and a frequency axis from the sampled input sound on which FFT is performed; training a neural network model, which recognizes sound, with a plurality of 2D feature maps including the first 2D feature map and an n.sup.th 2D feature map as training data.
Audio waveform display using mapping function
The described technology is generally directed towards providing a visible waveform representation of an audio signal, by processing the audio signal with a polynomial (e.g., cubic) mapping function. Coefficients of the polynomial mapping function are predetermined based on constraints (e.g., slope information and desired range of the resultant curve), and whether the plotted audio waveform corresponds to sound field quantities or power quantities. Once the visible representation of the reshaped audio waveform is displayed, audio and/or video editing operations can be performed, e.g., by time-aligning other audio or video with the reshaped audio waveform, and/or modifying the reshaped audio waveform to change the underlying audio data.
Computerised systems and methods for detection
Methods and systems for detecting marine mammals. Acoustic data can be received from one or more hydrophones. The acoustic data can be sampled, and the sampled acoustic data can be transformed to time-frequency image data. The image data can be processed to transform the data for input to a model. The model can be trained to detect the presence or absence of marine mammal vocalizations in the acoustic data. The model can output a prediction of whether or not a mammal is present.
Computerised systems and methods for detection
Methods and systems for detecting marine mammals. Acoustic data can be received from one or more hydrophones. The acoustic data can be sampled, and the sampled acoustic data can be transformed to time-frequency image data. The image data can be processed to transform the data for input to a model. The model can be trained to detect the presence or absence of marine mammal vocalizations in the acoustic data. The model can output a prediction of whether or not a mammal is present.
METHOD AND SYSTEM FOR LEARNING AND USING LATENT-SPACE REPRESENTATIONS OF AUDIO SIGNALS FOR AUDIO CONTENT-BASED RETRIEVAL
A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
METHOD AND SYSTEM FOR LEARNING AND USING LATENT-SPACE REPRESENTATIONS OF AUDIO SIGNALS FOR AUDIO CONTENT-BASED RETRIEVAL
A method and system are provided for extracting features from digital audio signals which exhibit variations in pitch, timbre, decay, reverberation, and other psychoacoustic attributes and learning, from the extracted features, an artificial neural network model for generating contextual latent-space representations of digital audio signals. A method and system are also provided for learning an artificial neural network model for generating consistent latent-space representations of digital audio signals in which the generated latent-space representations are comparable for the purposes of determining psychoacoustic similarity between digital audio signals. A method and system are also provided for extracting features from digital audio signals and learning, from the extracted features, an artificial neural network model for generating latent-space representations of digital audio signals which take care of selecting salient attributes of the signals that represent psychoacoustic differences between the signals.
CALL CONTROL METHOD AND RELATED PRODUCT
Provided are a call control method and related product. In the method, during a voice call between the first user of the first terminal and the second user of the second terminal, a three-dimensional face model of the second user is displayed; model-driven parameters are determined according to the call voice of the second user, where the model-driven parameters include expression parameters and posture parameters; the three-dimensional face model of the second user is driven according to the model-driven parameters to display a three-dimensional simulated call animation of the second user.
CALL CONTROL METHOD AND RELATED PRODUCT
Provided are a call control method and related product. In the method, during a voice call between the first user of the first terminal and the second user of the second terminal, a three-dimensional face model of the second user is displayed; model-driven parameters are determined according to the call voice of the second user, where the model-driven parameters include expression parameters and posture parameters; the three-dimensional face model of the second user is driven according to the model-driven parameters to display a three-dimensional simulated call animation of the second user.