Patent classifications
G10L25/15
Automated music composition and generation system for spotting digital media objects and event markers using emotion-type, style-type, timing-type and accent-type musical experience descriptors that characterize the digital music to be automatically composed and generated by the system
An autonomous music composition and performance system employing an automated music composition and generation engine configured to receive musical signals from a set of a real or synthetic musical instruments being played by a group of human musicians. The system buffers and analyzes musical signals from the set of real or synthetic musical instruments, composes and generates music in real-time that augments the music being played by the band of musicians, and/or records, analyzes and composes music recorded for subsequent playback, review and consideration by the human musicians.
LEARNING ALGORITHM TO DETECT HUMAN PRESENCE IN INDOOR ENVIRONMENTS FROM ACOUSTIC SIGNALS
A system is described that constantly learns the sound characteristics of an indoor environment to detect the presence or absence of humans within that environment. A detection model is constructed and a decision feedback approach is used to constantly learn and update the statistics of the detection features and sound events that are unique to the environment in question. The learning process may not only rely on acoustic signal, but may also make use of signals derived from other sensors such as range sensor, motion sensors, pressure sensors, and video sensors.
SPEAKER RECOGNITION WITH ASSESSMENT OF AUDIO FRAME CONTRIBUTION
This application describes methods and apparatus for speaker recognition. An apparatus according to an embodiment has an analyzer for analyzing each frame of a sequence of frames of audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame. An assessment module determines, for each frame of audio data, a contribution indicator of the extent to which that frame of audio data should be used for speaker recognition processing based on the determined characteristic of the speech sound. Said contribution indicator comprises a weighting to be applied to each frame in the speaker recognition processing. In this way frames which correspond to speech sounds that are of most use for speaker discrimination may be emphasized and/or frames which correspond to speech sounds that are of least use for speaker discrimination may be de-emphasized.
SPEAKER RECOGNITION WITH ASSESSMENT OF AUDIO FRAME CONTRIBUTION
This application describes methods and apparatus for speaker recognition. An apparatus according to an embodiment has an analyzer for analyzing each frame of a sequence of frames of audio data which correspond to speech sounds uttered by a user to determine at least one characteristic of the speech sound of that frame. An assessment module determines, for each frame of audio data, a contribution indicator of the extent to which that frame of audio data should be used for speaker recognition processing based on the determined characteristic of the speech sound. Said contribution indicator comprises a weighting to be applied to each frame in the speaker recognition processing. In this way frames which correspond to speech sounds that are of most use for speaker discrimination may be emphasized and/or frames which correspond to speech sounds that are of least use for speaker discrimination may be de-emphasized.
ADAPTIVE ENHANCEMENT OF SPEECH SIGNALS
A signal processing apparatus that handles an adaptive enhancement of a speech signal, receives a first signal and a second signal from a determined source. At least one of a speech signal or at least one noise signal is present in the first signal or the second signal. The first signal and the received second signal are processed to obtain a processed signal for amplification of a gain associated with the speech signal present in the first signal and the second signal by a determined factor. A signal-to-noise ratio (SNR) associated with the processed signal is greater than or equal to a threshold value. A reference noise signal is obtained from the second signal based on subtraction of an estimated the speech signal present in the received second signal from the processed signal. A processed speech signal is determined based on filtration of the obtained reference noise signal.
VOICE MODIFICATION DETECTION USING PHYSICAL MODELS OF SPEECH PRODUCTION
A computer may train a single-class machine learning using normal speech recordings. The machine learning model or any other model may estimate the normal range of parameters of a physical speech production model based on the normal speech recordings. For example, the computer may use a source-filter model of speech production, where voiced speech is represented by a pulse train and unvoiced speech by a random noise and a combination of the pulse train and the random noise is passed through an auto-regressive filter that emulates the human vocal tract. The computer leverages the fact that intentional modification of human voice introduces errors to source-filter model or any other physical model of speech production. The computer may identify anomalies in the physical model to generate a voice modification score for an audio signal. The voice modification score may indicate a degree of abnormality of human voice in the audio signal.
AUTOMATED MUSIC COMPOSITION AND GENERATION SYSTEM EMPLOYING AN INSTRUMENT SELECTOR FOR AUTOMATICALLY SELECTING VIRTUAL INSTRUMENTS FROM A LIBRARY OF VIRTUAL INSTRUMENTS TO PERFORM THE NOTES OF THE COMPOSED PIECE OF DIGITAL MUSIC
An automated music composition and generation system for automatically composing and generating digital pieces of music using an automated music composition and generation engine driven by a set of emotion-type and style-type musical experience descriptors and time and/or space parameters provided by a system user. The automated music composition and generation engine includes an instrument subsystem supporting a library of virtual instruments, wherein each virtual instrument is capable of performing one or more notes of at least a portion of the composed piece of music, in response to the emotion-type and/or style-type musical experience descriptors; an instrument selector subsystem for automatically selecting one or more of virtual instruments from the library, so that each selected virtual instrument performs one or more notes of at least a portion of the composed piece of music; and a digital piece creation subsystem for creating the digital piece of composed music by assembling the notes produced from the virtual instruments selected from the library.
AUTOMATED MUSIC COMPOSITION AND GENERATION SYSTEM EMPLOYING AN INSTRUMENT SELECTOR FOR AUTOMATICALLY SELECTING VIRTUAL INSTRUMENTS FROM A LIBRARY OF VIRTUAL INSTRUMENTS TO PERFORM THE NOTES OF THE COMPOSED PIECE OF DIGITAL MUSIC
An automated music composition and generation system for automatically composing and generating digital pieces of music using an automated music composition and generation engine driven by a set of emotion-type and style-type musical experience descriptors and time and/or space parameters provided by a system user. The automated music composition and generation engine includes an instrument subsystem supporting a library of virtual instruments, wherein each virtual instrument is capable of performing one or more notes of at least a portion of the composed piece of music, in response to the emotion-type and/or style-type musical experience descriptors; an instrument selector subsystem for automatically selecting one or more of virtual instruments from the library, so that each selected virtual instrument performs one or more notes of at least a portion of the composed piece of music; and a digital piece creation subsystem for creating the digital piece of composed music by assembling the notes produced from the virtual instruments selected from the library.
METHOD OF AND SYSTEM FOR SPOTTING DIGITAL MEDIA OBJECTS AND EVENT MARKERS USING MUSICAL EXPERIENCE DESCRIPTORS TO CHARACTERIZE DIGITAL MUSIC TO BE AUTOMATICALLY COMPOSED AND GENERATED BY AN AUTOMATED MUSIC COMPOSITION AND GENERATION ENGINE
An automated music composition and generation system and process for scoring a selected media object or event marker, with one or more pieces of digital music, by spotting the selected media object or event marker with musical experience descriptors selected and applied to the selected media object or event marker by the system user during a scoring process, and using said selected musical experience descriptors to drive an automated music composition and generation engine to automatically compose and generate the one or more pieces of digital music.
METHOD OF AND SYSTEM FOR SPOTTING DIGITAL MEDIA OBJECTS AND EVENT MARKERS USING MUSICAL EXPERIENCE DESCRIPTORS TO CHARACTERIZE DIGITAL MUSIC TO BE AUTOMATICALLY COMPOSED AND GENERATED BY AN AUTOMATED MUSIC COMPOSITION AND GENERATION ENGINE
An automated music composition and generation system and process for scoring a selected media object or event marker, with one or more pieces of digital music, by spotting the selected media object or event marker with musical experience descriptors selected and applied to the selected media object or event marker by the system user during a scoring process, and using said selected musical experience descriptors to drive an automated music composition and generation engine to automatically compose and generate the one or more pieces of digital music.