G10L25/15

EMOTION ESTIMATION SYSTEM AND NON-TRANSITORY COMPUTER READABLE MEDIUM
20200013428 · 2020-01-09 · ·

An emotion estimation system includes a feature amount extraction unit, a vowel section specification unit, and an estimation unit. The feature amount extraction unit analyzes recorded produced speech to extract a predetermined feature amount. The vowel section specification unit specifies, based on the feature amount extracted by the feature amount extraction unit, a section in which a vowel is produced. The estimation unit estimates, based on the feature amount in a vowel section specified by the vowel section specification unit, an emotion of a speaker.

Learning algorithm to detect human presence in indoor environments from acoustic signals
10515654 · 2019-12-24 ·

A system is described that constantly learns the sound characteristics of an indoor environment to detect the presence or absence of humans within that environment. A detection model is constructed and a decision feedback approach is used to constantly learn and update the statistics of the detection features and sound events that are unique to the environment in question. The learning process may not only rely on acoustic signal, but may also make use of signals derived from other sensors such as range sensor, motion sensors, pressure sensors, and video sensors.

AUTOMATED MUSIC COMPOSITION AND GENERATION SYSTEM EMPLOYING VIRTUAL MUSICAL INSTRUMENT LIBRARIES FOR PRODUCING NOTES CONTAINED IN THE DIGITAL PIECES OF AUTOMATICALLY COMPOSED MUSIC
20240062736 · 2024-02-22 ·

An automated music composition and generation system including a system user interface for enabling system users to review and select one or more musical experience descriptors, as well as time and/or space parameters; and an automated music composition and generation engine, operably connected to the system user interface, for receiving, storing and processing musical experience descriptors and time and/or space parameters selected by the system user, so as to automatically compose and generate one or more digital pieces of music in response to the musical experience descriptors and time and/or space parameters selected by the system user. The automated music composition and generation engine includes: a digital piece creation subsystem for creating and delivering the digital piece of music to the system user interface; and a digital audio sample producing subsystem supported by virtual musical instrument libraries.

AUTOMATED MUSIC COMPOSITION AND GENERATION SYSTEM EMPLOYING VIRTUAL MUSICAL INSTRUMENT LIBRARIES FOR PRODUCING NOTES CONTAINED IN THE DIGITAL PIECES OF AUTOMATICALLY COMPOSED MUSIC
20240062736 · 2024-02-22 ·

An automated music composition and generation system including a system user interface for enabling system users to review and select one or more musical experience descriptors, as well as time and/or space parameters; and an automated music composition and generation engine, operably connected to the system user interface, for receiving, storing and processing musical experience descriptors and time and/or space parameters selected by the system user, so as to automatically compose and generate one or more digital pieces of music in response to the musical experience descriptors and time and/or space parameters selected by the system user. The automated music composition and generation engine includes: a digital piece creation subsystem for creating and delivering the digital piece of music to the system user interface; and a digital audio sample producing subsystem supported by virtual musical instrument libraries.

COMPUTING DEVICES AND METHODS FOR CONVERTING AUDIO SIGNALS TO TEXT
20190378533 · 2019-12-12 ·

Computer-implemented methods and computing devices for converting spoken language into text. Computer-implemented methods include receiving an input audio signal that includes spoken language uttered by a speaker, analyzing the input audio signal, and determining one or more differences between one or more measured formant values and one or more model formant values. The computer-implemented methods further may include identifying an optimal trained model for processing the input audio signal and/or transforming the input audio signal into a transformed audio signal that more closely matches the trained model. Computing devices for converting spoken language into text include a processing unit and a memory that stores non-transitory computer readable instructions that, when executed by the processing unit, cause the computing devices to perform the computer-implemented methods disclosed herein.

LOW-COMPLEXITY VOICE ACTIVITY DETECTION

Many processes for audio signal processing can benefit from voice activity detection, which aims to detect the presence of speech as opposed to silence or noise. The present disclosure describes, among other things, leveraging energy-based features of voice and insights on first and second formant frequencies of vowels to provide a low-complexity and low-power voice activity detector. A pair of two channels is provided whereby each channel is configured to detect voice activity in respective frequency bands of interest. Simultaneous activity detected in both channels can be a sufficient condition for determining that voice is present. More channels or pairs of channels can be used to detect different types of voices to improve detection and/or to detect voices present in different audio streams.

LOW-COMPLEXITY VOICE ACTIVITY DETECTION

Many processes for audio signal processing can benefit from voice activity detection, which aims to detect the presence of speech as opposed to silence or noise. The present disclosure describes, among other things, leveraging energy-based features of voice and insights on first and second formant frequencies of vowels to provide a low-complexity and low-power voice activity detector. A pair of two channels is provided whereby each channel is configured to detect voice activity in respective frequency bands of interest. Simultaneous activity detected in both channels can be a sufficient condition for determining that voice is present. More channels or pairs of channels can be used to detect different types of voices to improve detection and/or to detect voices present in different audio streams.

Method for voice identification and device using same

An electronic device may include: a memory; a sound sensor; and a processor, wherein the processor is configured to: receive, from the sound sensor, sound data including a first piece of data corresponding to a first frequency band and a second piece of data corresponding to a second frequency band different from the first frequency band; receive voice data related to a voice of a registered user from the memory; perform voice identification by comparing the first piece of data and the second piece of data with the voice data related to the voice of the registered user; and determine an output based on a result of the voice identification.

Method for voice identification and device using same

An electronic device may include: a memory; a sound sensor; and a processor, wherein the processor is configured to: receive, from the sound sensor, sound data including a first piece of data corresponding to a first frequency band and a second piece of data corresponding to a second frequency band different from the first frequency band; receive voice data related to a voice of a registered user from the memory; perform voice identification by comparing the first piece of data and the second piece of data with the voice data related to the voice of the registered user; and determine an output based on a result of the voice identification.

Automated music composition and generation system for spotting digital media objects and event markers using emotion-type, style-type, timing-type and accent-type musical experience descriptors that characterize the digital music to be automatically composed and generated by the system
10467998 · 2019-11-05 · ·

An autonomous music composition and performance system employing an automated music composition and generation engine configured to receive musical signals from a set of a real or synthetic musical instruments being played by a group of human musicians. The system buffers and analyzes musical signals from the set of real or synthetic musical instruments, composes and generates music in real-time that augments the music being played by the band of musicians, and/or records, analyzes and composes music recorded for subsequent playback, review and consideration by the human musicians.