Patent classifications
G10L25/15
Automated music composition and generation system driven by emotion-type and style-type musical experience descriptors
An automated music composition and generation system for automatically composing and generating digital pieces of music using an automated music composition and generation engine driven by a set of emotion-type and style-type musical experience descriptors and time and/or space parameters supplied by a system user during an automated music composition and generation process. The system includes a system user interface allowing a system user to input (i) linguistic and/or graphical icon based musical experience descriptors, and (ii) a video, audio-recording, image, slide-show, or event marker, as input through the system user interface.
Automated music composition and generation system driven by emotion-type and style-type musical experience descriptors
An automated music composition and generation system for automatically composing and generating digital pieces of music using an automated music composition and generation engine driven by a set of emotion-type and style-type musical experience descriptors and time and/or space parameters supplied by a system user during an automated music composition and generation process. The system includes a system user interface allowing a system user to input (i) linguistic and/or graphical icon based musical experience descriptors, and (ii) a video, audio-recording, image, slide-show, or event marker, as input through the system user interface.
Speech Model-Based Neural Network-Assisted Signal Enhancement
Several embodiments of a digital speech signal enhancer are described that use an artificial neural network that produces clean speech coding parameters based on noisy speech coding parameters as its input features. A vocoder parameter generator produces the noisy speech coding parameters from a noisy speech signal. A vocoder model generator processes the clean speech coding parameters into estimated clean speech spectral magnitudes. In one embodiment, a magnitude modifier modifies an original frequency spectrum of the noisy speech signal using the estimated clean speech spectral magnitudes, to produce an enhanced frequency spectrum, and a synthesis block converts the enhanced frequency spectrum into time domain, as an output speech sequence. Other embodiments are also described.
Speech Model-Based Neural Network-Assisted Signal Enhancement
Several embodiments of a digital speech signal enhancer are described that use an artificial neural network that produces clean speech coding parameters based on noisy speech coding parameters as its input features. A vocoder parameter generator produces the noisy speech coding parameters from a noisy speech signal. A vocoder model generator processes the clean speech coding parameters into estimated clean speech spectral magnitudes. In one embodiment, a magnitude modifier modifies an original frequency spectrum of the noisy speech signal using the estimated clean speech spectral magnitudes, to produce an enhanced frequency spectrum, and a synthesis block converts the enhanced frequency spectrum into time domain, as an output speech sequence. Other embodiments are also described.
ORAL FUNCTION EVALUATION METHOD, RECORDING MEDIUM, ORAL FUNCTION EVALUATION DEVICE, AND ORAL FUNCTION EVALUATION SYSTEM
An oral function evaluation method includes: obtaining voice data obtained by collecting a voice of an evaluatee uttering a syllable or a fixed phrase that includes (i) two or more morae including a change in a first formant frequency or a change in a second formant frequency or (ii) at least one of a flap, a plosive, a voiceless sound, a double consonant, or a fricative; extracting a prosody feature from the voice data obtained; calculating an estimated value of an oral function of the evaluatee, based on the prosody feature extracted and an oral function estimating equation calculated based on a plurality of training data items; and evaluating an oral function deterioration state of the evaluatee by assessing the estimated value using an oral function evaluation indicator.
LEARNING ALGORITHM TO DETECT HUMAN PRESENCE IN INDOOR ENVIRONMENTS FROM ACOUSTIC SIGNALS
A system is described that constantly learns the sound characteristics of an indoor environment to detect the presence or absence of humans within that environment. A detection model is constructed and a decision feedback approach is used to constantly learn and update the statistics of the detection features and sound events that are unique to the environment in question. The learning process may not only rely on acoustic signal, but may also make use of signals derived from other sensors such as range sensor, motion sensors, pressure sensors, and video sensors.
SYSTEM AND METHOD FOR ESTIMATING HORMONE LEVEL AND PHYSIOLOGICAL CONDITIONS BY ANALYSING SPEECH SAMPLES
The present disclosure describes a system and method for estimating hormone levels and physiological conditions of a user by analysing speech samples of said user. A user device of the user may record specifics of speech and use these specifics of speech as a speech sample of user's utterance. The user device may transmit the speech samples to a backend system. The system may isolate phonation segments from the speech samples. The system may filter the one or more phonation segments. The system may isolate uttered speech segments from the one or more phonation segments. The system may perform an acoustic-phonetic analysis of the uttered speech segments. The acoustic-phonetic analysis may use plurality of features for the analysis. The IPA phonemes may be used to derive speech markers that correspond to specific hormones and levels thereof. The system may generate a hormone level report which is transmitted to the user.
System and method of using neural transforms of robust audio features for speech processing
A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.
System and method of using neural transforms of robust audio features for speech processing
A system and method for processing speech includes receiving a first information stream associated with speech, the first information stream comprising micro-modulation features and receiving a second information stream associated with the speech, the second information stream comprising features. The method includes combining, via a non-linear multilayer perceptron, the first information stream and the second information stream to yield a third information stream. The system performs automatic speech recognition on the third information stream. The third information stream can also be used for training HMMs.
CLINICAL SPEECH ANALYSIS SYSTEM FOR CHILDHOOD SPEECH DISORDERS
A speech analysis system that can accurately determine whether a predetermined sound in a spoken work has been correctly pronounced. The speech analysis system includes a machine learning algorithm that has been trained to consider temporal and spectral information about the frame-by-frame components of a target sound. The speech analysis system is personalized based on previous therapy recordings for a speaker's specific error and correct/mispronounced exemplars from speakers with matched vocal tracts. The speech analysis system may be integrated into an adaptive therapy program, such as Speech Motor Chaining, to provide an assessment of proper speech and personalized biofeedback in the place of a live clinician.