Patent classifications
G10L13/02
SOUND PROCESSING METHOD USING DJ TRANSFORM
Provided is a sound processing method performed by a computer, the method comprising generating a DJ transform spectrogram indicating estimated pure-tone amplitudes for respective frequencies corresponding to natural frequencies of a plurality of springs and a plurality of time points by modeling an oscillation motion of the plurality of springs having different natural frequencies, with respect to an input sound, and calculating the estimated pure-tone amplitudes for the respective natural frequencies; calculating degrees of fundamental frequency suitability based on a moving average of the estimated pure-tone amplitudes or a moving standard deviation of the estimated pure-tone amplitudes with respect to each natural frequency of the DJ transform spectrogram; and extracting the fundamental frequency based on local maximum values of the degrees of fundamental frequency suitability for the respective natural frequencies at each of the plurality of time points.
Voice recognition device and method for learning voice data
A voice recognition device and a method for learning voice data using the same are disclosed. The voice recognition device combines feature information for various speakers with a text-to-speech function to generate voice data recognized by a voice recognition unit, and can improve voice recognition efficiency by allowing the voice recognition unit itself to learn various voice data. The voice recognition device can be associated with an artificial intelligence module, a robot, an augmented reality (AR) device, a virtual reality (VR) device, devices related to 5G services, and the like.
Voice recognition device and method for learning voice data
A voice recognition device and a method for learning voice data using the same are disclosed. The voice recognition device combines feature information for various speakers with a text-to-speech function to generate voice data recognized by a voice recognition unit, and can improve voice recognition efficiency by allowing the voice recognition unit itself to learn various voice data. The voice recognition device can be associated with an artificial intelligence module, a robot, an augmented reality (AR) device, a virtual reality (VR) device, devices related to 5G services, and the like.
Data generation apparatus and data generation method that generate recognition text from speech data
According to one embodiment, the data generation apparatus includes a speech synthesis unit, a speech recognition unit, a matching processing unit, and a dataset generation unit. The speech synthesis unit generates speech data from an original text. The speech recognition unit generates a recognition text by speech recognition from the speech data. The matching processing unit performs matching between the original text and the recognition text. The dataset generation unit generates a dataset in such a manner where the speech data, from which the recognition text satisfying a certain condition for a matching degree relative to the original text is generated, is associated with the original text, based on a matching result.
Data generation apparatus and data generation method that generate recognition text from speech data
According to one embodiment, the data generation apparatus includes a speech synthesis unit, a speech recognition unit, a matching processing unit, and a dataset generation unit. The speech synthesis unit generates speech data from an original text. The speech recognition unit generates a recognition text by speech recognition from the speech data. The matching processing unit performs matching between the original text and the recognition text. The dataset generation unit generates a dataset in such a manner where the speech data, from which the recognition text satisfying a certain condition for a matching degree relative to the original text is generated, is associated with the original text, based on a matching result.
Recognition or synthesis of human-uttered harmonic sounds
Within each harmonic spectrum of a sequence of spectra derived from analysis of a waveform representing human speech are identified two or more fundamental or harmonic components that have frequencies that are separated by integer multiples of a fundamental acoustic frequency. The highest harmonic frequency that is also greater than 410 Hz is a primary cap frequency, which is used to select a primary phonetic note that corresponds to a subset of phonetic chords from a set of phonetic chords for which acoustic spectral is available. The spectral data can also include frequencies for primary band, secondary band (or secondary note), basal band, or reduced basal band acoustic components, which can be used to select a phonetic chord from the subset of phonetic chords corresponding to the selected primary note.
Omni-channel orchestrated conversation system and virtual conversation agent for realtime contextual and orchestrated omni-channel conversation with a human and an omni-channel orchestrated conversation process for conducting realtime contextual and fluid conversation with the human by the virtual conversation agent
An omni-channel orchestrated conversation system and virtual conversation agent for realtime contextual and orchestrated omni-channel conversation with a human and an omni-channel orchestrated conversation process for conducting realtime contextual and fluid conversation with a human by a virtual conversation agent in relation to a particular domain are disclosed.
Omni-channel orchestrated conversation system and virtual conversation agent for realtime contextual and orchestrated omni-channel conversation with a human and an omni-channel orchestrated conversation process for conducting realtime contextual and fluid conversation with the human by the virtual conversation agent
An omni-channel orchestrated conversation system and virtual conversation agent for realtime contextual and orchestrated omni-channel conversation with a human and an omni-channel orchestrated conversation process for conducting realtime contextual and fluid conversation with a human by a virtual conversation agent in relation to a particular domain are disclosed.
AUTOMATED SESSION PARTICIPATION ON BEHALF OF ABSENT PARTICIPANTS
The technology disclosed herein enables an absent participant to participate in a communication session. In a particular embodiment, a method includes identifying a meeting for an automated attendee to attend on behalf of a user. At a time for the meeting, the method includes joining the automated attendee to a communication session for the meeting. In the automated attendee, the method provides monitoring, in real-time, user communications exchanged between two or more other users over the communication session. During the monitoring, upon identifying a portion of the user communications that is relevant to the user, the method provides notifying the user about the portion.
AUTOMATED SESSION PARTICIPATION ON BEHALF OF ABSENT PARTICIPANTS
The technology disclosed herein enables an absent participant to participate in a communication session. In a particular embodiment, a method includes identifying a meeting for an automated attendee to attend on behalf of a user. At a time for the meeting, the method includes joining the automated attendee to a communication session for the meeting. In the automated attendee, the method provides monitoring, in real-time, user communications exchanged between two or more other users over the communication session. During the monitoring, upon identifying a portion of the user communications that is relevant to the user, the method provides notifying the user about the portion.