Patent classifications
G10L25/90
HOWLING SUPPRESSION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
This application relates to a howling suppression method and apparatus, a computer device, and a storage medium. The method includes obtaining a current audio signal corresponding to a current time period, and performing frequency domain transformation on the current audio signal; dividing the frequency domain audio signal and determining a target subband; obtaining a current howling detection result and a current voice detection result that correspond to the current audio signal, and determining a subband gain coefficient; obtaining a past subband gain corresponding to an audio signal within a past time period, and calculating a current subband gain corresponding to the current audio signal based on the subband gain coefficient and the past subband gain; and suppressing howling on the target subband based on the current subband gain, to obtain a first target audio signal corresponding to the current time period.
HOWLING SUPPRESSION METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
This application relates to a howling suppression method and apparatus, a computer device, and a storage medium. The method includes obtaining a current audio signal corresponding to a current time period, and performing frequency domain transformation on the current audio signal; dividing the frequency domain audio signal and determining a target subband; obtaining a current howling detection result and a current voice detection result that correspond to the current audio signal, and determining a subband gain coefficient; obtaining a past subband gain corresponding to an audio signal within a past time period, and calculating a current subband gain corresponding to the current audio signal based on the subband gain coefficient and the past subband gain; and suppressing howling on the target subband based on the current subband gain, to obtain a first target audio signal corresponding to the current time period.
Harmonicity-dependent controlling of a harmonic filter tool
The coding efficiency of an audio codec using a controllable—switchable or even adjustable—harmonic filter tool is improved by performing the harmonicity-dependent controlling of this tool using a temporal structure measure in addition to a measure of harmonicity in order to control the harmonic filter tool. In particular, the temporal structure of the audio signal is evaluated in a manner which depends on the pitch. This enables to achieve a situation-adapted control of the harmonic filter tool so that in situations where a control made solely based on the measure of harmonicity would decide against or reduce the usage of this tool, although using the harmonic filter tool would, in that situation, increase the coding efficiency, the harmonic filter tool is applied, while in other situations where the harmonic filter tool may be inefficient or even destructive, the control reduces the appliance of the harmonic filter tool appropriately.
Harmonicity-dependent controlling of a harmonic filter tool
The coding efficiency of an audio codec using a controllable—switchable or even adjustable—harmonic filter tool is improved by performing the harmonicity-dependent controlling of this tool using a temporal structure measure in addition to a measure of harmonicity in order to control the harmonic filter tool. In particular, the temporal structure of the audio signal is evaluated in a manner which depends on the pitch. This enables to achieve a situation-adapted control of the harmonic filter tool so that in situations where a control made solely based on the measure of harmonicity would decide against or reduce the usage of this tool, although using the harmonic filter tool would, in that situation, increase the coding efficiency, the harmonic filter tool is applied, while in other situations where the harmonic filter tool may be inefficient or even destructive, the control reduces the appliance of the harmonic filter tool appropriately.
DIFFICULT AIRWAY EVALUATION METHOD AND DEVICE BASED ON MACHINE LEARNING VOICE TECHNOLOGY
The present disclosure relates to a difficult airway evaluation method and device based on a machine learning voice technology. The method includes the following steps: acquiring voice data of a patient; carrying out feature extraction on the voice data, obtaining a pitch period of pronunciations, and acquiring a voiced sound feature and unvoiced sound features based on the pitch period of pronunciations; and constructing a difficult airway evaluation classifier based on the machine learning voice technology, analyzing the received voiced sound feature and unvoiced sound features by the trained difficult airway evaluation classifier, and carrying out scoring on the severity of a difficult airway to obtain an evaluation result of the difficult airway.
DIFFICULT AIRWAY EVALUATION METHOD AND DEVICE BASED ON MACHINE LEARNING VOICE TECHNOLOGY
The present disclosure relates to a difficult airway evaluation method and device based on a machine learning voice technology. The method includes the following steps: acquiring voice data of a patient; carrying out feature extraction on the voice data, obtaining a pitch period of pronunciations, and acquiring a voiced sound feature and unvoiced sound features based on the pitch period of pronunciations; and constructing a difficult airway evaluation classifier based on the machine learning voice technology, analyzing the received voiced sound feature and unvoiced sound features by the trained difficult airway evaluation classifier, and carrying out scoring on the severity of a difficult airway to obtain an evaluation result of the difficult airway.
Pronunciation conversion apparatus, pitch mark timing extraction apparatus, methods and programs for the same
Provided is a system which allows a learner who is a non-native speaker of a given language to intuitively improve pronunciation of the language. A pronunciation conversion apparatus includes a conversion section which converts a first feature value corresponding to a first speech signal obtained when a first speaker who speaks a given language as his/her native language speaks another language such that the first feature value approaches a second feature value corresponding to a second speech signal obtained when a second speaker who speaks the other language as his/her native language speaks the other language, each of the first feature value and the second feature value is a feature value capable of representing a difference in pronunciation, and a speech signal obtained from the first feature value after the conversion is presented to the first speaker.
Pronunciation conversion apparatus, pitch mark timing extraction apparatus, methods and programs for the same
Provided is a system which allows a learner who is a non-native speaker of a given language to intuitively improve pronunciation of the language. A pronunciation conversion apparatus includes a conversion section which converts a first feature value corresponding to a first speech signal obtained when a first speaker who speaks a given language as his/her native language speaks another language such that the first feature value approaches a second feature value corresponding to a second speech signal obtained when a second speaker who speaks the other language as his/her native language speaks the other language, each of the first feature value and the second feature value is a feature value capable of representing a difference in pronunciation, and a speech signal obtained from the first feature value after the conversion is presented to the first speaker.
Enhanced graphical user interface for voice communications
Enhanced graphical user interfaces for transcription of audio and video messages is disclosed. Audio data may be transcribed, and the transcription may include emphasized words and/or punctuation corresponding to emphasis of user speech. Additionally, the transcription may be translated into a second language. A message spoken by a user depicted in one or more images of video data may also be transcribed and provided to one or more devices.
Enhanced graphical user interface for voice communications
Enhanced graphical user interfaces for transcription of audio and video messages is disclosed. Audio data may be transcribed, and the transcription may include emphasized words and/or punctuation corresponding to emphasis of user speech. Additionally, the transcription may be translated into a second language. A message spoken by a user depicted in one or more images of video data may also be transcribed and provided to one or more devices.