G10L2021/03646

HEARING SYSTEM INCLUDING A HEARING INSTRUMENT AND METHOD FOR OPERATING THE HEARING INSTRUMENT
20230047868 · 2023-02-16 ·

A hearing system includes a hearing instrument for capturing a sound signal from an environment of the hearing instrument. The captured sound signal is processed, and the processed sound signal is output to a user of the hearing instrument. In a speech recognition step, the captured sound signal is analyzed to recognize speech intervals, in which the captured sound signal contains speech. In a speech enhancement procedure performed during recognized speech intervals, the amplitude of the processed sound signal is periodically varied according to a temporal pattern that is consistent with a stress rhythmic pattern of the user. A method for operating the hearing instrument is also provided.

LOW LATENCY AUTOMIXER INTEGRATED WITH VOICE AND NOISE ACTIVITY DETECTION

Systems and methods are disclosed for providing voice and noise activity detection with audio automixers that can reject errant non-voice or non-human noises while maximizing signal-to-noise ratio and minimizing audio latency.

TRAINING APPARATUS, METHOD OF THE SAME AND PROGRAM

A training device changes feedback formant frequencies which are formant frequencies of a picked-up speech signal, applies a lowpass filter, converts the picked-up speech signal, adds high-pass noise to the converted speech signal, feeds back the converted speech signal with the high-pass noise added to a subject, calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on the compensatory response vector and a correct compensatory response vector.

Speech synthesis in noisy environment

Disclosed is speech synthesis in a noisy environment. According to an embodiment of the disclosure, a method of speech synthesis may generate a Lombard effect-applied synthesized speech using a feature vector generated from an utterance feature. According to the disclosure, the speech synthesis method and device may be related to artificial intelligence (AI) modules, unmanned aerial vehicles (UAVs), robots, augmented reality (AR) devices, virtual reality (VR) devices, and 5G service-related devices.

Sibilance Detection and Mitigation

The present disclosure relates to sibilance detection and mitigation in a voice signal. A method of sibilance detection and mitigation is described. In the method, a predetermined spectrum feature is extracted from a voice signal, the predetermined spectrum feature representing a distribution of signal energy over a voice frequency band. Sibilance is then identified based on the predetermined spectrum feature. Excessive sibilance is further identified from the identified sibilance based on a level of the identified sibilance. Then the voice signal is processed by decreasing a level of the excessive sibilance so as to suppress the excessive sibilance. Corresponding system and computer program products are described as well.

Low latency automixer integrated with voice and noise activity detection

Systems and methods are disclosed for providing voice and noise activity detection with audio automixers that can reject errant non-voice or non-human noises while maximizing signal-to-noise ratio and minimizing audio latency.

Conversation dependent volume control
11211080 · 2021-12-28 · ·

Techniques are described for detecting a conversation between at least two people, and for reducing noise during the conversation. In certain embodiments, at least one speech metric is generated based on spectral analysis of an audio signal and is used to determine that the audio signal represents speech from a first person. Responsive to determining that the speech is part of a conversation between the first person and a second person an operating state of a device in a physical environment is adjusted such that a volume level of sound contributed by or associated with the device is reduced. The sound contributed by or associated with the device corresponds to noise, at least for the duration of the conversation. Therefore, reducing the volume level of sound contributed by or associated with the device reduces the overall noise level in the environment, resulting in a reduction in conversational effort.

Training apparatus, method of the same and program

A training device changes feedback formant frequencies which are formant frequencies of a picked-up speech signal, applies a lowpass filter, converts the picked-up speech signal, adds high-pass noise to the converted speech signal, feeds back the converted speech signal with the high-pass noise added to a subject, calculates a compensatory response vector by using pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted with change of the feedback formant frequencies to the subject, and pickup formant frequencies which are formant frequencies of a speech signal acquired by picking up an utterance made by the subject while feeding back a speech signal that has been converted without change of the feedback formant frequencies to the subject, and determines an evaluation based on the compensatory response vector and a correct compensatory response vector.

TRANSFORMING VOICE SIGNALS TO COMPENSATE FOR EFFECTS FROM A FACIAL COVERING

In one example embodiment, audio characteristics of audio signals are adjusted by a first machine learning model to reduce effects of a facial covering and produce adjusted audio signals. The audio signals correspond to resulting voice signals produced from the facial covering affecting original voice signals. Speech characteristics are predicted for the adjusted audio signals by a second machine learning model. Transformed audio signals corresponding to the original voice signals are produced based on the adjusted audio signals and predicted speech characteristics.

Low latency automixer integrated with voice and noise activity detection

Systems and methods are disclosed for providing voice and noise activity detection with audio automixers that can reject errant non-voice or non-human noises while maximizing signal-to-noise ratio and minimizing audio latency.