G10L21/0272

Detecting and Compensating for the Presence of a Speaker Mask in a Speech Signal
20230005498 · 2023-01-05 ·

Compensating a speech signal for the presence of a speaker mask includes receiving a speech signal, dividing the speech signal into subframes, generating speech parameters for a subframe, and determining whether the subframe is suitable for use in detecting a mask. If the subframe is suitable for use in detecting a mask, the speech parameters for the subframe are used in determining whether a mask is present. If a mask is present, the speech parameters for the subframe are modified to produce modified speech parameters that compensate for the presence of the mask.

Detecting and Compensating for the Presence of a Speaker Mask in a Speech Signal
20230005498 · 2023-01-05 ·

Compensating a speech signal for the presence of a speaker mask includes receiving a speech signal, dividing the speech signal into subframes, generating speech parameters for a subframe, and determining whether the subframe is suitable for use in detecting a mask. If the subframe is suitable for use in detecting a mask, the speech parameters for the subframe are used in determining whether a mask is present. If a mask is present, the speech parameters for the subframe are modified to produce modified speech parameters that compensate for the presence of the mask.

Speech-Tracking Listening Device
20220417679 · 2022-12-29 ·

A system (20) includes a plurality of microphones (22), configured to generate different respective signals in response to acoustic waves (36) arriving at the microphones, and a processor (34). The processor is configured to receive the signals, to combine the signals into multiple channels, which correspond to different respective directions relative to the microphones by virtue of each channel representing any portion of the acoustic waves arriving from the corresponding direction with greater weight, relative to others of the directions, to calculate respective energy measures of the channels, to select one of the directions, in response to the energy measure for the channel corresponding to the selected direction passing one or more energy thresholds, and to output a combined signal representing the selected direction with greater weight, relative to others of the directions. Other embodiments are also described.

Voice signal enhancing method and device

The disclosure discloses a voice signal enhancing method and device, which divide a voice signal at the present scene into multiple frame signals based on a preset time interval; feed multiple frame signals into a trained neural network based on a preset step size, perform convolution operations on multiple frame signals through skip-connected convolutional layers to obtain multiple enhanced frame signals; superpose each enhanced frame signal according to the time domain of each enhanced frame signal to obtain an enhanced voice signal. Compared with the prior art, the present disclosure automatically enhances voice signals through the neural network without manual interference, so the effects and the application scenes of voice enhancement is not necessary to be limited by the preset method and method designers, thereby reducing the occurrence frequency of signal distortion and extra noises, which in turn improves the effects of the voice signal enhancement.

Voice signal enhancing method and device

The disclosure discloses a voice signal enhancing method and device, which divide a voice signal at the present scene into multiple frame signals based on a preset time interval; feed multiple frame signals into a trained neural network based on a preset step size, perform convolution operations on multiple frame signals through skip-connected convolutional layers to obtain multiple enhanced frame signals; superpose each enhanced frame signal according to the time domain of each enhanced frame signal to obtain an enhanced voice signal. Compared with the prior art, the present disclosure automatically enhances voice signals through the neural network without manual interference, so the effects and the application scenes of voice enhancement is not necessary to be limited by the preset method and method designers, thereby reducing the occurrence frequency of signal distortion and extra noises, which in turn improves the effects of the voice signal enhancement.

COMPUTING RESOURCE-SAVING VOICE ASSISTANT
20220406312 · 2022-12-22 ·

A voice assistant includes an electronic processor unit connected to at least one microphone and to remote equipment. The electronic processor unit includes both single detection modules for detecting respective single keywords from an audio signal supplied by the microphone, and also a control unit connected to the single detection modules to select predetermined actions as a function of the detected keywords and to perform those actions. The control module is also arranged to detect whether actions are doable and to activate or deactivate the single detection modules as a function of the doability of the actions.

COMPUTING RESOURCE-SAVING VOICE ASSISTANT
20220406312 · 2022-12-22 ·

A voice assistant includes an electronic processor unit connected to at least one microphone and to remote equipment. The electronic processor unit includes both single detection modules for detecting respective single keywords from an audio signal supplied by the microphone, and also a control unit connected to the single detection modules to select predetermined actions as a function of the detected keywords and to perform those actions. The control module is also arranged to detect whether actions are doable and to activate or deactivate the single detection modules as a function of the doability of the actions.

DEEP SOURCE SEPARATION ARCHITECTURE

A speech separation server comprises a deep-learning encoder with nonlinear activation. The encoder is programmed to take a mixture audio waveform in the time domain, learn generalized patterns from the mixture audio waveform, and generate an encoded representation that effectively characterizes the mixture audio waveform for speech separation.

ELECTRONIC DEVICE AND PERSONALIZED AUDIO PROCESSING METHOD OF THE ELECTRONIC DEVICE
20220406324 · 2022-12-22 ·

According to an embodiment, an electronic device, comprises: a microphone configured to receive an audio signal comprising a speech of a user; a memory storing instructions therein; and a processor electrically connected to the memory and configured to execute the instructions, wherein execution of the instructions by the processor, causes the processor to perform a plurality of operations, the plurality of operations comprising: removing noise from the audio signal, thereby generating a first output result; performing speaker separation on the audio signal on the audio signal or the first output result, thereby generating a second output result; and processing a command corresponding to the audio signal based on the first output result and the second output result.

ELECTRONIC DEVICE AND PERSONALIZED AUDIO PROCESSING METHOD OF THE ELECTRONIC DEVICE
20220406324 · 2022-12-22 ·

According to an embodiment, an electronic device, comprises: a microphone configured to receive an audio signal comprising a speech of a user; a memory storing instructions therein; and a processor electrically connected to the memory and configured to execute the instructions, wherein execution of the instructions by the processor, causes the processor to perform a plurality of operations, the plurality of operations comprising: removing noise from the audio signal, thereby generating a first output result; performing speaker separation on the audio signal on the audio signal or the first output result, thereby generating a second output result; and processing a command corresponding to the audio signal based on the first output result and the second output result.