G10L25/60

AUDIO ENCODING METHOD, AUDIO DECODING METHOD, APPARATUS, COMPUTER DEVICE, STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
20230046509 · 2023-02-16 ·

An audio encoding bit rate prediction model training method is performed by a computer device. The method includes: obtaining a sample audio feature parameter corresponding to each of sample audio frames in a first sample audio; performing encoding bit rate prediction on the sample audio feature parameter through an encoding bit rate prediction model, to obtain a sample encoding bit rate for each of the sample audio frames; performing audio encoding on the sample audio frames based on the corresponding sample encoding bit rates to generate sample audio data corresponding to the sample audio frames; performing audio decoding on the sample audio data, to obtain a second sample audio corresponding to the sample audio data; and training the encoding bit rate prediction model based on the first sample audio and the second sample audio until a sample encoding quality score reaches a target encoding quality score.

AUDIO ENCODING METHOD, AUDIO DECODING METHOD, APPARATUS, COMPUTER DEVICE, STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
20230046509 · 2023-02-16 ·

An audio encoding bit rate prediction model training method is performed by a computer device. The method includes: obtaining a sample audio feature parameter corresponding to each of sample audio frames in a first sample audio; performing encoding bit rate prediction on the sample audio feature parameter through an encoding bit rate prediction model, to obtain a sample encoding bit rate for each of the sample audio frames; performing audio encoding on the sample audio frames based on the corresponding sample encoding bit rates to generate sample audio data corresponding to the sample audio frames; performing audio decoding on the sample audio data, to obtain a second sample audio corresponding to the sample audio data; and training the encoding bit rate prediction model based on the first sample audio and the second sample audio until a sample encoding quality score reaches a target encoding quality score.

CONTACT AND ACOUSTIC MICROPHONES FOR VOICE WAKE AND VOICE PROCESSING FOR AR/VR APPLICATIONS
20230050954 · 2023-02-16 ·

A method to combine contact and acoustic microphones in a headset for voice wake and voice processing in immersive reality applications is provided. The method includes receiving, from a contact microphone, a first acoustic signal, determining a fidelity and a quality of the first acoustic signal, receiving, from an acoustic microphone, a second acoustic signal, and when the fidelity and quality of the first acoustic signal exceeds a pre-selected threshold, combining the first acoustic signal and the second acoustic signal to provide an enhanced acoustic signal to a smart glass user. A non-transitory, computer-readable medium storing instructions to cause a headset to perform the above method, and the headset, are also provided.

Hearing system comprising a personalized beamformer

A hearing system configured to be located at or in the head of a user, comprises a) at least two microphones providing at least two electric input signals, b) an own voice detector, c) access to a database (O.sub.l, H.sub.l) comprising c1) relative or absolute own voice transfer function(s), and corresponding c2) absolute or relative acoustic transfer functions for a multitude of test-persons, d) a processor connectable to the at least two microphones, to the own voice detector, and to the database. The processor is configured A) to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones, and B) to estimate personalized relative or absolute head related acoustic transfer functions from at least one spatial location other than the user's mouth to at least one of the microphones of the hearing system in dependence of the estimated own voice relative transfer function(s) and the database (O.sub.l, H.sub.l). The hearing system further comprises e) a beamformer configured to receive the at least two electric input signals, or processed versions thereof, and to determine personalized beamformer weights based on the personalized relative or absolute head related acoustic transfer functions or impulse responses. A method of determining personalized beamformer coefficients (w.sub.k) is further disclosed.

Hearing system comprising a personalized beamformer

A hearing system configured to be located at or in the head of a user, comprises a) at least two microphones providing at least two electric input signals, b) an own voice detector, c) access to a database (O.sub.l, H.sub.l) comprising c1) relative or absolute own voice transfer function(s), and corresponding c2) absolute or relative acoustic transfer functions for a multitude of test-persons, d) a processor connectable to the at least two microphones, to the own voice detector, and to the database. The processor is configured A) to estimate an own voice relative transfer function for sound from the user's mouth to at least one of the at least two microphones, and B) to estimate personalized relative or absolute head related acoustic transfer functions from at least one spatial location other than the user's mouth to at least one of the microphones of the hearing system in dependence of the estimated own voice relative transfer function(s) and the database (O.sub.l, H.sub.l). The hearing system further comprises e) a beamformer configured to receive the at least two electric input signals, or processed versions thereof, and to determine personalized beamformer weights based on the personalized relative or absolute head related acoustic transfer functions or impulse responses. A method of determining personalized beamformer coefficients (w.sub.k) is further disclosed.

Artificial intelligence device and method of operating artificial intelligence device
11580969 · 2023-02-14 · ·

An artificial intelligence device includes a microphone configured to receive a speech command, a speaker, a communication unit configured to perform communication with an external artificial intelligence device, and a processor configured to receive a wake-up command through the microphone, acquire a first speech quality level of the received wake-up command, receive a second speech quality level of the wake-up command input to the external artificial intelligence device from the external artificial intelligence device through the communication unit, output a notification indicating that the artificial intelligence device is selected as an object to be controlled through the speaker, when the first speech quality level is larger than the second speech quality level, receive an operation command through the microphone, acquire an intention of the received operation command and transmit the operation command to an external artificial intelligence device which will perform operation corresponding to the operation command according to the acquired intention through the communication unit.

Artificial intelligence device and method of operating artificial intelligence device
11580969 · 2023-02-14 · ·

An artificial intelligence device includes a microphone configured to receive a speech command, a speaker, a communication unit configured to perform communication with an external artificial intelligence device, and a processor configured to receive a wake-up command through the microphone, acquire a first speech quality level of the received wake-up command, receive a second speech quality level of the wake-up command input to the external artificial intelligence device from the external artificial intelligence device through the communication unit, output a notification indicating that the artificial intelligence device is selected as an object to be controlled through the speaker, when the first speech quality level is larger than the second speech quality level, receive an operation command through the microphone, acquire an intention of the received operation command and transmit the operation command to an external artificial intelligence device which will perform operation corresponding to the operation command according to the acquired intention through the communication unit.

ADJUSTING AUDIO AND NON-AUDIO FEATURES BASED ON NOISE METRICS AND SPEECH INTELLIGIBILITY METRICS

Some implementations involve determining a noise metric and/or a speech intelligibility metric and determining a compensation process corresponding to the noise metric and/or the speech intelligibility metric. The compensation process may involve altering a processing of audio data and/or applying a non-audio-based compensation method. In some examples, altering the processing of the audio data does not involve applying a broadband gain increase to the audio signals. Some examples involve applying the compensation process in an audio environment. Other examples involve determining compensation metadata corresponding to the compensation process and transmitting an encoded content stream that includes encoded compensation metadata, encoded video data and encoded audio data from a first device to one or more other devices.

ADJUSTING AUDIO AND NON-AUDIO FEATURES BASED ON NOISE METRICS AND SPEECH INTELLIGIBILITY METRICS

Some implementations involve determining a noise metric and/or a speech intelligibility metric and determining a compensation process corresponding to the noise metric and/or the speech intelligibility metric. The compensation process may involve altering a processing of audio data and/or applying a non-audio-based compensation method. In some examples, altering the processing of the audio data does not involve applying a broadband gain increase to the audio signals. Some examples involve applying the compensation process in an audio environment. Other examples involve determining compensation metadata corresponding to the compensation process and transmitting an encoded content stream that includes encoded compensation metadata, encoded video data and encoded audio data from a first device to one or more other devices.

Speech fluency evaluation and feedback

Speech fluency evaluation and feedback tools are described. A computing device such as a smartphone may be used to collect speech (and/or other data). The collected data may be analyzed to detect various speech events (e.g., stuttering) and feedback may be generated and provided based on the detected speech events. The collected data may be used to generate a fluency score or other performance metric associated with speech. Collected data may be provided to a practitioner such as a speech therapist or physician for improved analysis and/or treatment.