G10L25/60

Multi-stream target-speech detection and channel fusion

Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.

Multi-stream target-speech detection and channel fusion

Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.

VEHICLE QUALITY PROBLEM MANAGEMENT SYSTEM AND METHOD FOR PROCESSING DATA THEREOF
20230005493 · 2023-01-05 ·

The present disclosure relates to a vehicle quality problem management system and a data processing method thereof. The system includes a management server including a server communication device that performs wireless communication with a mobile device, and a server processing device connected to the server communication device, and the server processing device receives a voice signal containing a current quality problem from the mobile device, converts the current quality problem in the voice signal to text using speech to text (STT), and registers the current quality problem converted into the text in a database (DB).

VEHICLE QUALITY PROBLEM MANAGEMENT SYSTEM AND METHOD FOR PROCESSING DATA THEREOF
20230005493 · 2023-01-05 ·

The present disclosure relates to a vehicle quality problem management system and a data processing method thereof. The system includes a management server including a server communication device that performs wireless communication with a mobile device, and a server processing device connected to the server communication device, and the server processing device receives a voice signal containing a current quality problem from the mobile device, converts the current quality problem in the voice signal to text using speech to text (STT), and registers the current quality problem converted into the text in a database (DB).

Detection and alerting based on room occupancy
11545024 · 2023-01-03 · ·

Input data, such as audio and/or video data, may be captured from a first room, for example via microphones and/or cameras within the first room. A first quantity of people within the first room may be determined based at least in part on the input data. An alert may be provided when the first quantity of people exceeds a threshold quantity of people. Additionally, locations of people within the room may also be detected based at least in part on the input data. A first proximity of a first person in the room to a second person in the room may be determined. An alert may also be provided when the first proximity is less than a threshold proximity.

Detection and alerting based on room occupancy
11545024 · 2023-01-03 · ·

Input data, such as audio and/or video data, may be captured from a first room, for example via microphones and/or cameras within the first room. A first quantity of people within the first room may be determined based at least in part on the input data. An alert may be provided when the first quantity of people exceeds a threshold quantity of people. Additionally, locations of people within the room may also be detected based at least in part on the input data. A first proximity of a first person in the room to a second person in the room may be determined. An alert may also be provided when the first proximity is less than a threshold proximity.

HEARING DEVICE COMPRISING A SPEECH INTELLIGIBILITY ESTIMATOR
20220400349 · 2022-12-15 · ·

A hearing device, e.g. a hearing aid, comprises a) an input unit configured to provide at least one time-variant electric input signal representing sound, the at least one electric input signal comprising target signal components and optionally noise signal components, the target signal components originating from a target sound source; b) a signal processing unit for processing the at least one electric input signal and providing a processed signal; c) an output unit for creating output stimuli configured to be perceivable by the user as sound based on the processed signal from the signal processing unit; d) a speech presence probability prediction unit for repeatedly providing a measure of a predicted speech presence probability of the at least one electric input signal, or of a signal originating therefrom; and e) a speech intelligibility prediction unit for repeatedly providing a current measure of a predicted speech intelligibility of the at least one electric input signal, or of a signal originating therefrom. The speech intelligibility prediction unit is configured to determine said current measure of the predicted speech intelligibility in dependence of said measure of the predicted speech presence probability. A method of operating a hearing device is further disclosed. The invention may e.g. be used in hearing aids, headsets, earpieces (ear buds), etc.

COMPUTERIZED MONITORING OF DIGITAL AUDIO SIGNALS
20220399945 · 2022-12-15 ·

A digital audio quality monitoring device uses a deep neural network (DNN) to provide accurate estimates of signal-to-noise ratio (SNR) from a limited set of features extracted from incoming audio. Some embodiments improve the SNR estimate accuracy by selecting a DNN model from a plurality of available models based on a codec used to compress/decompress the incoming audio. Each model has been trained on audio compressed/decompressed by a codec associated with the model, and the monitoring device selects the model associated with the codec used to compress/decompress the incoming audio. Other embodiments are also provided.

System to evaluate dimensions of pronunciation quality
11527174 · 2022-12-13 · ·

The present invention provides a system for determining a language proficiency of a user in an evaluated language. A machine learning engine may be trained using audio file variables from a plurality of audio files and human generated scores for a comprehensibility, accentedness and intelligibility for each audio file. The system may receive an audio file from a user and determine a plurality of audio file variables from the audio file. The system may apply the audio file variables to the machine learning engine to determine a comprehensibility, an accentedness and an intelligibility score for the user. The system may determine one or more projects and/or classes for the user based on the user's comprehensibility score, accentedness score and/or intelligibility score.

Method, apparatus, device and computer storage medium for generating speech packet

A method, device and computer storage medium for generating a speech packet which relates to the technical field of speech are disclosed. The method may include: providing a speech recording interface to a user; obtaining speech data entered by the user after obtaining an event of triggering speech recording on the speech recording interface; uploading the speech data to a server side in response to determining that the speech data meets requirements for training a speech synthesis model; receiving a downloading address of the speech packet generated by the server side after training the speech synthesis model with the speech data. An ordinary user may customize a personalized speech packet through the speech recording interface provided by the client, without using professional recording equipment, which may substantially reduce the production cost of the speech packet.