G10L25/24

DIFFICULT AIRWAY EVALUATION METHOD AND DEVICE BASED ON MACHINE LEARNING VOICE TECHNOLOGY

The present disclosure relates to a difficult airway evaluation method and device based on a machine learning voice technology. The method includes the following steps: acquiring voice data of a patient; carrying out feature extraction on the voice data, obtaining a pitch period of pronunciations, and acquiring a voiced sound feature and unvoiced sound features based on the pitch period of pronunciations; and constructing a difficult airway evaluation classifier based on the machine learning voice technology, analyzing the received voiced sound feature and unvoiced sound features by the trained difficult airway evaluation classifier, and carrying out scoring on the severity of a difficult airway to obtain an evaluation result of the difficult airway.

DIFFICULT AIRWAY EVALUATION METHOD AND DEVICE BASED ON MACHINE LEARNING VOICE TECHNOLOGY

The present disclosure relates to a difficult airway evaluation method and device based on a machine learning voice technology. The method includes the following steps: acquiring voice data of a patient; carrying out feature extraction on the voice data, obtaining a pitch period of pronunciations, and acquiring a voiced sound feature and unvoiced sound features based on the pitch period of pronunciations; and constructing a difficult airway evaluation classifier based on the machine learning voice technology, analyzing the received voiced sound feature and unvoiced sound features by the trained difficult airway evaluation classifier, and carrying out scoring on the severity of a difficult airway to obtain an evaluation result of the difficult airway.

MICROPHONE UNIT COMPRISING INTEGRATED SPEECH ANALYSIS

A microphone unit has a transducer, for generating an electrical audio signal from a received acoustic signal; a speech coder, for obtaining compressed speech data from the audio signal; and a digital output, for supplying digital signals representing said compressed speech data. The speech coder may be a lossy speech coder, and may contain a bank of filters with centre frequencies that are non-uniformly spaced, for example mel frequencies.

MICROPHONE UNIT COMPRISING INTEGRATED SPEECH ANALYSIS

A microphone unit has a transducer, for generating an electrical audio signal from a received acoustic signal; a speech coder, for obtaining compressed speech data from the audio signal; and a digital output, for supplying digital signals representing said compressed speech data. The speech coder may be a lossy speech coder, and may contain a bank of filters with centre frequencies that are non-uniformly spaced, for example mel frequencies.

DISEASE PREDICTION DEVICE, PREDICTION MODEL GENERATION DEVICE, AND DISEASE PREDICTION PROGRAM

Provided is a device performing machine learning by extracting an acoustic feature value from conversational voice data and predicting a disease level of a subject on the basis of a disease prediction model to be generated by the machine learning, the device including: a matrix calculation unit 23 calculating a spatial delay matrix using a relation value of a plurality of types of acoustic feature values; and a matrix decomposition unit 24 calculating a matrix decomposition value from the spatial delay matrix, in which a relation value reflecting a non-linear and non-stationary relationship of the feature values can be obtained by calculating at least one of a DCCA coefficient and a mutual information amount as the relation value of the plurality of types of acoustic feature values, and the disease level of the subject can be predicted on the basis of the relation value.

METHOD FOR PROCESSING AN AUDIO STREAM AND CORRESPONDING SYSTEM

A method and a system for processing an audio stream are described, wherein at least one database of classified voices and at least one database of classified background sounds are provided and a comparison between these classified voices and background sounds with the voices and the sounds extrapolated from a suitably re-processed audio stream is carried out in order to identify possible matches.

METHOD FOR PROCESSING AN AUDIO STREAM AND CORRESPONDING SYSTEM

A method and a system for processing an audio stream are described, wherein at least one database of classified voices and at least one database of classified background sounds are provided and a comparison between these classified voices and background sounds with the voices and the sounds extrapolated from a suitably re-processed audio stream is carried out in order to identify possible matches.

Utilizing machine learning models to provide cognitive speaker fractionalization with empathy recognition

A device may receive audio data identifying a plurality of speakers and may process the audio data, with a plurality of clustering models, to identify a plurality of speaker segments. The device may determine a plurality of diarization error rates for the plurality of speaker segments and may identify a plurality of errors in the plurality of speaker segments. The device may select rectification models to rectify the plurality of errors and may segment and/or re-segment the audio data with the rectification models to generate re-segmented audio data. The device may determine a plurality of modified diarization error rates for the plurality of speaker segments based on the re-segmented audio data and may select one of the plurality of speaker segments based on the plurality of modified diarization error rates. The device may calculate an empathy score based on the selected speaker segment and may perform actions based on the empathy score.

Utilizing machine learning models to provide cognitive speaker fractionalization with empathy recognition

A device may receive audio data identifying a plurality of speakers and may process the audio data, with a plurality of clustering models, to identify a plurality of speaker segments. The device may determine a plurality of diarization error rates for the plurality of speaker segments and may identify a plurality of errors in the plurality of speaker segments. The device may select rectification models to rectify the plurality of errors and may segment and/or re-segment the audio data with the rectification models to generate re-segmented audio data. The device may determine a plurality of modified diarization error rates for the plurality of speaker segments based on the re-segmented audio data and may select one of the plurality of speaker segments based on the plurality of modified diarization error rates. The device may calculate an empathy score based on the selected speaker segment and may perform actions based on the empathy score.

HIERARCHICAL GENERATED AUDIO DETECTION SYSTEM

Disclosed is a hierarchical generated audio detection system, comprising an audio preprocessing module, a CQCC feature extraction module, a LFCC feature extraction module, a first-stage lightweight coarse-level detection model and a second-stage fine-level deep identification model; the audio preprocessing module preprocesses collected audio or video data to obtain an audio clip with a length not exceeding the limit; inputting the audio clip into CQCC feature extraction module and LFCC feature extraction module respectively to obtain CQCC feature and LFCC feature; inputting CQCC feature or LFCC feature into the first-stage lightweight coarse-level detection model for first-stage screening to screen out the first-stage real audio and the first-stage generated audio; inputting the CQCC feature or LFCC feature of the first-stage generated audio into the second-stage fine-level deep identification model to identify the second-stage real audio and the second-stage generated audio, and the second-stage generated audio is identified as generated audio.