SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF DISEASE-ASSOCIATED RESPIRATORY SOUNDS
20220338756 ยท 2022-10-27
Inventors
Cpc classification
G10L21/0308
PHYSICS
A61B5/7282
HUMAN NECESSITIES
International classification
A61B5/08
HUMAN NECESSITIES
A61B5/00
HUMAN NECESSITIES
G10L21/0308
PHYSICS
G10L25/18
PHYSICS
Abstract
A method to detect and analyze cough parameters remotely and in the background through the user's device in everyday life. The method involves consecutive or selective execution of 4 stages, which solve a variety of the following tasks: Detection of coughing events in sound, in environmental noise. Separation of the sound containing the cough into the sound of the cough and the rest of the sounds, even when extraneous sounds occurred during the cough. Identify the user by the sound of the cough to prevent analyzing the sounds of the cough that do not belong to the user (for example when another person coughs nearby). Assessment of cough characteristics (wet/dry, severity, duration, number of spasms, etc.). The method can work on wearable devices, smart devices, personal computers, laptops in 24/7 mode or in a selected range of time, and has a high energy efficiency.
Claims
1. A method of monitoring respiratory sounds of a user, comprising: receiving an audio stream through a microphone; automatically identifying a commencement and a completion of a sound event cycle in the received audio stream with at least one automated processor; automatically distinguishing a respiratory event from a non-respiratory event in the sound event comprising performing a first analysis of frequency domain characteristics of the sound event with the at least one automated processor; automatically classifying a type of respiratory event of the user from a predetermined set of classifications comprising performing a third analysis of frequency domain characteristics of the sound event with the at least one automated processor; and at least one of outputting and storing the automatically classified type of respiratory event of the user.
2. The method according to claim 1, wherein the respiratory event comprises a cough, and the classified type of respiratory event comprises a distinction between a cough of a COVID-19 person and a cough of a COVID-19 negative person.
3. The method according to claim 1, further comprising determining a location of the device during the respiratory event.
4. The method according to claim 1, wherein the classifying is performed contingent upon determination of the commencement of the sound event cycle.
5. The method according to claim 1, wherein the at least one automated processor is part of a smartphone having an operating system, having background execution and foreground execution, wherein at least the automatically identifying and automatically distinguishing a respiratory event from a non-respiratory event, are performed in background execution.
6. The method according to claim 1, wherein the automated processor executes an operating system of a smartphone, the operating system having root execution privileges and user execution privileges, wherein at least the automatically identifying and distinguishing a respiratory event from a non-respiratory event, are performed with root execution privileges.
7. The method according to claim 1, wherein the automatically identifying is performed using a sound event detection model comprising: automatically performing a spectrographic analysis on received sound signals to produce a spectrogram; automatically analyzing the spectrogram with respect to a spectrum model; and automatically applying a frame-wise mask to the analyzed spectrogram.
8. The method according to claim 1, wherein the automatically classifying is performed using a sound source separation model, comprising: automatically performing a spectrographic transform; automatically segmenting the spectrographic transform; automatically comparing the segmented spectrographic transform with a cough sound spectrum and a non-cough sound spectrum; and automatically inverse transforming the segmented spectrographic transform corresponding to the cough sound spectrum.
9. The method according to claim 1, further comprising automatically distinguishing between a respiratory event of a user of the device and a respiratory event of a non-user of the device comprising performing a second analysis of characteristics of the sound event with the at least one automated processor.
10. The method according to claim 9, wherein the automatically distinguishing between the respiratory event of the user of the device and the respiratory event of a non-user of the device is performed using a non-speech source identification model, comprising: automatically performing a spectrographic transform; automatically employing a spectrum model; automatically generating an embedded vector output; and automatically performing a database lookup.
11. The method according to claim 1, wherein the automatically classifying distinguishes between cough, sneeze, and clear throat sounds.
12. A non-transitory computer readable medium for controlling a programmable processor to perform a method of monitoring respiratory sounds of a user, comprising: instructions for receiving an audio stream through a microphone of a device; instructions for identifying a commencement and a completion of a sound event cycle in the received audio stream; instructions for distinguishing a respiratory event from a non-respiratory event in the sound event comprising performing a first analysis of frequency domain characteristics of the sound event; and instructions for classifying a type of respiratory event of the user from a predetermined set of classifications comprising performing a second analysis of frequency domain characteristics of the sound event.
13. The non-transitory computer readable medium according to claim 12, further comprising instructions for distinguishing between a respiratory event of a user of the device and a respiratory event of a non-user of the device comprising performing a third analysis of characteristics of the sound event.
14. The non-transitory computer readable medium according to claim 12, wherein the device is a smartphone having an operating system, having background execution, foreground execution, root execution privileges, and user execution privileges, wherein at least the instructions for identifying and instructions for distinguishing a respiratory event from a non-respiratory event, are performed as background execution of the operating system; and wherein at least the instructions for identifying and instructions for distinguishing a respiratory event from a non-respiratory event, are performed with the root execution privileges of the operating system.
15. The non-transitory computer readable medium according to claim 12, wherein the instructions for identifying are performed using a sound event detection model, and further comprise: instructions for performing a spectrographic analysis on received sound signals to produce a spectrogram; instructions for analyzing the spectrogram with respect to a spectrum model; and instructions for applying a frame-wise mask to the analyzed spectrogram.
16. The non-transitory computer readable medium according to claim 12, wherein the instructions for classifying are performed using a sound source separation model, comprising: instructions for performing a spectrographic transform; instructions for segmenting the spectrographic transform; instructions for comparing the segmented spectrographic transform with a cough sound spectrum and a non-cough sound spectrum; and instructions for inverse transforming the segmented spectrographic transform corresponding to the cough sound spectrum.
17. The non-transitory computer readable medium according to claim 12, further comprising instructions for distinguishing between the respiratory event of the user of the device and the respiratory event of a non-user of the device, performed using a non-speech source identification model, comprising: instructions for performing a spectrographic transform; instructions for employing a spectrum model; instructions for generating an embedded vector output; and instructions for performing a database lookup.
18. A device for monitoring respiratory sounds of a user, comprising: a microphone configured to transduce acoustic waves into an electronic stream of audio information; at least one automated processor, configured to: identify a commencement and a completion of a sound event cycle in the electronic stream of audio information; distinguish between a respiratory event from a non-respiratory event in the sound event comprising performing a first analysis of frequency domain characteristics of the sound event; and classify a type of respiratory event of the user from a predetermined set of classifications comprising performing a second analysis of frequency domain characteristics of the sound event.
19. The device according to claim 18, wherein the programmable processor is provided within at least one of a smartphone and a smart speaker.
20. The device according to claim 18, further comprising: a geolocation system for determining a location of the device; and a radio frequency transceiver configured to communicate the determined location of the device in conjunction with the classified type of respiratory event to a remote database.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0669]
[0670]
[0671]
[0672]
[0673]
[0674]
[0675]
[0676]
[0677]
[0678]
[0679]
[0680]
[0681]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
[0682]
[0683] Blocks separated by a dotted line represent a logical combination of operations in the context of one model. Blocks selected with a solid line can represent either an atomic operation, a set of operations, or a machine learning model. The lines with an arrow mark the links in the sequence of Pipeline operations.
[0684] Block 101 shows performing event detection of Cough, Sneeze, and Clear Throat (CSC) in incoming sound. Block 112 shows clearing CSC sounds from the background sound environment. Block 118 shows user identification by cough sounds. Block 125 shows detection of cough characteristics. Blocks can either work together or separately on wearable devices, smart devices, personal computers, laptops or on the server.
[0685] The processing pipeline receives raw sound 100 at the input, which can be a microphone of a mobile device or computer, or any other sound receiver. On output 128, a detailed report containing the characteristics of the cough.
[0686]
[0687]
[0688]
[0689]
[0690]
[0691]
[0692]
[0693]
[0694]
[0695]
[0696]
[0697]
[0698]
[0699]
[0700]
[0701]
[0702] The process of extracting cough characteristics from raw sound is represented by several components. Each component is an independent unit capable of operating on both a mobile device and a server.
[0703]
[0704] The SED model is designed to detect Cough, Sneeze, Clear Throat (CSC) events in sound. The model has multi label output, it means that the model can correctly process the situation when on one sound segment there are sounds of several classes (for example, after sneezing there is cough). The architecture of the model allows use of weakly labeling data for learning. It is enough just to label if there is a CSC sound on the sound segment, without detailed information about the beginning and end of the segment containing CSC. The model consists of three separate submodels 103, 105, 108 which are trained independently of each other. These models are then combined into one common model 101.
[0705] Model 103 in
[0706] The model consists of five Conv1D blocks. Each block consists of Conv1D, BatchNormalization, Activation operations. As an activation function, any differentiable function can be used, e.g. Relu. Similar to model 103, this model has a Dense 302 layer where data are converted into a probability mask after applying the Sigmoid activation function. The model is trained using the Predict 304 output. Similar to model 105, the second output represents a mask with the same dimension as the model mask 103. Identical dimensions are required to perform the Stack 107 operation which combines the outputs to be sent to the next model. The model is able to generalize the time domain signs, without taking into account the frequency domain signs. The model is invariant by time.
[0707] The results of the masks of two models 103 and 105 are combined to train FTM 108. This model consists of only one Conv2D layer, which is fed with the masks of models 103 and 105. The model is designed for weighted probability estimation based on the opinion of two models. As a result of this approach both frequency domain and time domain signs are taken into account. The output of FTM is a mask containing the probability of CSC presence in time.
[0708] The SED output is a three-channel mask containing CSC class probabilities, the mask 110 is applied to the incoming sound to get the beginning and end of CSC segments per
[0709] The existing methods of dividing sound into components (Vocal, Drums, Bass) have recently become actively used in music. A corresponding procedure is employed for dividing the sound of coughing into components (cough sound, the rest of the sound).
[0710] The SSS model is designed to separate the cough sound from other sounds. A sound containing cough sounds is fed to the model input. At the output there are two STFT spectra containing only the cough sound (
[0711]
[0712] During continuous operation, moments may occur when the detected cough does not belong to the user. For example, a person who is next to the user coughs. The SED model will detect this cough. However, before extracting the characteristics of the detected cough, the source of the cough needs to be identified. A trained model capable of identifying the user by the sound of the cough is therefore provided. Since voice authorization methods have long been used and studied, similar methods may be employed herein, with the cough used instead of voice.
[0713]
[0714] For conversion, the same parameters are used as for the SED model. Any popular architecture (MobileNet, ResNet, Inception, EfficenceNet, etc.) can be used as a backbone 501 network. The Dense 502 layer is a vector that will contain a meta-representation of the cough sound during training. The Predict 505 output, which is used to teach the model, contains the probability that the cough sound belongs to a certain user.
[0715] The model trained to classify the user by the sounds of the cough at the Inference stage gives the Embedding vector 503 which is a sign description of a cough. This vector is compared 123 by the closeness metric between vectors from the base 122. This database contains user cough episodes transformed into vectors by the NSSI model. The decision on the user ID 124 is be made when estimating closeness by the distance of the initial vector to the vectors in the database. If the closest vectors in the database belong to the user, the identification is considered passed and the sound sent for further processing of characteristics extraction. If the nearest vectors do not belong to the User, the cough shall be deemed not to belong to the User.
[0716] If the cough has been identified, signs are extracted from it for further analysis. The cough epoch may contain several episodes of cough
[0717] A clean cough is sent to the CCD model 125. The CCD model is a set of methods for constructing cough characteristics by estimating its coughing episodes. Coughing episodes are separated by an operation comparing the amplitude to the threshold 126. If the amplitude value is higher than the threshold value, then the cough sound
[0718] Block 127 is a set of methods for assessing the length of coughing episodes and such characteristics as dry/wet cough. This block of operations is repeated iteratively for each episode of cough.
[0719] The architecture of the model determining the dry/wet characteristic is presented at
[0720] Cough characterization methods may not work correctly if the cough episodes are located too close to each other and the method 126 cannot separate them correctly.
[0721] The report 128 can be a convenient format for presenting all collected characteristics, which can be sent either to a professional or used for aggregation over a period of time.
[0722] Thus, the illustrative embodiments provide mechanisms for enabling continuous audio monitoring and sound discrimination on a smartphone or smart speaker, along with further actions dependent on detection of trigger sounds, such as cough or sneeze. This functionality is not limited to cough and sneeze detection, and advantageously include other or alternate functions consistent with the discussion herein.
[0723] It should be appreciated that the illustrative embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In one example embodiment, the mechanisms of the illustrative embodiments are implemented in software or program code, which includes but is not limited to firmware, resident software, microcode, etc.
[0724] A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a communication bus, such as a system bus, for example. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution. The memory may be of various types including, but not limited to, ROM, PROM, EPROM, EEPROM, DRAM, SRAM, Flash memory, solid state memory, and the like.
[0725] The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.