Automatic change in condition monitoring by passive sensor monitoring and machine learning
11599830 · 2023-03-07
Inventors
- Geoffrey Nudd (San Francisco, CA, US)
- David Cristman (San Francisco, CA, US)
- Jonathan J. Hull (San Carlos, CA)
Cpc classification
G06F18/214
PHYSICS
G08B21/0423
PHYSICS
G08B21/0469
PHYSICS
G16H50/70
PHYSICS
G08B23/00
PHYSICS
G16H50/20
PHYSICS
G16H15/00
PHYSICS
International classification
Abstract
A machine learning system passively monitors sensor data from the living space of a patient. The sensor data may include audio data. Audio features are generated. A trained machine learning model is used to detect a change in condition. In some implementations, the machine learning model is trained in a learning phase based on training data that includes questionnaires completed by caregivers and identified audio features.
Claims
1. A method comprising: monitoring an output of a passive sensor monitor of a living space of a patient; providing the monitored sensor output to a machine learning system having a machine learning model including one or more classifiers trained to identify changes in at least one health condition of the patient based at least in part on the monitored sensor output including an audio output, wherein the one or more classifiers are trained by a questionnaire, answered by a caregiver of the patient, having a set of condition questions associated with one or more audio features of the audio output of the monitored sensor output; and detecting, by the machine learning system, a change in at least one health condition of the patient based at least in part on an audio feature classification of the audio output.
2. The method of claim 1, further comprising generating an alert in response to detecting the change in at least one health condition of the patient.
3. The method of claim 1, further comprising generating at least one follow-up action in response to detecting the change in at least one health condition of the patient.
4. A method comprising: in a training phase, monitoring an output of a passive audio monitor of a living space of a patient; generating labels specific to the types of sounds in a patient's living area by having a caregiver answer a health questionnaire to associate particular types of sounds in the patient's living area with specific health condition events; training, based at least in part on the generated labels, a machine learning classification model to generate a trained machine model, the trained machine learning model automatically determining answers to a health condition questionnaire based on audio feature vectors in the patient's living area; in a deployment phase, monitoring an output of a passive audio monitor of a living space of a patient; generating audio feature vectors from the monitored output, classifying the audio feature vectors, and generating an input for the trained machine learning model of a machine learning system to identify changes in at least one health condition of the patient; and detecting, by the machine learning system, a change in at least one health condition of the patient.
5. The method of claim 4, wherein the trained machine learning model includes at least one classifier trained to identify patient condition questions.
6. The method of claim 4, further comprising generating an alert in response to detecting the change in at least one health condition of the patient.
7. The method of claim 4, further comprising generating at least one follow-up action in response to detecting the change in at least one health condition of the patient.
8. The method of claim 4, wherein the trained machine learning model is trained, based at least in part on a medical history of the patient, to identify changes in at least one health condition of the patient.
9. A method, comprising: monitoring a passive audio monitor of a living space of a patient; providing a patient condition change questionnaire to a caregiver to enter annotations for particular sound-related events in the living space of the patient indicative of patient condition change and receiving responses; generating training data from the patient condition change questionnaire and audio feature data generated from monitored audio data; and training a machine learning model to detect a change in at least one health condition of the patient from audio feature data generated from monitored audio data.
10. The method of claim 9, further comprising customizing at least one question of the patient condition change questionnaire based on a medical history of the patient.
11. A method, comprising: training a machine learning model to automatically identify a change in condition of a patient based at least in part on monitored audio data from a living area of patient and information provided by a caregiver, wherein the training comprises using training data generated from monitored audio data and responses of a caregiver to a questionnaire; and using the trained machine learning model to detect a change in condition of the patient based on monitored audio data from the living area of the patient.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DETAILED DESCRIPTION
(14)
(15) The output of the passive sensor monitor 110 is provided to a machine learning system 115. The machine learning system 115 does not have to be located at the home of the patient. As some examples, it could, for example, be provided as a network-based service or as a cloud-based service. A sensor feature generation engine 120 generates sensor feature vectors from the received sensor data. This may include, for example, extracting sensor features that are relevant to monitoring the health of a patient. For example, background noises (e.g., nearby construction noise) may be filtered out. A training data engine 125 may be provided to generate training data to train a machine learning model to perform condition change detection 130. In some implementations, the machine learning model is trained based on training data that associates sensor feature vectors with questions of a health questionnaire. The trained machine learning model is used to detect a change in condition. When a change in condition is detected, a reporting module 135 may generate a report. The report may, for example, include a notification, an alert, or a recommended follow up action. The machine learning system may, for example, include one or more processors and memory to implement components as computer executable instructions. As one example the machine learning system may be implemented on a server, as illustrated in
(16)
(17)
(18) In some implementations, the passive sensor monitor is an audio monitor.
(19) A passive audio monitor 402 (e.g., Amazon Alexa®) is deployed in a senior citizen's living area to provide data that is input to an audio feature analysis and annotation unit 410. In some implementations a caregiver in charge of caring for a senior enters annotations for particular sound-related events. The annotations may, for example, be entered by a voice command but more generally could be entered in other ways. The annotations are used to generate labels to aid audio classification unit 420 in generating classifications for audio features. For example, in a training phase, a caregiver may enter an annotation when a senior is washing their hands in the bathroom. This thus generates a label that is specific to the types of sounds in the senior's living area for specific events.
(20) In some implementations the audio features (e.g., cepstral coefficients) are analyzed in comparison with short audio clips, classifications for those clips (e.g., running water), and events associated with the clips (e.g., “washing hands in the bathroom”). Audio classification module 420 is given short audio clips and matches audio feature vectors to examples in the database. Augmented questionnaires are input to a machine learning model training module 430 that associates results produced by audio classification with answers to questions provided by a caregiver. The machine learning model 440 (after training is complete) receives input from the audio classification routine and automatically completes a questionnaire without human intervention.
(21) As an illustrative example, the augmented questionnaires may include questions such as “Did the senior slip or fall while you were there?” as well as sounds associated with that event (e.g., falling sounds). The machine learning model training module 430 learns audio features that correspond to question/answer pairs. For example, falling sounds are associated with the YES answers to “Did the senior slip or fall?”
(22) Audio Feature Analysis
(23)
(24)
(25) The speech recognition 608 path applies speech recognition to the raw audio data to produce a sequence of text. Annotations are detected by first locating trigger phrases in the text (e.g., the phrase “audio event” is a trigger phrase). If a trigger phrase is found, the text following the trigger phrase is matched to the scripts shown in Table 1. If one or more of the words in each group of words in the left column are found, the corresponding audio event (ae) is asserted to have occurred. The merging 614 routine combines the audio feature information (af.sub.i, f.sub.i, s.sub.i, l.sub.j, t.sub.i−m,i+m) with the information derived from speech recognition (ae.sub.i, a.sub.i, l.sub.j, t.sub.i−p,i+p) and saves both in the database.
(26) For example, if the user says “computer, audio event, senior Martha is washing her hands.” Detection of the trigger phrase “audio event” would cause the annotation detection routine to match “senior Martha is washing her hands” to a script, such as the illustrative script shown in Table 1. In the example of Table 1, the first row would match the audio event “bathroom 1”, which would be identified from time t.sub.i−p to time t.sub.i+p. If an annotation was not spoken, the speech act is processed as normal. For example, if the user says “computer, what was the score in the Giants game last night?” the system would look up the answer to that question and speak it to the user.
(27) Table 1 shows some of the scripts used for annotation. Each of the words indicated must be present in the text that occurs after the trigger phrase for the event to be associated with the clip. The merging 614 routine tags the feature vectors whose windows overlap with the time interval t.sub.i−p to time t.sub.i+p with the label assigned by the user (e.g., bathroom event 1).
(28) TABLE-US-00001 TABLE 1 Example scripts for annotating audio events. Word matches Audio Event (ae) [wash, washing], [hands, feet, face] bathroom 1 [washing, doing], [laundry] laundry 1 [going, making], [poddy, kaka] bathroom 2 [doing, cleaning], [dishes] kitchen 1 [running, starting, using], [dishwasher, range] kitchen 2 [sleeping, napping], [bed, couch, recliner] sleeping 1 [watching, enjoying], [TV, television] TV 1 [talking, chatting], [PROPER NAME] conversation 1 [talking, calling], [phone, cell, telephone] phone call 1 [phone, cell, telephone], [ringing, buzzing] phone call 2 [PROPER NAME], [cooking, baking] kitchen 3 [PROPER NAME] [shaving, brushing teeth] bathroom 3 [PROPER NAME], [snoring, sleeping, resting] sleeping 2
(29) In one implementation, the feature vector generation 604 process is applied to 25 ms audio frames that are extracted from the input audio every 10 ms (overlapping). The Discrete Fourier Transform (DFT) of the frame is then computed as follows:
(30)
where h(n) is an N sample long analysis window and K is the length of the DFT.
(31) The periodogram estimate of the power spectrum is then computed:
(32)
(33) The absolute value of the complex Fourier transform is taken and result squared. A 512 point fast Fourier transform (FFT) is performed and the first 257 coefficients are kept.
(34) The energy in 26 Mel-spaced filterbanks is calculated by multiplying 26 triangular filters, each of which is tuned for a particular frequency range, with the periodogram power spectral estimate. This gives 26 numbers that represent the amount of energy present in each filterbank. The discrete cosine transform (DCT) of the log of the 26 energy values is taken and the first 12 coefficients retained. These are the MFCC coefficients. A feature vector is generated by concatenating the MFCC coefficients for the frames in an audio clip. For example, a five second (5000 ms) audio clip would produce a feature vector with 500*12=6000 features, assuming the standard 10 ms overlap.
(35) Feature detection also includes calculation of the spectrogram s.sub.i for each 25 ms audio frame. The spectrogram is comprised of multiple FFTs, one for each audio frame. The X axis is time and the Y axis is the magnitude of the frequency components.
(36)
where k is the window size (number of samples) over which the FFT is calculated. We have found the 1024 samples from a 25 ms audio signal sampled at 44100 Hz provide the best performance.
(37)
(38)
(39) If not previously performed, the feature generation 805 process described above is applied to each of the background noise clips and known sound clips. The Euclidean distance 810 between the MFCC coefficients is computed for each frame in f.sub.i and the MFCC coefficients for each frame in the background noise and known sound clips. The clip with the frame that most closely matches the frame in f.sub.i votes for that class (background or known sound). This operation is performed for all the frames in f.sub.i and accumulate the votes for each class. If known sound clips receive the majority vote 815, the algorithm decides that an audio event has been detected. For example, a 5 second audio clip would yield 500 votes, assuming a standard 10 ms overlap value. If the known sound category received more than 250 votes, we decide that an event was detected. Alternatives to the Euclidean distance classifier include support vector machines or deep learning.
(40) The merging 614 process in
(41) The clustering and class assignment 618 routine in
(42) Machine Learning Training and Model Application
(43) Augmented Questionnaire
(44) Table 2 shows an example of an augmented questionnaire, although more generally the questionnaire may include a different selection of questions and associated audio features. The questionnaire is augmented in that it has additional fields not found in a conventional health questionnaire, such as fields for audio features. In some implementations, it includes a set of passive keywords. In one implementation, each question can be answered either Yes or No. In some implementations, here are primary and follow-up questions. When a positive response is received to a primary question, the corresponding follow-up questions are asked to clarify the report. “Passive” keywords are also shown that can indicate the corresponding question when the speech-to-text system detects them in the audio stream. Features that can exist in the audio stream are also listed that can corroborate the existence of the condition indicated by the question. For example, when the question “Does the client show any signs of pain?” is asked, the system expects to hear pain sounds in the audio stream.
(45) TABLE-US-00002 TABLE 2 Augmented questionnaire for change in condition monitoring. Primary (P) or Follow- up (F) passive question question keywords audio features P Does the client seem different, all features different than usual? changed F Does the client talkative, speech utterances show any reduced talking by client talking or alertness? F Is the client agitated, speech agitation newly agitated, angry, of client, confused, or sleepy? confused speech utterances by client, sleeping sounds, sleeping location F Does the client show pain pain sounds any signs of pain? P Has there been moving, walking sounds a change in mobility? mobile by client F Has there been a change standing, walking sounds in the ability walking by client to stand or walk? F Has there been fall, slip falling sounds an observed or unobserved fall or slip? P Has there been eating, kitchen sounds a change in eating drinking or drinking? P Has there been toileting bathroom sounds a change in toileting? F Has there been any urination bathroom sounds discomfort, smell or change in frequency associated with urination? F Has the client had diarrhea, bathroom sounds diarrhea or constipation constipation? P Has there been a change skin, pain sounds in skin condition or swelling increase in swelling? F Have there been any new rash, wound pain sounds skin rashes or wounds
(46) The questionnaire thus may include a number of different questions regarding individual changes in condition. However, it will also be understood that the interpretation of a combination of answers to individual questions may be important in determining a likely significance of a change in condition and an urgency in regards to generate a notification or alert of a change in condition. For example, a combination of changes may correspond to an overall change in condition for which it may be prudent to generate an alert or recommend a follow up action such as contacting a nurse to check-in on the patient, contacting the patient's doctor, etc.
(47)
(48)
(49) An example of applying the technique described in
(50) Additional Sensing Modalities
(51)
(52) Questionnaire Customization
(53) The questionnaires administered to patients, first manually by caregivers and then automatically by the proposed system, can be customized based on the predicted disease trajectory of the senior.
(54) More generally the customization of the questionnaire could be for other medical conditions besides diabetes.
(55) The collection of questionnaires q.sub.1 . . . q.sub.N shown in
(56) In the above description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it should be understood that the technology described herein can be practiced without these specific details. Further, various systems, devices, and structures are shown in block diagram form in order to avoid obscuring the description. For instance, various implementations are described as having particular hardware, software, and user interfaces. However, the present disclosure applies to any type of computing device that can receive data and commands, and to any peripheral devices providing services.
(57) In some instances, various implementations may be presented herein in terms of algorithms and symbolic representations of operations on data bits within a computer memory. An algorithm is here, and generally, conceived to be a self-consistent set of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
(58) To ease description, some elements of the system and/or the methods are referred to using the labels first, second, third, etc. These labels are intended to help to distinguish the elements but do not necessarily imply any particular order or ranking unless indicated otherwise.
(59) It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout this disclosure, discussions utilizing terms including “processing,” “computing,” “calculating,” “determining,” “displaying,” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
(60) Various implementations described herein may relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, including, but is not limited to, any type of disk including floppy disks, optical disks, CD ROMs, and magnetic disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, flash memories including USB keys with non-volatile memory or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
(61) The technology described herein can take the form of an entirely hardware implementation, an entirely software implementation, or implementations containing both hardware and software elements. For instance, the technology may be implemented in software, which includes but is not limited to firmware, resident software, microcode, etc. Furthermore, the technology can take the form of a computer program object accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any non-transitory storage apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
(62) A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories that provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution. Input or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
(63) Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems, storage devices, remote printers, etc., through intervening private and/or public networks. Wireless (e.g., Wi-Fi™) transceivers, Ethernet adapters, and Modems, are just a few examples of network adapters. The private and public networks may have any number of configurations and/or topologies. Data may be transmitted between these devices via the networks using a variety of different communication protocols including, for example, various Internet layer, transport layer, or application layer protocols. For example, data may be transmitted via the networks using transmission control protocol/Internet protocol (TCP/IP), user datagram protocol (UDP), transmission control protocol (TCP), hypertext transfer protocol (HTTP), secure hypertext transfer protocol (HTTPS), dynamic adaptive streaming over HTTP (DASH), real-time streaming protocol (RTSP), real-time transport protocol (RTP) and the real-time transport control protocol (RTCP), voice over Internet protocol (VOIP), file transfer protocol (FTP), WebSocket (WS), wireless access protocol (WAP), various messaging protocols (SMS, MMS, XMS, IMAP, SMTP, POP, WebDAV, etc.), or other known protocols.
(64) Finally, the structure, algorithms, and/or interfaces presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method blocks. The required structure for a variety of these systems will appear from the description above. In addition, the specification is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the specification as described herein.
(65) The foregoing description has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the specification to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. As will be understood by those familiar with the art, the specification may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Likewise, the particular naming and division of the modules, routines, features, attributes, methodologies and other aspects are not mandatory or significant, and the mechanisms that implement the specification or its features may have different names, divisions and/or formats.
(66) Furthermore, the modules, routines, features, attributes, methodologies and other aspects of the disclosure can be implemented as software, hardware, firmware, or any combination of the foregoing. Also, wherever a component, an example of which is a module, of the specification is implemented as software, the component can be implemented as a standalone program, as part of a larger program, as a plurality of separate programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future. Additionally, the disclosure is in no way limited to implementation in any specific programming language, or for any specific operating system or environment.