MONITORING SYSTEM FOR MONITORING A PATIENT AND METHOD FOR OPERATING THE MONITORING SYSTEM
20210186339 · 2021-06-24
Inventors
Cpc classification
G16H20/70
PHYSICS
A61M21/00
HUMAN NECESSITIES
A61B5/165
HUMAN NECESSITIES
G16H80/00
PHYSICS
G16H50/30
PHYSICS
A61M2230/005
HUMAN NECESSITIES
G16H40/20
PHYSICS
A61M2230/005
HUMAN NECESSITIES
A61M21/02
HUMAN NECESSITIES
International classification
A61B5/0205
HUMAN NECESSITIES
A61B5/00
HUMAN NECESSITIES
A61B5/16
HUMAN NECESSITIES
A61M21/00
HUMAN NECESSITIES
G16H20/70
PHYSICS
G16H50/30
PHYSICS
Abstract
For particularly good communication even with difficult patients, a monitoring system for monitoring a patient, such as during a medical diagnostic or therapeutic procedure, is provided. The monitoring system includes a voice cloning device having a voice generator. The voice cloning device is configured to replace a natural voice of a person with a cloned synthetic voice different from the voice of the person. At least two synthetic voices may be selected. The monitoring system also includes a measuring unit configured to record at least one physiological parameter of the patient, an evaluation unit configured to evaluate the at least one measured physiological parameter of the patient, and an actuation unit configured to actuate the voice cloning device such that a synthetic voice is selected or rejected depending on a result of the evaluation by the evaluation unit.
Claims
1. A monitoring system for monitoring a patient, the monitoring system comprising: a voice cloning device including a voice generator, the voice cloning device being configured to: replace a natural voice of a person with a cloned synthetic voice different from the natural voice of the person; and output the cloned synthetic voice, wherein at least two synthetic voices are selectable; a measuring unit configured to record at least one physiological parameter of the patient; an evaluation unit configured to evaluate the at least one measured physiological parameter of the patient; and an actuation unit configured to actuate the voice cloning device such that a synthetic voice of the at least two synthetic voices is selected or rejected depending on a result of the evaluation by the evaluation unit.
2. A monitoring system for monitoring a patient, the monitoring system comprising: a deepfake device comprising: an image generator that is configured to replace a natural visual appearance of a person or a natural environment with a synthetic visual appearance or synthetic environment that is different than the natural visual appearance of the person or the natural environment, wherein at least two synthetic visual appearances or two synthetic environments are selectable; and a visualization device for visualizing the synthetic visual appearance of the person or the surrounding synthetic environment; a measuring unit configured to record at least one physiological parameter of the patient; an evaluation unit that is configured to evaluate the at least one measured physiological parameter of the patient; and an actuation unit configured to actuate the deepfake device such that a synthetic visual appearance or synthetic background is selected or rejected depending on a result of the evaluation by the evaluation unit.
3. The monitoring system of claim 2, wherein the evaluation unit is further configured to compare the at least one measured physiological parameter of the patient with at least one threshold value.
4. The monitoring system of claim 3, wherein the actuation unit is further configured to actuate the voice cloning device such that a synthetic voice that does not cause the threshold value to be exceeded is selected or a synthetic voice that causes the threshold value to be exceeded is rejected.
5. The monitoring system of claim 1, wherein the evaluation unit is further configured to compare the at least one measured physiological parameter of the patient with at least one threshold value.
6. The monitoring system of claim 1, wherein the cloned synthetic voice has been generated using a pre-trained algorithm for machine learning.
7. The monitoring system of claim 6, wherein pre-training of the algorithm was carried out by speech samples of persons known to the patient.
8. The monitoring system of claim 5, wherein the actuation unit is further configured to actuate the voice cloning device such that a synthetic voice that does not cause the threshold value to be exceeded is selected or a synthetic voice that causes the threshold value to be exceeded is rejected.
9. The monitoring system of claim 1, wherein the measuring unit is further configured to measure a physiological parameter that is usable as an indicator for a state of mind of the patient, and wherein the evaluation unit is further configured to evaluate the physiological parameter with respect to the state of mind of the patient.
10. The monitoring system of claim 1, wherein the at least one physiological parameter is constituted by blood pressure, pulse, or EEG.
11. The monitoring system of claim 1, wherein the actuation unit is further configured to actuate the voice cloning device such that a synthetic voice that produces a positive state of mind is selected or a synthetic voice that produces a negative state of mind is rejected.
12. A method for operating a monitoring system, the monitoring system comprising a deepfake device, the deepfake device comprising an image generator that is configured to replace a natural visual appearance of a person or a natural environment with a synthetic visual appearance or synthetic environment that is different than the natural visual appearance of the person or the natural environment, wherein at least two synthetic visual appearances or two synthetic environments are selectable, the deepfake device further comprising a visualization device for visualizing the synthetic visual appearance of the person or the surrounding synthetic environment, a measuring unit configured to record at least one physiological parameter of the patient, an evaluation unit that is configured to evaluate the at least one measured physiological parameter of the patient, and an actuation unit configured to actuate the deepfake device such that a synthetic visual appearance or synthetic background is selected or rejected depending on a result of the evaluation by the evaluation unit, the method comprising: activating the deepfake device, wherein a first synthetic visual appearance or a first synthetic environment is set; recording the at least one physiological parameter of the patient during operation of the deepfake device using the first synthetic visual appearance or the first synthetic environment; evaluating the at least one physiological parameter of the patient; and automatically actuating the deepfake device depending on a result of the evaluating.
13. The method of claim 12, wherein, depending on the result of the evaluating by the evaluation unit, the first synthetic visual appearance or the first synthetic environment is selected or rejected by the evaluation unit for further operation of the deepfake device.
14. The method of claim 12, wherein evaluating the at least one physiological parameter of the patient comprises comparing the at least one physiological parameter with at least one threshold value.
15. The method of claim 12, wherein evaluating the at least one physiological parameter comprises evaluating the at least one physiological parameter with respect to a state of mind of the patient.
16. The method of claim 14, wherein actuating the deepfake device takes place such that the first synthetic visual appearance or the first synthetic environment is selected when the evaluating of the at least one physiological parameter indicates that a threshold value for the at least one physiological parameter is not exceeded, and that the first synthetic visual appearance or the first synthetic environment is rejected when the evaluating of the at least one physiological parameter indicates that a threshold value for the at least one physiological parameter is exceeded.
17. The method of claim 15, wherein the actuating takes place such that the first synthetic visual appearance or the first synthetic environment is selected when the first synthetic visual appearance or the first synthetic environment produces a positive state of mind in the patient, and that the first synthetic visual appearance or the first synthetic environment is rejected when the first synthetic visual appearance or the first synthetic environment produces a negative state of mind.
18. The method of claim 12, further comprising: measuring and evaluating at least one physiological parameter for at least one second synthetic visual appearance or one second synthetic environment; and depending on the evaluating of the at least one physiological parameter for the at least one second synthetic visual appearance or the one second synthetic environment, selecting a synthetic voice or synthetic visual appearance or synthetic environment from selection of a first synthetic voice and at least one second synthetic voice for further operation of a voice cloning device.
19. The method of claim 12, further comprising: measuring and evaluating at least one physiological parameter for at least one second synthetic visual appearance or one second synthetic environment in each case; and depending on the evaluating of the at least one physiological parameter for the at least one second synthetic visual appearance or one second synthetic environment, selecting a synthetic visual appearance or synthetic environment from the selection of the first synthetic visual appearance and the at least second synthetic visual appearance or the first synthetic environment and the at least second synthetic environment for further operation of the deepfake device.
20. The method of claim 15, wherein the synthetic visual appearance or the synthetic environment that produces a most positive state of mind in the patient is selected.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0020]
[0021]
[0022]
DETAILED DESCRIPTION
[0023] Particularly in situations such as medical examinations or procedures or operations without general anesthesia, the purpose of the present embodiments is to improve communications, especially with elderly, confused, or dementia patients and also with children, using voice cloning systems or deepfake devices via which the physicians may communicate with the patients using the voices of reference persons (e.g., synthetic voices) or the faces, facial expressions, or gestures of reference persons (e.g., synthetic visual appearance) in order to achieve better cooperation and/or to be able to produce a calming effect.
[0024]
[0025] In a second act 2, at least one physiological parameter of the patient is recorded or measured during operation of the voice cloning device using the first synthetic voice. This is takes place by a measuring unit 13 assigned to the monitoring system 15. The measuring unit 13 has, for example, one or more sensors, cameras, or other transducers or is constituted by one of these. The at least one physiological parameter may be used, for example, as an indicator of the patient's state of mind. The at least one physiological parameter may be constituted, for example, by blood pressure, pulse, or EEG, where, for example, pulse sensors, EEG electrodes, or blood pressure sensors may be used. It is also possible to measure any other physiological parameters that may indicate the state of mind or the condition of the patient P. The physiological parameter(s) may be measured once or over a previously set period of time. In a third act 3, the at least one physiological parameter of the patient is evaluated by an evaluation unit 14 assigned to the monitoring system 15. This evaluation may simply involve, for example, a comparison with one or more threshold values or a more complex analysis regarding a positive or negative state of mind of the patient.
[0026] After the evaluation, in a fourth act 4, the voice cloning device 10 is automatically actuated by an actuation unit 12 assigned to the monitoring system 15 as a function of the result of the evaluation. For example, the first synthetic voice F1 is retained or selected if the evaluation shows a positive effect of the first synthetic voice F1 on the patient P (e.g., if the patient's state of mind is or becomes positive, if the patient is or becomes calmer or cooperates better than before). In the simplest case, the first synthetic voice F1 is retained or selected, provided that the parameter does not exceed a threshold value (e.g., pulse, blood pressure, etc.). The first synthetic voice F1 is rejected or terminated if the evaluation shows a negative effect of the first synthetic voice F1 on the patient P (e.g., if the patient's state of mind is or becomes negative, if the patient is or becomes agitated, anxious, or disturbed, if the patient shows symptoms of stress, or if the patient's willingness to cooperate decreases). In the simplest case, the first synthetic voice F1 is rejected or terminated if the evaluation indicates that the threshold value(s) has been exceeded.
[0027]
[0028] In an eighth act 8, depending on the evaluation, the synthetic voice with the most positive effect on the patient P is then automatically selected from the two synthetic voices. More than two synthetic voices may be used (e.g., three or four synthetic voices).
[0029] In a preliminary process, the synthetic voices may be cloned by machine learning. This may be carried out, for example, using audio recordings of the voices to be cloned. It is possible to clone the voices of living persons or also the voices of already deceased persons; for this purpose, merely audio recordings (e.g., film, sound recordings, answering machines, etc.) of sufficient length are necessary. Prior to using the actual method, the first synthetic voice was cloned in a ninth act 9.1, and the second synthetic voice was cloned in a tenth act 9.2 in
[0030] The general method may be used in a medical environment (e.g., for examinations, procedures, and operations without general anesthesia), but may also be used in other environments, such as nursing homes. The method may be performed during the described situations or in advance to prepare for the corresponding situation. The synthetic voice selected there may then be used by the monitoring system 15 in the medical environment.
[0031] Provision may also be made for selecting different synthetic voices for different situations. For example, for this purpose, results (e.g., objective results; that the patient follows instructions) may also be included in the evaluations and may be taken into account for selecting or rejecting a synthetic voice. In this context, individual situations are tested using each synthetic voice (e.g., following instructions such as breathing commands and remaining calm). It may happen that a different synthetic voice is more suitable in each situation (e.g., the first synthetic voice is suitable for calming, and the second synthetic voice is suitable for the following of instructions). This may be learned by the monitoring system 15 (e.g., by machine learning) so that the monitoring system 15 then sets the most suitable synthetic voice according to the evaluation of the respective situations by the actuation unit.
[0032] The method described in
[0033] With deepfake methods (e.g., a portmanteau of “deep learning” and “fake”), media content (e.g., videos) may be modified and falsified largely autonomously by artificial intelligence techniques. The resulting fake videos are largely realistic and difficult to distinguish from an original. Deepfake offers various possibilities: “face swapping”, where the face of one person is swapped with a generated face of another person in videos or photos. Alternatively, the environment of the video may be swapped so that people are placed in a new context. Another application is the transfer of body movements to other persons in video material, known as “body puppetry”.
[0034] Virtual reality (VR) is the representation and simultaneous perception of reality and corresponding physical properties in a real-time computer-generated, interactive virtual environment. A mixing of virtual reality and physical reality is referred to as mixed or augmented reality (AR).
[0035] With regard to the AR/VR visualization devices used, there are two categories that may be used within the scope of the present embodiments (e.g., to make synthetic visual appearances or synthetic environments visible to the patient, so-called optical see-through devices such as HoloLens2 in which virtual objects are additionally superimposed on the environment via optical lenses, or video see-through devices such as Varjo XR1 that are based on VR headsets but also feed video signals into the headset from outside).
[0036] Optical see-through devices are small and portable, but limited in the size of the visible area and may only partially “mask out” real objects. Video see-through devices create a complete mask-out and have a large visible range, but are often large, heavy, and typically also hard-wired.
[0037] Within the scope of the method according to the present embodiments, the patient wears one of the visualization devices described (e.g., AR headset, VR headset, HoloLens) or looks into one (e.g., screen).
[0038] The method is used, for example, for communication between a physician A and a patient P. First, a deepfake device of the monitoring system is activated, where a first synthetic visual appearance (e.g., “animated avatar”) is set. As will be described later, the first synthetic visual appearance has been previously generated by a machine learning algorithm and represents the face, facial expressions, or gestures of a person known to or trusted by the patient P (e.g., husband/wife, child, friend or relative, parent, or caregiver). If the deepfake device is activated, the incorporated image generator provides that the original visual appearance of the physician A is replaced by the cloned synthetic visual appearance.
[0039] The deepfake device has at least two cloned synthetic visual appearances from which to choose. These may be pre-selected and set automatically or manually. At least one physiological parameter of the patient is then recorded or measured during operation of the deepfake device. This is performed by a measuring unit 13 that is assigned to the monitoring system 15 and has, for example, one or more sensors, cameras, or other sensing elements or is constituted by one of these. The at least one physiological parameter may be used, for example, as an indicator of the patient's state of mind. The at least one physiological parameter may be constituted, for example, by one of the following measured values: blood pressure, pulse, and EEG. For example, pulse sensors, EEG electrodes, or blood pressure sensors may be used for this purpose. It is also possible to measure any other physiological parameters that are indicative of the state of mind or condition of the patient P. The physiological parameter(s) may be measured once or over a previously selected period of time. The at least one physiological parameter of the patient is then evaluated by an evaluation unit assigned to the monitoring system. The evaluation may, for example, simply include a comparison with one or more threshold values or involve a more complex evaluation with respect to a positive or negative state of mind of the patient.
[0040] Following the evaluation, the deepfake device is automatically actuated by an actuation unit 12 assigned to the monitoring system 15 according to the result of the evaluation. For example, the first synthetic visual appearance is retained or selected if the evaluation shows a positive effect on the patient P (e.g., if the patient's state of mind is or becomes positive, if the patient is or becomes calmer or more cooperative than before). In the simplest case, the first synthetic visual appearance is retained or selected if the parameter does not exceed a threshold value (e.g., pulse, blood pressure, etc.). The first synthetic visual appearance is rejected or terminated if the evaluation indicates a negative effect on the patient P (e.g., if the patient's state of mind is or becomes negative, if the patient becomes agitated, anxious, or disturbed, if the patient shows symptoms of stress or if the patient's willingness to cooperate decreases). In the simplest case, the first synthetic visual appearance is discarded or terminated if the evaluation shows that the threshold value(s) has/have been exceeded.
[0041] In an expanded embodiment, a further, second synthetic visual appearance may then be activated. Then, in a sixth act 6, at least one physiological parameter of the patient is also recorded or measured by the measuring unit 13 (e.g., the same physiological parameter(s) in order to provide comparability). The measured parameters are then evaluated by the evaluation unit. The evaluation may be carried out in a comparable manner to the previous evaluation (e.g., simply by comparison with one or more threshold values or, with greater complexity, with respect to a positive or negative state of mind of the patient). Subsequently, depending on the evaluation, the synthetic visual appearance that has the most positive effect on the patient P with regard to the patient's state of mind is automatically selected. Self-evidently, more than two (e.g., three or four synthetic visual appearances) may also be used.
[0042] As part of the method, the scenery may be recorded from the patient's point of view (e.g., typically using RGB or RGB-D cameras). Alternatively or in addition, various other room cameras may be used. In the video recordings, the OR staff are selectively replaced by “more comfortable faces” (e.g., avatars) using deepfake algorithms, and the patient is presented with the altered images live via the AR/VR visualization device worn by the patient (e.g., AR headset, VR headset, HoloLens, screen). For example, face masks may also be eliminated, making the situation much more pleasant for the patient. Also, calming emotions and facial expressions that were otherwise not visible may be made visible to the patient. During the intervention, the physician may now speak to the patient as one of the previously trained reference persons in the form of an avatar. In addition, the environment within the operating room may be transformed, which also has a calming effect on the patient. For example, the scenery may be relocated virtually to a Caribbean beach, a sunbathing lawn, or into the patient's own living room or bedroom.
[0043] In the course of the intervention or in a trial conversation with the patient, the system is able to “learn” which avatar was successful in which situation (e.g., by determining the success of an instruction). The system may then independently use avatar X for “calming actions”, avatar Y for instructions, etc. Learning may also take place pre-operatively using test sequences.
[0044] The present embodiments may be briefly summarized as follows: for particularly good communication even with difficult patients, a monitoring system for monitoring a patient, especially during a medical diagnostic or therapeutic procedure, is provided. The monitoring system includes a voice cloning device with a voice generator that is configured to replace a natural voice of a person with a cloned synthetic voice different from the voice of the person, where at least two synthetic voices may be selected. The monitoring system also includes a measuring unit that is configured to record at least one physiological parameter of the patient, an evaluation unit that is configured to evaluate the at least one measured physiological parameter of the patient, and an actuation unit that is configured to actuate the voice cloning device such that a synthetic voice is selected or rejected (e.g., for the further operation of the voice cloning device) depending on the result of the evaluation by the evaluation unit.
[0045] The elements and features recited in the appended claims may be combined in different ways to produce new claims that likewise fall within the scope of the present invention. Thus, whereas the dependent claims appended below depend from only a single independent or dependent claim, it is to be understood that these dependent claims may, alternatively, be made to depend in the alternative from any preceding or following claim, whether independent or dependent. Such new combinations are to be understood as forming a part of the present specification.
[0046] While the present invention has been described above by reference to various embodiments, it should be understood that many changes and modifications can be made to the described embodiments. It is therefore intended that the foregoing description be regarded as illustrative rather than limiting, and that it be understood that all equivalents and/or combinations of embodiments are intended to be included in this description.