Audio-video analytics for simulation-based training
10665268 ยท 2020-05-26
Assignee
Inventors
Cpc classification
G06V20/41
PHYSICS
G09B5/065
PHYSICS
G06V10/774
PHYSICS
International classification
H04N9/80
ELECTRICITY
G09B19/00
PHYSICS
H04N5/92
ELECTRICITY
Abstract
Implementations generally relate to audio-video analytics for simulation-based training. In some implementations, a method includes obtaining a video. The method further includes detecting one or more observed actions of a user in the video. The method further includes matching the one or more observed actions with one or more predetermined key actions. The method further includes annotating the video with annotations based on the matching of the one or more observed actions with the one or more predetermined key actions.
Claims
1. A system comprising: one or more processors; and logic encoded in one or more non-transitory computer-readable storage media for execution by the one or more processors and when executed operable to cause the one or more processors to perform operations comprising: obtaining a video; detecting one or more observed actions of a user in the video; matching the one or more observed actions with one or more predetermined key actions; and annotating the video with annotations based on the matching of the one or more observed actions with the one or more predetermined key actions.
2. The system of claim 1, wherein the one or more predetermined key actions comprise one or more of key movements and key words.
3. The system of claim 1, wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising characterizing the one or more observed actions of the user.
4. The system of claim 1, wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising detecting, from the one or more observed actions, one or more of observed movements of the user and observed words of the user.
5. The system of claim 1, wherein the annotations characterize the one or more observed actions.
6. The system of claim 1, wherein the annotations comprise a timeline and one or more markers.
7. The system of claim 1, wherein the logic when executed is further operable to cause the one or more processors to perform operations comprising enabling the video to be played with the annotations.
8. A non-transitory computer-readable storage medium with program instructions stored thereon, the program instructions when executed by one or more processors are operable to cause the one or more processors to perform operations comprising: obtaining a video; detecting one or more observed actions of a user in the video; matching the one or more observed actions with one or more predetermined key actions; and annotating the video with annotations based on the matching of the one or more observed actions with the one or more predetermined key actions.
9. The computer-readable storage medium of claim 8, wherein the one or more predetermined key actions comprise one or more of key movements and key words.
10. The computer-readable storage medium of claim 8, wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising characterizing the one or more observed actions of the user.
11. The computer-readable storage medium of claim 8, wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising detecting, from the one or more observed actions, one or more of observed movements of the user and observed words of the user.
12. The computer-readable storage medium of claim 8, wherein the annotations characterize the one or more observed actions.
13. The computer-readable storage medium of claim 8, wherein the annotations comprise a timeline and one or more markers.
14. The computer-readable storage medium of claim 8, wherein the instructions when executed are further operable to cause the one or more processors to perform operations comprising enabling the video to be played with the annotations.
15. A computer-implemented method comprising: obtaining a video; detecting one or more observed actions of a user in the video; matching the one or more observed actions with one or more predetermined key actions; and annotating the video with annotations based on the matching of the one or more observed actions with the one or more predetermined key actions.
16. The method of claim 15, wherein the one or more predetermined key actions comprise one or more of key movements and key words.
17. The method of claim 15, further comprising characterizing the one or more observed actions of the user.
18. The method of claim 15, further comprising detecting, from the one or more observed actions, one or more of observed movements of the user and observed words of the user.
19. The method of claim 15, wherein the annotations characterize the one or more observed actions.
20. The method of claim 15, wherein the annotations comprise a timeline and one or more markers.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
DETAILED DESCRIPTION
(7) Implementations generally relate to audio-video analytics for simulation-based training. As described in more detail below, in various implementations, a system obtains a video. The system detects one or more observed actions of a target user in the video. The target user may be a student being evaluated by a teacher, for example. The system matches observed actions of the target user with predetermined key actions that may be expected of the target user to perform. The system annotates the video with annotations based on the matching of the observed actions with the predetermined key actions. This enables an evaluator or teacher to access from the video the quality of the observed actions performed by the target user.
(8)
(9) In this example scenario, there are two people or users in work area 106. The video may be a video recorded by camera 104 for the purposes of training one of users U1 and U2 or otherwise observing behavior of users U1 and U2. For ease of illustration, the user being evaluated is user U1, and may also be referred to as the target user, evaluated user, or observed user.
(10) System 102 causes camera 104 to capture video of the activity in a work area 106. Camera 104 may send the raw video from camera 104 to content management system 108 via a network 110. The video may be a video file or a live video stream. System 102 may obtain the raw video from content management system 108 via network 110. Alternatively, system 102 may obtain the raw video directly from camera 104 via network 110. As described in more detail herein, the system annotates the video based on observed actions compared against key actions.
(11) In various implementations, the annotated video may be used to facilitate various activities such as a simulation-based training. As described in more detail herein, the system annotates a given video in order to facilitate a teacher and/or student study particular activities. For example, a teaching nurse may observe the interactions between a student nurse and a patient. The annotated video may guide the teacher and/or student in analyzing and assessing the performance of the student in the video. For example, the annotative video may indicate critical or key actions performed by the student (e.g., washing hands, measuring vitals of the patient, interviewing the patent, etc.). The system may facilitate the evaluator or teacher during a debriefing session to determine if any target key actions are properly done. This accelerates the process of evaluation by highlighting the detected key actions. While some implementations are described in the context of healthcare, these implementations and others may also be applied to other fields such as education generally, law enforcement, aviation, etc.
(12)
(13) At block 204, the system detects one or more observed actions of a user in the video. In various implementations, the video may include raw video data and raw audio data. As such, for ease of illustration, the term video may be used to refer to all content of a video including both video data and audio data. For clarity, video data alone without an audio component may be referred to as video data or raw video data, and audio data alone may be referred to as audio data or raw audio data.
(14)
(15) User interface 300 also includes video marker 308, video marker 310, and audio markers 312. These aspects of
(16) At block 206, the system matches the one or more observed actions with one or more predetermined key actions. In various implementations, the predetermined key actions or behaviors may include key movements performed by the user and/or may contain key words spoken by the user. For example, key movements may include movements associated with cardiopulmonary resuscitation, mouth-to-mouth resuscitation, etc. These particular example key movements are associated with the healthcare field. Other key actions and associated movements may vary depending on the particular implementations. For example, key movements may be associated with fields in law enforcement, military, food service, finance, etc.
(17) In various implementations, the system characterizes the observed actions of the user that the system observes in the video. As indicated above, such key actions may include key movements and key words, which the system analyzes, characterizes, and marks or flags automatically without user intervention. For example, such key movements may include finer movements such as hand motions (e.g., waving or other gestures, etc.), facial expressions (e.g., indicating thinking, confusion, happiness, etc.), activities (e.g., writing, sketching, sitting, standing, walking, etc.).
(18) Key words may include words associated with a particular key activity. For example, if a student nurse is consulting with a patient, key words may include words expected in an interview between the nurse and the patient (e.g., feeling, pain, breath, etc.). Such key movements and key words which constitute broader actions may be, for example, a series of movements related to a particular activity such as cardiopulmonary resuscitation, mouth-to-mouth resuscitation, etc.
(19) In various implementations, the system detects movements performed by the user (referred to as observed movements) and/or observed words spoken by the user. For example, referring to
(20) In another example, if the target user under observation administers a test for vitals (e.g., taking blood pressure, etc.), the system may detect a blood pressure monitor, may detect the target user handling the blood pressure monitor, and may detect a blood pressure cuff that is placed on the arm of another user or patient. The taking of blood pressure may also be a key action of several actions to be observed during the session. The system may then compare the observed behavior of the target user taking the blood pressure of a patient and match the observed behavior to the key action of taking blood pressure. The particular key actions may vary, depending on the particular application. For example, particular actions may include particular actions that are specific to a medical field, school, field, etc.
(21) At block 208, the system annotates the video with annotations based on the matching of one or more observed actions with one or more predetermined key actions. In various implementations, the annotations characterize the one or more observed actions. For example, referring to
(22) As indicated above, the annotations include timeline 304 and one or more markers such as video markers 308 and 310. In various implementations, the system enables the video to be played with the annotations. This enables a user such as an evaluator or teacher to evaluate the quality of actions and or quality of communication of the target user. As such, the evaluator or teacher may moderate or provide guidance to the target user or student as needed.
(23) As shown in
(24) User interface 300 also includes audio markers 312, which may include vertical bars corresponding to points in timeline 304 when the microphone associated with the camera captures audio. For example, the microphone may record a conversation of the users in the video. In some implementations, audio markers 312 may include volume indications, indicated by the length of the vertical bars. For example, a longer bar may indicate more volume and a shorter bar may indicate less volume.
(25) In some implementations, in addition to enabling a user or users to listen to particular audio, the system may also list observed words spoken at the time that match key words associated with the session. For example, the evaluator and/or teacher may expect the student to discuss a particular topic and may detect when key words are uttered. In other words, the system determines when observed words that are spoken match predetermined key words on a list.
(26) In various implementations, the system may include key words in association with audio markers 312. For example, in some implementations, the system may show a list of key words over a particular portion of audio markers 312. Such words may appear by default or alternatively when a user hovers a mouse or other input device. In some implementations, the system may enable the user to select a portion (e.g., bar, etc.) of audio markers 312, after which the system displays a list of key words spoken.
(27) The particular annotations may vary, depending on the particular implementation. For example, the system may determine and annotate where questions during a conversation were raised, if the conversation was balanced, the emotional tone of the conversation, where sound is directed, etc. In some implementations, the system may provide fields to enable another user such as an evaluator or teacher to added notes to the system generated annotations. Such notes may be helpful to the target user or to another evaluator or teacher during a subsequent viewing of the video.
(28) Such annotations may be displayed in appropriate locations along the timeline, which makes searching for particular key actions quick and convenient for the evaluator user. For example, in various implementations, the video markers enable a user to such as an evaluator and/or teacher to quickly and conveniently navigate the video footage and learn of various aspects of the observed actions. For example, in some implementations, video marker 308 enables a user to conveniently view the video footage of particular actions such as the video footage of observed behavior 302. As described in more detail below in connection with
(29)
(30) In some implementations, the system may color code different video markers or different audio markers in order to highlight key actions or key words. In some implementations, the system may rank the importance or urgency of particular key actions and differentially highlight different key actions accordingly. For example, a video marker associated with a critical key actions such as CPR may have red color coding. In contrast, a video marker associated with a non-critical key actions such as generally interviewing may have no color coding or more subdued color coding (e.g., grey, brown, blue, etc.). The particular color coding or other labels may vary, and will depend on the particular implementation.
(31) In the examples above, observed actions captured as video data may include various user behaviors and actions including CPR, mouth-to-mouth resuscitation, etc. Other user behaviors or actions are possible, depending on the particular implementation. For example, in some implementations, the system may observe user-associated behaviors such as (heartbeat), eye gaze, head movement, body movement, etc. Such specific behavior may constitute broader or more general actions and/or series of actions such as CPR, mouth-to-mouth resuscitation, interviewing, etc. Also, a particular movement or spoken word may be associated with different broader key actions, and the system determines such key actions.
(32) The system may use any suitable facial and/or pattern recognition techniques to observe visual behavior. Observed actions captured as audio data may include various user behaviors including particular words uttered by a user, the frequency of words uttered by a user, the volume of the user, etc. The system may use any suitable natural language, and/or speech or pattern recognition techniques to observe auditory behavior.
(33) In some implementations, the system may detect discussion contents, including key words or key themes that arise in the video data. For example, if any words are printed on paper and/or written down on paper or a whiteboard, the system may detect some of the words for analysis. Furthermore, the system may detect discussion contents, including key words or key themes that arise in the audio data. For example, if any words are spoken by a user or any participant in the work area, the system may detect some of the words for analysis.
(34) In some implementations, the system enables an evaluator user to alter aspects of the work area and/or simulation environment in order to more effectively teach the target user. For example, if the target user were measuring vitals of a patient that is actually a mannequin, the evaluator may increase or decrease the heart rate in order to observe how the target user responds. In some implementations, the system may include timing information in connection with a given key action. This enables evaluation of the timing of a particular action (e.g., responding to an emergency situation in a timely manner, washing hands for a sufficient length of time, etc.).
(35) In various implementations, the system sends the video with annotations to the content management system. The content management system may then use the video and annotations for facilitating a simulation-based training system. In various implementations, the video and associated data may be stored in any suitable file and may include interexchangeable metadata and/or other forms of metadata.
(36) Although the steps, operations, or computations may be presented in a specific order, the order may be changed in particular implementations. Other orderings of the steps are possible, depending on the particular implementation. In some particular implementations, multiple steps shown as sequential in this specification may be performed at the same time. Also, some implementations may not have all of the steps shown and/or may have other steps instead of, or in addition to, those shown herein.
(37) Implementations described herein provide various benefits. For example, implementations facilitate in simulation-based training. Implementations described herein enable an evaluator or teacher to view and study video footage and associated annotations in order to determine the quality of actions of a target user or student.
(38)
(39) In various implementations, system 502 may be used for providing audio-video analytics for simulation-based training. Also, client devices 510, 520, 530, and 540 may be used to implement various devices for simulation-based training. For example, some client devices may function as cameras for capturing observed behavior. Some client devices may used to show annotated videos for debriefing, etc.
(40) For ease of illustration,
(41) While server 504 of system 502 performs embodiments described herein, in other embodiments, any suitable component or combination of components associated with server 502 or any suitable processor or processors associated with server 502 may facilitate performing the embodiments described herein.
(42) Implementations may apply to any network system and/or may apply locally for an individual user. For example, implementations described herein may be implemented by system 502 and/or any client device 510, 520, 530, and 540. System 502 may perform the implementations described herein on a stand-alone computer, tablet computer, smartphone, etc. System 502 and/or any of client devices 510, 520, 530, and 540 may perform implementations described herein individually or in combination with other devices.
(43) In the various implementations described herein, a processor of system 502 and/or a processor of any client device 510, 520, 530, and 540 causes the elements described herein (e.g., information, etc.) to be displayed in a user interface on one or more display screens.
(44)
(45) Computer system 600 also includes a software application 610, which may be stored on memory 606 or on any other suitable storage location or computer-readable medium. Software application 610 provides instructions that enable processor 602 to perform the implementations described herein and other functions. Software application may also include an engine such as a network engine for performing various functions associated with one or more networks and network communications. The components of computer system 600 may be implemented by one or more processors or any combination of hardware devices, as well as any combination of hardware, software, firmware, etc.
(46) For ease of illustration,
(47) Although the description has been described with respect to particular embodiments thereof, these particular embodiments are merely illustrative, and not restrictive. Concepts illustrated in the examples may be applied to other examples and implementations.
(48) In various implementations, software is encoded in one or more non-transitory computer-readable media for execution by one or more processors. The software when executed by one or more processors is operable to perform the implementations described herein and other functions.
(49) Any suitable programming language can be used to implement the routines of particular embodiments including C, C++, Java, assembly language, etc. Different programming techniques can be employed such as procedural or object oriented. The routines can execute on a single processing device or multiple processors. Although the steps, operations, or computations may be presented in a specific order, this order may be changed in different particular embodiments. In some particular embodiments, multiple steps shown as sequential in this specification can be performed at the same time.
(50) Particular embodiments may be implemented in a non-transitory computer-readable storage medium (also referred to as a machine-readable storage medium) for use by or in connection with the instruction execution system, apparatus, or device. Particular embodiments can be implemented in the form of control logic in software or hardware or a combination of both. The control logic when executed by one or more processors is operable to perform the implementations described herein and other functions. For example, a tangible medium such as a hardware storage device can be used to store the control logic, which can include executable instructions.
(51) Particular embodiments may be implemented by using a programmable general purpose digital computer, and/or by using application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical, chemical, biological, quantum or nanoengineered systems, components and mechanisms. In general, the functions of particular embodiments can be achieved by any means as is known in the art. Distributed, networked systems, components, and/or circuits can be used. Communication, or transfer, of data may be wired, wireless, or by any other means.
(52) A processor may include any suitable hardware and/or software system, mechanism, or component that processes data, signals or other information. A processor may include a system with a general-purpose central processing unit, multiple processing units, dedicated circuitry for achieving functionality, or other systems. Processing need not be limited to a geographic location, or have temporal limitations. For example, a processor may perform its functions in real-time, offline, in a batch mode, etc. Portions of processing may be performed at different times and at different locations, by different (or the same) processing systems. A computer may be any processor in communication with a memory. The memory may be any suitable data storage, memory and/or non-transitory computer-readable storage medium, including electronic storage devices such as random-access memory (RAM), read-only memory (ROM), magnetic storage device (hard disk drive or the like), flash, optical storage device (CD, DVD or the like), magnetic or optical disk, or other tangible media suitable for storing instructions (e.g., program or software instructions) for execution by the processor. For example, a tangible medium such as a hardware storage device can be used to store the control logic, which can include executable instructions. The instructions can also be contained in, and provided as, an electronic signal, for example in the form of software as a service (SaaS) delivered from a server (e.g., a distributed system and/or a cloud computing system).
(53) It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope to implement a program or code that can be stored in a machine-readable medium to permit a computer to perform any of the methods described above.
(54) As used in the description herein and throughout the claims that follow, a, an, and the includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of in includes in and on unless the context clearly dictates otherwise.
(55) Thus, while particular embodiments have been described herein, latitudes of modification, various changes, and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of particular embodiments will be employed without a corresponding use of other features without departing from the scope and spirit as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit.