Automatic visual remote assessment of movement symptoms in people with parkinson's disease for MDS-UPDRS finger tapping task
09687189 ยท 2017-06-27
Assignee
Inventors
- Faezeh Tafazzoli (Louisville, KY, US)
- Beilei Xu (Penfield, NY)
- Hao Wu (Pittsford, NY, US)
- Robert P Loce (Webster, NY, US)
Cpc classification
A61B5/4082
HUMAN NECESSITIES
G06V20/46
PHYSICS
G06V10/25
PHYSICS
G16H50/20
PHYSICS
A61B5/7264
HUMAN NECESSITIES
International classification
A61B5/00
HUMAN NECESSITIES
A61B5/11
HUMAN NECESSITIES
Abstract
A system and method for assessing patient movement for Parkinson's disease includes capturing a video of a subject performing a finger tapping sequence comprising a predetermined number of open and close periods. According to an exemplary embodiment, a system and method includes extracting a region of interest for each frame of the video and generating a projection of the region of interest for each frame of the video using perpendicular vector projections in a direction or plurality of directions.
Claims
1. A computer implemented method for assessing patient movement, comprising: capturing a video including a plurality of frames of a subject performing a finger tapping sequence including a predetermined number of open and close periods; extracting a region of interest for each frame of the video; generating one or more projections of the region of interest for a set of frames; and extracting maximums and minimums over time from the generated projections.
2. The computer implemented method according to claim 1, further comprising: extracting temporal features from the extracted maximums and minimums; and mapping the extracted temporal features to a trained model to attach an assessment to the subject.
3. The computer implemented method according to claim 2, wherein the extracted temporal features is one of number of interruptions, pace of slowing the task, the time at which the amplitude starts to decrement.
4. The computer implemented method according to claim 2, wherein the trained model represents a UPDRS score for diagnosing Parkinson's disease.
5. The computer implemented method according to claim 1, wherein generating a projection of the region of interest includes using perpendicular vector projections for a border of the region of interest facing a direction.
6. The computer implemented method according to claim 1, wherein extracting maximums and minimums includes: constructing a spatio-temporal representation of the video based on the projections for the direction; extracting temporal data from the spatio-temporal representation to generate a time stamp graph having local minima and maxima indicating a start and end time of each open/close period; and extracting spatial data to determine one of an open or closed position depending on the local minima and maxima of the time stamp graph.
7. The computer implemented method according to claim 6, wherein the spatio-temporal representation represents the spatial position of a pixel of the projection along the vertical axis and the time of the frame of the video along the horizontal axis.
8. The computer implemented method according to claim 6, wherein constructing the spatio-temporal representation includes assigning a visual indicator value corresponding to the vertical spatial coordinates of the projection.
9. The computer implemented method according to claim 6, wherein extracting the spatial data includes: combining the temporal data with another spatio-temporal representation to determine an open or closed state; selecting a window around each local minimum and maximum representing the height of the spatio-temporal representation; and determining the position of the thumb and index finger using an average location in the window.
10. A computer program product comprising tangible media with encoded instructions for performing the method of claim 1.
11. A system for assessing movement symptoms in a video comprising: a video processing system including memory which stores instructions for performing the method of claim 1 and a processor in communication with the memory for executing the instructions.
12. A motion assessment system, comprising: one or more image capturing devices configured to capture a video of a finger tapping sequence comprising a predetermined number of open and close periods; a ROI module for tracking extracting a region of interest for each frame of the video; and a memory having instructions to be performed by one or more processors, the instructions including the steps of: extracting a region of interest for each frame of the video; generating one or more projections of the region of interest for a set of frames; and extracting maximums and minimums over time from the generated projections.
13. The motion assessment system according to claim 12, the instructions further comprising: extracting temporal features from the extracted maximums and minimums; and mapping the extracted temporal features to a trained model to attach an assessment to the subject.
14. The motion assessment system according to claim 13, wherein the extracted temporal features is one of number of interruptions, pace of slowing the task, the time at which the amplitude starts to decrement.
15. The motion assessment system according to claim 13, wherein the trained model represents a UPDRS score for diagnosing Parkinson's disease.
16. The motion assessment system according to claim 12, the instructions for generating a projection of the region of interest includes using perpendicular vector projections for a border of the region of interest facing a direction.
17. The motion assessment system according to claim 12, the instructions for extracting maximums and minimums comprises: constructing a spatio-temporal representation of the entire video based on the projections for the direction; extracting temporal data from the spatio-temporal representation to generate a time stamp graph having local minima and maxima indicating a start and end time of each open/close period; and extracting spatial data to determine one of an open or closed position depending on the local minima and maxima of the time stamp graph.
18. The motion assessment system according to claim 17, wherein the spatio-temporal representation represents the spatial position of a pixel of the projection along the vertical axis and the time of the frame of the video along the horizontal axis.
19. The motion assessment system according to claim 17, wherein the instructions for constructing the spatio-temporal representation includes assigning a visual indicator value corresponding to the vertical spatial coordinates of the projection.
20. The motion assessment system according to claim 17, wherein the instructions for extracting the spatial data includes: combining the temporal data with another spatio-temporal representation to determine an open or closed state; selecting a window around each local minimum and maximum representing the height of the spatio-temporal representation; and determining the position of the thumb and index finger using an average location in the window.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION
(10) An exemplary method and system according to this disclosure is now described with reference to
(11) With reference to
(12) a) capturing video 102 including a clear view of a patient's hand movement using a video imaging system;
(13) b) extracting a hand silhouette 104, i.e., a region of interest around the hand, and tracking the hand during the sequence;
(14) c) creating spatio-temporal representations 106 of the task for different projections of silhouette extracted from various directions;
(15) d) extracting temporal information 108 for determining the times when the index and thumb fingers have the maximum and minimum distance;
(16) e) extracting spatial information 110 for determining the location of index and thumb fingers at frames in which fingers have the maximum and minimum distance by infusing information of the spatio-temporal data of the projection which displays the maximum change in index finger's location and the one displaying major changes in both fingers;
(17) f) extracting features 112 such as number of interruptions, pace of slowing and index of amplitude decrement from the information extracted in the previous two steps; and
(18) g) mapping features 114 to an equivalent 0-4 UPDRS score.
(19) With reference to
(20) The video capturing and processing system 200 includes a video processing device 202, a video capture device 204, and a storage device 206, which may be linked together by communication links, referred to herein as a network. In one exemplary embodiment, the system 200 may be in further communication with a user device 208. These components are described in greater detail below.
(21) Video processing device 202 illustrated in
(22) Memory 214 may represent any type of tangible computer readable medium such as random access memory (RAM), read only memory (ROM), magnetic disk or tape, optical disk, flash memory, or holographic memory. In one embodiment, the memory 214 includes a combination of random access memory and read only memory. The digital processor 210 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like. The digital processor, in addition to controlling the operation of the video processing device 202, executes instructions stored in memory 214 for performing the parts of a method discussed herein. In some embodiments, the processor 210 and memory 214 may be combined in a single chip.
(23) The video processing device 202 may be embodied in a networked device, such as the video capture device 204, although it is also contemplated that the video processing device 202 may be located elsewhere on a network to which the system 200 is connected, such as on a central server, a networked computer, or the like, or distributed throughout the network or otherwise accessible thereto. The functions of the video processing device 202 can be performed remotely away from the subject of the video. The video data analysis, i.e., movement symptom analysis include spatial-temporal analysis, phases disclosed herein are performed by the processor 210 according to the instructions contained in the memory 214. In particular, the memory 214 stores a video capture module 216, which captures video data of a finger tapping test; an initialization module 218, which initializes the system; and a ROI module 220, which detects and tracks objects that are moving in the area of interest. Embodiments are contemplated wherein these instructions can be stored in a single module or as multiple modules embodied in the different devices.
(24) The software modules as used herein, are intended to encompass any collection or set of instructions executable by the video processing device 202 or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term software as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called firmware that is software stored on a ROM or so forth. Such software may be organized in various ways, and may include software components organized as libraries, Internet-based programs stored on a remote server or so forth, source code, interpretive code, object code, directly executable code, and so forth. It is contemplated that the software may invoke system-level code or calls to other software residing on a server (not shown) or other location to perform certain functions. The various components of the video processing device 202 may be all connected by a bus 228.
(25) With continued reference to
(26) The video processing device 202 may include one or more special purpose or general purpose computing devices, such as a server computer or digital front end (DFE), or any other computing device capable of executing instructions for performing the exemplary method.
(27)
(28) In one embodiment, the video source 204 can be a device adapted to relay and/or transmit the video captured by the camera to the video processing device 202. For example, the video source 204 can include a scanner, a computer, or the like. In another embodiment, the video data 232 may be input from any suitable source, such as a workstation, a database, a memory storage device, such as a disk, or the like. The video source 204 is in communication with the controller containing the processor 210 and memories 214.
(29) With continued reference to
(30) Various aspects of the method and system of detecting movement symptoms are now described in further detail.
(31) With continuing reference to
(32) In each video frame, the processor 210 extracts a region of interest, i.e., including a hand 104. In one embodiment, the region of interest is extracted by detecting the face of the patient, i.e., subject, and using the size, location, and color of the face bounding box. The size of the ROI is determined proportional to the size of the face of the subject because the length of an adult person's hand is approximately equal to the height of that person's face. See T. Khan, D. Nyholm, J. Westin and M. Dougherty, A computer vision framework for finger-tapping evaluation in Parkinson's disease. Artificial Intelligence in Medicine, Vol. 60, No. 1, pp. 27-40, 2014. In another embodiment, the size of the ROI is determined by detecting skin of the subject directly from each frame.
(33) The processor 210 uses skin color and motion data, detected from the hand in each ROI and tracked throughout the entire sequence. See S. N. Karishma and V. Lathasree, Fusion of Skin Color Detection and Background Subtraction for Hand Gesture Segmentation, International Journal of Engineering Research & Technology, Vol. 3, No. 2, 2014.
(34) For detecting the skin, a face-based adaptive threshold using a personalized Gaussian skin color model can be employed to account for illumination variations. See C. Hsieh, D. Liou and w. Lai, Enhanced Face-Based Adaptive Skin Color Model, Journal of Applied Science and Engineering, Vol. 15, No. 2, pp. 167-176, 2012.
(35) The processor 210 extracts spatio-temporal representations 106 for the entire video sequence from the silhouette of the hand detected in each frame of the video. With reference to
(36) With reference to
(37) The processor 210 determines the location of fingers at their extreme positions, i.e., either opened or closed, in time. The processor 210 extracts temporal information of the points where the fingers are in the full open or closed position. The processor 210 finds the spatio-temporal data that displays the maximum vertical change in the index finger. From this spatio-temporal pattern, i.e.,
(38) From the temporal information indicating start and end of each finger tapping cycle, the processor 210 extracts the location of the fingertips 110, where the pattern displays a wide open or completely closed state for fingers depending on a peak or valley in the time stamp graph 1202. The processor 210 combines the temporal information with other spatio-temporal data that represents a horizontal projection of silhouette displaying maximum change in location of both thumb and index finger.
(39) The processor 210 selects a window 1302, 1304 around each point that has the height of the spatio-temporal pattern and a varying width depending on the distance between two consecutive peaks. With reference to
(40) The processor 210 extracts features from the finger tapping task 112 based on a discrete scale of the UPDRS evaluation which can be assigned to the patient's current status. The extracted features can include number of interruptions, pace of slowing the task, the time at which the amplitude starts to decrement and the like.
(41) The processor 210 determines the location of index finger and thumb at specific time intervals during the whole sequence and analyzes the variation of distance between them. The temporal information of extreme points is used by the processor 210 to determine the amount of time the subject has kept his fingers in the closed or open status. This provides a presentation of the period of each status resulting in an inference for the number of interruptions occurred. In one embodiment, with reference to
(42) The processor 210 inputs the extracted feature data into a trained model to classify each graph to a discrete number/value for each patient 114. The values for each feature have are presented in the table below.
(43) TABLE-US-00001 0: Normal 1: Slight 2: Mild 3: Moderate 4: Severe No problems a) 1 or 2 interruptions or a) 3 to 5 interruptions a) More than 5 a) Cannot or can only hesitations b) Mild slowing interruptions or at barely perform the b) Slight slowing c) Amplitude least one longer task because of c) Amplitude decrements midway arrest (freeze) in slowing, decrements near the in the task ongoing movement interruptions or end of the 10 taps b) Moderate slowing decrements c) Amplitude decrements starting after the 1st tap
(44) The number indicating the stage of the disease should be assigned if any of the conditions are met. The numbers of the table indicate the patient's UPDRS score for determination of Parkinson's disease.
(45) Some portions of the detailed description herein are presented in terms of algorithms and symbolic representations of operations on data bits performed by conventional computer components, including a central processing unit (CPU), memory storage devices for the CPU, and connected display devices. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is generally perceived as a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
(46) It should be understood, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, as apparent from the discussion herein, it is appreciated that throughout the description, discussions utilizing terms such as processing or computing or calculating or determining or displaying or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
(47) The exemplary embodiment also relates to an apparatus for performing the operations discussed herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
(48) The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the methods described herein. The structure for a variety of these systems is apparent from the description above. In addition, the exemplary embodiment is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the exemplary embodiment as described herein.
(49) A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For instance, a machine-readable medium includes read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; and electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), just to mention a few examples.
(50) The methods illustrated throughout the specification, may be implemented in a computer program product that may be executed on a computer. The computer program product may comprise a non-transitory computer-readable recording medium on which a control program is recorded, such as a disk, hard drive, or the like. Common forms of non-transitory computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, or any other tangible medium from which a computer can read and use.
(51) Alternatively, the method may be implemented in transitory media, such as a transmittable carrier wave in which the control program is embodied as a data signal using transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like.
(52) It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.