In-vehicle monitoring

10748016 ยท 2020-08-18

Assignee

Inventors

Cpc classification

International classification

Abstract

In a method of video monitoring of a subject, for example a driver of a vehicle, the video image is motion compensated by image registration techniques so that the subject's position in each frame of the video image is stable. A region of interest is defined on the skin of the subject and used to obtain a PPG signal. To compensate for variations in illumination of the subject caused by the subject's movement in the vehicle, the parameters of the calculated motion transformation used in the image registration are used to form an illumination model representing how the illumination of the subject would have changed because of the subject's motion. The illumination model is a linear or quadratic function fitted to the image intensity in the region of interest. Residuals between the fitted model and the image intensity form an illumination-compensated signal in which the photoplethysmographic signal is more clearly present. The illumination-compensated signal is analysed to obtain a PPG signal and from this estimates of one or more vital signs such as heart rate or breathing rate are obtained.

Claims

1. A method of processing sequence of image frames forming a video image of a human subject to estimate a vital sign of the subject comprising the steps of: analysing the video image to obtain a motion transformation whose parameters represent the movement of the subject; defining at least one region of interest on the subject in the video image and obtaining a signal representing the image intensity in the at least one region of interest; obtaining a predicted illumination signal by fitting a function of the motion transformation parameters to the signal representing the image intensity; obtaining a residuals signal representing the difference between the signal representing the image intensity and the predicted illumination signal after said fitting; and analysing the residuals signal to obtain an estimate of at least one vital sign of the subject.

2. A method according to claim 1 wherein the step of analysing the video image to obtain a motion transformation comprises performing a motion correction process which produces a motion corrected video image and the motion transformation.

3. A method according to claim 1 wherein the step of analysing the video image to obtain a motion transformation comprises performing an image registration process.

4. A method according to claim 1 wherein the region of interest is defined in the video image before the step of analysing the video image to obtain a motion transformation, and the step of analysing the video image to obtain a motion transformation comprises tracking the region of interest through the sequence of image frames.

5. A method according to claim 1 wherein the motion transformation is a projective transformation.

6. A method according to any claim 1 wherein the step of fitting a function of the motion transformation parameters to the signal representing the image intensity comprises linear or quadratic regression.

7. A method according to claim 1 wherein a plurality of regions of interest are defined on the subject.

8. A method according to claim 1 wherein the intensity signal is formed for the or each at least one region of interest by averaging or summing pixel values in the at least one region of interest.

9. A method according to claim 1 wherein the intensity signal contains a photoplethysmographic signal.

10. A method according to claim 1 wherein the video image is of a subject in a vehicle.

11. Apparatus for video image processing comprising an input for receiving a video image signal, an image processor adapted to process a video image in accordance with the method of claim 1, and an output for outputting the estimate of the at least one vital sign of the subject.

12. A video monitoring system comprising a video camera for capturing a video image of a subject and outputting a video image and apparatus for video image processing as defined in claim 11.

13. A vehicle having installed therein a video monitoring system according to claim 12, the video camera being positioned to capture an image of the operator of the vehicle.

Description

DRAWINGS

(1) FIG. 1 schematically illustrates a vehicle including an embodiment of the invention;

(2) FIG. 2 schematically illustrates the main parts of a video monitoring system in accordance with an embodiment of the invention;

(3) FIG. 3 illustrates two image frames from a motion-compensated video image;

(4) FIG. 4 is a flow diagram of one embodiment of the invention;

(5) FIG. 5 shows the results of applying the method of an embodiment of the invention to a 20 seconds segment of video

(6) FIG. 6 is a flow diagram of an alternative embodiment of the invention;

(7) FIGS. 7(A) and 7(B) illustrate regions of interest and resulting signals according to the alternative embodiment of the invention

(8) FIG. 8 is a flow diagram illustrating a way of defining regions of interest for the alternative embodiment of the invention.

DETAILED DESCRIPTION

(9) FIG. 1 schematically illustrates a vehicle 3 occupied by an operator 1 who is the subject of a video image captured by video camera 5. The video camera 5 may be a standard colour or monochrome digital video camera producing a monochrome or colour digital video signal consisting of a sequence of image frames, typically at a frame rate of 20 frames per second for example.

(10) As shown in FIG. 2, the camera 5 supplies its output (a video image) to an image processor 10 which processes the video image as will be described later to obtain an estimate of a vital sign of the subject 1. This estimate may be output via a display 12 and/or recorded in a data store 14. Optionally the image processor may also supply an output to the vehicle controls 7 if it is desired to provide some form of alarm or alert through the vehicle controls or to operate the vehicle to bring it into a safe condition.

(11) The analysis of the video image by the image processor 10 to obtain a PPG signal relies upon analysing the image in a skin region of the subject 1. This is achieved by locating one or more areas in each video image frame which are images of the subject's skin and defining one or more regions of interest in such skin regions. Such regions of interest are usually square or rectangular and FIG. 3 illustrates two image frames from a video image with a region of interest shown on the forehead of the subject as a white rectangle. Because the operator of the vehicle is subject to the vehicle's movements, the subject moves considerably within the field of view of the camera and so appears in different positions in each frame of the image. In order to obtain a good PPG signal it is necessary for the region of interest to remain consistently positioned on the subject. Therefore it is normal to perform an image registration process on the video image so that the subject appears to remain still, while the background moves. Such image registration processes are well-known.

(12) For each pair of adjacent frames, the image registration process involves the application of the projective transformation to the second frame, that minimizes the differences between the overlapping region of the second frame and a pre-specified sub-region of the first frame. Each such projective transformation can be specified as a set of 8 parameters, with t.sub.ni taken to denote the value of the i.sup.th such parameter corresponding to the transformation between the n1.sup.th and n.sup.th frame. For n=1 no such transformation exists, with the instead set to zero, for all i. The cumulative transformation at frame n is defined as the sum of the transformations up to that frame:
T.sub.ni=.sub.m=1.sup.n t.sub.mi

(13) The motion transformations can be applied to the video images to produce a set of motion-compensated video images in which the subject remains apparently still, but the background appears to move. This allows the region of interest to be positioned at the same position in each frame and thus to represent the same area of skin of the subject throughout the video image.

(14) However, as mentioned above, if the subject is in a non-uniform illumination field, then although the image registration process corrects the images for movement, the variation in illumination on the face of the subject will still be present. Thus the brightness of the subject will appear to flicker or vary and such a variation in illumination is a significant source of noise for the PPG signal analysis.

(15) In accordance with the invention, therefore, after the image registration process, the calculated motion transformation is used to form an illumination model of how the illumination on the face has changed because of the motion of the subject relative to the camera. The difference between the modeled illumination and the intensity variation detected in the region of interest on the skin should then have a clearer PPG component. Conventional methods for estimating the vital signs of the subject, such as heart rate or breathing rate from the signal will then produce a better estimate because of the reduction of noise from varying illumination.

(16) One embodiment of the method is shown in more detail in FIG. 4.

(17) In step 100 a video image consisting of successive image frames is captured and in step 102 a time window of n adjacent frames is selected (for example n may be 200). Then in step 104 a conventional image registration process is executed. This process involves calculating in step 106 a motion transformation representing the movement of the subject between adjacent image frames of the video image and in step 108 this transformation is used to compensate the video images for the subject motion to place the subject in the same position in each frame.

(18) In step 110 one or more regions of interest on skin areas of the subject in the motion-compensated video image are defined and in step 112 an intensity-representative signal is obtained from the region or each region of interest. Such signals may be obtained by averaging or summing the pixel values in the region of interest and this may be conducted on a single monochrome or colour channel or on a function, such as a ratio, of multiple colour channels of a video signal.

(19) In step 114 the motion transformation calculated in step 106 is used to form an illumination model of how the illumination on the subject has changed as a result of the subject's movement. One example of such an illumination model is a linear combination of the cumulative transformation parameters, that is to say, each parameter is multiplied by a coefficient and added together.
Illumination L.sub.n=k.sub.0+.sub.0k.sub.iT.sub.ni

(20) The coefficients k.sub.0, k.sub.1, . . . , k.sub.8 are obtained through the minimization of .sub.n(I.sub.nL.sub.n).sup.2, where I.sub.n is the mean intensity of the or each region of interest in the n.sup.th transformed frame.

(21) Rather than a simple linear function, a higher-order function such as a quadratic function of the cumulative transformation parameters may be used:
Illumination L.sub.n=k.sub.0+.sub.ik.sub.iT.sub.ni+.sub.ik.sub.(i+8)(T.sub.ni).sup.2

(22) Once the illumination model has been fitted to the intensity signal, the residuals, i.e. the remaining differences between the fitted function and the intensity signal can be taken as a new residuals signal which still includes the PPG signal. The PPG signal will be stronger in this residuals signal because the main variations in illumination resulting from the subject's movement have been removed. In step 116 these residuals are taken as an illumination-compensated signal, one for each region of interest. In step 118 any of the known techniques for analysing a video image signal to derive a PPG signal may be applied to the illumination-compensated residuals signal(s) to derive an estimate of a vital sign such as heart rate or breathing rate. Where there are plural regions of interest, this may involve combining the illumination-compensated residuals signal(s) and then analysing the result to find the PPG signal and a vital sign estimate, or analysing them individually and combining the resulting estimates. In step 120 the estimated vital sign may be displayed and/or output and/or stored. Possible actions based on the vital sign are alarming or alerting the operator if the vital signs are abnormal or executing some control of the vehicle.

(23) In step 122 the time window of n frames is shifted by a time step, such as one second, to obtain a new time window of n frames and the process is repeated to obtain a new vital sign estimate.

(24) FIG. 5 illustrates the results of applying the first embodiment of the invention to a video image sequence of a subject under spatially variable illumination. The top plot in FIG. 5 shows the original motion-compensated intensity signal in blue (solid) and the predicted illumination formed by fitting a quadratic function of the motion cumulative transform parameters to that intensity signal. The residuals, i.e. the difference between the two, is illustrated in the middle plot and this forms an illumination-compensated signal which includes a PPG component. The bottom plot shows a Fast Fourier Transform (FFT) analysis of that signal with the realistic physiological frequency bands for heart rate illustrated between the two pairs of vertical lines. The FFT representation of the residuals in the bottom plot shows strong signals in the two physiologically-possible ranges for heart beat.

(25) FIG. 6 schematically illustrates an alternative embodiment of the invention in which instead of motion correcting the image and then defining the region of interest, the region of interest is defined first and then tracked through the video image, its movement representing the movement of the subject and thus providing the motion transformation signal and motion transformation parameters. As illustrated in FIG. 6, in step 102 a time window of n adjacent frames of the video image captured in step 100 is selected and in step 604 one or more regions of interest are defined on a skin area of the subject in the image. One automatic way of defining such regions of interest will be described below with reference to FIGS. 7 and 8, but other ways, such as detecting areas of skin by their colour or by recognising human shapes in the image are known and explained in the art. Having defined one or more regions of interest, the position of the or each region of interest is tracked through the time window in step 606 and in step 608 the movement is taken as the motion transformation signal whose parameters represent subject movement in the image.

(26) In step 112 an intensity signal is obtained from the or each region of interest in the same way as in the first embodiment, for example by averaging or summing the pixel values in the region of interest, and then the method proceeds as in the first embodiment by fitting a function of the motion transformation parameters to the intensity signal for the or each region of interest and taking the residuals of the fitting process as an illumination-compensated PPG signal (step 116). This illumination-compensated PPG signal is analysed in step 118 to obtain an estimate of at least one vital sign, which is displayed and output or stored in step 120. The process then moves the time window by a time step, for example one second in step 122 and repeats. FIG. 8 illustrates in more detail the signal processing for one way of defining regions of interest and obtaining intensity and motion transformation signals of steps 604, 606, 608 and 112. Optionally firstly, in step 800, the average frame intensity of each frame is set to a constant value to reduce image flicker, e.g. by multiplying each pixel value by the mean pixel value over the whole sequence and dividing by a constant to scale the values as desired (e.g. 0-255 for 8 bit values).

(27) In step 801, feature points in the video sequence are detected. There are many ways of detecting feature points in a video sequence using off-the-shelf video processing algorithms. For example, feature points consisting of recognisable geometrical shapes such as corner or edges can be detected based, for example, on the gradient of intensity variation in one or two dimensions, and any such conventional algorithm which identifies image feature points can be used in this invention. The feature points are tracked through the whole batch of video frames under consideration, e.g. by using a conventional tracking algorithm such as KLT tracking, to form tracks consisting of the x and y coordinates of each feature point in each image frame through the sequence.

(28) In step 803 a time window (e.g. 9 seconds=180 frames at twenty frames per second) is taken. Thus the next steps of the process are conducted on a time window of the video sequence, and then the process will be repeated for another time window shifted along by some time increment. The successive windows may overlap, for example if a nine second window is stepped forwards by one second each time the overlap will be eight seconds.

(29) Any tracks which do not exist in all the frames of the window are discarded.

(30) In step 806, the central frame of the time window is taken and Delaunay triangulation is performed on the feature points. Delaunay triangulation is a process which creates triangles favouring large internal angles. FIG. 7(A) illustrates schematically three successive frames at times n1, n and n+1 with the central frame n having five feature points 20 connected to form triangles. As can be seen in FIG. 7(A), the position of the feature points varies from frame-to-frame. Having formed the triangles in the central frame of the sequence, the same triangles are formed in each other frame of the sequence (i.e. the same feature points are connected together) so that each triangle is defined throughout the whole nine second time window by three KLT tracks specifying the positions of its vertices. In step 808, the in-circle 22 of each triangle is formed and then a square 24 concentric with the in-circle 22 is formed, aligned with the x and y axes of the image frame and with a side length equal to the diameter of the in-circle. Each of these squares 24 then constitutes a region of interest from which a signal will be obtained for further processing.

(31) As illustrated in step 810 the intensity in each region of interest in each frame is calculated (the sum of all the pixel intensity values) and the intensity for each square region of interest (ROI) through the time window corresponds to a signal (i1 to im) to be processed. In visible light, for a camera outputting three R, G, B colour channels, only the green channel is used. However if the scene is illuminated by infra-red light, the mean of the three colour channels is used. The image intensity of each ROI through the frame sequence will typically vary as schematically illustrated in FIG. 7(B). The location (x, y) of each region of interest (for example the centre or a specified corner of each square) is also obtained and represents the movement of the subject. The intensity signals and movements of each region of interest are then output as time signals as illustrated in step 112.

(32) The intensity and movement signals output from the process of FIG. 8 are input to step 114 of FIG. 6.