Device, system and method for skin detection

10818010 ยท 2020-10-27

Assignee

Inventors

Cpc classification

International classification

Abstract

The present invention relates to a device, system and method for skin detection. To enable a reliable, accurate and fast detection the proposed device comprises an input interface (30) for obtaining image data of a scene, said image data comprising a time sequence of image frames, an extraction unit (31) for extracting a photoplethysmography (PPG) signal from a region of interest of said image data, a transformation unit (32) for transforming said PPG signal into a spectral signal, a sorting unit (33) for sorting said spectral signal to obtain a sorted spectral signal representing a descriptor, and a classifier (34) for classifying said region of interest as skin region of a living being or as non-skin region based on the descriptor.

Claims

1. A device for skin detection comprising: an input interface for obtaining image data of a scene, said image data comprising a time sequence of image frames, an extraction unit for extracting a photoplethysmography, PPG, signal from a region of interest of said image data, a transformation unit for transforming said PPG signal into a spectral signal, a sorting unit configured to divide said spectral signal into an in-band sub-signal covering a first frequency range of said spectral signal and an out-band sub-signal covering the remaining frequency range of said spectral signal and to separately sort said in-band sub-signal and said out-band sub-signal to obtain a sorted in-band sub-signal and a sorted out-band sub-signal representing the descriptor, and a classifier for classifying said region of interest as one of a skin region of a living being or a non-skin region based on the descriptor.

2. The device as claimed in claim 1, wherein said transformation unit is configured to transform said PPG signal into a spectral signal without phase information, in particular into one of a power spectrum or an absolute spectrum.

3. The device as claimed in claim 1, wherein said sorting unit is configured to divide said spectral signal such that the in-band sub-signal covers one of a lower portion of the frequency range of said spectral signal or a portion of the frequency range around a highest frequency peak of said spectral signal.

4. The device as claimed in claim 1, wherein said transformation unit is configured to normalize the spectral signal.

5. The device as claimed in claim 1, further comprising a control unit for controlling said transformation unit and said sorting unit to perform two or more iterations, wherein the sorted spectral signal output from said sorting unit is used as input PPG signal for the transformation unit in the next iteration.

6. The device as claimed in claim 1, wherein said classifier is configured to concatenate the sorted spectral signals output from said sorting unit in each iteration and use said concatenated sorted spectral signal as descriptor for classifying said region of interest as one of a skin region of a living being or a non-skin region.

7. The device as claimed in claim 1, wherein said extraction unit is configured to combine, in particular to average, image data values of a group of pixels of said image data per image frame to obtain said PPG signal from said combined image data values.

8. The device as claimed in claim 1, wherein said extraction unit is configured to combine, in particular to average, image data values of a group of pixels of said image data per image frame at a wavelength or in a wavelength range to obtain said PPG signal from said combined image data values.

9. The device as claimed in claim 1, wherein said extraction unit is configured to combine, per pixel or group of pixels and per image frame, image data values of at least two different wavelength channels as a weighted average to obtain said PPG signal from said combined image data values.

10. The device as claimed in claim 9, wherein said extraction unit is configured to compute said weights using a normalized blood volume pulse vector signature based method, a chrominance based method, a blind source separation method, a principal component analysis or an independent component analysis.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) These and other aspects of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter. In the following drawings

(2) FIG. 1 shows a schematic diagram of a first embodiment of a system according to the present invention,

(3) FIG. 2 shows a schematic diagram of a first embodiment of a device according to the present invention,

(4) FIGS. 3A and 3B show diagrams illustrating exemplary PPG signals from skin and non-skin regions for different living beings,

(5) FIGS. 4A, 4B, 4C, and 4D show diagrams illustrating exemplary signals at the various steps of a method according to an embodiment of the present invention,

(6) FIG. 5 shows a schematic diagram of a second embodiment of a device according to the present invention,

(7) FIG. 6 shows a diagram illustrating descriptors related to the processing according to the second embodiment, and

(8) FIGS. 7A and 7B show diagrams illustrating descriptors related to the processing according to a third embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

(9) FIG. 1 shows a schematic diagram of a system 10 according to the present invention including a device 12 for skin detection. The system 10 and device 12 may preferably be used in a device and method for detecting vital signs of a subject 14 from image data including a time sequence of image frames of the subject. The subject 14, in this example a patient, lies in a bed 16, e.g. in a hospital or other healthcare facility, but may also be a neonate or premature infant, e.g. lying in an incubator, or person at home or in a different environment, such as an athlete doing sports.

(10) The imaging unit 18 may include a camera (also referred to as detection unit or as camera-based or remote PPG sensor) for acquiring an image data (also called RGB images, which shall be understood as an image in the wavelength range of visual and/or infrared light) of the scene, in particular for acquiring a sequence of image frames of the subject 14 over time, preferably including skin areas 15 of the subject 14 from which PPG signals can be derived. In an application of the device 12 for obtaining vital signs of the subject 14, the skin area 15 is preferably an area of the face, such as the cheeks or the forehead, but may also be another area of the body with visible skin surface, such as the hands or the arms.

(11) The image frames captured by the imaging may particularly correspond to a video sequence captured by means of an analog or digital photosensor, e.g. in a (digital) camera. Such a camera usually includes a photosensor, such as a CMOS or CCD sensor, which may also operate in a specific spectral range (visible, nIR) or provide information for different spectral ranges, particularly enabling the extraction of PPG signals. The camera may provide an analog or digital signal. The image frames include a plurality of image pixels having associated pixel values. Particularly, the image frames include pixels representing light intensity values captured with different photosensitive elements of a photosensor. These photosensitive elements may be sensitive in a specific spectral range (i.e. representing a specific color). The image frames include at least some image pixels being representative of a skin portion of the person. Thereby, an image pixel may correspond to one photosensitive element of a photo-detector and its (analog or digital) output or may be determined based on a combination (e.g. through binning) of a plurality of the photosensitive elements.

(12) When using a camera 18 the system 10 may further optionally comprise an illumination unit 22 (also called illumination source or light source or electromagnetic radiator), such as a lamp or LED, for illuminating/irradiating a region of interest 24, such as the skin of the patient's face (e.g. part of the cheek or forehead), with light, for instance in a predetermined wavelength range or ranges (e.g. in the red, green and/or infrared wavelength range(s)). The light reflected from said region of interest 24 in response to said illumination is detected by the camera 18. In another embodiment no dedicated light source is provided, but ambient light is used for illumination of the subject 14. From the reflected light only light in a desired wavelength ranges (e.g. green and red or infrared light, or light in a sufficiently large wavelength range covering at least two wavelength channels) may be detected and/or evaluated.

(13) The device 12 is further connected to an interface 20 for displaying the determined information and/or for providing medical personnel with an interface to change settings of the device 12, the camera 18, the illumination unit 22 and/or any other parameter of the system 10. Such an interface 20 may comprise different displays, buttons, touchscreens, keyboards or other human machine interface means.

(14) A system 10 as illustrated in FIG. 1 may, e.g., be located in a hospital, healthcare facility, elderly care facility or the like. Apart from the monitoring of patients, the present invention may also be applied in other fields such as neonate monitoring, general surveillance applications, security monitoring or so-called live style environments, such as fitness equipment, a wearable, a handheld device like a smartphone, or the like. The uni- or bidirectional communication between the device 12, the camera 18 and the interface 20 may work via a wireless or wired communication interface. Other embodiments of the present invention may include a device 12, which is not provided stand-alone, but integrated into the camera 18 or the interface 20.

(15) FIG. 2 shows a schematic diagram of a first embodiment of a device 12a according to the present invention, which may be used as device 12 in the system 10 shown in FIG. 1. For deriving one or more vital signs of the subject 14 a skin area of the subject has to be found in the image data. For this purpose, the proposed device 12a comprises an input interface 30 for obtaining image data 40 of a scene, said image data comprising a time sequence of image frames acquired by the imaging unit 18. An extraction unit 31 extracts a PPG signal 41 from a region of interest of said image data, wherein said region of interest may a single pixel or a group of pixel or an area resulting from a segmentation of one or more image frames. A transformation unit 32 transforms said PPG signal 41 into a spectral signal 42. A sorting unit 33 sorts said spectral signal 42 to obtain a sorted spectral signal 43 representing a descriptor. Finally, a classifier 34 classifies said region of interest as skin region of a living being or as non-skin region based on the descriptor and issues a corresponding classification result 44, which may be a binary decision (e.g. indication that the region of interest is a skin are or not) or a likelihood that the region of interest is a skin area or not.

(16) The units 30 to 34 may be configured as dedicated hardware elements, but may also be configured as processor or computer, which is programmed accordingly. The device 12a may be configured as integrated device including all its elements and units, e.g. in a common housing (e.g. in a common housing of the imaging unit 18) or as distributed device, as shown in FIG. 1, in which the elements and units may be distributed, i.e. implemented as separate elements and units arranged at different positions.

(17) FIGS. 3A and 3B show diagrams illustrating exemplary PPG signals 41 from skin and non-skin regions for different living beings. In particular, FIG. 3A shows a pulse signal 41a from a first subject (e.g. an adult) and a pulse signal 41c from a second subject (e.g. a neonate). As can be seen, the pulse signals from different subjects are different. Further, even the pulse signal from a single subject is time-varying (e.g. in phase). Although the pulse generally has a clear periodic component, there may be variations in amplitude, phase and even frequency (e.g. heart rate variability), and typically the signal will also suffer from sensor noise and may be distorted by subject motion. FIG. 3B shows a noise signal 41b from the first subject (e.g. an adult) and a noise signal 41d from the second subject (e.g. a neonate). As can be seen, noise signals are irregular/erratic signals that cannot be learned, while also pulse signals exhibit significant variability. Hence, one idea is to transform obtained PPG signals into a different representation that allows supervised learning.

(18) Considering pulse and noise as two classes, the transformed representation (e.g. the descriptor) should eliminate three properties in PPG signals. The descriptor should be invariant to phase changes of pulse/noise, i.e. pulse at different moments. Further, the descriptor should not depend on the amplitude of pulse/noise. Still further, the descriptor should be independent of varying frequencies in pulse/noise, i.e. different subjects.

(19) Given above requirements, the following exemplary approach is applied to the PPG signals 41, which is illustrated by use of FIG. 4 showing a diagram illustrating exemplary signals at the various steps of the method according to the present invention. In a first step, spectrum boosting is applied to the PPG signals 41 (for illustration, a pulse signal 41a and a noise signal 41a are shown in FIG. 4A) by the transformation unit 32. Based on the valid assumption that pulse is a periodic signal, the transformation unit 32 transforms the PPG signal 41 from time domain to frequency domain for analysis, for instance by use of a Fourier Transform (FT). The transformed pulse 42a shown in FIG. 4B presents a significant peak in the frequency spectrum, whereas the transformed noise 42b shown in FIG. 4B is an irregular signal that does not show such a pattern. Hereby, the Fourier Transform (or Fast Fourier Transform; FFT) can be replaced by a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), a Wavelet Transform, or a Haar Transform, etc.

(20) Using the FT can be written as:
{right arrow over (F)}.sup.L=custom character({right arrow over (P)}.sup.L),(1)
with the PPG signal with length L=2.sup.n,{n|.sup..sup.Z,n2} for {right arrow over (P)}.sup.L Fourier transform (e.g., L=64); custom character(:) denotes the FT operation. The real and imaginary parts of {right arrow over (P)}.sup.L contain varying phase information, which can be eliminated by just using the amplitude or power spectrum. Preferably, the power spectrum is used because it generally boosts the frequency peak of pulse and suppresses the noise. Since {right arrow over (F)}.sup.L is a mirrored spectrum with half redundancy, it is halved before deriving the power spectrum:
S.sup.L/2={right arrow over (F)}.sup.1.fwdarw.L/2conj({right arrow over (F)}.sup.1.fwdarw.L/2),(2)
where conj() denotes the conjugation; denotes the element-wise product. In S.sup.L/2, the phase information disappear, while the frequency peak of pulse is boosted as compared to that of noise, as shown in FIG. 4B.

(21) In a next step, spectrum normalization may be performed by the transformation unit 32. This normalization makes the spectrum substantially independent of pulse and noise amplitudes in skin/non-skin areas, respectively. The spectrum amplitudes are still variant in S.sup.L/2, which are normalized as:

(22) S _ L / 2 = S L / 2 || S L / 2 || p , ( 3 )
where .sub.P denotes the Lp-norm. It can either be the L1-norm or L2-norm. The normalization of the standard deviation is not preferred, since only the absolute energy information shall be illuminated, but the variation should remain within the spectrum for distinguishing pulse and noise. In an exemplary embodiment the L2-norm is used, because it can suppress noise with respect to the total energy. The normalized S.sup.L/2 is independent of spectrum amplitude, whereas the relative energy distribution of its entries is remained, as shown in FIG. 4C for a normalized spectral pulse signal 42a and a normalized spectral noise signal 42b.

(23) In a next step, spectrum sorting is performed by the sorting unit 33. Due to the frequency variance, S.sup.L/2 cannot be used for classification yet. However, although different individuals have different heart rates, their pulse frequencies are mostly peaked and concentrated in a certain (lower) band, i.e., [40, 240] beats per minute (bpm), whereas the background (non-skin) signals are usually white noise that spread into the high-frequency band. To this end, the S.sup.L/2 is divided again, e.g. halved into two halves (also called sub-signals), preferably into lower and upper parts to approximate the in-band and out-band frequencies, where the pulse-related property is implicitly exploited here.

(24) To eliminate the frequency dependency, the divided spectrums are sorted and then concatenated as:
.sup.L/2=[sort(S.sup.1.fwdarw.L/4), sort(S.sup.L/4.fwdarw.L/2)],(4)
where sort() denotes sorting the spectrum entries for example in a descending order of amplitude/energy. In .sup.L/2, the frequency variance in pulse and noise are eliminated, but their essential differences in the lower band and upper band are preserved, as shown in FIG. 4D showing a sorted spectral pulse signal 43a and a sorted spectral noise signal 43b. Hence, in this step a ranking (sorting) procedure is essentially performed acting on the frequency bins.

(25) An essential difference between the proposed approach and known approaches is that known approaches only use a single value (e.g., normalized spectrum peak) to separate pulse and noise, whereas the proposed approach exploits all the entries in a sorted spectrum for classification, which is in fact a shape descriptor. Essentially, according to the proposed approach the phenomenon is exploited that the energy of a pulse signal (in the ranked frequency spectrum) drops faster than the energy of the ranked noise spectrum. With a trained classifier using these ranked spectra as input an optimal decision can be obtained. In other words, in the above illustrated first embodiment the sorted spectral signals 43a, 43b are used by the classifier 34 to decide if the respective region of interest in the original image data is a skin region of a living being or is a non-skin region.

(26) FIG. 5 shows a schematic diagram of a second embodiment of a device 12b according to the present invention. In this embodiment a control unit 35 is provided for controlling said transformation unit 32 and said sorting unit 33 to perform two or more iterations, wherein the sorted spectral signal 43 output from said sorting unit 33 is used as input PPG signal 41 for the transformation unit 32 in the next iteration. Thus, according to this embodiment a multiscale iteration may be performed as will be explained in the following.

(27) With the first embodiment of the device 12a a transformed signal .sup.L/2 is obtained given the input PPG signal {right arrow over (P)}.sup.L, where pulse and noise have self-unified but mutually different interpretations. If the descriptor for pulse and noise is compared, the pulse-descriptor has a salient feature (e.g., peak at first location), whereas the noise does not. To obtain better classification performance, the descriptors from different classes require large between-class variance, i.e. pulse and noise are easily distinguishable. This can be improved by iterating the procedure (boost, normalize, sort). Now the relatively flat noise spectrum translates into a clear peak, while the peaked pulse spectrum translates into a relatively flat result. The two iterations combined provides an anti-phase pattern between two classes, which lead to easier separation.

(28) Similarly, the same transformation is further iterated on the transformed signals one or more times. The newly generated patterns in pulse and noise occur in an opposite order, i.e., peak-flat-peak versus flat-peak-flat, as shown in FIG. 6 for an obtained pulse signal descriptor 43a and a noise signal descriptor 43b. In the illustrated example the sorted spectral signals of five iterations are concatenated.

(29) In this way, a longer descriptor X is thus created to collect/concatenate the iteratively transformed signals in different scales:
X.sub.i+1=[X.sub.i, .sub.i.sup.L/(2i)], {i|i custom character, 1ilog.sub.2(L)},(5)
where .sub.i.sup.L/(22i) is the transformed signal in i-th iteration with length L/(2i). When the iteration is finished, the complete descriptor may further be normalized by L2-norm. In fact, the proposed descriptor is built on the hypothesis that multiscale iterations can improve the discriminativity of the descriptor. Such hypothesis has been experimentally verified.

(30) Thus, the iteration acting on the output of the previous iteration is preferably started after halving the length of the signal. In this case, the iteration makes a multi-scale representation of the spectrum available to the classifier. Particularly the first iteration leads to a relatively peaked signal representing the non-tissue, due to the elimination of the phase in the noise frequency components. For this reason, at least two sequential transforms may be performed: FFTdelete phasenormalizerankFFTdelete phasenormalizerank, where for efficiency the second transform may act on the half spectrum obtained from the first iteration.

(31) Furthermore, the discriminativity between pulse and noise representations may be further improved. In an embodiment the flat/peaked patterns in the transformed descriptor (43a, 43b) can be made even more flat/peaked. Equation (2) uses a single signal to derive the power spectrum. This may be improved by using two temporally adjacent signals (with one frame shifted). It mainly benefits the noise class: the conjugation of two noise signals induces negative entries in the real part of the power spectrum. This is due to the high-frequency components in noise signals, i.e., background (non-skin) signals are mostly white noise and thus exploited here. Subtracting the minimal negative value in the spectrum can make the noise descriptor more flat in the first iteration, as shown in FIG. 7B showing the thus obtained noise descriptor 43c compared to the original noise descriptor 43b shown in FIG. 7A (and FIG. 4D). Therefore, in such an embodiment the boosting step may be modified by conjugating two temporally adjacent signals (with one frame shifted) instead of a single signal.

(32) As illustrated above, in a preferred embodiment, it is proposed to disregard the phase information, but to take the absolute spectrum or the power spectrum.

(33) Preferably, the ranking is done twice: an in-band ranking and an out-band ranking. In-band can simply be the lower half of an oversampled signal, and out-band the upper-half. However, in a more sophisticated version, in-band may be defined as a window around the highest frequency peak, e.g. with half of the total bin-number, out-band then is formed by the remaining frequency bins.

(34) The classification may use a classifier obtained from supervised learning (e.g. AdaBoost, SVM, etc.), taking the samples of the transformed signal as input (e.g. ranked, normalized frequency bins without phase information) and outputting a signal (hard (binary) label, or regression values) identifying the likelihood of an image segment to be alive-human-tissue or not. Although the supervised learning may use actual data obtained from skin and non-skin surfaces, good performance has been obtained by training the classifier using a dataset of 1D time signals including sinusoids with varying amplitudes, levels of noise, and frequencies in the pulse-rate band to represent the segments containing alive human tissue and noise signals (zero-mean Gaussian, or uniform, etc.) representing segments that do not contain alive-human-tissue.

(35) Still further, the proposed method may be applied to classify image regions obtained from segmentation, where possibly motion tracking may be used to track individual segments over time in successive image.

(36) The present invention is preferably applied in the field of rPPG for the acquisition of vital signs of the person. Thus, images obtained by an imaging unit are not only used for detecting skin areas as explained above, but from detected (and preferably tracked, also by use of the present invention) skin areas PPG signals are derived, which are used for deriving vital signs of the person, such as heartbeat, SpO2, etc. The imaging unit 18 is at least sensitive at the wavelength(s) or wavelength ranges, in which the scene is illuminated (by ambient light and/or by illumination), but may be sensitive for other wavelengths as well, in particular if required for obtaining the desired vital signs.

(37) In another embodiment of the present invention, the proposed analysis for skin detection can be combined with another method for skin detection, e.g. the analysis of chrominance or temporal pulsatility of structured light reflected from the skin area as generally known. The method may comprise further steps and may be modified as explained above for the various embodiments of the device and as disclosed herein.

(38) The proposed device and method can be used for continuous unobtrusive monitoring of PPG related vital signs (e.g. heartbeat, SpO2, respiration), and can be used in NICU, Operation Room, or General Ward. The proposed device and method can be also used for personal health monitoring. Generally, the present invention can be used in all applications where skin needs to be detected in an image of a scene and needs particularly be distinguished from non-skin.

(39) While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive; the invention is not limited to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims.

(40) In the claims, the word comprising does not exclude other elements or steps, and the indefinite article a or an does not exclude a plurality. A single element or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

(41) A computer program may be stored/distributed on a suitable non-transitory medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems.

(42) Any reference signs in the claims should not be construed as limiting the scope.