ULTRASONIC-BASED PERSON DETECTION SYSTEM AND METHOD

20220373666 · 2022-11-24

    Inventors

    Cpc classification

    International classification

    Abstract

    An ultrasonic-based person detection method. The method comprising the steps of: (a) emitting, from an emitter, an ultrasonic signal, the ultrasonic signal including a component at a first frequency; (b) receiving reflections of the ultrasonic signal, the received signal including components at frequencies greater than and less than the first frequency; (c) determining a difference between an upper portion of the received signal containing a frequency higher than the first frequency, and a lower portion of the received signal containing a frequency lower than the first frequency; and (d) determining, based on the difference between the upper portion and the lower portion, whether a person is present.

    Claims

    1. An ultrasonic-based person detection method, comprising the steps of: (a) emitting, from an emitter, an ultrasonic signal, the ultrasonic signal including a component at a first frequency, f.sub.0; (b) receiving reflections of the ultrasonic signal, the received signal including components at frequencies greater than and less than the first frequency; (c) determining a difference between an upper portion of the received signal containing a frequency higher than the first frequency, and a lower portion of the received signal containing a frequency lower than the first frequency; and (d) determining, based on the difference between the upper portion and the lower portion, whether a person is present.

    2. The method of claim 1, wherein the determination is based on a difference between the upper frequency portion and the lower frequency portion.

    3. The method of any preceding claim, wherein the upper portion of the received signal contains higher frequencies immediately adjacent to the first frequency, and the lower portion of the received signal contains lower frequencies immediately adjacent to the first frequency.

    4. The method of any of claims 1 to 3, wherein the method includes dividing the received signal into a plurality of bins, each bin representing a range of frequencies in the received signal, and wherein the upper portion is an upper frequency bin, containing portions of the received signal which are higher in frequency than the first frequency, and the lower portion is a lower frequency bin, containing portions of the received signal which are lower in frequency than the first frequency.

    5. The method of claim 4, wherein the determination is performed based on a difference between a normalised power estimate of the upper frequency bin and a normalised power estimate of the lower frequency bin.

    6. The method of claim 5, wherein the normalisation factor is the sum of the power estimates of the upper frequency bin and the lower frequency bin.

    7. The method of any of claims 4-6, wherein determining the presence of a person includes determining a logit function of a normalised power of the upper frequency bin.

    8. The method of claim 7, wherein the logit function takes the form: L ( t ) = logit ( .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 .Math. "\[LeftBracketingBar]" X ( t , f 0 - 1 ) .Math. "\[RightBracketingBar]" 2 + .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 ) where X(t, f.sub.0+1) is a coefficient representing the upper frequency bin at time t, and X(t, f.sub.0−1) is a coefficient representing the lower frequency bin at time t.

    9. The method of any preceding claim, wherein steps (b)-(d) are repeated at a predetermined rate.

    10. The method of any preceding claim, further comprising a step, after it has been determined that a person is present, of determining whether the person is moving towards or away from the receiver.

    11. The method of claim 10, wherein determining whether the person is moving towards or away from the receiver is further based on a first likelihood ratio test, for determining whether the person is moving towards the receiver; and a second likelihood ratio test, for determining whether the person is moving away from the receiver.

    12. The method of claim 11, wherein a log-likelihood ratio is derived for each likelihood ratio, and is computed recursively from a previous value of the respective log-likelihood ratio.

    13. The method of any preceding claim, wherein when it has been determined that a person is present, the method includes taking a video conferencing device out of standby mode.

    14. A system for detecting a person, the system including: an emitter, configured to emit an ultrasonic signal including a component at a first frequency, f.sub.0; one or more receivers, configured to receive reflections of the ultrasonic signal; and one or more processors, configured, in response to the receiver receiving a received signal including components at frequencies greater than and less than the first frequency, to: (a) determine a difference between an upper portion of the received signal containing a frequency higher than the first frequency, and a lower portion of the received signal containing a frequency lower than the first frequency; and (b) determine, based on the difference between the upper portion and the lower portion, whether a person is present.

    15. The system of claim 14, wherein the determination is based on a difference between the upper portion of the received signal and the lower portion of the received signal.

    16. The system of either of claim 14 or 15, wherein the upper portion of the received signal contains higher frequencies immediately adjacent to the first frequency, and the lower portion of the received signal contains lower frequencies immediately adjacent to the first frequency.

    17. The system of any of claims 14-16, wherein the processor(s) are further configured to divide the received signal into a plurality of bins, each bin representing a range of frequencies in the received signal, and wherein the upper portion is an upper frequency bin, containing portions of the received signal which are higher in frequency than the first frequency, and the lower portion is a lower frequency bin, containing portions of the received signal which are lower in frequency than the first frequency.

    18. The system of claim 17, wherein the determination is performed based on a difference between a normalised power estimate of the upper frequency bin and a normalised power estimate of the lower frequency bin.

    19. The system of claim 18, wherein the normalisation factor is the sum of the power estimates of the upper frequency bin and the lower frequency bin.

    20. The system of any of claims 17 to 19, wherein determining the presence of a person includes determining a logit function of a normalised power of the upper frequency bin.

    21. The system of claim 20, wherein the logit function takes the form: L ( t ) = logit ( .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 .Math. "\[LeftBracketingBar]" X ( t , f 0 - 1 ) .Math. "\[RightBracketingBar]" 2 + .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 ) where X(t, f.sub.0+1) is a coefficient representing the upper frequency bin at time t, and X(t, f.sub.0−1) is a coefficient representing the lower frequency bin at time t.

    22. The system of any of claims 14 to 21, wherein the processor is configured to repeat steps (a)-(b) at a predetermined rate.

    23. The system of any of claims 14 to 22, wherein the processor is further configured to determine, after it has been determined that a person is present, whether the person is moving towards or away from the receiver.

    24. The system of claim 23, wherein determining whether the person is moving towards or away from the receiver is further based on a first likelihood ratio test, for determining whether the person is moving towards the receiver; and a second likelihood ratio test, for determining whether the person is moving away from the receiver.

    25. The system of any of claims 14 to 24, wherein when it has been determined that a user is present, the processor is configured to take a video conferencing device out of standby mode.

    Description

    BRIEF DESCRIPTION OF THE DRAWINGS

    [0046] Embodiments of the invention will now be described by way of example with reference to the accompanying drawings in which:

    [0047] FIG. 1 shows a system according to embodiments of the present invention;

    [0048] FIG. 2 shows a spectrogram (time-frequency plot) of a point in space near the microphone of the system in FIG. 1;

    [0049] FIG. 3 is a flow diagram of a method according to embodiments of the present invention;

    [0050] FIG. 4 shows plots of |X(t, f.sub.0−1)|.sup.2, |X(t, f.sub.0+1)|.sup.2, and

    [00003] L ( t ) = logit ( .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 .Math. "\[LeftBracketingBar]" X ( t , f 0 - 1 ) .Math. "\[RightBracketingBar]" 2 + .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 )

    when a wideband signal is received;

    [0051] FIG. 5 shows a plot of L(t) together with a corresponding histogram when no motion is occurring;

    [0052] FIG. 6 shows plots of |X(t, f.sub.0−1)|.sup.2, |X(t, f.sub.0+1)|.sup.2, and L(t) when motion is occurring;

    [0053] FIG. 7 shows a plot of L(t), and a plot of corresponding log-likelihood ratios, and detection threshold; and

    [0054] FIG. 8 is a flow diagram of a method according to a variant embodiment of the present invention.

    DETAILED DESCRIPTION AND FURTHER OPTIONAL FEATURES

    [0055] Aspects and embodiments of the present invention will now be discussed with reference to the accompanying figures. Further aspects and embodiments will be apparent to those skilled in the art.

    [0056] FIG. 1 shows a room including the system of the present invention. The system includes an ultrasonic emitter 101, which emits an ultrasonic signal 102 at a first frequency, f.sub.0 (also referred to as a tone). In this example, f.sub.0 is 22000 Hz, but it may take any ultrasonic frequency value (e.g. at least 20 kHz and no more than 24 kHz). The emitter continuously emits the tone. In this example, the emitter is a speaker also used in a video conferencing device. The system also includes a receiver 103, in this example a microphone also part of the video conferencing device. The receiver is configured to detect not only the ultrasonic signal at f.sub.0 but also reflections of the signal which have been Doppler shifted. The system also includes one or more processors (not shown), which are configured to use the signal received from the receiver 103 to determine if a person is present in the room.

    [0057] As the ultrasonic signal propagates through the room, it reflects from various objects and/or interfaces. For example, after reflecting from a wall, an un-shifted reflection 104, i.e. one still at f.sub.0, is returned to the receiver 103. This un-shifted reflection is ignored, as it provides little information on the presence of people (indicated by movement) within the room. Whereas, after reflecting from person 105, who is moving towards the receiver 103, upshifted reflection 106 is returned to the receiver. The upshifted reflection 106 has a frequency higher than f.sub.0. This upshifted reflection provides information relating to the presence of a person within the room, particularly that the person is moving towards the receiver 103. Similarly, after reflecting from person 107, who is moving away from the receiver 103, downshifted reflection 108 is returned to the receiver. The downshifted reflection 108 has a frequency lower than f.sub.0. This downshifted reflection also provides information relating to the presence of a person within the room, and particularly that the person is moving away from the receiver 103.

    [0058] However, as discussed previously, transient noises 110 such as those generated by a door 109 slamming or hands clapping (which may originate from outside of the room) have a relatively broad frequency range and may contain components which have the same or similar frequency to the upshifted or downshifted components. These transient noises, which do not originate from emitter 101, can be interpreted by the receiver (or the processors connect thereto) as indicating the presence of a person.

    [0059] FIG. 2 is a spectrogram (time-frequency plot) of a region of space near the receiver which illustrates this principle. A tone is emitted by the emitter at 22000 Hz, and so provides a narrow band of signal which extends over a long period of time. At t.sub.0, a person walks towards the receiver at a first speed, and so an upshifted signal 201 is received by the receiver. In this example, the person then increases their speed towards the receiver, resulting in a further upshifted signal 202 which is received by the receiver. The person then halts, and no upshifted signal is received. At t.sub.1, the person then walks away from the receiver, and so a downshifted signal 203 is received by the receiver.

    [0060] Next, at time t.sub.2, a transient signal 204 is received by the receiver. The signal is transient in that it has a limited presence in the ‘x’ axis. However, the transient signal includes components at the same frequency as the upshifted signal 201, further upshifted signal 202, and downshifted signal 203. There is a risk then, that a processor connected to the receiver may interpret transient signal 204 as being indicative of a person being present.

    [0061] FIG. 3 is a flow diagram of a method according to embodiments of the present invention. In a first step, 301, the ultrasonic tone is emitted at frequency f.sub.0. Next, in step 302, the signal received by the one or more receivers is transformed from a microphone frame, i.e. a short time frame of the microphone signal, into the time-frequency domain. This is performed for a short time window or time frame of the received signal (e.g. 20 ms) and the received signal is transformed into the frequency domain with a filter bank. This results in a plurality of coefficients describing a plurality of time-frequency bins denoted as X(t, f), where t is the time frame, and t=0, 1, 2, . . . , and f is the frequency bin index and f=0, 1, 2, . . . , K−1, and where K is the discrete Fourier transform (DFT) size. The filter bank is designed so that the filters are sharp (with little leakage from neighbouring frequency bins) and with sufficiently narrow bandwidth in order to detect slow walking speed, e.g. 65 Hz.

    [0062] Next, in step 303, a logit of the normalised Doppler shift power is computed. Let f.sub.0 denote the frequency bin index that contains the emitted tone's frequency (e.g. 22000 Hz). The logit of the normalised power of the Doppler shift is then defined as:

    [00004] L ( t ) = logit ( .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 .Math. "\[LeftBracketingBar]" X ( t , f 0 - 1 ) .Math. "\[RightBracketingBar]" 2 + .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 )

    [0063] Where | . . . | denotes the absolute value, and logit

    [00005] ( p ) = ln ( p 1 - p )

    is the logit function for p. The argument of the logit function, i.e.

    [00006] p = .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 .Math. "\[LeftBracketingBar]" X ( t , f 0 - 1 ) .Math. "\[RightBracketingBar]" 2 + .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2

    is the normalised power estimate of the frequency bin above f.sub.0 and the normalisation factor is the sum of the power estimates of the frequency bins above and below f.sub.0.

    [0064] This means that p is a number between zero and one, and can be likened to a probability. The logit function then transforms this probability such that it can take on values between ±∞.

    [0065] After this has been calculated for a given time-window, the method moves to steps 304 and 307 which are performed simultaneously. In step 304, a first log-likelihood ratio, log-likelihood ratio 0, is updated based on the computed logit, to indicate how likely it is that there is movement towards the receiver. At the same time, in step 307, a second log-likelihood ratio, log-likelihood ratio 1, is updated based on the computed logit, to indicate how likely it is that there is movement away from the receiver.

    [0066] In general, likelihood ratios do not have closed form expression, and so it can be computationally expensive to compute one. However, since the values of L(t) have been found to be approximately independent and normally distributed, simple expressions for the log-likelihood ratio can be derived.

    [0067] Log-likelihood ratios, of the type known per se in the art, have the general expression:

    [00007] L L R x = ln ( p x .Math. h 1 p x | h 0 )

    [0068] Where p.sub.x|h.sub.1 is the likelihood of there being motion towards or away from the receiver, and p.sub.x|h.sub.0 is the likelihood of there being no motion. See, for example, The CuSum Algorithm—a small review, Pierre Granjon, the contents of which is incorporated herein by reference.

    [0069] Further, the log-likelihood ratios can be computed recursively, using the previous value and the new value of L(t). The initialisation of the log-likelihood ratios may include initialising them to zero, meaning that the initial likelihood ratio is one. This means that, at initialisation, the likelihood for motion is the same as the likelihood for no motion. Letting LLR.sub.0(t) denote the log-likelihood ratio of motion towards the receiver and LLR.sub.1(t) denote the log-likelihood ratio of motion away from the receiver, the update equations of the log-likelihood ratios can be specified as:

    [00008] L L R 0 ( t ) = max ( L L R 0 ( t - 1 ) + ( δ var ) × ( L ( t ) - δ 2 ) , 0 ) LLR 1 ( t ) = max ( L L R 1 ( t - 1 ) - ( δ var ) × ( L ( t ) + δ 2 ) , 0 )

    [0070] In these expressions, δ is the expected change in magnitude, i.e. the expected deviation in the mean of L(t) from zero mean upon motion. This is a constant which is set during an initialisation stage. The variance of L(t) is denoted as var. This is either set to a fixed value during the initialisation stage, or estimated as the values of L(t) are computed.

    [0071] Once the log-likelihood ratios are calculated using some or all of the information from the computed logit, each log-likelihood ratio is compared to a threshold in steps 305 and 308. If one of the likelihood ratios exceeds its threshold, ‘Yes’ in steps 305 and/or 308, then motion towards or away from the receiver can be determined in steps 306 and 309 respectively.

    [0072] Once motion has been detected, or not (‘No’ in steps 305 and 308) the method returns to step 302 for a new time-window. In this way, the motion detection method can operate continuously. In the example discussed below, the value of δ was selected as 5, and var was estimated from the values of L(t). In one example, an estimate value for var is obtained using the maximum likelihood estimator for L(t) in a time window when it was known that no motion was present. The maximum likelihood estimate can be calculated as the average of L(t).sup.2, for t in the time window when it is known that there is not motion.

    [0073] The logit function discussed above is particularly well suited for motion detection, for three reasons: (1) transient noise immunity; (2) normally distributed values; and (3) indicative of the direction of motion.

    [0074] Taking point (1) first, FIG. 4 shows plots of the lower frequency bin |X(t, f.sub.0−1)|.sup.2, the upper frequency bin |X(t, f.sub.0+1)|.sup.2, and

    [00009] L ( t ) = logit ( .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 .Math. "\[LeftBracketingBar]" X ( t , f 0 - 1 ) .Math. "\[RightBracketingBar]" 2 + .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 ) ,

    when a transient, broadband signal is received.

    [0075] The upper graph in FIG. 4 is a plot of |X(t, f.sub.0−1)|.sup.2 against time, and so a plot of the power of the frequency bin immediately below f.sub.0. As can be seen by the two peaks, approximately at 3 seconds and 5 seconds, this frequency bin encapsulates components of the transient, broadband noise. The middle graph in FIG. 4 is a plot of |X(t, f.sub.0+1)|.sup.2 against time, and so a plot of the power of the frequency bin immediately above f.sub.0. Again, two peaks can be seen at approximately 3 and 5 seconds. It can be determined then that the transient, broadband signal adds approximately equally to both the upper and lower portions of the signal.

    [0076] Thus, as seen in the lower graph in FIG. 4 which is a plot of L(t), the approximately equal contributions in the upper and lower portions of the signal are cancelled out by the ratio in L(t) and so the logit function has noise immunity to transient, broadband, noises or signals.

    [0077] Turning next to point (2), the normally distributed values, FIG. 5 shows a plot of L(t) together with a corresponding histogram when no motion is occurring. If there is no motion, and hence no Doppler shift, the values of L(t) for t=0, 1, 2, . . . follow a distribution which is similar to the normal distribution. This was validated by experiments, the results of which are shown in the histogram which is the lower plot in FIG. 5. It is also expected that the values of L(t) are almost independent of each other. As was discussed above, because these values are independent of each other and normally distributed, simple expressions for the log-likelihood ratios can be derived.

    [0078] Next, and with relation to point (3) the detection of motion, FIG. 6 shows plots of |X(t, f.sub.0−1)|.sup.2, |X(t, f.sub.0+1)|.sup.2, and L(t) when motion is occurring. The values of L(t), t=1, 2, . . . contain information about the direction of motion as has been discussed before. Where there is no motion, L(t) is close to zero. When there is motion towards the receiver, L(t) is generally positive, for example a few decibels above zero. Conversely, when there is motion away from the receiver, L(t) is generally negative, for example a few decibels below zero. The upper plot in FIG. 6 is a plot of the frequency bin below the frequency containing f.sub.0, and shows between 6 and 10 seconds that a person is walking away from the receiver through an increase in amplitude. The middle plot in FIG. 6 is a plot of the frequency bin above the frequency bin containing f.sub.0, and shows between 4 and 6 seconds that a person is walking towards the receiver through an increase in amplitude. The lower plot is a plot of the logit function L(t), and shows that it takes positive values between 4 and 6 seconds, and negative values between 6 and 10 seconds, which demonstrates that L(t) can be used to determine the direction of motion relative to the receiver.

    [0079] FIG. 7 shows a plot of L(t), and a plot of corresponding log-likelihood ratios, and detection threshold. The upper plot in FIG. 7 is of L(t) and corresponds to the lower plot in FIG. 6. The lower plot in FIG. 7 is a plot of the log-likelihood ratios, and detection threshold for detecting that there is motion towards or away from the receiver. Line 701 shows the value of LLR.sub.0(t) discussed above, and line 702 shows the value of LLR.sub.1(t) discussed above. Dashed lined 703 is the threshold, taken in this example to be 100.

    [0080] As can be seen, line 701 rises above threshold 703 between 4 and 5 seconds, and gives an indication that there is motion towards the receiver. At around 7 seconds, line 702 rises above the threshold 703 whilst line 701 falls below it, and gives an indication that there is motion away from the receiver.

    [0081] FIG. 8 is a flow diagram of a variant method according to embodiments of the present invention. Where it shares features with the flow diagram shown in FIG. 3, like features are indicated by like reference numerals. In contrast to the method shown in FIG. 3, the method of FIG. 8 utilises two logit functions. A first logit function, L.sub.1(t), which is tuned to better detect motion towards the video system, and a second logit function, L.sub.2(t), which is tuned to better detect motion away from the video system.

    [0082] The logit function discussed with respect to FIG. 3 can be improved based on the following observations. During motion towards a video conferencing device in a room reflections from the moving object will cause a higher received frequency. However, a reflection hitting a back wall, then the moving object, and then the back wall again before being received by the receiver in the video conferencing device will have a lower received frequency. It is noted then that the received frequencies constitute a range of Doppler shifts. With motion towards the video conferencing device, most of these Doppler shifts will be of a higher frequency, but some will be of a lower frequency.

    [0083] Accordingly, L.sub.1(t) can be formulated as:

    [00010] L 1 ( t ) = logit ( .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 .Math. "\[LeftBracketingBar]" X ( t , f 0 - 3 ) .Math. "\[RightBracketingBar]" 2 + .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 )

    [0084] i.e. replacing X(t, f.sub.0−1) in L(t) with X(t, f.sub.0−3). This results in a more robust signal for detection of motion towards the video conferencing device, since with normal walking speeds few Doppler shifts are received as low as f.sub.0−3. Further, noise immunity is still good, as broadband noises such as a door slamming or hands clapping have a very similar amount of energy in both frequency bins f.sub.0−3 and f.sub.0+1. However, logit function L.sub.1(t) does not perform as well when motion is directed away from the video conferencing device. Therefore the second logit function, L.sub.2(t), is employed which is formulated as:

    [00011] L 2 ( t ) = logit ( .Math. "\[LeftBracketingBar]" X ( t , f 0 - 1 ) .Math. "\[RightBracketingBar]" 2 .Math. "\[LeftBracketingBar]" X ( t , f 0 + 3 ) .Math. "\[RightBracketingBar]" 2 + .Math. "\[LeftBracketingBar]" X ( t , f 0 + 1 ) .Math. "\[RightBracketingBar]" 2 )

    [0085] This is shown in Steps 303a-309a and 303b-309b for both logit functions, which are executed in parallel.

    [0086] While the invention has been described in conjunction with the exemplary embodiments described above, many equivalent modifications and variations will be apparent to those skilled in the art when given this disclosure. Accordingly, the exemplary embodiments of the invention set forth above are considered to be illustrative and not limiting. Various changes to the described embodiments may be made without departing from the spirit and scope of the invention.