Monitoring the performance of physical exercises

Abstract

A method for monitoring a person performing a physical exercise based on a sequence of image frames showing an exercise activity of the person. The method includes extracting, based on the sequence of image frames, for each image frame a set of body key points using a neural network, the set of body key points being indicative of a posture of the person in the image frame; deriving, based on a subset of the body key points in each image frame, at least one characteristic parameter indicating a progression of a movement of the person; detecting a start loop condition by evaluating the time progression of the at least one characteristic parameter, said start loop condition indicating a transition from a start posture of the person to the movement of the person when performing the physical exercise, wherein a loop of exercising encompasses one single repetition of the physical exercise; detecting an end loop condition by evaluating the time progression of at least one of the characteristic parameters, said end loop condition indicating a transition from the movement of the person when performing the physical exercise to an intermediate posture, wherein, as a result, the start of the loop and the end of the loop are determined; and deriving the time period for a single loop of the physical exercise based on the start of the loop and the end of the loop and evaluating the time period.

Claims

1. A method for monitoring a person performing a physical exercise based on a sequence of image frames showing an exercise activity of the person, the method comprising: extracting, based on the sequence of image frames, for each image frame a set of body key points using a neural network, the set of body key points being indicative of a posture of the person in the image frame; deriving, based on a subset of the body key points in each image frame, characteristic parameters indicating a progression of a movement of the person; detecting a start loop condition by evaluating a time progression of at least one of the characteristic parameters, said start loop condition indicating a transition from a start posture of the person to the movement of the person when performing the physical exercise, wherein a loop of exercising encompasses one single repetition of the physical exercise; detecting an end loop condition by evaluating the time progression of at least one of the characteristic parameters, said end loop condition indicating a transition from the movement of the person when performing the physical exercise to an intermediate posture, wherein, as a result, the start of the loop and the end of the loop are determined; and deriving a time period fora single loop of the physical exercise based on the start of the loop and the end of the loop and evaluating the time period.

2. The method according to claim 1, wherein at least one of the characteristic parameters for a respective image frame is derived from coordinate values of the body key points of the respective image frame.

3. The method according to claim 1, wherein for each of the image frames, at least one of the characteristic parameters is the Procrustes distance between the subset of body key points in a respective frame and the same subset of body key points in a reference frame.

4. The method according to claim 1, further comprising detecting the start posture of the person by comparing the person's posture in at least one image frame of the sequence of image frames with at least one predefined criterion.

5. The method according to claim 1, wherein an image frame in which the start loop condition is detected defines the start of the person's exercising activity.

6. The method according to claim 1, wherein the start loop condition is detected at an image frame in which at least one of the characteristic parameters leaves a predetermined value range and changes with at least a minimum rate of change.

7. The method according to claim 1, wherein detecting the start loop condition comprises detecting when at least one of the characteristic parameters leaves a predetermined value range corresponding to the person's start posture.

8. The method according to claim 1, further comprising detecting at least one evaluation point in the person's movement by evaluating the time progression of at least one characteristic parameter indicating the person's movement.

9. The method according to claim 8, further comprising evaluating the person's posture at the at least one evaluation point.

10. The method according to claim 1, wherein evaluating the person's posture comprises comparing the person's posture with a set of predefined conditions.

11. The method according to claim 1, wherein, based on the result of comparison between the person's posture and a set of predetermined feedback trigger conditions, feedback is provided to the person.

12. A mobile device comprising: a camera configured to capture a sequence of image frames showing an exercise activity of a person using the mobile device; and a controller configured to: extract a set of body key points using a neural network for each image frame among the sequence of image frames, the set of body key points being indicative of a posture of the person in each image frame, derive, based on a subset of the body key points in each image frame, characteristic parameters indicating a progression of a movement of the person, detect a start loop condition by evaluating a time progression of at least one of the characteristic parameters, said start loop condition indicating a transition from a start posture of the person to the movement of the person when performing the physical exercise, wherein a loop of exercising encompasses one single repetition of the physical exercise; detect an end loop condition by evaluating the time progression of at least one of the characteristic parameters, said end loop condition indicating a transition from the movement of the person when performing the physical exercise to an intermediate posture, wherein, as a result, the start of the loop and the end of the loop are determined, and derive a time period for a single loop of the physical exercise based on the start of the loop and the end of the loop and evaluate the time period.

13. A non-transitory computer storage readable medium comprising computer executable program code configured to perform the method according to claim 1.

14. The method according to claim 1, wherein at least one of the characteristic parameters is derived from a subset of body key points in the sequence of image frames using machine learning.

15. The method according to claim 2, wherein at least one of the characteristic parameters is derived from a subset of body key points in the sequence of image frames using machine learning.

16. The method according to claim 3, wherein at least one of the characteristic parameters is derived from a subset of body key points in the sequence of image frames using machine learning.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The invention is illustrated in greater detail with the aid of schematic drawings. It shows schematically:

(2) FIG. 1 shows a mobile device with a camera configured for acquiring a sequence of image frames of a person performing physical exercises.

(3) FIG. 2 shows a set of body key points indicating the person's posture.

(4) FIG. 3a shows a subset of the body key points related to the hip, the knee and the ankle when the person stands straight.

(5) FIG. 3b shows the subset of body key points of FIG. 3a when the person does a squat.

(6) FIG. 4a depicts the Procrustes distance relative to a person's start posture as a function of time during exercising.

(7) FIG. 4b depicts the Procrustes distance relative to a person's posture at an evaluation point as a function of time during exercising.

(8) FIG. 5 gives a side view of the person's posture when the person has reached the lowest position.

DETAILED DESCRIPTION OF THE EMBODIMENTS

(9) In the following description of preferred embodiments of the present invention, identical reference numerals denote identical or comparable components.

(10) FIG. 1 shows a person 1 performing a physical exercise, for example a squat. To make sure that exercising is performed correctly, a mobile device 2 with a camera 3 is placed at a distance from the person. Preferably, the mobile device 2 is arranged such that the entire body of the person 1 is within the camera's field of view. In this position, the mobile device 2 can acquire a sequence of image frames, preferably a video sequence, of the person's movement. The mobile device 2 may for example be a smartphone, a tablet, a laptop, etc. An evaluation unit 4 configured for analyzing the person's movements and for providing feedback to the person may for example be installed on the mobile device 2. Thus, the processing of the sequence of image frames is performed on the mobile device 2. Alternatively, instead of using a mobile device, a stationary computer with a camera may be used for image data acquisition and data processing.

(11) Further alternatively, a mobile device may comprise a camera for acquiring a sequence of image frames and an interface for transmitting the image data to a remote computer or to a cloud server. Preferably, the interface is an interface for wireless data transmission. Processing of the sequence of image frames and the extraction of body key points may be performed on the remote computer or on the cloud server, and at least some of the results of these computations and/or feedback for the user may be transmitted from the remote computer or cloud server back to the mobile device.

(12) According to yet another alternative example, a camera may be connected to a transmitter configured for transmitting a sequence of image data to a remote computer or to a cloud server. In this case, processing of the sequence of image frames is performed on the remote computer or on the cloud server. Optionally, feedback for the user may be transmitted from the remote computer or the cloud server back to the transmitter and the transmitter may be configured for providing feedback to the person performing a physical exercise.

(13) The evaluation unit 4 is configured for extracting, for each of the acquired image frames, respective positions of a predefined set of body key points. The body key points may for example be assigned to the joints of the body and to body features like for example the forehead, the chin, the breastbone, the hip, etc. The extraction of the body key points is performed using a neural network, preferably a convolutional neural network (CNN). The image data of an image frame is fed to the input layer of the convolutional neural network, which processes the image data in several consecutive processing layers. The convolutional neural network has been trained to recognise the respective position of body key points in the image data. For each predefined body key point, an associated two-dimensional output matrix is generated, with the respective position of the body key point being indicated in the two-dimensional output matrix. Preferably, the two-dimensional output matrix indicates respective probabilities for each point that the body key point is located at that point. The point having the highest probability is taken as the body key point's position. For each of the predefined body key points, the convolutional neural network provides a separate output matrix indicating the position of one specific body key point. In addition to the position of the body key point, the probability associated with the body key point's position may be considered during further computation. For example, if a particular joint or limb is not visible, the associated probability will be comparatively low. In this regard, the probability indicates a level of confidence of the obtained results.

(14) In a preferred embodiment, a Sequence Model such as a Recurrent Neural Network or a Long-Short-Term-Memory might take in a sequence of image frames, wherein for each new image frame, the body key points for the new image frame are output based on latent temporal information of at least one of the past image frames. More information related to latent temporal information in neural networks can be found in the article by M R Hossain and J J Little, “Exploiting temporal information for 3D pose estimation”, 2017, CoRR, https://arxiv.org/abs/1711.08585. As far as latent temporal information in neural networks is concerned, this article is herewith incorporated by reference.

(15) The neural network may be configured for extracting body key points in 2D from the sequence of image frames. Alternatively, 3D body key points may be derived for a 2D (or 2.5D) image frame or for a sequence of 2D (or 2.5D) image frames, wherein the 2D or 2.5D image frames are acquired using a 2D or 2.5D camera. Using the techniques of machine learning, it is possible to derive additional depth information even for a 2D image frame. Mainly because of body constraints, it is possible to estimate the additional depth information for each body key point. For determining the additional depth information, the neural network may for example comprise an additional depth regression module. Further alternatively, the neural network may be configured for extracting body key points in 3D from the sequence of 3D image frames.

(16) For implementing a convolutional neural network (CNN) capable of extracting body key points from the sequence of image frames, a stacked hourglass architecture as described in the article by A Newell, K Yang and J Deng “Stacked hourglass networks for human pose estimation”, European Conference on Computer Vision, October 2016, pp 483-499, Springer International Publishing, https://arxiv.org/abs/1603.06937 is used. The input layer is a 256×256×3 layer comprising 256×256 pixels and 3 colour channels per pixel, for example RGB colour channels. In the present implementation, the convolutional neural network comprises four hourglass modules. As an output of the convolutional neural network, 16 matrices corresponding to the 16 body key points are obtained, with each matrix comprising 64×64 pixels. Each point of the matrix indicates a probability that the respective body key point is located at that point. Regarding the implementation and structure of the hourglass modules, the above referenced article “Stacked hourglass networks for human pose estimation” is herewith incorporated by reference. The stacked hourglass architecture has been adapted to the limitations imposed by the limited processing resources on a smartphone. In this respect, reference is made to the article by A G Howard, M Zhu, B Chen, D Kalenichenko, W Wang, T Weyand, M Andreetto and H Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications”, 2017, arXiv preprint arXiv:1704.04861, https://arxiv.org/abs/1704.04861. With regard to implementation of a convolutional neural network with a stacked hourglass architecture on a smartphone, this article is herewith incorporated by reference. Based on the 2D hourglass approach, for example a depth regression module may be added after a stack of hourglass modules to output a vector of size 16 (for 16 key points) which encodes the depth information in addition to the 64×64×16 shaped matrices that have been described so far. Details of the depth regression module can be found in the article by X Zhou, Q Huang, X Sun, X Xue and Y Wei, “Towards 3d human pose estimation in the wild: a weakly-supervised approach”, October 2017, IEEE International Conference on Computer Vision, https://arxiv.org/abs/1704.02447. A further approach to determining depth information is presented in the article by J Martinez, R Hossain, J Romero and J J Little, “A simple yet effective baseline for 3d human pose estimation”, May 2017, IEEE International Conference on Computer Vision, Vol. 206, p. 3, https://arxiv.org/abs/1705.03098. As far as the addition of depth information is concerned, these two articles are herewith incorporated by reference.

(17) FIG. 2 shows an example of a posture of a person's body, wherein the posture is described by the respective positions of sixteen different body key points 5-1 to 5-16. The body key points form a skeleton of the human body and allow for detecting the orientation of different parts of the body relative to each other. The body key points 5-1 to 5-16 may be 2D body key points, but they may as well be 2.5D or 3D body key points. By tracking the positions of the body key points in the acquired sequence of image frames, the person's movements when performing a physical exercise can be evaluated.

(18) Initially, before exercising is started, the person is asked by the mobile device 2 to assume a predefined setup position. In the setup position, the person stands straight, with the distance between the feet being shoulderwide. The evaluation unit 4 extracts the respective positions of the body key points 5-1 to 5-16. In addition, the evaluation unit 4 detects if the person's posture corresponds to the predefined setup position. For this purpose, the evaluation unit 4 analyzes at least one of ratios, proportions, positions, distances and angles of the obtained posture, in order to detect whether the person is in the setup position. For example, the upright stand of the person in the setup position may be identified by relating the distance between the two shoulders to the distance between thorax and pelvis. Based on the evaluation of predefined geometric properties, the evaluation unit 4 recognises that a front view of a person in the setup position is captured. The respective positions of the body key points in the person's setup position may then for example be used for calibrating the dimensions and proportions of the person's body.

(19) Optionally, the person may be asked by the mobile device 2 to turn sideways by 90°, such that a side view of the person can be acquired as a second setup position. In the second setup position, acquiring a side view of the person may yield additional information on the properties and proportions of the person's body.

(20) After the person's posture in the respective setup positions has been detected and acquired, the person is asked by the evaluation unit 4 to start performing a specific physical exercise like for example a squat. The person may either perform a single pass of the physical exercise or a number of repetitions of the physical exercise. In the following, a single pass of the physical exercise will be referred to as a “loop” of physical exercising. In the present case, the person is in the second setup position oriented sideways to the camera when exercising starts. Accordingly, the second setup position will be the start posture for exercising. The start posture is the posture from which the respective physical exercise is started. When performing the physical exercise, the person starts at a start posture, performs the physical exercise and comes to an intermediate posture. Then, further repetitions may be performed.

(21) Determining a Characteristic Parameter for Tracking Exercising Activities

(22) In order to track and evaluate the person's exercising activity, at least one characteristic parameter indicating a time progression of the person's movement is derived from the respective positions of a subset of the body key points in the image frames of the sequence of image frames. By analyzing the time progression of a respective characteristic parameter in the course of exercising, it is possible to detect a start of the loop, wherein the start loop condition indicates the transition from the person's start posture to the person's movement when performing the exercise. Furthermore, the time progression of the characteristic parameter allows detecting an end loop condition, with the end loop condition denoting a transition between the person's movement during exercising and an intermediate posture after the first repetition of the exercise has been performed. In the following, different ways of determining a characteristic parameter for tracking the motion will be explained.

(23) A first option is to use a coordinate value of a specific body key point as a characteristic parameter for tracking the person's movements. For example, the vertical coordinate value of the person's hip may be taken as a characteristic parameter for tracking the execution of squats. Alternatively, a characteristic parameter may be derived from coordinate values of a plurality of different body key points. For example, the coordinate values of a subset of the body key points may be taken as a basis for calculating the characteristic parameter. For example, the characteristic parameter may be derived by determining an average value of several different body key points. In addition to the coordinate values, probabilities for each body key point obtained as an output of the neural network may be taken into account, for example as a sanity criterion.

(24) A second option for determining a characteristic parameter indicative of the person's movement is based on the evaluation of the Procrustes distance. The Procrustes distance of a subset of body key points relative to the same subset of body key points in a reference frame, for example in an image frame showing the person's start posture, is used as a characteristic parameter indicating the course of the person's movement when performing the physical exercise. In a first step, a subset of body key points is selected in dependence on the respective physical exercise to be monitored. For example, when performing a squat, the seven body key points related to the lower back, the left and right hip, the left and right knee and the left and right ankle may be used as a suitable subset of body key points. FIG. 3a shows the respective positions of these body key points 6-1 to 6-7 for the person in its start posture, whereas FIG. 3b shows the same subset of body key points 6-1 to 6-7 when the person has lowered the body and bent the knees. As a characteristic parameter, the Procrustes distance between these two configurations of body key points is determined. In a Procrustes analysis, the two subsets of body key points are brought into alignment as far as possible by performing steps of scaling, translating, rotating and reflecting, in order to eliminate any difference in size and orientation of the two configurations of body key points. After these simple transformations have been performed, there still exists a difference in shape between the two configurations of points. The Procrustes distance is obtained as the L2 norm of the remaining differences between the two subsets after the transformation. In particular, the Procrustes distance is obtained by summing up the squared differences between the coordinate values of the points in the first subset and the second subset. The Procrustes distance provides a measure of the difference in shape between the two configurations of points. For this reason, the Procrustes distance is well-suited for tracking the motion of the person when performing the exercise.

(25) If X denotes the positions of the subset of body key points in the person's start posture, which is used as a reference, and Y denotes the positions of the body key points at an arbitrary evaluation point in the course of exercising, X and Y can be brought into an alignment by scaling, rotating, translating and reflecting the two subsets X and Y relative to one another. For performing these transformations, the following expression is minimised:
∥Y−(1c.sup.T+ρXA)∥

(26) where X and Y are the input matrices, 1 is the unit matrix, c is a row vector representing the translation, ρ is the scalar “dilation factor”, A is the rotation and reflection matrix (orthogonal, oblique or unrestricted) and ∥.Math.∥ denotes the L2 norm. By minimising the above expression, the row vector c for the translation, the scalar dilation factor ρ and the rotation and reflection matrix A are obtained. Furthermore, by performing the minimising process, the Procrustes distance between the subsets X and Y is obtained, because the minimised expression

(27) ∥Y−(1c.sup.T+ρXA)∥ is the Procrustes distance between the two subsets X and Y. The Procrustes distance can be determined in 2D between two configurations of 2D body key points, but it can also be determined in 2.5D or 3D between two configurations of 2.5D or 3D body key points. Accordingly, a characteristic parameter based on the Procrustes distance may be used based on 2D body key points for indicating a progression of the person's movement, but it can also be used based on 2.5D or 3D body key points for indicating a progression of the person's movement.

(28) A third option is to apply a filtering operation to a subset of the body key points in the sequence of image frames and to obtain, as a result of the filtering operation, at least one of the characteristic parameters. For example, a Kalman filter may be applied to a subset of body key points for determining at least one of the characteristic parameters. Kalman filtering is an algorithm that uses a series of measurements observed over time, containing statistical noise and other inaccuracies, and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone, by estimating a joint probability distribution over the variables for each timeframe. The Kalman filter is an efficient recursive filter that estimates the internal state of a linear dynamic system for a series of noisy measurements. The algorithm works in a two-step process. In the prediction step, the Kalman filter produces estimates of the current state variables, along with their uncertainties. Once the outcome of the next measurement is observed, which is necessarily corrupted with some amount of error, including random noise, these estimates are updated using a weighted average, with more weight being given to estimates with higher certainty. By subjecting a subset of body key points in the sequence of image frames to Kalman filtering, estimates of at least one of velocity and acceleration of at least one of the body key points are determined as characteristic parameters indicating a progression of the person's movement. For example, linear velocity or angular velocity of a respective body key point or an angle enclosed by specific body key points may be obtained as a result of Kalman filtering.

(29) As a fourth option, at least one of the characteristic parameters is derived from a subset of body key points in the sequence of image frames using machine learning. For example, at least one characteristic parameter may be derived that it is suitable for detecting a transition from the person's start posture to exercising. As a further example, at least one characteristic parameter may be derived that it is suitable for detecting at least one evaluation point in the course of the person's movement. For example, at least one of the characteristic parameters may be learned using a statistic model. The model could for example map a subset of body key points to the interval [0, 1] as a probability to indicate the start of the loop or as a probability to indicate a respective evaluation point.

(30) Detecting a Start Loop Condition

(31) For evaluating the person's movement when performing the physical exercise, it is essential to detect the transition from the start posture to exercising activity. The detection of this transition is performed based on the time progression of at least one of the characteristic parameters. For detecting this transition, a start loop condition is used, wherein this start loop condition is configured such that the transition from the start posture to physical exercising can be detected. The start loop condition is applied to the time progression of at least one of the characteristic parameters. If it is detected that the start loop condition is fulfilled for a certain image frame, this means that the person's exercising activity starts at this image frame.

(32) In FIG. 4a, a characteristic parameter indicating the person's motion is shown as a function of time. The time axis also corresponds to the sequence of image frames, because in general, the image frames of for example a video sequence are acquired at regular time intervals. The characteristic parameter shown in FIG. 4a is the first Procrustes distance, the first Procrustes distance being the Procrustes distance of a subset of body key points relative to the same subset of body key points in an image frame showing the start posture. Hence, for determining the first Procrustes distance, the image frame showing the person's start posture is taken as a reference frame.

(33) In FIG. 4a, curve 7 shows the first Procrustes distance as a function of time during a single pass of exercising. For each image frame, the first Procrustes distance, which is the Procrustes distance of a subset of body key points relative to the same subset of body key points in an image frame showing the start posture, is calculated.

(34) During a time interval 8, the person is in a start posture. Therefore, during the time interval 8, the first Procrustes distance remains approximately constant. Then, at the point of time 9, the person starts performing a physical exercise like for example a squat. When the person starts bending the knees and lowering the body, the first Procrustes distance increases, because the first Procrustes distance indicates the change of the person's posture relative to the start posture. During the time interval 10, the person performs the physical exercise. At the time point 11, the physical exercise is finished and the person is in an intermediate posture, for example in a rest position. Therefore, during the time interval 12, the first Procrustes distance remains approximately constant.

(35) The point of time 9, which indicates the start of the physical exercise, is detected by means of a start loop condition. The start loop condition is configured for detecting the transition from the person's start posture to exercising activity based on the time progression of the characteristic parameter. When the person is in the start posture, it is detected for each image frame whether or not the start loop condition is fulfilled. Detecting the start loop criterion may for example comprise detecting the characteristic change in slope of the curve 7 at the point of time 9. In particular, a change of the slope from a nearly horizontal slope to a slope 13 that exceeds a predefined threshold may be detected. Furthermore, the characteristic parameter remains nearly constant during the time interval 8, with the characteristic parameter being confined to a value range 14. Hence, detecting the start loop condition may comprise determining when the characteristic parameter leaves the predefined value range 14. In this regard, when the characteristic parameter leaves the value range 14, this indicates the start of the person's exercising activity.

(36) In a preferred example, the evaluation unit 4 detects in a first step if the characteristic parameter is within the predefined value range 14. As soon as the characteristic parameter leaves the predefined value range 14, the evaluation unit 4 determines if the rate of change exceeds a predefined threshold. If this is the case, it is detected that the start loop condition is fulfilled for the image frame at the point of time 9. Hence, the start of loop is detected. The start loop condition may alternatively be defined by specifying a transition template describing a transition from the person's start posture to exercising activity. For example, the template may model the typical time behaviour of the characteristic parameter at the transition from the start posture to exercising activity at the start of loop. When a match between the time progression of the characteristic parameter and the time behaviour described by the predefined template is detected, the start of loop is detected.

(37) The definition of the start loop condition is essential for monitoring the person's physical activity, because it allows detecting a time frame related to a start of the loop of exercising, said loop corresponding to one single pass of the physical exercise.

(38) Preferably, the start loop condition is adjusted and optimised in dependence on real video sequences of persons performing the exercises. For example, for a large number of video sequences, the optimum transition point may be specified manually, and this large amount of reference data may be used for optimising the start loop condition. For example, machine learning using a neural network may be used for adjusting the start loop condition. In this way, the start loop condition can be adapted to real data showing persons performing the exercise.

(39) Evaluating the Person's Posture at at Least One Evaluation Point

(40) When the person performs the physical exercise, the person's posture is evaluated at one or more predefined evaluation points in the course of exercising. These evaluation points are detected by evaluating a time progression of at least one characteristic parameter indicating the person's movement. At the one or more evaluation points, the person's posture is evaluated. One or more of the at least one characteristic parameters used for detecting the at least one evaluation point may be identical to the at least one characteristic parameter used for detecting the start loop condition. In particular, the at least one evaluation point may for example be detected in dependence on the same characteristic parameters that are used for detecting the start loop condition.

(41) Returning to the above example of a person doing squats, a relevant evaluation point is the point where the person's body reaches the lowest position and the person's knees are bent. In this position, the person's hands are approximately on the same level as the knees. In the diagram shown in FIG. 4a, this evaluation point 15 corresponds to the maximum value of the first Procrustes distance. Accordingly, for detecting the evaluation point 15, the point where the first Procrustes distance reaches its maximum is determined. For example, it is determined where the first derivative with respect to time of the curve 7 is equal to zero.

(42) At the evaluation point 15, the person's posture is evaluated. Evaluating the person's posture comprises evaluating respective positions of a subset of the body key points in a respective image frame. Depending on the result of this evaluation, suitable feedback is provided to the person performing the exercise. For example, typical errors and shortcomings when performing the exercise may be detected. In dependence on the respective deficiencies, a prerecorded audio message with comments on the person's posture may be reproduced.

(43) The progression of the person's movement may additionally be monitored by tracking a second characteristic parameter, wherein analysis of the second characteristic parameter complements analysis of the first characteristic parameter. In FIG. 4b, the time progression of the second characteristic parameter is shown as a function of time. The second characteristic parameter is a second Procrustes distance, the second Procrustes distance being the Procrustes distance of the subset of body key points in an image frame relative to the subset of body key points in a reference frame at the evaluation point 15 where the person's body has reached the lowest position and the person's knees are bent. While the first Procrustes distance shown in FIG. 4a is determined relative to the subset of body key points in the start posture, the second Procrustes distance shown in FIG. 4b is determined relative to a subset of body key points at the evaluation point 15.

(44) In FIG. 4b, the curve 16 indicates the time progression of the second Procrustes distance as a function of time. During the time interval 8, the second Procrustes distance is comparatively large, because in the start posture, the person's posture differs considerably from the posture at the evaluation point 15. At the point of time 9, the person starts performing a squat. When the person bends the knees, the second Procrustes distance becomes smaller and smaller and at the evaluation point 15, the second Procrustes distance reaches its minimum, because the person's posture coincides with the posture in the reference frame at the evaluation point 15. When the second Procrustes distance reaches its minimum and enters a predefined value range 17, it is detected that the evaluation point 15 is reached. The person's posture at the evaluation point 15 is analyzed. When the person returns to its upright position, the second Procrustes distance increases. At the point of time 11, the exercise is finished and the person has returned to the upright position. Accordingly, during the time interval 12, the second Procrustes distance is as large as at the beginning.

(45) FIG. 5 shows the person's posture at the evaluation point 15. For evaluating the posture as indicated by the subset of body key points 18-1 to 18-8, a set of predefined conditions is defined. If the posture represented by the body key point 18-1 to 18-8 meets each of these conditions, the posture is in conformity with the requirements. In case one or more of the predefined conditions are not met, a prerecorded audio message that corresponds to this error is selected and displayed to the person. Accordingly, the set of predefined conditions can be referred to as a set of feedback trigger conditions. In case the posture is indicated by a set of 2D body key points, the feedback trigger conditions may define conditions for evaluating the posture in two dimensions. In case the posture is indicated by a set of 3D body key points, the feedback trigger conditions may be set up in a three dimensions. Alternatively, a posture defined by the set of 3D body key points may be projected to one or more two-dimensional projection planes. Thus, a dimensional reduction is accomplished. Then, the person's posture can be evaluated in the two-dimensional projection plane by applying a set of feedback trigger conditions in two dimensions. The advantage is that the orientation of the two-dimensional projection plane can be chosen in dependence on the orientation of the person's posture. Further preferably, the posture defined by the set of 3D body key points may be projected on a plurality of two-dimensional projection planes. For one specific frame, the pose may for example be evaluated from multiple perspectives based on multiple 2D constraints applied to the different projections.

(46) For evaluating the posture shown in FIG. 5, a plurality of different feedback trigger conditions may be defined. The feedback trigger conditions may for example specify at least one of distances, angles, ratios of different body parts and specific relationships between different body key points. For each feedback trigger condition, an allowable deviation from the given condition may for example be specified. The feedback trigger conditions may for example be set up using an editor or a graphical user interface provided for this purpose. For example, a physiotherapist may use such a tool for specifying the feedback trigger conditions for dedicated postures of a certain physical exercise.

(47) For example, for the above example of a squat, a first feedback trigger condition may define that the head is oriented at an angle of 0° relative to the vertical, in order to make sure that the line of sight is straight. For this condition, an allowable deviation of 5° may be specified. A second feedback trigger condition may specify that the leg is oriented at angle of less than 45° relative to the vertical. This ensures that the knee does not dodge to the front. The third feedback trigger condition relates to the vertical position of the wrist relative to the knee. The movement should not be too deep and therefore, the wrist has to be located above the knee. In a fourth feedback trigger condition, the correct orientation of the spine is defined. When doing a squat, the spine must not be crooked. Accordingly, the angle of the spine relative to the vertical should be below 40°, with an allowable deviation being set to 5°.

(48) In case one of the feedback trigger conditions is not fulfilled, for example in case the person's wrist is located below the knee, a corresponding audio message may be reproduced. In this example, the audio message would suggest that the movement should not be that deep. The feedback provided to the person can also depend on the previously given feedback. For example, if the user has improved since the last loop but the movement is still not correct, there might be a different feedback like for example “better, but still a bit too deep”. In case the person overcorrected the movement based on the last feedback, a suitable audio message may address this overcorrection.

(49) Also with regard to the feedback trigger conditions, the limit values of these conditions may be adjusted in dependence on real data showing persons performing an exercise. For example, a physiotherapist or a physician may classify postures in a large number of video sequences, with the postures being rated as favourable or as not favourable. Depending on these ratings, the limit values and thresholds of the feedback trigger conditions may be set or adjusted automatically. Also here, suitable limit values may be either obtained as a result of calculation or by machine learning.

(50) Detection of the End Loop Condition

(51) At the end of the loop, there is a transition from the person's exercising activity to an intermediate posture. This transition occurs at the point of time 11. As shown in FIG. 4a, detecting the end loop condition may comprise detecting a change in the slope 19 of the curve 7 and/or detecting that the characteristic parameter remains within a value range 20 when the person is in the intermediate posture. Preferably, the intermediate posture is a rest position.

(52) By detecting the start loop condition and the end loop condition, the loop of exercising can be detected. The loop provides a reference frame for analyzing and evaluating the person's movement. Preferably, the time period for a single loop may be evaluated and compared with at least one of a lower limit and an upper limit. If the time period for performing a single repetition of the physical exercise is too short, a suitable feedback may be provided to the person performing the exercise. For example, the person may be asked to slow down when performing the exercise. If the time period for a single loop is too large, the person may be asked to perform the exercise faster.

(53) The features described in the above description, claims and figures can be relevant to the invention in any combination. Their reference numerals in the claims have merely been introduced to facilitate reading of the claims. They are by no means meant to be limiting.

Monitoring the performance of physical exercises

Assignee

Inventors

Cpc classification

Classification Explorer

G16H20/30

PHYSICS

Classification Explorer

G06T7/246

PHYSICS

Classification Explorer

G06V10/757

PHYSICS

Classification Explorer

A61B5/1128

HUMAN NECESSITIES

Classification Explorer

A61B5/7405

HUMAN NECESSITIES

Classification Explorer

G06T2207/10016

PHYSICS

Classification Explorer

G16H30/20

PHYSICS

Classification Explorer

G06T7/74

PHYSICS

Classification Explorer

A61B5/1116

HUMAN NECESSITIES

Classification Explorer

A63B2220/806

HUMAN NECESSITIES

Classification Explorer

A63B2230/62

HUMAN NECESSITIES

Classification Explorer

G06T7/248

PHYSICS

Classification Explorer

A61B5/4833

HUMAN NECESSITIES

Classification Explorer

A63B2024/0068

HUMAN NECESSITIES

Classification Explorer

G06T2207/20084

PHYSICS

Classification Explorer

G06V40/23

PHYSICS

Classification Explorer

A63B24/0062

HUMAN NECESSITIES

Classification Explorer

A61B5/1118

HUMAN NECESSITIES

Classification Explorer

G06V10/62

PHYSICS

Classification Explorer

A61B5/486

HUMAN NECESSITIES

Classification Explorer

A61B5/725

HUMAN NECESSITIES

Classification Explorer

G06T2207/30196

PHYSICS

International classification

Classification Explorer

A61B5/11

HUMAN NECESSITIES

Classification Explorer

A63B24/00

HUMAN NECESSITIES

Classification Explorer

G06T7/246

PHYSICS