Method for estimating force and pressure of collision in vocal cords from high-speed laryngeal videos

10783630 · 2020-09-22

Assignee

Inventors

Cpc classification

International classification

Abstract

The present invention relates to a collision force analysis (CFA) method for obtaining in vivoestimates of contact force and pressure in the vocal cords directly from laryngeal videoendoscopies. The method comprises the steps of: providing at least a high-speed laryngeal videoendoscopy (also called High Speed Videoendoscopy or HSV) to generate at least one image and videos of the vocal cords; pre-processing the image in a processing unit, to define a region of interest (ROI) of the location of the glottis; detecting in the processing unit, the edges of the vocal folds in the images obtained by means of the HSV; recording in the processing unit, the points of the edges detected by means of a sequence of images; estimating in the processing unit, the path of the vocal cord edge during collision throughout time; and estimating the values of contact and impact of the vocal cords by means of a collision model.

Claims

1. A method of analysis of collision force or CFA to obtain in vivo estimates of force and contact pressure on the vocal cords directly from laryngeal videoendomates, CHARACTERISED in that it comprises the steps of: a. providing at least high-speed laryngeal videoendoscopy (also called High Speed Videoendoscopy or HSV) to generate at least one image and videos of the vocal cords; b. pre-processing the image in a processing unit, to define a region of interest (ROI) of the location of the glottis; c. detecting in said processing unit, the edges on the vocal cords in the images obtained by means of the HSV; d. recording in said processing unit, the detected points of the edges by means of a sequence of images; e. estimating in said processing unit, the path of the vocal cord edge during collision throughout time; and f. estimating the values of contact and impact of the vocal cords by means of a contact model.

2. The method according to claim 1, CHARACTERIZED in that in the step of recording the edges, the location of each edge of the vocal cord is segmented and a polynomial adjustment is applied to record the set of points found for each edge.

3. The method according to claim 1, CHARACTERIZED in that to detect the path according to the detected points of the edges, a Kalman filter is used.

4. The method according to claim 1, CHARACTERIZED in that the contact model is a Hertzian model used to calculate impact estimates from penetration or overlapping values between the edges.

5. The method according to claim 1, CHARACTERIZED in that the pre processing step comprises a correction of the rotation of the endoscopic image by selecting the anterior and posterior endpoints in the glottis to establish the necessary angle of compensation.

6. The method according to claim 5, CHARACTERIZED in that in the pre processing step at least a pair of attachment points is defined, which in turn define where the resting positions of the folds observed in the video.

7. The method according to claim 1, CHARACTERIZED in that in the detection step the left and right vocal edges are determined from the gradient images of the vocal cords.

8. The method according to claim 1, CHARACTERIZED in that in the recording step, the points found in the detection step are used to adjust a p order polynomial by applying a least squares estimator (LS) over the detected points.

9. The method according to claim 3, CHARACTERIZED in that the Kalman filter is applied to perform predictions on the value and rate of change that the positions should have along the contact period.

10. The method according to claim 9, CHARACTERIZED in that the main mode of vibration of the vocal cords may be represented by a mass spring configuration.

11. The method according to claim 1, CHARACTERIZED in that in the step of estimating the contact, the apparent penetration between the tissues and the contact section is extracted from the path previously estimated.

12. The method according to claim 1, CHARACTERIZED in that in the step of estimating the contact, the penetration and degree of contact are evaluated in the Hertzian contact equations in order to obtain the predictions of contact force and pressure suffered by the tissue displayed on a HSV recording to be analyzed.

Description

BRIEF DESCRIPTION OF THE FIGURES

(1) FIG. 1 corresponds to a block diagram with the steps of the collision force analysis method, according to an embodiment of the invention.

(2) FIG. 2 corresponds to an image of a vocal cord with their respective edges and predefined attachment points, according to an embodiment of the invention.

(3) FIG. 3 corresponds to an image of a vocal cord in the step of detecting vocal edges, according to an embodiment of the invention.

(4) FIG. 4 corresponds to a temporal representation of the detection step, according to an embodiment of the invention.

(5) FIG. 5 corresponds to an image of the recording stage of the vocal cords, according to an embodiment of the invention.

(6) FIG. 6 corresponds to a temporary representation of the recording step, according to an embodiment of the invention.

(7) FIG. 7 corresponds to an estimate of coefficients during the collision at the tracking stage, according to an embodiment of the invention.

(8) FIG. 8 corresponds to a temporary representation of the tracking step according to an embodiment of the invention.

(9) FIG. 9 corresponds to the step of estimating the collision in the vocal cords, according to an embodiment of the invention.

(10) FIG. 10 corresponds to an outline of a sequence of movement of the vocal cords in a typical videoendoscopy recording.

DETAILED DESCRIPTION OF THE INVENTION

(11) The present invention relates to a method for the Collision Force Analysis or CFA. Said method comprises at least 5 steps, as shown in FIG. 1.

(12) First, at least one High Speed Laryngoscopy (1) (referred to as High Speed Videoendoscopy or HSV) is presented as input to the method, so that to generate at least one image and videos of vocal cords (2). Thereafter, the images and videos are sent to a processing unit (not shown in the figures), wherein a pre-processing step (100) is applied to correct the orientation of the glottis, defining a region of interest (ROI) in its location. Then, edge detection (200) is performed on the vocal folds, which is processed by a sequence of operators (300) which analyzes the gradient information in the image. The location of each edge (2a, 2b) of the vocal cord (right and left) is segmented and a polynomial adjustment is applied to record the set of points found for each edge (2a, 2b). The recorded coefficients are provided to a Kalman filter that provides an estimate of the path of the vocal edge during the collision over time (400), or tracking. A mass-spring model is used to follow the edge trajectory during the collision phase. Finally, the penetration or overlap values between the edges and the contact section between them are extracted to calculate the impact estimates through the Hertzian (500) model.

(13) During the pre processing step (100) of the videos obtained through HSV (1) there is a correction of the rotation of the endoscopic image by the user comprised, which is carried out by selecting the anterior and posterior endpoints in the glottis to establish the necessary angle for compensation. A reference image of the sequence during glottic closure is used to view these points. The user then defines a region of interest (ROI) and a M.sub.ROI mask centered on the glottis to establish what section of the video will be processed. Typically, a HSV recording has undesired low frequency movements related to the usual manipulation of the endoscope. A motion compensation algorithm is pre applied to the video in the event that cleaning the low frequency movements present is necessary, this being why the location of the ROI can be considered fixed and not requiring updating.

(14) Additionally, a pair of points on each vocal cord are defined by user input, which are referred to attachment points (2c, 2d), which are referenced as (x.sub.a; y.sub.a) and (x.sub.b; y.sub.b). As noted in FIG. 2, these attachment points (2c, 2d) define where the resting positions of the folds observed in the video are found, assuming a straight line between them as the central location of the oscillation of each tissue during phonation. Under this assumption, these attachment points (2c, 2d) are considered close to the glottis endpoints (both anterior and posterior) under complete closing of the glottis. However, these points (2c, 2d) may differ from this respective glottal midline (a line formed by the joining of the anterior or posterior spaces of the glottal area) especially in the cases of patients with incomplete glottal closure. When contact between tissues is partial, an opening appears at the back of the glottis, which induces a more distant location of these upper attachment points by the user. For CFA, the attachment points are necessary to grip a curve representing the vocal edge. These are constraints for a problem of polynomial adjustment used to represent each fold.

(15) In the step of detection (200), the HSV passes through a sequence of the basic image processing operations by the processing unit. Each frame I is converted into a grey scale image I.sub.g and a morphological reconstruction operation is applied on its reverse to clean the specular reflection generated by the mucosa of the vocal cords. Next, a Prewitt operator is applied to obtain the magnitude and phase of the gradient, G.sub.A and G.sub. (in degrees) respectively. G.sub.A is masked with the M.sub.ROI obtained in the previous stage (G=GA.Math.M.sub.ROI) and used to segment the edges, separating G into two gradient images as follows:

(16) G right ( x , y ) = { G ( x , y ) B r ( x , y ) G ( x , y ) > t h , 0 i . o . c . ( 1 ) G left ( x , y ) = { G ( x , y ) B l ( x , y ) G ( x , y ) > t h , 0 i . o . c . ( 2 ) B r ( x , y ) = { 1 G ( x , y ) > 90 G ( x , y ) < - 90 , 0 i . o . c . ( 3 ) B l ( x , y ) = { 1 G ( x , y ) < 90 G ( x , y ) > - 90 , 0 i . o . c . ( 4 )

(17) where t.sub.h is a threshold parameter. From these gradient images G.sub.right y G.sub.left, the location of the edge is calculated on axis x for each horizontal line of the ROI, forming pairs is calculated (x; y) of points located in the centroid of the gradient found:

(18) ( x _ j , y _ j ) = ( .Math. i = 1 w i .Math. G s ( i , j ) .Math. i = 1 w G s ( i , j ) , j ) ( 5 )

(19) j[1, h], s[left, right]. Where w and h are respectively the width and height of the ROI. Only up to the endpoints of glottis are taken into account. The upper and lower points outside the range defined by the attachment points are omitted. Finally, a temporary average mobile filter is applied at each X.sub.j position the invention in order to obtain a smooth variation of the fold movement, reducing the detection error in the local position of the edge.

(20) ( x j , y j ) k = ( [ 1 N .Math. i = - .Math. ( N - 1 ) / 2 .Math. .Math. ( N - 1 ) / 2 .Math. x _ j , k - i ] , y _ j , k ) ( 6 )

(21) k[1, N.sub.frames], wherein N=5. In FIG. 3, an example is shown of this detection step applied on a HSV recording only as example and in FIG. 4 a temporary representation of the medial portion of the glottis with a chemogram is seen. As it can be seen, the gradient information is used to find the left and right vocal edges, but the detected points are lost when the flods collide each other (time C in the temporary sequence of FIG. 4). The smoothing performed by the temporary filter reduces the detection error during the glottal opening and closing phase, but when the impact starts, the gradient does not exceed the t.sub.h threshold set and the edge location is lost. The task of the following steps will be to establishing a framework allowing estimating the projection of these edges during the times of impact.

(22) In the recording step (300), conducted on the processing unit, the points (x; y) found in the detection step (200) are used to adjust a polynomial of the p order by applying the Least Squares (LS) estimator to the detected points along a line (or coordinate axis) defined by the attachment points (x.sub.a; y.sub.a) y (x.sub.b; y.sub.b). The points of attachment are taken into account as fixed roots of the solution, thereby determining constraints to the problem. The M polynomial to be adjusted when the line of attachment is vertical (x0=xa=xb) is defined as:

(23) M p ( y ) = ay p + by p - 1 + cy p - 2 + dy p - 3 + .Math. = ( .Math. i = 0 p - 2 i y i ) ( y - y a ) ( y - y b ) + x 0 ( 7 )

(24) Wherein the coefficients of the M polynomial written are in general as follows:
=(a b c d . . . ).sup.T(8)

(25) The value of these coefficients is constrained by the roots y.sub.a y y.sub.b. By factoring these constraints on M, the unknown parameters of the curve to be adjusted can be cleared, said set being defined as:
=custom character(1(y.sub.a+y.sub.b)y.sub.ay.sub.b).sup.T(9)
If the attachment points do not define a vertically oriented line, rotating all of the set of points detected is previously required to view the problem from the coordinate axis determined by these restrictive points. If the angle of inclination of this line is , then the points detected in the new coordinate system can be obtained with the following transformation:

(26) ( u v ) = ( cos - sin sin cos ) ( x y ) ( 10 )

(27) And the M curve to be adjusted is rewritten as:
M.sub.p(v)=(.sub.i=0.sup.p2.sub.iv.sup.i)(vv.sub.a)(vv.sub.b)+u.sub.0(11)

(28) With this, the LS solution used to compute the parameters in equation 1 corresponds to:

(29) = ( A T A ) - 1 A T U ( 12 ) A = ( v ~ 1 v 1 p - 2 .Math. v ~ 1 v 1 v ~ 1 v ~ 2 v 2 p - 2 .Math. v ~ 2 v 2 v ~ 2 .Math. .Math. v ~ D v D p - 2 .Math. v ~ D v D p - 2 v ~ D ) U = ( u ~ 1 u ~ 2 .Math. u ~ D ) u ~ l = u l - u 0 u ~ l = ( v l - v a ) ( v l - v b ) l [ 1 , D ] ( 13 )

(30) Where the pairs (u.sub.l; v.sub.l) are the points obtained in the detection step with the equation (6) and previously transformed with equation (10), and D is the number of points found in the detection step. This regression is applied for each set of points of vocal cords, both left and right, and after being applying to the equation 12, their values .sub.k are recorded along the sequence of the video. In this step, also the rate of change of the coefficients ({dot over ()}.sub.k) is estimated:

(31) . k = 1 t ( k - k - 1 ) ( 14 )

(32) Both the value or location of the coefficients .sub.k and their respective velocities .sub.k are the input records to the next tracking stage. These values are considered as observations of a process describing the dominant oscillating mode of the vocal folds. The recording process can be seen in FIGS. 5 and 6.

(33) As can be seen in FIG. 6, the values of .sub.k tend to show poor adjustment solutions during the collision phases of the tissue. This is basically because the least squares estimate is not well conditioned when the amount of detected points D suddenly decreases, which occurs by the reach of the t.sub.h gradient threshold in the detection step. At this point, the values of the record obtained during the collision are invalid and do not represent useful information during the impact. Thereby, they can be regarded as an occlusion problem of the vocal edge, whose handling will be performed in the next follow-up step (400).

(34) In the tracking step (400) the occlusion of the vocal cords is regarded as a problem of estimating the status variables in the presence of noise and data loss. Here a Kalman filter is applied to perform predictions on the value and rate of change that the coefficients should have over the contact period. In order to describe these occlusion periods with a linear process, it is assumed that the main mode of vibration in the vocal cords can be represented by a mass-spring configuration, that is, a pair of springs fixed in their respective attachment lines (defined in the pre processing stage (100)).

(35) Under this assumption, the model used to describe the vibratory process of a vocal cord corresponds to:

(36) X i , k + 1 = AX i , k + V k ( 15 ) Y i , k = CX i , k + E k Y i , k = ( i , k . i , k ) A = ( 1 t - k t 1 - b t ) C = ( 1 0 0 1 ) ( 16

(37) Where X.sub.i,k is the particular state of the coefficient .sub.i in .sub.k at time k, Y.sub.i,k are observations of the state of the process, which we assume as available with matrix C as identity. V.sub.k an E.sub.k are the processing noise and measurement noise, cConsidered Gaussian and noncorrelated with variances .sub.v and .sub.e respectively. t=1/f.sub.s is the sampling time, k the stiffness of the spring, and b the damping value of the process. The mass of the coefficient is not present, since the interest lies in representing the kinematic of the vocal edge and this only translates into a scaling factor for the solution. Therefore, the mMass parameter will be considered as unit in this process. Tuning this process to a particular w.sub.r resonance is sought, which enables to describe the path of .sub.k during the occlusion. Thus, w.sub.r and are defined as cControl parameters for the dynamic response of the process.

(38) k = w r 2 1 - 2 b = 2 k ( 17 )

(39) The parameter is thought only to avoid possible unstable solutions and low values near zero are usually considered (0-0.03). This offsets possible instabilities of the process due to discretization of the system (high values of K are prone to generate poles slightly outside of the unit circle). The stiffness k is automatically calculated by estimating the resonance frequency w.sub.r, using the kinematic information from the recording stage.

(40) For establishing the resonance value, the analytical solution of the mass-spring model is considered at initial conditions as a target function of a minimization problem. The values of .sub.i,k.sub.o and {dot over ()}.sub.i,k.sub.o are considered at k.sub.0 previous time to the impact as initial condition values, so that the analytical solution of the mass-spring model for these conditions coincides with a similar position of return, but at the end of the contact at k.sub.1 time. Thereby to determine a value of w.sub.r meeting the following is interesting:

(41) 0 w r = arg min w .Math. F i , k 1 ( w ) - i , k 1 .Math. ( 18 ) F i , k 1 ( w ) = i , k 0 cos ( wt k 1 ) + i , k 0 w sin ( wt k 1 ) ( 19 )

(42) Where t.sub.k.sub.1=(k.sub.1k.sub.0) t. As can be seen in FIG. 7, the solution of equation 18 is not necessarily unique and the method of resolution thereof may fall into local minimums. However, it is expected that the target resonance frequency is maintained close to the fundamental frequency executed by the patient during recording. In a preferred embodiment of the invention, the resolution method used is a standard Nelder-Mead and its starting condition is set to an expected fundamental frequency of oscillation in the order of 200 [Hz].

(43) This w.sub.r resonance value controls the necessary k stiffness for the process to synchronize a simple harmonic motion on the temporal evolution of each vocal cord, but it is only intended to complete the sequence during the collision times. When the vocal edges are visible, there is no priority for the use of the process for the estimation of the trajectory, since there is no occlusion. In order to define when the process predictions will be required, the following amounts are defined:

(44) K = D T - D k D T k = 1 1 + e - ( K - ) ( 20 )

(45) which are respectively referred to as the ratio of undetected points .sub.K and its associated uncertainty factor .sub.k. D.sub.T represents the maximum possible amount of points detected at the edge, D.sub.k the current amount of detected points, a gain factor, and an uncertainty threshold. The uncertainty factor determines how much mistrust we have on recorded values of .sub.k. When D.sub.k is very small, for example, .sub.k increases above the uncertainty threshold and .sub.k tends to unity, which means that there are many points lost in the detection step and the adjustment of the polynomial in the recording step is bad. This indicator states that the estimates of the Kalman filter are necessary in such a circumstance and require higher priority. The following equations describe the implementation of the Kalman filter developed (the indices of i coefficients are omitted for simplicity):
{circumflex over (X)}.sub.k+1|k+A{circumflex over (X)}.sub.k|k(21)
{circumflex over (X)}.sub.k|k=(IJ.sub.kC){circumflex over (X)}.sub.k|k1+J.sub.kY.sub.k(22)
J.sub.k=P.sub.k|k1C.sup.T[CP.sub.k|k1C.sup.T+P.sub.E].sup.1(23)
P.sub.k+1|k=AP.sub.k|kA.sup.TP.sub.V(24)
P.sub.k|k=P.sub.k|k1(1.sub.k)J.sub.kCP.sub.k|k1(25)
.sub.k=C{circumflex over (X)}.sub.k|k1(26)

(46) Kalman considers this uncertainty factor .sub.k as a quantifier of the degree of mistrust or loss of kinematic information in the observation. This is internally controlled by modifying the J.sub.k gain matrix of the filter, adjusting the weight of the second term in the equation (25) which updates the covariance of the estimate error P.sub.k|k.

(47) The output estimate is defined as Y.sub.k, which is a linear combination between the Y.sub.k observations of the status obtained in the recording step and the predictions made of the .sub.k status.
Y.sub.k=(1.sub.k)Y.sub.k+.sub.k.sub.k(27)

(48) It should be noted that by controlling the .sub.k factor, the filter selects the best set of available coefficients to represent the curve describing the vocal fold. Finally, the first value of the Y.sub.k vectors (the value estimated of the position of the coefficient .sub.i,k) is grouped into a .sub.k vector and then by the expression (9) the resulting .sub.k coefficient vector is calculated for the final representation of the edge.

(49) In the example illustrated in FIG. 7, the response of the filter against changes in uncertainty in the variation of the recorded coefficients can be observed. The filter handles the loss of points detected by increasing .sub.k and switches to the internal predictions of these values if necessary. The last position and speed achieved at the previous time of the impact is taken into consideration to previously estimate the k parameter of the process. During the collision, the Kalman filter continues the sequence with the predictions, ignoring the values of misconditioned coefficients. When the collision is terminated and the occlusion of the vocal fold is no longer a problem, the estimate returns to the edges previously detected in the previous step. This allows for the complete representation of the entire cycle, which is possible to note in FIG. 8. By gently completing the temporary evolution of vibration of each vocal cord, regardless of the deformation thereof at the moment of impacting, the apparent penetration .sub.k between the overlapping cords is now visible and can be used to estimate the collision of the tissue.

(50) In the step of estimating the contact (500), carried out at the processing unit, the apparent .sub.k penetration is drawn between the tissues and the contact section .sub.c from the previously estimated trajectory. The difference between the left and right polynomials evaluated at their respective .sub.k coefficients is used to compute this pair of values as follows:
{circumflex over (x)}.sub.j,k=M.sub.p(y.sub.i,k;.sub.k.sup.left)M.sub.p(y.sub.j,k;.sub.k.sup.right)(28)
.sub.k=.Math.mx{{circumflex over (x)}.sub.j,k,.sub.j}(29)
.sub.k=.Math.j{{circumflex over (x)}.sub.j,k>0}(30)

(51) The gain is a video calibration factor to turn the spatial dimension of pixels into meters, which will be assumed to be known. Finally, the penetration and degree of contact are evaluated in the Hertzian contact equations to obtain the predictions of force and pressure of contact suffered by the tissue displayed on a HSV recording to be analyzed, as shown in FIG. 9. In this example, the values of force and pressure are only suggested since the parameters, T, L, and E* used herein are not calibrated for this case.

(52) For the cylindrical contact it is fulfilled with

(53) P c = 4 E * c L c .

(54) Where c is the penetration, Lc is the length of the contact, =1.679 is a correction factor and E* is the effective Young modulus define by

(55) E * = E 2 ( 1 - v 2 ) .

(56) Outline of a sequence of movement of the vocal cords in a typical videoendoscopy recording is shown in FIG. 10. The colored lines represent the estimation of the edge of each vocal fold (left in red, right in blue). The superposition of fictitious edges during the collision shows the estimated depth of penetration (delta_c) and contact length (L_c).