SYSTEM AND METHOD FOR RETINA TEMPLATE MATCHING IN TELEOPHTHALMOLOGY

Abstract

A retina image template matching method is based on the registration and comparison between the images captured with portable low-cost fundus cameras (e.g., a consumer grade camera typically incorporated into a smartphone or tablet computer) and a baseline image. The method solves the challenges posed by registering small and low-quality retinal template images captured with such cameras. Our method combines dimension reduction methods with a mutual information (MI) based image registration technique. In particular, principle components analysis (PCA) and optionally block PCA are used as a dimension reduction method to localize the template image coarsely to the baseline image, then the resulting displacement parameters are used to initialize the MI metric optimization for registration of the template image with the closest region of the baseline image.

Claims

1. A computer-implemented method for registering a narrow field of view template image to a wide field of view, previously obtained, baseline image, the method comprising: cropping the baseline image into a multitude of smaller offset target images; applying a dimension reduction method to map the offset target images to a representation in a lower dimensional space; mapping the template image into the lower dimensional space using the dimension reduction method; finding the corresponding nearest target image for the template image in the lower dimensional space; registering the template image to the nearest target image; identifying the location of the template image on the baseline image based on the position of the nearest target image; and registering the template image to the baseline image at the identified location.

2. The method of claim 1, wherein the baseline image comprises a fundus image.

3. The method of claim 2, wherein the template image comprises an image captured by a portable fundus camera.

4. The method of claim 3, wherein the portable fundus camera comprises a camera embodied in a smartphone or tablet computer configured with apparatus to assist in taking a photograph of the eye.

5. The method of claim 4, wherein the cropping, applying, mapping, finding, registering, identifying, and registering are performed in a processing unit in the smartphone or tablet computer.

6. The method of claim 2, wherein the fundus image is obtained without chemical dilation of the pupil of the subject.

7. The method of claim 1, wherein registering the template image to the nearest target image employs a mutual information procedure.

8. The method of claim 1, wherein applying a dimension reduction method to map the offset target images to a representation in a lower dimensional space and mapping the template image into the lower dimensional space using the dimension reduction method comprises Principal Component Analysis.

9. The method of claim 1, wherein finding the corresponding nearest target image for the template image in the lower dimensional space is performed using block Principal Component Analysis.

10. The method of claim 1, further comprising determining the gaze position of the subject.

11. The method of claim 1, further comprising locating a surgical tool in the eye from the registered template images.

12. An extended reality device, comprising: an imaging device; a processor operatively coupled to the imaging device; and a memory storing therein a sequence of instructions which, when executed by the processor, causes the processor to perform a set of acts for registering a narrow field of view template image to a wide field of view, previously obtained, baseline image, the set of acts comprising: cropping the baseline image into a multitude of smaller offset target images; applying a dimension reduction method to map the offset target images to a representation in a lower dimensional space; mapping the template image into the lower dimensional space using the dimension reduction method; finding the corresponding nearest target image for the template image in the lower dimensional space; registering the template image to the nearest target image; identifying the location of the template image on the baseline image based on the position of the nearest target image; and registering the template image to the baseline image at the identified location.

13. The device of claim 12, wherein the baseline image comprises a fundus image.

14. The device of claim 13, wherein the template image comprises an image captured by a portable fundus camera.

15. The device of claim 14, wherein the portable fundus camera comprises a camera embodied in a smartphone or tablet computer configured with apparatus to assist in taking a photograph of the eye.

16. The device of claim 15, wherein the cropping, applying, mapping, finding, registering, identifying, and registering are performed in a processing unit in the smartphone or tablet computer.

17. The device of claim 13, wherein the fundus image is obtained without chemical dilation of the pupil of the subject.

18. The device of claim 12, wherein registering the template image to the nearest target image employs a mutual information procedure.

19. The device of claim 12, wherein applying a dimension reduction device to map the offset target images to a representation in a lower dimensional space and mapping the template image into the lower dimensional space using the dimension reduction device comprises Principal Component Analysis.

20. A non-transitory machine accessible storage medium having stored thereupon a sequence of instructions which, when executed by a processor of a mixed reality device, causes the processor to perform a set of acts for registering a narrow field of view template image to a wide field of view, previously obtained, baseline image, the set of acts comprising: cropping the baseline image into a multitude of smaller offset target images; applying a dimension reduction method to map the offset target images to a representation in a lower dimensional space; mapping the template image into the lower dimensional space using the dimension reduction method; finding the corresponding nearest target image for the template image in the lower dimensional space; registering the template image to the nearest target image; identifying the location of the template image on the baseline image based on the position of the nearest target image; and registering the template image to the baseline image at the identified location.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0029] The appended drawing figures are offered by way of example and not limitation of currently preferred embodiments of this disclosure.

[0030] FIG. 1 is an illustration of one example of a teleophthalmology environment in which the features of this disclosure can be practiced.

[0031] FIG. 2 illustrates the overview of our template matching method, including four panels or processing steps: [0032] (a) create a multitude of offset target images from a full/baseline image; [0033] (b) create low-dimensional representations of each of the target images using PCA; [0034] (c) perform coarse localization of the template image: find the nearest target image in the low-dimensional space; and [0035] (d) MI-based registration of the template and the nearest target image, and locate the template image onto the baseline image.

[0036] FIG. 3 is a flow chart of the sequence of processing instructions representing panels (a) and (b) of FIG. 2. These processing steps can be done in a pre-computation or off-line manner, i.e., in advance of the template matching and registration steps of panels (c) and (d) of FIG. 2.

[0037] FIG. 4 is a flow chart of a sequence of processing instructions representing panels (c) and (d) of FIG. 2. These processing steps can be done in an on-line manner, i.e., at the time the template images are acquired. The processing steps can be executed in the device acquiring the template images (e.g., portable fundus camera, e.g., smartphone, etc.) or in a remote processing unit, such as a computer workstation in an eye clinic that receives the template images from the device the patient uses to acquire the template images.

[0038] FIG. 5 is a flow chart of the block PCA step 308 in FIG. 4.

[0039] FIG. 6 is a flow chart of a sequence of processing instructions for constructing a mosaic or panorama of template images.

[0040] FIG. 7 is low dimensional representation of block PCA showing the mapping of template patches onto a target image patches using the procedure of FIG. 5. The T dictionary for each target image saves information of the open circled dots in the figure.

DETAILED DESCRIPTION

[0041] This document discloses an efficient and accurate retinal matching system and method combining dimension reduction and mutual information (MI), we refer to the technique here as RetinaMatch. By way of overview, the dimension reduction initializes the MI optimization as a coarse localization process, which narrows the optimization domain and avoids local optima. The disclosed system and method outperforms to-date existing template matching solutions. In addition, we disclose a system and method for image mosaicking with area-based registration, providing a robust approach which may be used when the feature-based methods fail. To the best of our knowledge, this is the first template matching technique for retina images with small template images from unconstrained retinal areas.

[0042] Our approach is an improvement over the area-based matching methods with MI metric, since it is more reliable to achieve accurate and robust template matching near the alignment position. One unique aspect of our approach is that we combine dimension reduction methods with the MI-based registration to reduce the interference of local extrema and improve the matching efficiency.

[0043] An example of the practical use of our method in monitoring the retina in a teleophthalmology setting is shown in FIG. 1. A subject 10 has a portable fundus camera, in this example in the form of a smartphone 12 with ancillary apparatus 14 adapted for imaging the eye (for example the D-eye device). The subject's eye need not be chemically dilated. The subject 10 holds the camera 12 and apparatus 14 up to their eye and captures a series of small field of view images perhaps 30 or 50 in all. In one configuration, the smartphone 12 includes a processing unit and memory storing instructions for implementing the template matching procedure of FIG. 2 (discussed below). In particular, the smartphone captures a series of template images, registers them to a baseline image stored in the phone, and conducts a comparison between the two to determine if differences exist, where the differences can indicate an occurrence or change of a condition of the retina in the interim. The changes can be reported to the subject 10, e.g., via a template matching app that performs the comparison. In the event that some condition has developed or worsened based on the comparison, the subject can alert their eye doctor and/or send the mosaicked template images via the cellular network 16, internet 18 and to the eye clinic. The eye clinic includes a workstation 24 where the eye doctor can view the currently obtained mosaicked image 26 as well as stored prior mosaicked or baseline images 28 and make comparisons, make measurements of particular retinal features, etc., and coordinate treatment or further evaluation of the subject. While the above description indicates that the subject 10 could be a patient, that is not necessarily the case and could be any user, e.g., a user of a virtual reality (VR) headset, see the discussion in the applications section below.

[0044] It is also contemplated that the template images captured by the smartphone 12 could be sent over the network 16, 18 to a computing system 22 in the eye clinic and the processing steps of FIG. 2 could be performed in the eye clinic computing system 22, including the off-line processing steps (see discussion below), template matching and registration of template images to the baseline image. This configuration may be suitable when the portable fundus camera used by the patient 10 has limited processing power. The portable fundus camera may have the capability to connect to a computer network to share the images with the eye clinic, either directly (e.g., using WIFI) or the images can be otherwise downloaded to another device such as a personal computer or smartphone and then transmitted to the eye clinic for processing.

[0045] Specific examples of the applications of the retinal template matching in a teleophthalmology setting will be discussed at length later in this document.

[0046] With the above description in mind, one of the principal aspects of this disclosure is application of dimension reduction methods with the MI-based registration to reduce the inference of local extrema and improve the matching efficiency.

[0047] FIG. 2 illustrates an overview and schematic of the retinal template matching method shown in four panels from (a) to (d). In panel (a) a wide-FOV full or baseline image 10 is sampled, or cropped, into many overlapping or offset target images 102, 104, etc. The arrows 106 and 108 indicate the cropping or creation of target images occurs in the X and Y directions such that the full image 10 shown in the dashed lines is cropped into offset, smaller target images similar to images 102 and 104. In panel (b), each target image (shown as images 102A, 102B, 102C 102D etc.) is mapped into a low-dimensional space ?.sub.2 according to its positional relationship using PCA. The dots P1, P2, P3, P4 show the low dimensional representations of the images 102A, 102B, 102C 102D in the low dimensional space. (It will be understood that the panel (b) shows only a two dimensional space but in practice the representations may be made in a low dimensional space of say 20 dimensions, depending on the implementation of PCA.) In panel (c) a small field of view template image 110 is also mapped into this space using PCA (represented by the dot 112 ) and its nearest target image (image 102B, represented by the dot P2) is found. In panel (d), the template image 110 is registered to its nearest target image 102B with mutual information (MI). Specifically, in panel (c) the principal component analysis (PCA) and block PCA are used to localize the template image 110 coarsely, then the resulting displacement parameters are used to initialize the MI metric optimization for the registration procedure of panel (d). The initial parameters provided by the coarse localization are in the convergence domain of the MI metric. With the initialization near the optimal position, the transformation search space for optimization is narrowed significantly. The PCA computations shown in panels (b) and (c) are accelerated with randomized methods, which improve the coarse localization efficiency. As shown in panel (d), the template 110 is located or matched on to the full or baseline image 100 using the information of the location of the nearest target image 102B.

[0048] The process of FIG. 2 panels (c) and (d) repeats for all the template images 110 that are captured; they are all registered to their nearest target image and then located onto the baseline image. After completion of the panel (d) process for all template images 110 a comparison between the registered template images and baseline image can then be performed. Additionally, the template images can be mosaicked into a new baseline image such that when a subsequent set of template images are obtained they can be compared to the updated baseline image created from a previous iteration of the procedure of FIG. 2.

[0049] The procedures shown in panels (a) and (b) of FIG. 2 can be pre-processed when the full (or baseline) image is obtained, while panels (c) and (d) could be considered as an on-line stage when the template images are acquired in real time, or alternatively come into the clinic from the patient in a remote location. The procedures of panels (a)-(d) could be all executed in the processing unit of the portable fundus camera, smartphone, etc. that is used to acquire the template images. Alternatively, some or all of the processing steps of panels (a)-(d) could be done on a remote computing platform, such as for example a workstation in an eye clinic as shown in FIG. 1. The schematic of FIG. 2 describes the method without using the improvement of block PCA for finding the nearest target image. FIG. 5 shows the details of block PCA and will be discussed in detail below.

[0050] With the above explanation in mind, attention will now be directed to FIGS. 3 and 4 and the steps of panels (a)-(d) described in further detail. FIG. 3 illustrates detailed steps 200 of the pre-processed part, referring to (a) and (b) panels in FIG. 2. The input is the large FOV full image F (100), which at step 202 is split or cropped into image patches (target images) I.sub.i with a certain offset, e.g., 10 pixels. At step 204 a dimensional reduction process (e.g., PCA) is performed to the target images mapping them into the low dimensional space (FIG. 2 panel (b)) and the output is the low-dimension representations Z of F's patches. PCA is used for the dimension reduction of images, generating the representations in a low dimension space. At step 206 this low dimension representation is saved in computer memory.

[0051] FIG. 4 illustrates the sequence of processing steps 300 for a template image, referring to (c) and (d) panels in FIG. 2. Together, they form a matching step of dimension reduction for the template image and registration using mutual information. The process has two steps, coarse localization (steps 302 and 304 FIG. 2 panel (c)) and accurate registration (steps 308, 310 and 312, FIG. 2 panel (d)). The input is the template images S to be matched (110), and the output is the mapped template on the full image F, after step 312 is performed. In step 302, S is mapped to the low dimensional space. At step 304, we find the nearest target representation in the low dimensional pace, 306. At step 308 we use block PCA to update the target image region to I **. At step 310 we perform an accurate registration tween S and I** with a MI metric. At step 312 we determine the location of the template image on the full image F based on the position of the updated target image region I**. A nearest target image I* of S on the full image is obtained in the coarse localization, and image I* and S have a large overlap. Block PCA is used to update I* to I** on the full image F, getting more overlap with S. The template S can be matched on F based on the location of I**.

[0052] A specific embodiment of our procedure of FIG. 2 used is set forth below.

1. FIG. 2, Panel (a): Create Target Images from the Full (Baseline) Image

[0053] We define the full image and template as F and S respectively. The full image F is split into target images I.sub.1, I.sub.2, . . . , I.sub.N:

I.sub.i=?(b.sub.i, F).

[0054] The function ? crops the target images I.sub.i from F at b.sub.i and b.sub.i=[x.sub.i, y.sub.i, h, w], where (x.sub.i, y.sub.i) denotes the center position and (h,w) denotes the width and height of the cropped image. There is a certain displacement, f, of neighboring target images in the x and y axes. As shown in FIG. 2 panel (a), each target image has a large overlap with its neighbors. The overlap forms the redundancy of the data which can indicate the location distribution between each image and its neighbors. Applying dimension reduction techniques on such data, as explained below, we can obtain the low-dimensional distribution map of all target images.

[0055] Target images are resized to vectors and form the matrix X ? custom-character .sup.n?d.

2. FIG. 2 Panel (b): Create Low Dimensional Representations of the Target Images with PCA

[0056] Dimension reduction methods allow the construction of low-dimensional summaries, while eliminating redundancies and noise in the data. To estimate the template location in the 2d space, the full image dimension is redundancy, thus we apply dimension reduction methods for the template coarse localization.

[0057] Generally, we can distinguish between linear and nonlinear dimension reduction techniques. The most prominent linear technique PCA. PCA is selected as the dimension reduction method in RetinaMatch since PCA is simple and versatile. Specifically, PCA forms a set of new variables as a weighted linear combination of the input variables. Consider a matrix X=[X.sub.1, x.sub.2, . . . , X.sub.d] of dimension n?d, where n denotes the number of observations and d is the number of variables. Further, we assume that the matrix X is column-wise mean centered. The idea of PCA is to form a set of uncorrelated new variables (referred to as principal components) as a linear combination of the input variables:

[00001] $\begin{matrix} z_{i} = {Xw}_{i}, & (1) \end{matrix}$

where z.sub.i is the .sub.ith principal component (PC) and w.sub.i is the weight vector. The first PC explains most of the variation in the data, the subsequent PCs then account for the remaining variation in descending order. Thereby, PCA imposes the constraint that the weight vectors are orthogonal. This problem can be expressed compactly as the following minimization:

[00002] $\begin{matrix} minimize & {.Math. X - ZW .Math.}_{F^{2}} \\ subject to & W^{T} W = I \end{matrix}$

where ?.?.sub.F is the Frobenius norm. The weight matrix W that maps the input data to a subspace turns out to be the right singular vectors of the input matrix X. Often a low-rank approximation is desirable, e.g., we compute the k dominant PCs using a truncated weight matrix W.sub.k=[w.sub.1,w.sub.2, . . . , w.sub.k]. k is some integer, such as 20.

[0058] PCA is generally computed by the singular value decomposition (SVD). Many algorithms have been developed to streamline the computation of the SVD or PCA for high-dimensional data that exhibits low-dimensional patterns, see J. N. Kutz, et al., Dynamic mode decomposition: data-driven modeling of complex systems. SIAM, 2016, vol. 149. In particular, tremendous strides have been made accelerating the SVD and related computations using randomized methods for linear algebra. See the references 24-31 cited in the manuscript portion of the priority U.S. provisional application. Since we have demonstrated high performance with less than 20 principal components, the randomized SVD is used to compute the principal components, improving the efficiency in this retinal mapping application for mobile device platforms (e.g., smartphone, tablet). The randomized algorithm proceeds by forming a sketch Y of the input matrix

Y=X?,

Where ? is a d?l random test matrix, say with independent and identically distributed random standard normal entries. Thus, the l columns of Y are formed as a randomly weighted linear combination of the columns of the input matric, providing a basis for the column space of X. Note, that l is chosen to be slightly larger than the desired number of principal components. Next, we form an orthonormal basis Q using the QR decomposition Y=QR. Now, we use this basis matrix to project the input data matrix to low-dimensional space

B=Q.sup.TX.

[0059] This smaller matrix B of dimension l?d can then be used to efficiently compute the low-rank SVD and subsequently the dominant principal components. Given the SVD of B=U?V.sup.T, we obtain the approximate principal components as

Z=QU?=XV.

[0060] Here, U and V are the left and right singular vectors and the diagonal elements of ? are the corresponding singular values. The approximation accuracy can be controlled via additional oversampling and power iterations.

[0061] Referring again to panel (b) of FIG. 2, in our particular implementation, we obtain the low-dimensional distribution representation of the target image distribution by implementing PCA on X:

Z=XW,

Where Z=[z.sub.1, z.sub.2, z.sub.3, . . . , z.sub.N].sup.T? custom-character .sup.n?l, W?.sup.d?l and l <<d. The image space ?.sub.1 is mapped to a low-dimensional space ?.sub.2 with the mapping W. W and Z are saved in memory, in what we have called a dictionary, D.

[0062] It is important to note that PCA is sensitive to outliers, occlusions, and corruption in the data. In ophthalmological imaging applications, there are several potential sources of corruption and outliers when imaging the full image, including blur, uncorrected astigmatism, inhomogeneous illumination, glare from crystalline lens opacity, internal reflections (e.g., from the vitreoretinal interface and lens), transient floaters in the vitreous, and shot noise in the camera. Further, there is often a trade-off between illumination and image quality, and there is strong motivation to introduce as little light as necessary for the patient comfort and health. The robust principal component analysis (RPCA) was introduced specifically to address this issue, decomposing a data matrix into the sum of a matrix containing low-rank coherent structure and a sparse matrix of outliers and corrupt entries. In general, RPCA is more expensive than PCA, requiring an iterative optimization to decompose the original matrix into sparse and low-rank components. Each step of the iteration is as expensive as regular PCA, and typically on the order of tens of iterations are required; however, PCA may be viewed as an offline step in our procedure, so that this additional computational cost is manageable. RPCA has been applied with success in retinal imaging applications to improve image quality. In the examples presented in this work, the data appears to have few enough outliers so that RPCA is not necessary, although it is important to keep RCPA as an option for data with outliers and corruption. Further details on RPCA are contained in the references cited in the manuscript portion of our prior provisional application.

3. FIG. 2 Panel (c): Coarse LocalizationFind the Nearest Target Image in the Low-Dimensional Space

[0063] Given a template S, the coarse position can be estimated by recognizing its nearest target image. The nearest target image in the image space ?.sub.1 should also be the nearest representation of S in the lower dimensional space ?.sub.2. Accordingly, we obtain the low- dimension feature z.sub.s of the template in ?.sub.2:

z.sub.s={tilde over (S)}W,

where s? custom-character .sup.d is the reshaped vector of template S. Let ?(z.sub.s, z) be the Euclidean distance between z.sub.s and a feature z in Z. z* is the nearest target feature of the source image S in ?.sub.2:

z*=arg.sub.zmin?(z.sub.s, z).

[0064] The corresponding target image location is used as the coarse location of S. Ideally, the difference between the coarse location and the ground truth in x and y axes should be less than ?/2 pixels.

[0065] In one of the experiments we performed, PCA outperforms other non-linear dimension reduction methods, while the error is larger than ?/2. The main reason is that the image degradation creates spurious features that contribute to the final classification. To reduce the influence of local features, we implement block PCA to further improve the accuracy of the coarse localization. By computing the PCA of different local patches in the template, the effect of local features, which cannot be located correctly, is reduced. This procedure is shown in FIG. 5. The input is the template S (110) and nearest target image I from coarse localization 102B. To reduce the effect of the local deformation in coarse localization, S and I are split into small patches respectively (step 402A, 402B) and PCA is applied on the small patches (steps 404, 406). Similar to the coarse localization, the nearest target patch is determined for each template patch (step 408). The average position of all chosen target patches is computed as new center position of I (step 401) and new position of I is updated (step 412).

[0066] Obtaining the nearest target image, we crop a larger image at the same position from the full image as the new target image I. In this way, the template can have more overlap with the new target image when there is a large offset between two images. We segment I and the template S into small patches with the cropping function {tilde over (?)}, where the patch size is smaller than the source image with the axial displacement of neighboring patches ? Similarly, all image patches from I are mapped into the low-dimensional space ?.sub.3 with W. Let Z denote the low-dimensional representation of the target image distribution. Each template patch is then mapped to the space with W. The nearest target patch for each template patch is determined with the Euclidean distance as described before. We use the same weight for each region of the template for localization. Let b.sub.m be the mean of the coordinates of selected nearest target patches, which then represents the center of the template on I. Accordingly, the template location on the full image can be estimated and the region is cropped as the image ?. We store the representation of each of the target image patches in lower dimensional space in memory, referred to as dictionary T. The accurate registration is then applied to the template S and image ?. In this way the coarse localization provides an estimate of a good initial point for the accurate registration (panel (d) of FIG. 2).

[0067] In the implementation of the proposed coarse localization, the full (baseline) image is assumed to exist so the dictionary D and dictionary T for each target image can be built in advance. This is the pre-computed part as shown in FIG. 2 panels (a) and (b).

[0068] FIG. 7 is low dimensional representation of block PCA showing the mapping of template patches (represented by solid dots) onto a target image patches (represented by open circled dots) using the procedure of FIG. 5. The T dictionary for each target image saves information of the open circled dots in the figure.

Example Processing Instructions for Coarse Localization:

[0069]

TABLE-US-00001 1 Map template S into space ?.sub.2: z.sub.s = {tilde over (s)}W. 2 Determine closest target image I with corresponding z*: z* = arg min.sub.z ?(z.sub.s, z). z* ? Z. 3 Segment S into [S.sub.p.sup.1, S.sub.p.sup.2, . . . , S.sub.p.sup.n]: S.sub.p.sup.i = {tilde over (?)}(b.sub.i, S): Segment I into [I.sub.p.sup.1, I.sub.p.sup.2, . . . , I.sub.p.sup.n]: I.sub.p.sup.i = {tilde over (?)}(b.sub.i, I): 4 Map target patches I.sub.p.sup.i into space ?.sub.3: Z = I.sub.pW, where I.sub.p is formed with vectorized I.sub.p.sup.i. 5 For each template patch S.sub.p.sup.i: 6 (i)Map S.sub.p.sup.i into space ?.sub.3: {tilde over (z)}.sub.s.sup.i = S.sub.p.sup.iW. 7 (ii)Determine its closest target patch I.sub.p.sup.Idx(i) with index Idx (i). 8 [00003] $b_{m} = \frac{1}{n} {.Math.}_{i = 1}^{n} b_{ldx (i)}, where b_{Idx (i)} is the coordinate of$ selected target patch I.sub.p.sup.Idx(i). 9 return localization region ? = ?(b.sub.m, F)).

4. FIG. 2 Panel (d) Accurate Registration and Location of Template Onto Baseline

[0070] Panel (d) of FIG. 2 includes two sub-steps: (1) image registration between the template image and the nearest target image, found in the procedure of FIG. 2 panel (c), and (2) locate the template onto the full (or baseline) image.

(1) Image Registration Between Template and Nearest Target Image Using Mutual Information (MI) (FIG. 4 step 310)

[0071] In this section, we describe the maximization of MI for multimodal image registration. We define images S and ? as the template and target images, respectively. A transform u is defined to map pixel locations x?S to pixel locations in ?.

[0072] The main idea of the registration is to find a deformation ? at each pixel location x that maximizes the MI between the deformed template image S(u(x)) and the target image ?(x). Accordingly,

[00004] $u_{opt} = \arg_{u} \min \underset{u}{MI} (S (u (x)), \hat{S} (x)),$ $Where$ $MI (S (u (x)), \hat{S} (x)) = \underset{i_{1} ? S}{.Math.} \underset{i_{2} ? \hat{S}}{.Math.} p (i_{1} i_{2}) \log (\frac{p (i_{1}, i_{2})}{p (i_{1}) p (i_{2})}) .$

Here, i.sub.1 and i.sub.2 are the image intensity values in S(u(x)) and ?(x), respectively and p(i.sub.1) and p(i.sub.2) are their marginal probability distributions while p(i.sub.1, i.sub.2) is their joint probability distribution. The probability distributions p(i.sub.1, i.sub.2) reflect the degree to which the greyscale (image intensity) values of each pixel in S(u(x)) and ?(x) are similar; p(i.sub.1, i.sub.2) has a high value (closer to 1) if the pixel values are similar, and low value (closer to 0) if the pixel values are dissimilar. In more detail, in terms of mutual information, based on discrete data like images, each pixel has a grayscale value for 0 to 255. (Although examples herein may describe use of grayscale images for the fundus image work, embodiments are not so limited and may also employ color images as appropriate). We first compute the joint histogram of two images: the joint histogram will be 256x256, which counts the number of corresponding pixels' grayscale from two images. For example, if in the first pixel, one image has a grayscale of 100 and another one is 120, then the joint histogram map (100, 120) will add one. After we finished the joint histogram, the joint probability p(i.sub.1, i.sub.2) can be obtained by normalizing the joint histogram. Then the marginal probability is computed according to:

[00005] $P (i 1) = \underset{i 2}{.Math.} P (i 1, i 2) .$

(2) Locate the Template Onto the Full Image (FIG. 4 step 312)

[0073] In this step images S and ? are accurately registered with maximization of mutual information, as per sub-step (d)(1) above. The location of mage ? on the full image F becomes the estimated displacement of the template S. In our work, the transform u for alignment is given as an affine transformation:

[00006] $u = [\begin{matrix} a_{1 1} & a_{1 2} & t_{x} \\ a_{2 1} & a_{2 2} & t_{y} \\ 1 & 1 & 0 \end{matrix}]$

[0074] It will be appreciated that the processing to create the target images and map them into lower dimensional space (panels (a) and (b) of FIG. 2) can be done off-line, e.g., in advance of receiving a set of template images from a patient. The processing of FIG. 2, panels (c) and (d) described above could be said to be online, performed at the time of collecting the images in the portable fundus camera, or receiving the images from the patient's device at an eye clinic. Once the procedure of FIG. 2 has been performed and the template images matched to the baseline image, a mosaic of the entire retina can be created from the template images, and then the differences between the current retinal image mosaic and the baseline image ascertained from a comparison, e.g., in order to monitor the subject's health or check for onset, progression or improvement in eye disease (see the Applications section below).

Image Mosaicking

[0075] FIG. 6 illustrates an overview of a new image mosaicking method based on the dimension reduction idea. Given a series of images to be stitched (110), PCA can map them into a low dimensional space (step 502), where it is easy to find near images with overlap. The image registration method with MI metric is then applied to adjacent image pairs iteratively to stitch all images.

[0076] As pointed out previously, the full retina image can be stitched into a panorama by using many small templates. Users must capture a series of images in naturally unconstrained eye positions to explore different regions of the retina. It is problematic to determine adjacent images before the registration when we apply area-based registration approaches, because at that time they may not have effective descriptors for matching.

[0077] Related to the dimension reduction in the proposed template matching method, here we present the procedure shown in the table below to learn the positional relationship of images to be stitched. In this way, the adjacent images can be recognized and registered efficiently.

[0078] For a series of small images X.sub.i, we form the matrix X. PCA is applied to X and returns the low-dimensional features for each image in ?.sub.2. The distance between features in ?.sub.2 indicates the distance between images. We find the nearest N (e.g., N=3) target neighbors in the low dimensional space. The nearest neighbor X.sub.j of image X.sub.i is the one with largest overlap; the image pair is then registered with MI-based approach. To improve the algorithm robustness, the first N nearest neighbors for each image are first selected to compute MI with, and we keep the one with the largest metric value. The above procedure can be represented in the following pseudocode.

Processing Instructions: Image Stitching (with Reference to FIG. 6)

[0079] 1 Map images into space ?.sub.2:Z=XW. (step 502) [0080] 2 For each image X.sub.i: [0081] 3 (i).Find the nearest N (e.g., N=3) neighbors X.sub.j minimizing the feature distance ?(Z.sub.i,Z.sub.j). (step 504) [0082] 4 (ii).Compute the Mutual Information between each X.sub.j and X.sub.i and take the adjacent image with highest MI. (step 506, 508) [0083] 5 Panorama R Mosaicking: Align all the adjacent images with mutual information based registration method. (step 510) [0084] 6 Panorama blending. (step 512) [0085] 7 return mosaicked panorama R. (step 514)

APPLICATIONS

[0086] Our method of template matching with baseline images and image mosaicking allows for longitudinal comparisons with previously obtained fundus images of the patient. Such longitudinal comparisons have several applications in the field of ophthalmology as will be described below. Such applications are examples of how the methods of this disclosure can be practiced in a teleophthalmology setting. Other suitable applications are also supported by the embodiments described herein, including options outside of the field of retinal template matching.

Hypertension

[0087] In the retinal symptom of hypertension, the larger arteries constrict and the venous vessels enlarge in diameter. Ophthalmologists can select several detection points on the vessels. With the captured images coming from the patient as per FIG. 1, we construct a mosaic image of the fundus and can detect those images which cover the selected detection points. Then, the vessel width at the select points can be compared with the previous state by making measurements of vessel width and comparing them with previously stored fundus images of the patient. For more precise vessel width measurement, our method of FIG. 2 can be combined with vessel segmentation. The vessel width corresponding to each selected point is obtained by segmentation around the mapped location. The vessel segmentation here then is applied on very small retina patches around the point, which is more robust and accurate than segmentation of wide FOV retina images.

Abusive Head Trauma

[0088] The biomarkers of abusive head trauma (AHT) is another example. The most common retinal manifestation of AHT is multiple retinal hemorrhages in multiple layers of the retina. Matching the captured images onto the full retina image, the hemorrhagic spots can be easily segmented after the subtraction of the current retina regions and previous status. The AHT then can be recognized automatically when such spots are detected. This method permits identification of AHT from images obtained with portable fundus cameras.

Diabetic Retinopathy

[0089] The obvious symptoms of diabetic retinopathy (DR) are retina hemorrhages and the presence of exudate. They can be monitored follow the similar process of AHT screening.

Glaucoma

[0090] Glaucoma can cause the optic nerve cup to enlarge. Our matching method can automatically select the images that cover the optic nerve. The following segmentation can be easily implemented and a computation of the optic cup diameter performed. Enlargement of the optic nerve cup over time can be ascertained by comparing the computations from a current image with an image from a previous point in time.

Use RetinaMatch as a General Image Template Matching Method

[0091] Besides the retina images, the technique of RetinaMatch can be used in other type of image template matching tasks. Note that our method of FIG. 2 does not use the specific features of retina. Rather, our method is a combination of coarse localization and accurate localization based on MI. The accurate localization can be replaced by any other existing image registration method, and the coarse localization can always reduce the error caused by the small template size and sparse image features. Thus, the procedure of FIG. 2 is generally applicable to the problem of matching small field of view images to a previously obtained wide field of view image.

Use RetinaMatch for Camera Localization

[0092] Having the image of the full view, our method of FIG. 2 can be used for camera localization when matching the captured field of view onto the full or baseline image. In the case of endoscopic guidance of therapy by a surgical robot, the current limited-sized FOV can be matched onto the panorama for endoscope localization. Thus, this image template matching technique can be used to create a more reliable closed-loop control for the robot arm and surgical tool guidance. For example, after registering the template images the resulting mosaicked image can be inspected, e.g., to locate a surgical tool in the eye.

Augmented Reality (AR), Eye Glasses, etc. and Monitoring Changes Over Time

[0093] A retinal imaging system (e.g., consumer grade camera with ancillary imaging device, e.g., D-eye) can be portable and further, can be worn as integrated into, for example, glasses, or an Augmented Reality (AR), Virtual Reality (VR) and/or Mixed Reality (MR) headset, allowing a series of images to be taken and analyzed, either daily, weekly, monthly, or when the user or ophthalmologist requests. These measurements can be discrete, continual, but in a time series and analyzed longitudinally over the increasing time period. Change in a retina can detected by registering and comparing the captured small FOV images to a full baseline retina image using our template matching method.

[0094] AR, VR and/or MR devices can be used to optically scan the retina to form images and thereby acquire the template images. Even more pragmatically, spectacles or sunglasses can be used because of the smaller size, lower costs, and increasing utility to the user. A scanned light beam entering the pupil of the eye and striking the retina to form video rate images perceived by the user's brain can also be used to acquire images of high contrast structures, such as the vasculature containing blood.

[0095] A device can operate without major changes in performance during its lifetime and can be used as a monitor of the condition of a user's eye. By comparing retinal images from such a device over time, the changes in the user's optical system (such as cornea, intraocular lens, retina, and liquid environments) can be monitored to alert the user in possible health changes. For example, these changes can be gradual, like increasing light scattering from the crystalline or intraocular lens due to cataract formation, or the appearance and structural changes in the retina due to diabetic retinopathy. In addition, chronic diseases which may have variations over time in the blood vessel size and shape in conditions of hypertension are another example. Acute changes such as bleeding within the retinal can indicate brain trauma. Relative and repeatable changes in number, size, and shape of structures in the retinal images may indicate that the measured change is due to a particular disease type and not that the AR, VR, MR, glasses, or other type of monitoring device has slowly or suddenly changed its imaging performance or has become unstable.

[0096] However, in many healthy users the optical system will be unchanging over time. In this case, the vasculature of the retina can be used as a test target for detecting optical misalignments, focus errors, light scanning errors and distortions, non-uniform and color imbalance in the illumination, and aberrations in the imaging system. This situation can occur if the monitoring device, such as an AR, VR, or MR device is degraded due to mechanical impact, breakage, applied stresses, applied vibration, thermal changes, and opto-electrical disruption or interference. These changes can be observed in a measurable change to the current retinal images compared to before these changes happened to the AR, VR or MR device. Retinal vasculature images can be used to measure the level of image distortion within an imaging system by resolving a specific pattern of high contrast lines. By processing the retinal images or their panoramic mosaic into binary (black and white) high contrast by intensity thresholding and/or segmentation, the vascular network can be made into a RetinaTest Target.

[0097] By measuring the change in the images of the RetinaTest Target before and after a change in performance of the AR, MR or MR device, a calibration measurement of imaging performance can be made dynamically. This calibration measurement can be transmitted to a local computing device or to a remote location for analysis and diagnosis of the change of performance of the AR, VR or MR device. Furthermore, the calibration measurement can be updated when corrective actions are implemented within the AR, VR or MR device which can be used in a feedback loop as an error signal for the purpose of regaining optimal performance of the AR, VR or MR device. Since the blood has a distinct optical absorption spectrum in the arteries and veins and scattering differences can be determined, the calibrated imaging performance should be performed across the spectral range of visible to near infrared wavelengths being used by the AR, VR or MR device.

Gaze Tracking

[0098] The acquisition of template images and registration onto a baseline image as described above can be further used to determine the gaze position the user. In particular, as the user's gaze changes position, the angle between the optical axis of the camera and the fovea or other structures at the back of the eye will change accordingly, and by measuring the shift in this angle the gaze position can be determined.

[0099] While the above discussion has been directed primarily to detecting changes in the retina and monitoring for change, progression, occurrence etc. of eye disease, more generally the present methods can be used to monitor for other conditions (e.g., diabetes, etc.) that are not retinal conditions per se, but that may be measured in the retina. Furthermore, our methods can be also used to monitor improvement in a condition of the retina, for example, monitor effectiveness of a treatment or therapy, in addition to detecting onset or worsening of disease.

[0100] Other applications are of course possible as would be apparent to one skilled in the art.

[0101] The manuscript portion of our priority U.S. provisional application includes data regarding experiments we conducted using our template matching method, including validation on a set of simulated images from the STARE dataset, and in-vivo templated images captured from the D-eye smartphone device matched to full fundus images and mosaicked full images. The interested reader is directed to that portion of the provisional application for further details.

[0102] As used in the claims, the term head-worn retinal imaging device is intended to refer broadly to any device worn or supported by the head which includes a detector or camera and associated optical components designed for imaging the retina, including but not limited to glasses, and augmented, mixed or virtual reality headsets. As another example, devices which include scanned light (from laser or LED) display using a near-infrared (NIR) wavelength can also be a camera with the addition of a fast NIR detector, and such a device could be adapted as a head-worn retinal imaging device.

[0103] The particulars shown herein are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of various embodiments of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for the fundamental understanding of the invention, the description taken with the drawings and/or examples making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.

[0104] As used herein and unless otherwise indicated, the terms a and an are taken to mean one, at least one or one or more. Unless otherwise required by context, singular terms used herein shall include pluralities and plural terms shall include the singular. Unless the context clearly requires otherwise, throughout the description and the claims, the words comprise, comprising, and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in the sense of including, but not limited to. Words using the singular or plural number also include the plural and singular number, respectively. Additionally, the words herein, above, and below and words of similar import, when used in this application, refer to this application as a whole and not to any particular portions of the application.

SYSTEM AND METHOD FOR RETINA TEMPLATE MATCHING IN TELEOPHTHALMOLOGY

Assignee

Inventors

Cpc classification

Classification Explorer

A61B2576/02

HUMAN NECESSITIES

Classification Explorer

G06V40/193

PHYSICS

Classification Explorer

A61B3/0058

HUMAN NECESSITIES

Classification Explorer

A61B3/0025

HUMAN NECESSITIES

Classification Explorer

G06V40/197

PHYSICS

Classification Explorer

G06T2207/30041

PHYSICS

Classification Explorer

A61B3/1208

HUMAN NECESSITIES

Classification Explorer

G06T3/4038

PHYSICS

Classification Explorer

A61B3/14

HUMAN NECESSITIES

Classification Explorer

A61B3/102

HUMAN NECESSITIES

Classification Explorer

G06V10/76

PHYSICS

Classification Explorer

G06T2207/10101

PHYSICS

Classification Explorer

A61B3/12

HUMAN NECESSITIES

Classification Explorer

G06T7/0016

PHYSICS

International classification

Classification Explorer

A61B3/12

HUMAN NECESSITIES

Classification Explorer

G06V40/18

PHYSICS

Classification Explorer

A61B3/00

HUMAN NECESSITIES

Classification Explorer

A61B3/14

HUMAN NECESSITIES

Classification Explorer

G06T3/4038

PHYSICS

Abstract

Claims

Description