System and method for automated funduscopic image analysis

Abstract

A system and method of classifying images of pathology. An image is received, normalized, and segmented normalizing the image; into a plurality of regions; A disease vector is automatically determining for the plurality of regions with at least one classifier comprising a neural network. Each of the respective plurality of regions is automatically annotated, based on the determined disease vectors. The received image is automatically graded based on at least the annotations. The neural network is trained based on at least an expert annotation of respective regions of images, according to at least one objective classification criterion. The images may be eye images, vascular images, or funduscopic images. The disease may be a vascular disease, vasculopathy, or diabetic retinopathy, for example.

Claims

1. A method of classifying eye images, comprising: normalizing and segmenting a received eye image into a normalized and segmented plurality of regular regions; automatically determining an eye disease vector for each of the plurality of regions with at least one classifier comprising a neural network; automatically annotating each of the respective plurality of regions, based on the determined eye disease vectors; and automatically determining a grade of the received eye image with respect to at least two grades, dependent on a probability of a type I error of the grade, and a probability of a type II error of the grade, based on at least the annotations, wherein the neural network is trained based on at least an expert annotation of respective regions of eye images, according to at least one objective classification criterion.

2. The method according to claim 1, wherein the neural network comprises a plurality of hidden layers.

3. The method according to claim 2, wherein the neural network comprises a deep neural network.

4. The method according to claim 1, wherein the neural network comprises a deep convolutional neural network.

5. The method according to claim 1, wherein the neural network comprises a recurrent neural network, further comprising receiving a second vascular image acquired at a time different from the vascular image; and outputting information dependent on at least a change in the eye image over time.

6. The method according to claim 1, further comprising receiving at least an expert annotation of a quality of respective regions of a retinal image.

7. The method according to claim 1, wherein the at least one classifier comprises a multi-class support vector machine classifier.

8. The method according to claim 1, wherein the at least one classifier comprises a Gradient Boosting Classifier.

9. The method according to claim 1, wherein the at least one classifier classifies a respective region based on the eye disease vectors of a plurality of respective regions.

10. The method according to claim 1, wherein the at least one classifier classifies a respective region based on the eye disease vectors of a plurality of contiguous regions.

11. The method according to claim 1, wherein the at least one classifier comprises a classifier for determining a region having indicia of diabetic retinopathy.

12. The method according to claim 11, wherein the grade comprises a degree of diabetic retinopathy with respect to at least three different grades.

13. The method according to claim 1, further comprising outputting an image representing the classification of regions of the eye image.

14. The method according to claim 1, wherein the eye images comprise funduscopic images, and the annotations comprise at least an indication of diabetic retinopathy within a respective region of a respective funduscopic image.

15. A system for classifying eye images, comprising: an input configured to receive at least one eye image; a memory con figured to store information defining a neural network trained based on at least an expert annotation of respective regions of a plurality of retinal images, according to at least one objective classification criterion; at least one automated processor, configured to: normalize and segment the eye image into a plurality of regular regions; determine an eye disease vector for each of the plurality of regions of the eye image with at least one classifier comprising the defined neural network; annotate each of the respective plurality of regions, based on the determined eye disease vectors; grade the received eye image with respect to at least two grades, based on at least the annotations; and determine a probability of correctness of the grade dependent on a probability of a type I error of the grade and a probability of a type II error of the grade; and an output configured to communicate the grade.

16. The system according to claim 15, wherein the neural network comprises a deep neural network having a plurality of hidden layers.

17. The system according to claim 15, wherein the neural network comprises a recurrent neural network, the input being further configured to receive a second eye image acquired at a time different from the eye image; and the at least one automated processor being further configured to analyze at least a change in the eye image over time.

18. The method according to claim 15, wherein the neural network comprises a recurrent neural network, the grade comprises a severity of diabetic retinopathy, and the grade is selected from at least three different grades.

19. A computer readable medium, storing non-transitory instructions for controlling at least one automated processor, comprising: instructions for determining a pathology vector relating to existence of pathology in an anatomical region, for each of a plurality of regular portions of a segmented normalized image of the anatomical region, with at least one classifier comprising a neural network trained based on at least an expert annotation of respective portions of segmented normalized images of the anatomical region, according to at least one objective classification criterion for the pathology; instructions for automatically annotating each of the respective plurality of portions, based on the determined pathology vectors; and instructions for automatically grading the received image with respect to the pathology based on the automatically annotated respective plurality of portions, with respect to at least two grades and determining a probability of correctness of the grading comprising a probability of a type I error and a probability of a type II error.

20. The computer readable medium according to claim 19, wherein the neural network comprises a recurrent neural network, the pathology comprises diabetic retinopathy, and the grading is with respect to at least three different grades.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows a flowchart according to an embodiment of the invention.

(2) FIGS. 2A-2D show retinal images.

(3) FIGS. 3 and 4 show a retinal image and round and square regions thereof, respectively.

(4) FIGS. 5-23 chow various flowcharts according to respective embodiments of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

(5) FIG. 1 illustrates an embodiment of the invention. Given a retinal fundus image I, block 100 performs image normalization to produce a normalized image I. This normalization step involves identifying the retinal disc in I and resizing it to a fixed size, i.e., to a diameter of p pixels in an image of pp pixels, where, typically, p=1024. This is illustrated in FIGS. 2A-2D, where FIGS. 2A and 2C show pre-normalized retinal fundus images I and FIGS. 2B and 2D show the corresponding normalized retinal fundus images I. Automatic detection of the retinal disc is a straightforward task which entails the detection of a relatively bright disc on a relatively dark and uniform background. As such, a number of well-known image processing techniques may be used, such as edge detection, circular Hough transform, etc. More details of such techniques may be found in Russ, John C. The image processing handbook. CRC press, 2016.

(6) Next, in block 110 of FIG. 1, retinal regions are extracted from normalized image I. This process entails extracting small retinal regions from the retinal disc. In some embodiments of the invention these regions may be fully contained in the retinal disk, while in other embodiments of the invention these regions may be partially contained in the retinal disk, for example containing at least 50% or 75% of retinal pixels. In some embodiments of the invention these regions may be circular and with a diameter of q pixels, where q=32 or q=64, while in other embodiments of the invention these regions may be square and with size qq pixels, where q=32 or q=64. In alternative embodiments of the invention, a different region geometry may be used, for example a hexagon region.

(7) More generally, a continuous region of pixels is defined, of arbitrary geometry. Indeed, in some cases, the region may be topologically discontiguous, or the analysis may be of the image information in a domain other than pixels.

(8) FIG. 3 illustrates the extraction of two circular retinal regions R.sub.=R(x,y) centered on (x,y)=(579,209) and R.sub.=R(x,y) centered on (x,y)=(603,226) from a normalized image I, while FIG. 4 illustrates the extraction of two square retinal regions R.sub.=R(x,y) centered on (x,y)=(757,327) and R.sub.=R(x,y) centered on (x,y)=(782,363) from a normalized image I. In all cases, the regions are typically extracted in a regular pattern and with a significant degree of overlap; for example starting at the top-most left-most region, extract successive regions by moving with a stride of s pixels to the right up to the top-most right-most region, then move s pixels down, extract successive regions from the left-most to the right-most moving right by s pixels, and so on. Typically s takes a low value, i.e. typically between 1 and a few pixels. Different embodiments of the invention may use a higher value for s and/or may extract non-overlapping regions. In all cases, block 110 produces a set of n retinal regions R.sub.0, . . . , R.sub.n1, collectively denoted by R and with R.sub.k=R(x,y) denoting a region centered on (x,y) of normalized image I.

(9) We now consider the operation of block 120 of FIG. 1. Block 120 of FIG. 1 analyzes all the retinal regions independently and for each region R.sub.k produces a DR lesion classification vector D.sub.k=(D.sub.k.sup.0, . . . , D.sub.k.sup.l1) where l is the number of classes that the classifier of block 120 has been trained on and D.sub.k.sup.c is the class membership probability of region k for class c.

(10) As a first example, in one embodiment of the invention, the classifier in block 120 may be trained to classify regions to l=2 classes, with c=0 the Healthy class label and c=1 the Diseased class label.

(11) As a second example, in a different embodiment of the invention, the classifier in block 120 may be trained to classify regions to l>2 classes, with c=0 the Healthy class label, c=1 the Micro Aneurism class label, c=2 the Hard Exudate class label, . . . , c=l1 the Laser Scar class label. Thus, in such an embodiment, different types of lesions, artifacts or scars that may be present in the retinal fundus image are assigned separate class labels.

(12) Thus, different embodiments of the invention may employ classifiers trained on different numbers of classes to achieve the highest possible performance in detecting DR lesions and/or identifying the types of detected DR lesions. In all cases, in alternative embodiments of the invention, the classifier in block 120 may be configured to output D.sub.k.sup.c as a binary 0 or 1 value rather than a class membership probability, in which case D.sub.k is not a class probability distribution but a classification decision vector.

(13) We now consider the architecture of block 120 of FIG. 1 in more detail. In one embodiment of the invention, block 120 of FIG. 1 comprises the architecture illustrated in FIG. 5. In block 500, retinal regions are optionally normalized. This optional normalization may, for example, involve mean subtraction, resulting in the mean intensity of each color channel of each retinal region taking a value of 0, or some other mean adjustment, resulting in the mean intensity of each color channel of each retinal region taking a predefined specific value, or some geometric normalization, for example rotation normalization, whereby each region is rotated so that its center of mass lies at a specific angle. Thus, for each retinal region R.sub.k, block 500 produces the normalized region R.sub.k. Then, in block 510 a CNN is used to classify each region R.sub.k and produce a classification vector D.sub.k. The CNN has been previously trained using a large number of training samples, i.e., training labelled regions. As seen earlier, different embodiments of the invention may employ CNNs trained on only two classes, for example Healthy and Diseased, or more classes, for example Healthy, Micro Aneurism, Hard Exudate and so on, and, in all cases, the CNN may output D.sub.k as a class probability distribution or as a classification decision vector.

(14) FIG. 6 illustrates the architecture of block 510 of FIG. 5 in more detail. As can be seen in FIG. 6, each region R.sub.k is processed independently to produce its classification vector D.sub.k. The CNN of FIG. 6 is a common architecture, employing a number of convolution layers, which convolve their input with learned convolution masks, interleaved with pooling layers, i.e. spatial subsampling layers, followed by a number of fully connected layers to produce the DR lesion classification vector D.sub.k for region R.sub.k. The theory of CNNs (en.wikipedia.org/wiki/Convolutional_neural_network) is not covered here but there are numerous publications on the topic, for example Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 2012.

(15) In another embodiment of the invention, block 120 of FIG. 1 comprises the architecture illustrated in FIG. 7. Block 700 is optional and operates in the same fashion as block 500 of FIG. 5. Thus, for each retinal region R.sub.k, block 700 produces the normalized region R.sub.k. Then, in block 710 a CNN is used to produce a high-dimensional vector of distinguishing features V.sub.k for each region R.sub.k.

(16) FIG. 8 illustrates the architecture of block 710. As can be seen in FIG. 8, the CNN used to produce the feature vector V.sub.k is the same as the CNN of FIG. 6 with V.sub.k taken at the output of one of the layers preceding the final fully connected classification layer. Then, back to FIG. 7, in block 720 a classifier is used to classify each feature vector V.sub.k and produce a classification vector D.sub.k. There are various options for the classifier of block 720, such as a binary or multi-class Support Vector Machine (SVM) or Gradient Boosting Classifier (GBC), previously trained on the feature vectors of a set of training labelled regions. As seen earlier, different embodiments of the invention may employ classifiers trained on two or more classes and may output D.sub.k as a class probability distribution or as a classification decision vector. The theory of SVMs (en.wikipedia.org/wiki/Support_vector_machine) and GBCs (en.wikipedia.org/wiki/Gradient_boosting) are not covered here but there are numerous publications on the topics, for example Burges, Christopher J C. A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery 2.2 (1998): 121-167 and Schapire, Robert E., and Yoav Freund. Boosting: Foundations and algorithms. MIT press, 2012.

(17) In another embodiment of the invention, block 120 of FIG. 1 comprises the architecture illustrated in FIG. 9. Block 900 is optional and operates in the same fashion as block 500 of FIG. 5 or block 700 of FIG. 7. Thus, for each retinal region R.sub.k, block 900 produces the normalized region R.sub.k. Then, in block 910 a dimensionality reduction technique, such as Linear Discriminant Analysis (LDA en.wikipedia.org/wiki/Linear_discriminant_analysis) or Principal Components Analysis (PCA en.wikipedia.org/wiki/Principal_component_analysis), is used to produce a reduced dimensionality representation of each region R.sub.k which is used as a feature vector V.sub.k for that region. Then, in block 920 a classifier is used to classify each feature vector V.sub.k and produce a classification vector D.sub.k. As in the previous embodiment, binary or multi-class SVM or Gradient Boosting Classifier, previously trained on the feature vectors of a set of training labelled regions, are typically good choices for the classifier. As seen earlier, different embodiments of the invention may employ classifiers trained on two or more classes and may output D.sub.k as a class probability distribution or as a classification decision vector.

(18) We now consider the operation of block 130 of FIG. 1. Block 130 of FIG. 1 analyzes all the retinal regions independently and for each region R.sub.k produces a quality classification vector Q.sub.k=(Q.sub.k.sup.0, . . . , Q.sub.k.sup.m1) where m is the number of classes that the classifier of block 130 has been trained on and Q.sub.k.sup.z is the class membership probability of region k for class z. As a first example, in one embodiment of the invention, the classifier in block 130 may be trained to classify regions to m=2 classes, with z=0 the Good Quality class label and z=1 the Bad Quality class label. As a second example, in a different embodiment of the invention, the classifier in block 130 may be trained to classify regions to m>2 classes, with z=0 the Good Quality class label, z=1 the Bad Illumination class label, z=2 the Bad Contrast class label, . . . , z=m1 the Lens Flare class label.

(19) Thus, in such an embodiment, different types of quality artifacts and imaging artifacts that may be present in the retinal fundus image are assigned separate class labels. Various embodiments of the invention may employ classifiers trained on different numbers of classes to achieve the highest possible performance in assessing the quality of retinal regions. It should be noted that choice of the number of classes for the region quality classifier is independent from the choice of the number of classes for the DR lesion classifier. In all cases, in alternative embodiments of the invention, the classifier in block 130 may be configured to output Q.sub.k.sup.z as a binary 0 or 1 value rather than a class membership probability, in which case Q.sub.k is not a class probability distribution but a classification decision vector.

(20) We now consider the architecture of block 130 of FIG. 1 in more detail. In one embodiment of the invention, block 130 of FIG. 1 comprises the architecture illustrated in FIG. 10. In block 1000, retinal regions are optionally normalized. This optional normalization may, for example, involve intensity or geometric normalizations. Thus, for each retinal region R.sub.k, block 1000 produces the normalized region R.sub.k. Then, in block 1010 a CNN is used to classify each region R.sub.k according to its quality and produce a quality classification vector Q.sub.k. The CNN has been previously trained using a large number of training samples, i.e., training labelled regions. As seen earlier, different embodiments of the invention may employ CNNs trained on only two classes, for example Good Quality and Bad Quality, or more classes, for example Good Quality, Bad Illumination, Bad Contrast and so on, and, in all cases, the CNN may output Q.sub.k as a class probability distribution or as a classification decision vector.

(21) FIG. 11 illustrates the internal architecture of the CNN of block 1010; this is substantially the same as the CNN architecture of FIG. 6 described earlier, although network parameters such as the size, stride and number of filters, the learned convolution masks, the number or convolution layers, the number of fully connected layers and so on may be different.

(22) In another embodiment of the invention, block 130 of FIG. 1 comprises the architecture illustrated in FIG. 12. Block 1200 is optional and operates in the same fashion as block 1000 of FIG. 10. Thus, for each retinal region R.sub.k, block 1200 produces the normalized region R.sub.k. Then, in block 1210 a CNN is used to produce a high-dimensional vector of distinguishing features V.sub.k for each region R.sub.k.

(23) FIG. 13 illustrates the internal architecture of this CNN. As can be seen in FIG. 13, the CNN used to produce the feature vector V.sub.k is the same as the CNN of FIG. 11 with V.sub.k taken at the output of one of the layers preceding the final fully connected classification layer. Then, back to FIG. 12, in block 1220 a classifier is used to classify each feature vector V.sub.k and produce a classification vector Q.sub.k. This classifier may, for example, be a binary or multi-class SVM or GBC, previously trained on the feature vectors of a set of training labelled regions. As seen earlier, different embodiments of the invention may employ classifiers trained on two or more classes and may output Q.sub.k as a class probability distribution or as a classification decision vector.

(24) In another embodiment of the invention, block 130 of FIG. 1 comprises the architecture illustrated in FIG. 14. As can be seen in FIG. 14, different techniques are employed to assess the quality of each region R.sub.k according to different criteria, including but not limited to illumination, contrast, blur, etc. Various well known image processing techniques may be employed to perform this region quality analysis. The illumination quality metric of block 1400 may, for example, be calculated as min(abs((.sub.H+.sub.L2f)/(.sub.H.sub.L)), 1)[0,1] where f denotes the average region intensity and .sub.L and .sub.H the lowest and highest acceptable intensity for a region, respectively. The contrast quality metric of block 1410 may be calculated using a suitable technique, for example one of the techniques described in Tripathi, Abhishek Kumar, Sudipta Mukhopadhyay, and Ashis Kumar Dhara. Performance metrics for image contrast. Image Information Processing (ICIIP), 2011 International Conference on. IEEE, 2011, normalized to [0,1]. Similarly, the blur quality metric of block 1420 may be calculated using a suitable technique, for example the technique described in Marziliano, Pina, et al. A no-reference perceptual blur metric. Image Processing. 2002. Proceedings. 2002 International Conference on. Vol. 3. IEEE, 2002, normalized to [0,1].

(25) Finally, block 1430 computes a Good Quality metric based on the illumination, contract, blur, etc. metrics, for example as Q.sub.k.sup.0=1max(Q.sub.k.sup.1, Q.sub.k.sup.2, . . . , Q.sub.k.sup.j1) and produces the final quality assessment vector Q.sub.k for each region R.sub.k.

(26) We now consider the operation of block 140 of FIG. 1. Block 140 of FIG. 1 analyzes the DR lesion classification vectors D.sub.k and quality classification vectors Q.sub.k and produces conditioned DR lesion classification vectors D.sub.k. This operation entails three main steps, namely: (1) DR lesion classification vector spatial probability adjustment. This step is optional. Each DR lesion classification vector D.sub.k corresponds to a retinal region R.sub.k and, as seen earlier, each R.sub.k=R(x,y) is retinal region centered on pixel (x,y) of normalized image I. Thus, for each DR lesion classification vector D.sub.k corresponding to a retinal region R.sub.k centered on pixel (x,y) of I it is possible to adjust the values of D.sub.k based on the DR lesion classification vectors of its surrounding regions, for example those centered within a distance d of (x,y). This adjustment may, for example, take the form of mean filtering, median filtering, etc. and is beneficial in creating more robust and reliable DR lesion classification vectors for a retinal fundus image. (2) Quality classification vector probability adjustment. This step is optional. The principle of this identical to step (1) above, but the operation are performed on the quality classification vectors Q. (3) DR lesion classification vector adjustment based on quality classification vectors.

(27) We now consider the architecture of block 140 of FIG. 1 in more detail. FIG. 15 illustrates the architecture of block 140 of FIG. 1. Block 1500 implements DR lesion classification vector spatial probability adjustment. That is, for each DR lesion classification vector D.sub.k corresponding to a retinal region R.sub.k centered on pixel (x,y) of I, the values of D.sub.k are adjusted based on the DR lesion classification vectors of its surrounding regions centered within a distance d of (x,y). This adjustment may, for example, take the form of mean, median, min, max or other filtering, to produce adjusted DR lesion classification vector D.sub.k. An alternative way of viewing this is as follows: For image I, l spatial probability maps may be produced, where l is the number of DR classification labels in D. For example, for l=2 classes, with c=0 the Healthy class label and c=1 the Diseased class label, a Healthy probability map .sup.DRP.sup.c and a Diseased probability map .sup.DR P.sup.1 may be produced. Then, for each DR lesion classification vector D.sub.k corresponding to a retinal region R.sub.k centered on pixel (x,y) of I, .sup.DRP.sup.c(x/s,y/s)=D.sub.k.sup.c where s is the sampling stride used in region extraction. Each .sup.DRP.sup.c may then be spatially processed, e.g. with a mean, median, min, max or other filter, or other morphological operations, such as dilation, erosion, etc. to produce adjusted probability maps .sup.DRP.sup.c, which may then be mapped to adjusted DR lesion classification vectors D.sub.k. In a similar fashion to block 1500, block 1510 implements quality classification vector spatial probability adjustment. Finally, block 1520 of FIG. 15 performs DR lesion classification vector adjustment based on quality classification vectors. There are various possibilities in how this adjustment may be performed.

(28) As one example, for each D.sub.k and for each class label c out of l DR classes, this adjustment may be performed as

(29) $D_{k}^{c} = {\begin{matrix} D_{k}^{c} & if Q_{k}^{0} 0.5 \\ 0 & if Q_{k}^{0} < 0.5 \end{matrix}$

(30) where Q.sub.k.sup.0 denotes the Good Quality class probability for region R.sub.k. In essence, with the above relation, if the Good Quality class probability for region R.sub.k is less than 0.5, all DR classification vector probabilities for region R.sub.k are reduced to 0, i.e. the system does not deliver any DR classification probabilities for that region because the quality is too poor.

(31) As another example, for each D.sub.k and for each class label c out of l DR classes, this adjustment may be performed as

(32) $D_{k}^{c} = {\begin{matrix} D_{k}^{c} & if Q_{k}^{0} 0.5 \\ (Q_{k}^{0} + 0.25) D_{k}^{c} & if 0.5 > Q_{k}^{0} 0.25 \\ 0 & if Q_{k}^{0} < 0.5 \end{matrix}$

(33) where Q.sub.k.sup.0 denotes the Good Quality class probability for region R.sub.k. With this relation, the drop off in the DR classification probabilities down to 0 due to poor quality is more gradual.

(34) In the examples above, each D.sub.k is adjusted based on the corresponding Q.sub.k. In alternative embodiments of the invention, each D.sub.k corresponding to a retinal region R.sub.k centered on pixel (x,y) of I may be adjusted based on the corresponding Q.sub.k as well as the quality classification vectors of surrounding regions, for example those centered within a distance d of (x,y).

(35) We now consider the operation of block 150 of FIG. 1. Block 150 of FIG. 1 analyzes the DR lesion classification vectors D and produces a final retinal fundus image-level DR classification vector C=(C.sup.0, . . . , C.sup.w1) where w is the number of classes and C.sup.v is the class membership probability for class v. In one embodiment of the invention, w=2 with v=0 the Healthy class label and v=1 the Diseased class label. As a second example, in a different embodiment of the invention, w=4 classes, with each class corresponding to DR medical grade. In all cases, in alternative embodiments of the invention, C.sup.v may take a binary 0 or 1 value, in which case C is not a class probability distribution but a classification decision vector.

(36) We now consider the architecture of block 150 of FIG. 1 in more detail. In one embodiment of the invention, block 150 of FIG. 1 comprises the architecture of FIG. 16. In block 1600 of FIG. 16, the DR lesion classification vector probabilities are thresholded, for example as

(37) $D_{k}^{c} = {\begin{matrix} 1 & if D_{k}^{c}_{D}^{c} \\ 0 & if D_{k}^{c} <_{D}^{c} \end{matrix}$

(38) Then, in block 1610, the D undergoes spatial processing to produce D. This block operates in substantially the same fashion as block 1500 of FIG. 15, performing operations such as median filtering, erosion, dilation, and so on. In alternative embodiments of the invention, block 1610 may be skipped, while in different embodiments of the invention the order of blocks 1600 and 1610 may be reversed. Then, in block 1620, the number of regions for each class label l in D is counted. As seen earlier, in some embodiments of the invention, l=2 classes, with c=0 the Healthy class label and c=1 the Diseased class label. In different embodiments of the invention, l>2 classes, with c=0 the Healthy class label, c=1 the Micro Aneurism class label, c=2 the Hard Exudate class label, . . . , c=l1 the Laser Scar class label. Thus, block 1620 produces a vector G=(G.sup.0, . . . , G.sup.l1) where each element G.sup.c is a region count for the corresponding label c. Then, based on the region count vector G, block 1630 produces a final retinal fundus image classification decision vector C=(C.sup.0, . . . , C.sup.w1) where w is the number of classes and C.sup.v is the decision for class v. As seen earlier, in one embodiment of the invention, w=2 with v=0 the Healthy class label and v=1 the Diseased class label. In a different embodiment of the invention, w=4 classes, with each class corresponding to DR medical grade, which can be established based on the numbers of different types of DR lesions in the retinal fundus image.

(39) In another embodiment of the invention, block 150 of FIG. 1 comprises the architecture of FIG. 17. In block 1700 of FIG. 17, the DR lesion classification vectors D are converted into l 2D probability maps, where l is the number of DR classification labels in D. The process of converting a vector of DR classification vectors into 2D probability maps is substantially the same as described earlier for block 1500 of FIG. 15. In creating the 2D probability maps, block 1700 may optionally (i) change the dynamic range of the probabilities, for example from a real range of [0,1] to an integer range [0,255], and (ii) subsample the 2D probability maps to fixed resolution of tt pixels, for example t=256 pixels. Thus, block 1700 generates l 2D probability maps P=(P.sup.0, . . . , P.sup.l1). Then in block 1710 a CNN is used to classify the probability maps P and produce a final retinal fundus image-level DR classification vector C=(C.sup.0, . . . , C.sup.w1) where w is the number of classes and C.sup.v is the class membership probability for class v. The CNN has been previously trained using a large number of training samples. As seen earlier, different embodiments of the invention may employ CNNs trained on different number of classes, for example w=2 with v=0 the Healthy class label and v=1 the Diseased class label, or w=4 classes, with each class corresponding to DR medical grade. In all cases, in alternative embodiments of the invention, C.sup.v may take a binary 0 or 1 value, in which case C is not a class probability distribution but a classification decision vector. FIG. 18 illustrates the internal architecture of the CNN of block 1710; this is substantially the same as the CNN architectures of FIG. 6 and FIG. 11 described earlier, although network parameters such as the size, stride and number of filters, the learned convolution masks, the number or convolution layers, the number of fully connected layers and so on may be different.

(40) In another embodiment of the invention, block 150 of FIG. 1 comprises the architecture illustrated in FIG. 19. Block 1900 operates in the same fashion as block 1700 of FIG. 17. Then, in block 1910 a CNN is used to produce a high-dimensional vector of distinguishing features V. FIG. 20 illustrates the internal architecture illustrates the internal architecture of this CNN. As can be seen in FIG. 20, the CNN used to produce the feature vector V is the same as the CNN of FIG. 18 with V taken at the output of one of the layers preceding the final fully connected classification layer. Then, back to FIG. 19, in block 1920 a classifier is used to classify the feature vector V and produce the classification vector C. This classifier may, for example, be a binary or multi-class SVM or Gradient Boosting Classifier, previously trained on the feature vectors of a set of training samples. As seen earlier, different embodiments of the invention may employ classifiers trained on two or more classes and may output C as a class probability distribution or as a classification decision vector.

(41) FIG. 21 illustrates an alternative embodiment of the invention. There, blocks 2100 and 2110 operate in substantially the same fashion as blocks 100 and 110 of FIG. 1, performing image normalization and region extraction.

(42) We now consider the operation of block 2120 of FIG. 21. Block 2120 of FIG. 21 analyzes all the retinal regions independently and for each region R.sub.k produces a joint DR lesion/quality classification vector A.sub.k=(D.sub.k, Q.sub.k)=(D.sub.k.sup.0, . . . , D.sub.k.sup.l1, Q.sub.k.sup.0, . . . , Q.sub.k.sup.m1) where l is the number of DR lesion classes as described previously and m is the number of region quality classes as described previously. Thus, in the training of the classifier of block 2120, each training sample is assigned both a DR lesion class label and a quality class label, allowing the classifier to produce a joint DR lesion/quality classification vector A.sub.k=(D.sub.k, Q.sub.k)=(D.sub.k.sup.0, . . . , D.sub.k.sup.l1, Q.sub.k.sup.0, . . . , Q.sub.k.sup.m1) for each region R.sub.k. As seen previously, the classifier in block 2120 may be configured to output binary 0 or 1 value rather than class membership probabilities.

(43) We now consider the architecture of block 2120 of FIG. 21 in more detail. Block 2120 of FIG. 21 comprises the architecture illustrated in FIG. 22. Block 2200 performs retinal region normalization in substantially the same fashion as block 500 or FIG. 5 and block 700 or FIG. 7. Thus, for each retinal region R.sub.k, block 2200 produces the normalized region R.sub.k. Then, in block 2210 a CNN is used to classify each region R.sub.k and produce the classification vector A.sub.k. FIG. 23 illustrates the architecture of block 2210 of FIG. 22 in more detail. This is substantially the same as the CNN architecture of FIG. 6 and FIG. 11 described earlier, although network parameters such as the size, stride and number of filters, the learned convolution masks, the number or convolution layers, the number of fully connected layers and so on may be different.

(44) Then, back to FIG. 21, block 2130 performs regional DR lesion probability conditioning in substantially the same fashion as block 140 of FIG. 1, and block 2140 performs retinal fundus image classification in substantially the same fashion as block 150 of FIG. 1.

(45) In some embodiments, the process of imaging is performed by a computing system. In some embodiments, the computing system includes one or more computing devices, for example, a personal computer that is IBM, Macintosh, Microsoft Windows or Linux/Unix compatible or a server or workstation. In one embodiment, the computing device comprises a server, a laptop computer, a smart phone, a personal digital assistant, a kiosk, or a media player, for example. In one embodiment, the computing device includes one or more CPUS, which may each include a conventional or proprietary microprocessor. The computing device further includes one or more memory, such as random access memory (RAM) for temporary storage of information, one or more read only memory (ROM) for permanent storage of information, and one or more mass storage device, such as a hard drive, diskette, solid state drive, or optical media storage device. Typically, the modules of the computing device are connected to the computer using a standard based bus system. In different embodiments, the standard based bus system could be implemented in Peripheral Component Interconnect (PCI), Microchannel, Small Computer System Interface (SCSI), Industrial Standard Architecture (ISA) and Extended ISA (EISA) architectures, for example. In addition, the functionality provided for in the components and modules of computing device may be combined into fewer components and modules or further separated into additional components and modules.

(46) The computing device is generally controlled and coordinated by operating system software, such as Windows XP, Windows Vista, Windows 7, Windows 8, Windows Server, Embedded Windows, Unix, Linux, Ubuntu Linux, SunOS, Solaris, iOS, Blackberry OS, Android, or other compatible operating systems. In Macintosh systems, the operating system may be any available operating system, such as MAC OS X. In other embodiments, the computing device may be controlled by a proprietary operating system. Conventional operating systems control and schedule computer processes for execution, perform memory management, provide file system, networking, I/O services, and provide a user interface, such as a graphical user interface (GUI), among other things.

(47) The exemplary computing device may include one or more commonly available I/O interfaces and devices, such as a keyboard, mouse, touchpad, touchscreen, and printer. In one embodiment, the I/O interfaces and devices include one or more display devices, such as a monitor or a touchscreen monitor, that allows the visual presentation of data to a user. More particularly, a display device provides for the presentation of GUIs, application software data, and multimedia presentations, for example. The computing device may also include one or more multimedia devices, such as cameras, speakers, video cards, graphics accelerators, and microphones, for example. The I/O interfaces and devices provide a communication interface to various external devices. The computing device is electronically coupled to a network, which comprises one or more of a LAN, WAN, and/or the Internet, for example, via a wired, wireless, or combination of wired and wireless, communication link. The network communicates with various computing devices and/or other electronic devices via wired or wireless communication links.

(48) Images to be processed according to methods and systems described herein, may be provided to the computing system over the network from one or more data sources. The data sources may include one or more internal and/or external databases, data sources, and physical data stores. The data sources may include databases storing data to be processed with the imaging system according to the systems and methods described above, or the data sources may include databases for storing data that has been processed with the imaging system according to the systems and methods described above. In some embodiments, one or more of the databases or data sources may be implemented using a relational database, such as Sybase, Oracle, CodeBase, MySQL, SQLite, and Microsoft SQL Server, as well as other types of databases such as, for example, a flat file database, an entity-relationship database, a relational database, and object-oriented database, NoSQL database, and/or a record-based database.

(49) The computing system includes an imaging system module that may be stored in the mass storage device as executable software codes that are executed by the CPU. These modules may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The computing system is configured to execute the imaging system module in order to perform, for example, automated low-level image processing, automated image registration, automated image assessment, automated screening, and/or to implement new architectures described above.

(50) In general, the word module, as used herein, refers to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Python, Java, Lua, C and/or C++. Software modules may be provided on a computer readable medium, such as a optical storage medium, flash drive, or any other tangible medium. Such software code may be stored, partially or fully, on a memory device of the executing computing device, such as the computing system, for execution by the computing device. Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code modules executed by one or more computer systems or computer processors comprising computer hardware. The code modules may be stored on any type of non-transitory computer-readable medium or computer storage device, such as hard drives, solid state memory, optical disc, and/or the like. The systems and modules may also be transmitted as generated data signals (for example, as part of a carrier wave or other analog or digital propagated signal) on a variety of computer-readable transmission mediums, including wireless-based and wired/cable-based mediums, and may take a variety of forms (for example, as part of a single or multiplexed analog signal, or as multiple discrete digital packets or frames). The processes and algorithms may be implemented partially or wholly in application-specific circuitry. The results of the disclosed processes and process steps may be stored, persistently or otherwise, in any type of non-transitory computer storage such as, for example, volatile or non-volatile storage.

(51) The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and subcombinations are intended to fall within the scope of this disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed example embodiments.

(52) Conditional language, such as, among others, can, could, might, or may, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without user input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular embodiment. The term including means included but not limited to. The term or means and/or.

(53) Any process descriptions, elements, or blocks in the flow or block diagrams described herein and/or depicted in the attached figures should be understood as potentially representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process. Alternate implementations are included within the scope of the embodiments described herein in which elements or functions may be deleted, executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those skilled in the art.

(54) All of the methods and processes described above may be embodied in, and partially or fully automated via, software code modules executed by one or more general purpose computers. For example, the methods described herein may be performed by the computing system and/or any other suitable computing device. The methods may be executed on the computing devices in response to execution of software instructions or other executable code read from a tangible computer readable medium. A tangible computer readable medium is a data storage device that can store data that is readable by a computer system. Examples of computer readable mediums include read-only memory, random-access memory, other volatile or non-volatile memory devices, CD-ROMs, magnetic tape, flash drives, and optical data storage devices.

(55) It should be emphasized that many variations and modifications may be made to the above-described embodiments, the elements of which are to be understood as being among other acceptable examples. All such modifications and variations are intended to be included herein within the scope of this disclosure. The foregoing description details certain embodiments. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the systems and methods can be practiced in many ways. For example, a feature of one embodiment may be used with a feature in a different embodiment. As is also stated above, it should be noted that the use of particular terminology when describing certain features or aspects of the systems and methods should not be taken to imply that the terminology is being re-defined herein to be restricted to including any specific characteristics of the features or aspects of the systems and methods with which that terminology is associated.

System and method for automated funduscopic image analysis

Assignee

Inventors

Cpc classification

Classification Explorer

G06V40/193

PHYSICS

Classification Explorer

A61B3/0025

HUMAN NECESSITIES

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06F18/2148

PHYSICS

Classification Explorer

G06V40/197

PHYSICS

Classification Explorer

G06T2207/30041

PHYSICS

Classification Explorer

A61B3/14

HUMAN NECESSITIES

Classification Explorer

G01N2800/16

PHYSICS

Classification Explorer

G06T7/11

PHYSICS

Classification Explorer

A61B3/145

HUMAN NECESSITIES

Classification Explorer

G06V10/462

PHYSICS

Classification Explorer

G06T2207/20084

PHYSICS

Classification Explorer

A61B3/12

HUMAN NECESSITIES

Classification Explorer

G06T7/0016

PHYSICS

Classification Explorer

G06T2207/20081

PHYSICS

Classification Explorer

G06V2201/03

PHYSICS

Classification Explorer

G06F18/2411

PHYSICS

International classification

Classification Explorer

G06T7/00

PHYSICS

Classification Explorer

A61B3/12

HUMAN NECESSITIES

Classification Explorer

G06K9/46

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

A61B3/00

HUMAN NECESSITIES

Classification Explorer

G06K9/42

PHYSICS

Classification Explorer

G06K9/00

PHYSICS