METHOD OF ANALYZING IRIS IMAGE FOR DIAGNOSING DEMENTIA IN ARTIFICIAL INTELLIGENCE
20200327663 ยท 2020-10-15
Inventors
Cpc classification
A61B5/0077
HUMAN NECESSITIES
A61B5/7264
HUMAN NECESSITIES
A61B5/4088
HUMAN NECESSITIES
G06T3/40
PHYSICS
A61B5/0015
HUMAN NECESSITIES
International classification
A61B5/00
HUMAN NECESSITIES
G06T3/40
PHYSICS
Abstract
A method of analyzing an iris image with artificial intelligence to diagnose dementia in real time with a smart phone according to an embodiment of the present invention includes receiving an input image of a user's eye from user equipment; extracting a region of interest (RoI) from the input image to extract an iris; resizing the extracted RoI to a square shape and scaling the RoI; applying a deep neural network to the resized and scaled RoI; detecting a lesional area by applying detection and segmentation to an image acquired by applying the deep neural network; and diagnosing dementia by determining a position of the lesional area through the detection and by determining a shape of the lesional area through the segmentation.
Claims
1. A method of analyzing an iris image with artificial intelligence to diagnose dementia in real time with a smart phone, the method comprising: receiving by the server of an input image of a user's eye from user equipment; extracting a region of interest (RoI) by the server from the input image to extract an iris; resizing the extracted RoI to a square shape and scaling the RoI by the server; applying a deep neural network by the server to the resized and scaled RoI; detecting a lesional area by the server by applying detection and segmentation to an image acquired by applying the deep neural network; and diagnosing dementia by the server by determining a position of the lesional area through the detection and by determining a shape of the lesional area through the segmentation, wherein the extracting of the RoI further comprises extracting the RoI which is a minimum area required to extract an iris by excluding an area not used for diagnosing dementia from the input image, wherein the applying of the deep neural network further comprises resizing the extracted RoI in the input image to a square shape and compressing and optimizing pixel information values into one piece of data by normalizing the pixel information values into values between 0 and 1 and converting the normalized pixel information values into bytes, and wherein the diagnosing of dementia further comprises diagnosing a type of dementia based on the position and shape of the lesional area, wherein the diagnosing the type of dementia based on the position and shape of the lesional area comprises: accumulating bid data representing a probability of dementia and a degree of development of dementia according to a position and shape of a lesional area; determining a probability of dementia and a degree of development of dementia according to the position and shape of the lesional area based on the big data; and notifying the user equipment in real time that an additional test including an interview test and a laboratory test is required according to the probability of dementia and the degree of development of dementia, wherein the type of dementia includes Alzheimer's disease, vascular dementia, Lewy body dementia, and frontal lobe dementia, wherein the probability of dementia is classified by percentage, and wherein the degree of development of dementia is classified as an early stage, an intermediate stage, and an end stage.
2. The method of claim 1, wherein the extracting of the RoI further comprises, when the input image is tilted with respect to a vertical direction, aligning the input image by an angle at which the input image is tilted with respect to the vertical direction using a preset virtual axis and then extracting the RoI.
3. The method of claim 1, wherein the resizing and scaling of the RoI comprises optimizing data of the iris image by resizing the RoI to the square shape, normalizing pixel information values into values between 0 and 1, converting the pixel information values into bytes, and compressing the RoI into one piece of data.
4. The method of claim 1, wherein the deep neural network includes a convolutional neural network (CNN) to prevent spatial information of the iris image from being lost.
5. The method of claim 1, wherein the user equipment includes a camera unit, and the camera unit includes a general mobile camera and an iris recognition camera, or an iris recognition lens is attached to the camera unit.
6. The method of claim 4, wherein the applying of the deep neural network further comprises using separable convolution and atrous convolution.
7. The method of claim 1, further comprising generating a visualized image, which is a basis for dementia diagnosis, based on the position and shape of the lesional area.
8. The method of claim 1, further comprising diagnosing signs of dementia based on the position and shape of the lesional area.
9. (canceled)
10. The method of claim 4, wherein an activation function and a focal loss method are used in the CNN.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
DETAILED DESCRIPTION
[0041] Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. A detailed description to be disclosed below together with the accompanying drawings is to describe the exemplary embodiments of the present invention, and various modifications and alterations can be made from the embodiments. The detailed description does not represent the sole embodiment for carrying out the present invention.
[0042] The embodiments are provided merely to fully disclose the present invention and completely inform those of ordinary skill in the art of the scope of the present invention. The present invention is defined by only the scope of the claims.
[0043] In some cases, known structures and devices may be omitted or block diagrams mainly illustrating key functions of the structures and devices may be provided so as to not obscure the concept of the present invention. Throughout the specification, like reference numerals will be used to refer to like elements.
[0044] Throughout the specification, when a part is referred to as comprising or including a component, this indicates that the part may further include another element instead of excluding another element unless particularly stated otherwise.
[0045] The term . . . unit used herein refers to a unit that performs at least one function or operation and may be implemented in hardware, software, or a combination thereof. Further, a or an, one, and the like may be used to include both the singular form and the plural form unless indicated otherwise in the context of the present invention or clearly denied in the context.
[0046] In addition, specific terms used in the embodiments of the present invention are provided only to aid in understanding of the present invention. Unless otherwise defined, all terms including technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which the present invention pertains. The use of the specific terms may be modified in a different form without departing from the technical spirit of the present invention.
[0047] Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. A detailed description to be disclosed below together with the accompanying drawings is to describe the exemplary embodiments of the present invention and does not represent the sole embodiment for carrying out the present invention.
[0048]
[0049] Referring to
[0050]
[0051] Referring to
[0052] The extraction unit 110 may acquire an image from even a low-performance mobile device.
[0053] The extraction unit 110 may acquire images which will be used to diagnose dementia and then remove an image part unnecessary for diagnosis to increase a processing rate. According to an embodiment, the iris images may include tilted images. Therefore, it is possible to align the tilted images straight up and down using a preset virtual axis and then extract only RoIs from the eye images. An RoI refers to a minimum area for extracting an iris required for dementia diagnosis excluding an unnecessary area.
[0054] The reason that an RoI should be extracted is to reduce the amount of calculation as much as possible for lightweighting by removing an area unnecessary for dementia diagnosis. After an RoI is extracted, an unnecessary area may be colored in grey, and the RoI may be transmitted to the preprocessing unit 120.
[0055] The preprocessing unit 120 may resize the iris image obtained from the extraction unit 110 to a square (NN) size. When an image increases in size, the amount of calculation is multiplied by the increment. Therefore, the image is adjusted to an appropriate size for lightweight artificial intelligence, and pixel information values of 0 to 255 are normalized into values between 0 and 1 so that the values may not have errors or may not deviate from expected values.
[0056] In the case of a general neural network, it is necessary to planarize an iris image, which is three-dimensional data, into one dimensional image, and thus spatial information of the image is lost during the process. Therefore, the learning unit 130 may include a convolutional neural network (CNN) which can learn an iris image while maintaining spatial information of the iris image.
[0057] The learning unit 130 and the detection unit 140 will be described in detail with reference to
[0058] The detection unit 140 detects a detailed position and shape of a lesional area on the basis of overall iris characteristics extracted through the learning unit 130, and the diagnosis unit 150 analyzes and classifies dementia respectively on the basis of segmentation and detection so that a type of dementia may be determined.
[0059]
[0060] Referring to
[0061] The extracted RoI may be resized into a polygonal shape and scaled in size (S13), and a deep neural network may be applied to the resized and scaled RoI (S14). The various sizes may include a reduction, increase, etc. in image size while the square shape is maintained. When the deep neural network is applied, only an RoI may be extracted by coloring an area, which is not extracted as the RoI, in grey because the area is an unnecessary area. Accordingly, an unnecessary amount of calculation may be reduced, which may be a basis for a lightweight neural network concentrated on a processing rate.
[0062] Further, a lesional area may be detected by applying detection and segmentation to an image acquired by applying the deep neural network (S15). It is possible to diagnose dementia by determining a position of the lesional area through the detection and determining a shape of the lesional area through the segmentation (S16). A type of dementia may be determined on the basis of the position and shape of the lesional area. The types of dementia include Alzheimer's disease, vascular dementia, Lewy body dementia, and frontal lobe dementia by way of example. When the probability of dementia is determined on the basis of the position and shape of the lesional area, for example, a higher color strength of the lesional area may represent the higher probability of dementia by percentage and the like. Further, the degree of development of dementia may be represented on the basis of the position and shape of the lesional area. For example, the degree of development of dementia may be classified and represented as an early stage, an intermediate stage, or an end stage.
[0063]
[0064] Referring to
[0065] Key characteristics of a deep neural network include, first, convolutional layers generating feature maps by applying various filters to an input image. In other words, convolutional layers serve as templates which extract features of a high-dimensional input image. Second, downsampling refers to a neuron layer which reduces a spatial resolution of a generated feature map. Third, an activation function refers to a role for receiving an input signal, generating an output signal in response to the input signal when the input signal satisfies a specific threshold value, and transmitting the output signal to the next layer. This is modeled after neuroscience reporting that a neuron transmits a signal to the next neuron when a strong stimulus (=a specific threshold value) is received. In general, a rectified linear unit (ReLU) function is used and is represented as y=max(0, x).
[0066] However, the ReLU has a demerit that when x of an input signal is 0 or less, the signal is transmitted with a value of 0, that is, all signals of negative values are ignored. To complement the ReLU, according to the present invention, the function is corrected into y=1/(1exp?(x))*x so that even a signal having x of 0 or less may be transmitted with a certain degree of stimulus. The corrected function is referred to as softX and will be described in further detail with reference to
[0067] When the aforementioned layers are consecutively stacked and extracting local features from the image begins with a frontend layer and reaches to a backend layer, only a global feature which represents the overall image remains.
[0068] The amount of calculation of each general convolutional layer is F2NK2M (F=feature-map, N=input channels, K=kernel, and M=output channels). However, according to the present invention, the amount of calculation of a convolutional layer is reduced by separating the expression into F2NK2+F2MN, which will be described in detail with reference to
[0069] Meanwhile, there are factorization methods as a method for reducing the amount of calculation as described above. According to a factorization method, a 55 filter is factorized into 15+51 filters. In this case, the amount of calculation may be reduced by a ratio of 25:10, that is, about to . Assuming that an input image is 77, a 11 value is output when a 77 filter is used. On the other hand, when a 33 filter is used, a 55 value is output. When a 33 filter is used again, a 33 value is output, and when a 33 filter is used again, a 11 value is output. As a result, to obtain a 11 value as an output, using one 77 filter is equivalent to use of three 33 filters. In terms of the amount of calculation, 49:9+9+9=49:27, and thus it is possible to reduce the amount of calculation by about 45%. On the basis of this idea, a combination of a separable convolution method and a factorization method is used in the present invention.
[0070] Meanwhile, in the operation process of the learning unit 130, a lesional area has little pixel information compared to an overall iris image. Therefore, it may be difficult to extract a feature of a lesional area having a local feature due to the characteristic of a general deep neural network having a global feature representing an overall image when many layers overlap.
[0071] According to the present invention, however, during training of a deep neural network, it is possible to change a loss function into a specific loss function by giving a greater weight to a local feature than a global feature.
[0072] Here, a loss function refers to a process in which a deep neural network calculates an error between an answer predicted with a feature extracted by a last convolutional layer and a correct answer and updates a weight by backpropagating a variation of the error. Repeating this process is referred to as training a neural network.
[0073] In this regard, while a loss function which is generally used to train a neural network frequently employs cross-entropy (CE) loss, the present invention employs a focal loss method as the aforementioned specific loss function.
[0074] Specifically, when a deep neural network extracts feature maps, it is easier to extract a global feature than a local feature as described above. Therefore, during training of a deep neural network, an area (a global feature) other than a lesional area (a local feature), which is to be detected, is learned more than the lesional area.
[0075] However, according to the present invention, it is necessary to learn a feature of a very small lesional area better than a feature of an overall iris image. To this end, a focal loss method is used.
[0076] A CE function which is a generally used loss function is CE( )=log( ), and the focal loss function used in the present invention is defined as F( )=log( )(probability of correct answer).
[0077] In brief, in the focal loss function used in the present invention, indicates the probability of a correct answer, and thus (1) indicates the probability of an incorrect answer.
[0078] Therefore, an overall value of the loss function is reduced with an increase in probability of correct answer and is increased with a reduction therein.
[0079] Also, when data is biased to a specific class during the training, it is difficult to learn features of classes having little data, and thus a correct answer rate is low. In this case, it is possible to use a focal loss method.
[0080] The size of a lesion to be detected in an iris image is very small as compared to the overall image. For this reason, in the case of classifying the extracted lesional area, training is performed with less data than that of the overall image. Therefore, training is not performed appropriately, and accuracy is low.
[0081] In this regard, it is possible to obtain high accuracy using the focal loss method proposed in the present invention.
[0082] Meanwhile, details of a feature are supplemented by upsampling an image of a feature map extracted from a last convolutional layer to a double size image and combining the double size image with a feature map of a layer of m_conv (see
[0083] The feature map whose details have been supplemented is input to the detection unit 140, which detects detection and segmentation areas, and will be described in detail with reference to
[0084] Finally, a weight learned on the basis of the position and shape of the lesional area detected by the detection unit 140 is loaded to make a calculation (;=class activation map (CAM), w=weight, =unit of activation function). Then, the deep neural network determines a position and shape of the lesional area and generates a visualized image indicating whether the user has been diagnosed with dementia.
[0085] Referring to
[0086] In other words, when downsampling is performed in a pooling layer, a spatial resolution is reduced, and an image more indistinct than the original image is obtained. However, when the atrous convolution is used, it is possible to extract a high-resolution image similar to the original image, and the atrous convolution may replace a pooling layer.
[0087] Referring to
[0088] As a result, due to the characteristic of convolution of extracting a feature through a filter and generating several features, even different filters calculate the same R, G, and B values and may extract identical features. Therefore, it may be difficult to extract various features.
[0089] On the other hand, in separable convolution, R, G, and B values are separately calculated, that is, filters are separately generated for R, G, and B values. Therefore, a color feature may be extracted in further detail, and then it is possible to extract various features.
[0090] Also, general convolution has the amount of calculation of F2NK2M because the calculation is performed at once and a feature is extracted. However, in separable convolution, calculation of R, G, and B values is separated from generation of a filter for extracting a feature through the calculation, and the amount of calculation is F2NK2+F2MN. Therefore, it is possible to increase a processing rate to eight to nine times a conventional processing rate.
[0091] Referring to
[0092] Specifically, assuming that fully-connected (FC) layers are replaced with global average polling (GAP) and there are two 1616 feature maps (=16162) by way of example, the feature maps are mapped to 11256 vectors or so on. In other words, all the feature maps are pooled and mapped to several neurons. As a result, in the above example, the feature maps are mapped to 256 neurons.
[0093] Therefore, each feature map becomes several neurons through GAP, and the neurons are given appropriate weights and classified. The weights are used to generate a CAM with the neurons overlapping the original iris image. As a result, when a weight increases, grey becomes darker in a CAM. This is a major basis for classifying dementia.
[0094]
[0095] Referring to
[0096] A region proposal network (RPN) is used to extract a lesional area for dementia diagnosis from a feature map. In the network, candidate ROIs in which an object may exist are detected through a preset anchor box first, and the candidate RoIs are classified according to objects by a classifier.
[0097] Since the extracted RoIs have different sizes, it is difficult to process the extracted RoIs in a general deep neural network which requires a fixed size. Therefore, the RoIs of different sizes are converted to the same size through an RoI pooling layer.
[0098] The size of converted RoIs in which an object has been detected is segmented in units of pixels. However, according to a conventional method, alignment is not taken into consideration.
[0099] In other words, even when the size to which conversion has been performed in an RoI pooling layer, that is, a detected size of an object, has a value below the decimal point, the value is removed by rounding the size to the nearest 1. Therefore, the object is in a poor alignment state.
[0100] Also, pixel units are calculated through FC layers. As described above regarding FC layers, FC layers lose spatial information of an original three-dimensional image by converting features extracted in convolution layers into one-dimensional vectors. Therefore, accuracy is degraded.
[0101] On the other hand, according to the present invention, the decimal point is kept intact, and bilinear interpolation is used for accurate alignment. Therefore, according to the present invention, it is possible to know the position and shape of a lesional area.
[0102] Further, according to the present invention, FC layers are changed to 11 convolution layers to solve the problem of FC layers. FC layers are named fully-connected in the meaning of connecting all neuron layers and calculating correlations. Since 11 convolution is performed through a 11 filter, correlations of each pixel may be calculated, and also spatial information may be maintained. Therefore, it is possible to increase accuracy.
[0103] Finally, the probability of dementia is estimated using a weight obtained by training the deep neural network on the basis of the position and shape of the lesional area extracted through segmentation and detection.
[0104] Referring to
[0105] As described above, an activation function refers to a role for receiving an input signal, generating an output signal in response to the input signal when the input signal satisfies a specific threshold value, and transmitting the output signal to the next layer. Generally, the ReLU function is used.
[0106] However, since the ReLU has a demerit that when x of an input signal is 0 or less, the signal is transmitted with a value of 0, learning is not performed smoothly by the deep neural network learning unit 130. The reason is as follows. When a deep neural network has deep layers, it is possible to extract detailed features. Meanwhile, the neural network calculates an error due to a loss function and learns the error by backpropagating the error. The error is calculated with a differential value, that is, a variation. When an error is backpropagated while being multiplied by a differential value in each layer, a variation becomes very small going toward a frontend layer and converges without being transmitted. When this is applied to a ReLU, all values of 0 or less are processed as 0. As a result, 0 is obtained by differentiating 0. For this reason, learning is not smoothly performed due to the characteristic of a deep neural network performing learning while a weight is being updated.
[0107] To solve this problem, even values of 0 or less can be learned according to the present invention. Therefore, learning can be smoothly performed, and accuracy can be increased accordingly.
[0108] In addition to the above-described embodiment, according to the present invention, it is possible to store big data, which represents the probability of dementia and the degree of development of dementia according to a position and shape of a lesional area, to learn and determine the probability of dementia and the degree of development of dementia according to a position and shape of a lesional area on the basis of the big data and to notify user equipment in real time that an additional test including an interview test and a laboratory test is required according to the probability of dementia and the degree of development of dementia. Also, when the degree of development of dementia has been drastically increased over time recently, it is possible to notify the user equipment of such development in real time. The real-time notification through the user equipment may be a popup, a push alarm, and the like.
[0109] Meanwhile, the above-described method can be written as a program executable in a computer and may be implemented by a general-use digital computer which runs the program using a computer-readable medium. The structure of data used in the above-described method may be recorded in the computer-readable medium in several ways. The computer-readable medium which stores executable computer code for executing various methods of the present invention includes storage media, such as magnetic storage media (e.g., a read-only memory (ROM), a floppy disk, a hard disk, etc.,) and optical reading media (e.g., a compact disc (CD)-ROM, a digital versatile disc (DVD), etc.).
[0110] Those of ordinary skill in the technical field related to embodiments of the present invention will appreciate that the present invention may be implemented in modified forms without departing from the essential characteristics of this disclosure. Therefore, the disclosed methods should be considered from a descriptive point of view rather than a limiting point of view. The scope of the present invention is disclosed not in the detailed description of the present invention but in the claims, and all differences lying within the range of equivalents should be interpreted as being included in the scope of the present invention.