Method for detecting image of object using convolutional neural network

11580637 · 2023-02-14

Assignee

Inventors

Cpc classification

International classification

Abstract

The present application related to a method for detecting an object image using a convolutional neural network. Firstly, obtaining feature images by Convolution kernel, and then positioning an image of an object under detected by a default box and a boundary box from the feature image. By Comparing with the sample image, the detected object image is classifying to an esophageal cancer image or a non-esophageal cancer image. Thus, detecting an input image from the image capturing device by the convolutional neural network to judge if the input image is the esophageal cancer image for helping the doctor to interpret the detected object image.

Claims

1. A method for detecting an object image using a convolutional neural network, the steps include: Providing an input image to a host by an image capture unit, the input image including at least one detected object image and one background image; Converting the input image into a plurality of characteristic values and comparing the characteristic values with a plurality of convolution kernels to obtain at least one partial or full object image corresponding to some of the characteristic values by using the host, the convolution kernels containing the characteristic values of plural partial images and the adjacent background image in at least one object image; Capturing at least one regional image according to the region where the characteristic values corresponding to the partial or full detected object image and generating at least one default frame based on the edge of at least one regional image and overlapping the default frame on the input image, by using the host; Capturing and comparing a first center point of the default frames with a second center point of a boundary frame on the input image to obtain a center offset between the default frame and the boundary frames by using the host; Performing a regression operation according to the center offset to position the object image in the default frame on the input image by using the host, where the host performs the regression operation with a first position of the default frame, a second position of the boundary frame and a zooming factor to position the detected object image; Comparing the object image with at least one sample image to produce a comparison result by using the host; and Classifying the input image as a target object image or a non-target object image according to the comparison result by using the host; Wherein the boundary frame corresponds to the boundary of the input image, and thereby encompasses the entirety of the input image, and the default frame corresponds to the boundary of the detected object.

2. The method for detecting an object image with a convolutional neural network of claim 1, wherein in the step of Converting the input image into a plurality of characteristic values and comparing the characteristic values with a plurality of convolution kernels to obtain at least one partial or full object image corresponding to some of the characteristic values by using the host, the host sets the detection boundary of convolution cores to 3×3×p and normalizes a plurality of pixel values of input image to a plurality of pixel normal values; the host obtains the characteristic values in a convolution layer by having the convolution kernels multiplying the pixel normal values.

3. The method for detecting an object image with a convolutional neural network of claim 1, wherein in the step of capturing at least one regional image according to the region where the characteristic values corresponding to the partial or full detected object image, the host integrates the regions where the characteristic values are located, obtains the regional image of the input image, and establishes the default frame with the regional image.

4. The method for detecting an object image with a convolutional neural network of claim 1, wherein in the step of Converting the input image into a plurality of characteristic values and comparing the characteristic values with a plurality of convolution kernels to obtain at least one partial or full object image corresponding to some of the characteristic values by using the host, the host convolutes each pixel of input image according to a single shot multibox detector model to detect the characteristic values.

5. The method for detecting an object image with a convolutional neural network of claim 1, wherein in the step of comparing the detected object image with at least one sample image by using the host, the host performs classified comparison at a fully connection layer.

6. The method for detecting an object image with a convolutional neural network of claim 1, wherein in the step of classifying the input image as a target object image or a non-target object image based on a comparison result, when the host fails to identify the object image in the default frame that matches at least one sample image, the host classifies the input image as the non-target object image, else, the host classifies the input image as the target object image.

7. The method for detecting an object image with a convolutional neural network of claim 1, wherein in the step of classifying the input image as a target object image or a non-target object image according to a comparison result, when the host classifies the input image as the non-target object image, the host secondly compares at least one sample image with the object image; when the host judges that an approximation of one of the detected object images is greater than an approximation threshold, it classifies the input image into the target object image; else, the host classifies the input image into the non-target object image.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1: which is a flowchart of convolutional image detection in the embodiment of the present application

(2) FIG. 2A to FIG. 2G: which are partial steps schematic diagrams in the first embodiment of the present application

(3) FIG. 3: which is a schematic diagram of convolution kernels and input images in the embodiment of the present application

(4) FIG. 4: which is a schematic diagram of center point in the embodiment of the present application

(5) FIG. 5: which is a schematic diagram of superimposed default frame in the embodiment of the present application

(6) FIG. 6: which is a schematic diagram of center point offset in the embodiment of the present application

(7) FIG. 7: which is a schematic diagram of superimposing the center points of default frame and boundary frame in the embodiment of the present application

(8) FIG. 8: which is a schematic diagram of superimposing the center points of default frame and boundary frame in the practical operation of the present application

DETAILED DESCRIPTION

(9) Due to the fact of the negligence in manual operation or the difficulty of image identification caused by the complex operation of endoscopy, the present application proposes a method for detecting the object image by using the convolutional neural network, used to solve the problems of manual operation and difficulty of image identification caused by the complex operation of endoscopy.

(10) In the following interpretation, the characteristics and matching system of a method for detecting an object image by using a convolutional neural network will be revealed as follow:

(11) First, refer to FIG. 1, which is the flow chart of detecting an object image according to an embodiment of the present application. As shown in the figure, the steps of the method in detecting an object image with a convolutional neural network include:

(12) Step S10: providing input image to host by image capture unit;

(13) Step S20: converting input image into characteristic values and compared with convolution kernels to obtain partial or full detected object image corresponding to part of the characteristic values by using host;

(14) Step S30: capturing regional image according to region of characteristic values corresponding to partial or full detected object image, and generating a default frame based on edge of regional image to overlap it on input image, by using host; and

(15) Step S40: capturing and comparing first center point of default frame with second center point of boundary frame of input image to obtain center offset between default frame and boundary frame by using host;

(16) Step S50: performing regression operation according to center offset to position detected object image in default frame on input image;

(17) Step S60: comparing detected object image with sample image to produce comparison result by using host; and

(18) Step S70: classifying input image as target object image or non-target object image according to comparison result by using host.

(19) Refer to FIG. 2A to FIG. 2D, which are the detection system 1 used in the method of detecting the image of an object in the convolutional neural network of the present application; including a host 10 and an image capture unit 20. This embodiment the host 10 is served as a computer with a processing unit 12, a memory 14 and a storage unit 16 as an example, however, the present application is not limited to the embodiment; the host 10 also can be served as a server, a notebook computer, a tablet computer or an electronic device with computing capability basis. In the embodiment, the data base 30 is built in the storage unit 16, but limited to this arrangement; the data base 30 also can be set in an external storage unit of host 10. A convolution program P is executed by the processing unit 12 in the host 10, and correspondingly a convolutional neural network (CNN) is also established. Moreover, the image capture unit 20 is an endoscope in this embodiment, which is used to explore the internal organs and tissues, such as the cystoscope, gastroscope, colonoscopy, bronchoscope and laparoscope.

(20) In Step S10, as shown in FIG. 2A, the host 10 receives an input image IMG captured from the image capture unit 20, which includes at least one detected object image O1 and a background image BG; in Step S20 shown in FIG. 2B and FIG. 3, the host 10 transmits the input image IMG to each pixel unit with its own characteristic value, especially between 0 and 1; a plurality of convolution kernels C are used to detect the plural characteristic values of the input image IMG; the convolution kernels C contain the characteristic values corresponding to the plural partial image of at least one detected object image O2 and the characteristic values corresponding to the adjacent background image BGI of at least one test object, used to filter out the background image BG which does not contain the detected object image O1. Each pixel unit of the input image is convoluted by the Single Shot Multibox Detector (SSD) model to detect the characteristic values. The convolution kernel C corresponds to the corresponded characteristic value of plural partial image O2 in the detected object image O1 and the corresponded characteristic value of background image BG in the adjacent edge.

(21) As shown in FIG. 3, the input image IMG is an M×N pixel unit, the convolution kernel O1 is 3×3×P units. Therefore, the convolution kernel C is used to detect the detected object image O1 and the background image BG on the input image IMG, which can reduce the processing of the background image BG in subsequent steps. The input image IMG is converted into the corresponding characteristic value through the processing unit 12, and the processing unit 12 multiplies the input image IMG through the convolution kernel C to obtain different convolution result, “1” is representing the same case and “−1” is the different case, and thus filter out the irrelevant background image B. As shown in FIG. 4, the partial or full detected object image corresponding to the partial characteristic values obtained from the input image IMG thus will obtain the location area A where the partial or full detected object image is located at.

(22) As shown in FIG. 2C and FIG. 5, in Step S30, host 10 follows the location area A where the partial or full detected object image O1 locates at to obtain at least one regional image and to build the corresponded at least one default frame D and to overlap it on the input image IMG; the boundary corresponding to the input image IMG is the boundary frame B in initial, in which the size of default frame D.sub.min_size=s.sub.k, the maximum size=√{square root over (min_size×maxsize)}, max_size=s.sub.k+1; the frame size S.sub.k is calculated by the following equation (1):

(23) s k = s min + s max - s min m - 1 ( k - 1 ) , k [ 1 , m ] Equation ( 1 )

(24) By using Equation (2) and (3) below, calculate the frame height and width from frame size s.sub.k:
h.sub.k=s.sub.k√{square root over (a.sub.r)}  Equation (2)
w.sub.k=s.sub.k/√{square root over (a.sub.r)}  Equation (3)

(25) Where h.sub.k represents the first-check frame height of the rectangle in the k.sup.th characteristic diagram; the frame height, w.sub.k represents the first-checking frame width, a.sub.r represents the aspect ratio of default frame D, a.sub.r>0.

(26) As shown in FIG. 2D and FIG. 6, in Step S40, host 10 executes the convolutional program over the processing unit 12 by taking the first center point Dc of the default frame D and the second center point Bc at boundary frame B of the input image IMG; from the displacement between Dc and Bc, obtain the center offset. Continue to Step 50 shown in FIG. 2E and FIG. 7, host 10 uses the processing unit 12 to perform the regression operation loop according to the center point displacement DIS between the default frame D and boundary frame B; the loop running is as follows:
Location of default frame D,d=(d.sup.cx,d.sup.cy,d.sup.w,d.sup.h)  Equation (4)
Location of boundary frame B,b=(b.sup.cx,b.sup.cy,b.sup.w,b.sup.h)  Equation (5)
Zooming factor l=(l.sup.cx,l.sup.cy,l.sup.w,l.sup.h)  Equation (6)
b.sup.cx=d.sup.wl.sup.cx+d.sup.cx  Equation (7)
b.sup.cy=d.sup.hl.sup.cy+d.sup.cy  Equation (8)
b.sup.w=d.sup.wexp(l.sup.w)  Equation (9)
b.sup.h=d.sup.hexp(l.sup.h)  Equation (10)

(27) First, align the central coordinates of boundary frame B with the central coordinates of predicting detection frame D, which means that the center point of boundary frame B is “moved” to the center point of predicting detection frame D, that is, the first center point Dc and the second center point Bc shown in FIG. 6 overlap, as shown in Equation (7) and (8); next, “zoom” the size of boundary frame to be close to the predicting detection frame D, as shown in Equation (9) and (10). After the moving and zooming, it can make the boundary frame B infinitely close to the position of the predicting detection frame D; therefore, the host 10 uses the Convolutional Neural Network (CNN) run by the convolution program executed by the processing unit 12 to consecutively and repeatedly perform the regression operation till the size of boundary frame B is infinitely close to the position of predicting detection frame D, and thus locate the relative position of detected object image O1 onto the input image IMG.

(28) In order to more accurately define the position of the detected object image O1, a further matching of the loss equation is applied as Equation (8) in below
L.sub.loc(x,l,g)=Σ.sub.i∈Pos.sup.NΣ.sub.m∈{cx,cy,w,h}x.sub.ij.sup.ksmooth.sub.L1(l.sub.i.sup.m−ĝ.sub.j.sup.m)  Equation (8)

(29) It thus can verify the error between the position of predicting detection fame D and the detected object image O1.

(30) In Step S60, as shown in FIG. 2F, host 10 compares the detected object image O1 with the sample image SA in database 30 after locating the position of the detected object image O1 calculated by the processing unit 12, and obtains a comparison result R. continue to Step S70, as shown in FIG. 2G, through running the convolution program P executed by the processing unit 12, host 10 classifies the input image IMG into a target object image TA or a non-target object image NTA according to the comparison result R. For example, a malignant tumor. When the convolution program P executed by processing unit 12 of host 10 fails to identify at least one sample image SA from the detected object image O1 in frame D, host 10 classifies the input image IMG as the non-target object image NTA. Otherwise, the convolution program P executed by the processing unit 12 of host 10 classifies the input image into the target object image TA. Furthermore, when the convolution program P executed by the processing unit 12 of host 10 converts the input image into the non-target object image NTA, the convolution program P will continue to perform a second comparison between the sample image SA and the detected object image O1. When the convolution program P determines that the comparison result R of detected object image O1 is closer to the approximation threshold of one of the target object images TA (for example: When the approximation degree is between 0 and 1, and 0.5 is taken as the approximation threshold), the convolution program P will classify the input image IMG into the target object image TA, otherwise, the convolution program P would classify the input image IMG into the non-target object image NTA.

(31) As shown in FIG. 8, it is a schematic diagram of the present application's practical operation, in which the input image IMG is an esophageal endoscope image. For the input image IMG, the convolutional neural network (CNN) of the present application is used to detect the detected object image by superimposing the default frame D and the boundary frame B, and compare them to the sample image to obtain the comparison result R; the sample image is the esophageal endoscopic image in dysplasia area, so the approximation degree is 94.0%. For details, doctors need to carry out other medical diagnosis methods to diagnose the patient. Therefore, the present application can provide auxiliary evidence to doctors in judging the symptoms.

(32) Sum up the aforesaid statements, the method of detecting a detected object image by a convolutional neural network provides a host to execute the convolution program and allow the host to build a Convolutional Neural Network, used to convolute the input image taken by the image capture unit to screen out the filter area under detection. A predicting detection frame is set up on the input image, and the position of the detected object image is located by the boundary frame through regression operation. Finally, perform the sample image comparison, and use the comparison result to classify the target object images and non-target object images according to the comparison result.