ANALYSIS DEVICE AND COMPUTER-READABLE RECORDING MEDIUM STORING ANALYSIS PROGRAM

20230005255 · 2023-01-05

Assignee

Inventors

Cpc classification

International classification

Abstract

An analysis device includes a processor configured to: execute a first learning process on a generative model for images such that the images that bring a recognition result of an image recognition process into a preassigned state are generated; execute a second learning process on the generative model on which the first learning process has been executed, while gradually changing recognition accuracy of the images generated by the generative model on which the first learning process has been executed, to desired recognition accuracy; acquire each piece of information on back-error propagation calculated by executing the image recognition process, for the images with each level of the recognition accuracy generated through a course of the second learning process; and generate evaluation information indicating each of image parts that cause erroneous recognition at each level of the recognition accuracy, based on the acquired each piece of the information on the back-error propagation.

Claims

1. An analysis device comprising: a memory; and a processor coupled to the memory and configured to: execute a first learning process on a generative model for images such that the images that bring a recognition result of an image recognition process into a preassigned state are generated; execute a second learning process on the generative model on which the first learning process has been executed, while gradually changing recognition accuracy of the images generated by the generative model on which the first learning process has been executed, to desired recognition accuracy; acquire each piece of information on back-error propagation calculated by executing the image recognition process, for the images with each level of the recognition accuracy generated through a course of the second learning process; and generate evaluation information that indicates each of image parts that cause erroneous recognition at each level of the recognition accuracy, based on the acquired each piece of the information on the back-error propagation.

2. The analysis device according to claim 1, wherein the processor: executes the first learning process on the generative model for the images such that the images in a same state as input images are generated, and executes the second learning process on the generative model on which the first learning process has been executed, while gradually raising the recognition accuracy of the images generated by the generative model on which the first learning process has been executed, to the desired recognition accuracy.

3. The analysis device according to claim 2, wherein the processor: separately generates important feature maps that visualize feature portions that reacted during the image recognition process, based on the acquired each piece of the information on the back-error propagation; generates a plurality of difference maps by calculating differences between the separately generated important feature maps; and among the separately generated important feature maps, generates a predetermined important feature map and each of added important feature maps obtained by sequentially adding the plurality of difference maps to the predetermined important feature map, as the evaluation information.

4. The analysis device according to claim 3, wherein the processor generates an important feature index map in which a deterioration scale map obtained by calculating the differences between the input images or the images generated by executing the first learning process, and the images that are generated by executing the second learning process and have the desired recognition accuracy is superimposed on the predetermined important feature map, and each of added important feature index maps obtained by sequentially adding the plurality of difference maps to the important feature index map, as the evaluation information.

5. The analysis device according to claim 4, wherein the processor: divides the input images or the images generated by executing the first learning process for each of superpixels; and adds a value of each pixel of the important feature index map for each of the superpixels, and generates areas indicated by combinations of the superpixels whose additional values are equal to or higher than a predetermined threshold value, as the evaluation information.

6. The analysis device according to claim 5, wherein the processor composites the input images or the images generated by executing the first learning process, and the images generated by executing the second learning process, based on the combinations of the superpixels whose additional values are equal to or higher than the predetermined threshold value, and specifies the combinations of the superpixels, based on a result of the image recognition process executed on composite images.

7. The analysis device according to claim 6, wherein the processor calculates the differences in pixel units between the input images or the images generated by executing the first learning process, and the images generated by executing the second learning process, which are the images included in the areas indicated by the specified combinations of the superpixels, and generates the images obtained from the calculated differences in pixel units, as the evaluation information.

8. A non-transitory computer-readable recording medium storing an analysis program causing a computer a processing of: executing a first learning process on a generative model for images such that the images that bring a recognition result of an image recognition process into a preassigned state are generated; executing a second learning process on the generative model on which the first learning process has been executed, while gradually changing recognition accuracy of the images generated by the generative model on which the first learning process has been executed, to desired recognition accuracy; and acquiring each piece of information on back-error propagation calculated by executing the image recognition process, for the images with each level of the recognition accuracy generated through a course of the second learning process; and generating evaluation information that indicates each of image parts that cause erroneous recognition at each level of the recognition accuracy, based on the acquired each piece of the information on the back-error propagation.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0008] FIG. 1 is a diagram illustrating an example of the functional configuration of an analysis device;

[0009] FIG. 2 is a diagram illustrating an example of the hardware configuration of the analysis device;

[0010] FIG. 3 is a diagram illustrating an example of the functional configuration of an image refiner initialization unit;

[0011] FIG. 4 is a first diagram illustrating an example of the functional configuration of a refined image generation unit;

[0012] FIG. 5 is a first diagram illustrating an example of the functional configuration of a map generation unit;

[0013] FIG. 6 is a first flowchart illustrating a flow of an erroneous recognition cause extraction process;

[0014] FIG. 7 is a second diagram illustrating an example of the functional configuration of a refined image generation unit;

[0015] FIG. 8 is a second diagram illustrating an example of the functional configuration of a map generation unit;

[0016] FIG. 9 is a second flowchart illustrating a flow of an erroneous recognition cause extraction process;

[0017] FIG. 10 is a second diagram illustrating an example of the functional configuration of an analysis device;

[0018] FIG. 11 is a first diagram illustrating an example of the functional configuration of a specifying unit;

[0019] FIG. 12 is a diagram illustrating a specific example of processing of a superpixel dividing unit;

[0020] FIG. 13 is a diagram illustrating a specific example of processing of an important superpixel designation unit;

[0021] FIG. 14 is a diagram illustrating a specific example of processing of an area extraction unit and a compositing unit;

[0022] FIG. 15 is a third flowchart illustrating a flow of an erroneous recognition cause extraction process;

[0023] FIG. 16 is a flowchart illustrating a flow of a changeable area specifying process;

[0024] FIG. 17 is a second diagram illustrating an example of the functional configuration of a specifying unit;

[0025] FIG. 18 is a first diagram illustrating an example of the functional configuration of a detailed cause analysis unit;

[0026] FIG. 19 is a first diagram illustrating a specific example of processing of the detailed cause analysis unit;

[0027] FIG. 20 is a first flowchart illustrating a flow of a detailed cause analysis process;

[0028] FIG. 21 is a second diagram illustrating an example of the functional configuration of a detailed cause analysis unit;

[0029] FIG. 22 is a second diagram illustrating a specific example of processing of the detailed cause analysis unit;

[0030] FIG. 23 is a second flowchart illustrating a flow of a detailed cause analysis process;

[0031] FIG. 24 is a third diagram illustrating an example of the functional configuration of a detailed cause analysis unit;

[0032] FIG. 25 is a third diagram illustrating a specific example of processing of the detailed cause analysis unit; and

[0033] FIG. 26 is a third flowchart illustrating a flow of a detailed cause analysis process.

DESCRIPTION OF EMBODIMENTS

[0034] According to the score maximization method, by changing the input image such that the score is maximized and generating a refined image, the changed portion of the generated refined image from the input image can be visualized as an image part that causes the erroneous recognition.

[0035] However, in the case of the score maximization method, the image part after the change is completed is clearly indicated, but the image parts in the middle of the course of changing are not clearly indicated. Therefore, a user can grasp the image part affecting the maximum score, but is not allowed to grasp which image part has influence at scores in the middle of the course (recognition accuracy in the middle of the course) (for example, the degree of influence of each image part in the middle of the course).

[0036] One aspect aims to visualize the degree of influence of each image part that causes erroneous recognition.

[0037] Hereinafter, each embodiment will be described with reference to the accompanying drawings. Note that, in the present specification and the drawings, constituent elements having substantially the same functional configuration are denoted by the same reference sign, and redundant description will be omitted.

First Embodiment

[0038] <Functional Configuration of Analysis Device>

[0039] First, a functional configuration of an analysis device according to a first embodiment will be described. FIG. 1 is a first diagram illustrating an example of the functional configuration of the analysis device. An analysis program is installed in the analysis device 100, and when the program is executed, the analysis device 100 functions as an image recognition unit 110, an erroneous recognition image extraction unit 120, and an erroneous recognition cause extraction unit 140.

[0040] The image recognition unit 110 performs an image recognition process using a trained CNN. For example, the image recognition unit 110 executes the image recognition process in response to the input of an input image 10 and outputs a recognition result (for example, a label) indicating the type of an object (in the present embodiment, the type of vehicle) included in the input image 10.

[0041] The erroneous recognition image extraction unit 120 determines whether or not the recognition result included in the input image 10 (for example, a label indicating the type of the object (known)) and the recognition result by the image recognition unit 110 (for example, a label) coincide with each other. In addition, the erroneous recognition image extraction unit 120 extracts the input image when it is determined that the recognition results do not coincide with each other (when the erroneous recognition result is output), as “erroneous recognition image”, and stores the extracted erroneous recognition image in an erroneous recognition image storage unit 130.

[0042] The erroneous recognition cause extraction unit 140 specifies each image part that causes erroneous recognition at each level of recognition accuracy for the erroneous recognition image and, by outputting erroneous recognition cause information (an example of evaluation information) indicating each specified image part at each level of recognition accuracy, visualizes the degree of influence of each image part.

[0043] For example, the erroneous recognition cause extraction unit 140 includes an image refiner initialization unit 141, a refined image generation unit 142, and a map generation unit 143.

[0044] The image refiner initialization unit 141 is an example of a first learning unit. The image refiner initialization unit 141 reads the erroneous recognition image stored in the erroneous recognition image storage unit 130 and executes a first learning process for initializing an image refiner unit, by inputting the read erroneous recognition image.

[0045] The image refiner unit is a generative model that uses a CNN to change the erroneous recognition image and generate a refined image with a predetermined level of recognition accuracy. The image refiner initialization unit 141 initializes the image refiner unit by executing the first learning process and updating model parameters of the generative model.

[0046] The refined image generation unit 142 is an example of a second learning unit, and the image refiner unit initialized by the image refiner initialization unit 141 is applied. The refined image generation unit 142 reads the erroneous recognition image stored in the erroneous recognition image storage unit 130, executes a second learning process on the image refiner unit such that the recognition results have each level of recognition accuracy, and generates refined images with each level of recognition accuracy. The refined image generation unit 142 generates the refined images with each level of recognition accuracy while gradually raising the recognition accuracy to the desired recognition accuracy. Note that, among the refined images with each level of recognition accuracy, the refined image with the maximized recognition accuracy (the refined image with the desired recognition accuracy) will be referred to as “recognition accuracy-maximized refined image”.

[0047] The map generation unit 143 is an example of a generation unit. The map generation unit 143 uses a traditional analysis technique for analyzing the cause of erroneous recognition, and the like to separately generate maps indicating each image part that causes erroneous recognition at each level of recognition accuracy. The map generation unit 143 visualizes the degree of influence of each image part by outputting each generated map as the erroneous recognition cause information.

[0048] In this manner, the analysis device 100 visualizes the degree of influence of each image part that causes erroneous recognition by separately generating and outputting maps indicating each image part that causes erroneous recognition at each level of recognition accuracy.

[0049] <Hardware Configuration of Analysis Device>

[0050] Next, a hardware configuration of the analysis device 100 will be described. FIG. 2 is a diagram illustrating an example of the hardware configuration of the analysis device. As illustrated in FIG. 2, the analysis device 100 includes a central processing unit (CPU) 201, a read only memory (ROM) 202, and a random access memory (RAM) 203. The CPU 201, the ROM 202, and the RAM 203 form a so-called computer.

[0051] In addition, the analysis device 100 includes an auxiliary storage device 204, a display device 205, an operation device 206, an interface (I/F) device 207, and a drive device 208. Note that the respective pieces of hardware of the analysis device 100 are interconnected via a bus 209.

[0052] The CPU 201 is an arithmetic device that executes various programs (such as the analysis program as an example) installed in the auxiliary storage device 204. Note that, although not illustrated in FIG. 2, an accelerator (such as a graphics processing unit (GPU) as an example) may be combined as an arithmetic device.

[0053] The ROM 202 is a nonvolatile memory. The ROM 202 functions as a main storage device that stores various programs, data, and the like demanded by the CPU 201 to execute various programs installed in the auxiliary storage device 204. For example, the ROM 202 functions as a main storage device that stores, for example, a boot program such as the Basic Input/Output System (BIOS) or the Extensible Firmware Interface (EFI).

[0054] The RAM 203 is a volatile memory such as a dynamic random access memory (DRAM) or a static random access memory (SRAM). The RAM 203 functions as a main storage device that provides a work area into which various programs installed in the auxiliary storage device 204 are loaded when executed by the CPU 201.

[0055] The auxiliary storage device 204 is an auxiliary storage device that stores various programs and information used when various programs are executed. For example, the erroneous recognition image storage unit 130 is implemented in the auxiliary storage device 204.

[0056] The display device 205 is a display device that displays various display screens containing the erroneous recognition cause information and the like. The operation device 206 is an input device for a user of the analysis device 100 to input various instructions to the analysis device 100.

[0057] The I/F device 207 is, for example, a communication device for connecting to a network (not illustrated).

[0058] The drive device 208 is a device to which a recording medium 210 is set. The recording medium 210 mentioned here includes a medium that optically, electrically, or magnetically records information, such as a compact disc read only memory (CD-ROM), a flexible disk, or a magneto-optical disk. In addition, the recording medium 210 may include a semiconductor memory or the like that electrically records information, such as a ROM or a flash memory.

[0059] Note that various programs to be installed in the auxiliary storage device 204 are installed, for example, when the distributed recording medium 210 is set to the drive device 208, and the various programs recorded in the recording medium 210 are read by the drive device 208. Alternatively, various programs to be installed in the auxiliary storage device 204 may be downloaded from a network (not illustrated) to be installed.

[0060] <Functional Configuration of Erroneous Recognition Cause Extraction Unit>

[0061] Next, among the functions implemented in the analysis device 100 according to the first embodiment, details of each unit (the image refiner initialization unit 141, the refined image generation unit 142, and the map generation unit 143) of the erroneous recognition cause extraction unit 140 will be described. Note that, hereinafter, in explaining the details of each unit, the recognition accuracy is assumed as “score”, and the refined images with each level of recognition accuracy are

[0062] assumed as [0063] a refined image with a target score of 70%, [0064] a refined image with a target score of 80%, [0065] a refined image with a target score of 90%, and [0066] a refined image with a target score of 100% (score-maximized refined image). However, the recognition accuracy is not limited to “score” (recognition accuracy other than “score” may be used as long as the recognition result is represented). In addition, the setting of the target scores with an incremental margin of 10% in the range of 70% to 100% is also merely an example, and it is assumed that an optional range and an optional incremental margin can be set.

[0067] (1) Details of Image Refiner Initialization Unit

[0068] First, the details of the image refiner initialization unit 141 will be described. FIG. 3 is a diagram illustrating an example of the functional configuration of the image refiner initialization unit. As illustrated in FIG. 3, the image refiner initialization unit 141 includes an image refiner unit 301 and a comparison/change unit 302.

[0069] Among these, as described above, the image refiner unit 301 is a generative model that uses the CNN to change the erroneous recognition image and generate a refined image with a predetermined level of recognition accuracy. The image refiner initialization unit 141 executes the first learning process on the image refiner unit 301.

[0070] For example, the image refiner initialization unit 141 inputs the erroneous recognition image to the image refiner unit 301 and the comparison/change unit 302. This prompts the image refiner unit 301 to output a refined image. In addition, the refined image output from the image refiner unit 301 is input to the comparison/change unit 302.

[0071] The comparison/change unit 302 calculates the difference (image difference value) between the refined image output from the image refiner unit 301 and the erroneous recognition image input by the image refiner initialization unit 141. In addition, the comparison/change unit 302 updates the model parameters of the image refiner unit 301 by back-error propagation of the calculated image difference value.

[0072] In this manner, by executing the first learning process on the image refiner unit 301, the model parameters are updated in the image refiner unit 301 such that an erroneous recognition image in the same state as the input erroneous recognition image is output.

[0073] In the description of the present embodiment, the erroneous recognition image in the same state mentioned here will be assumed as referring to the same image as the input erroneous recognition image. However, the whole image does not necessarily have to be the same, and an image that will have the same recognition result when the image recognition process is executed may be adopted.

[0074] For example, the image refiner unit 301 is initialized by updating the model parameters such that the erroneous recognition image in the same state as each erroneous recognition image is output even when any kind of erroneous recognition image is input.

[0075] Note that the image refiner unit whose model parameters have been updated by executing the first learning process (first trained generative model) is applied to the refined image generation unit 142. This allows the second learning process to be executed using the image refiner unit in a predetermined state, without using the image refiner unit in a state in which the model parameters are initialized by random numbers and the history is unknown as in the traditional case.

[0076] (2) Details of Refined Image Generation Unit

[0077] Next, the details of the refined image generation unit 142 will be described. FIG. 4 is a first diagram illustrating an example of the functional configuration of the refined image generation unit.

[0078] As illustrated in FIG. 4, the refined image generation unit 142 includes an image refiner unit 401, an image error calculation unit 402, an image recognition unit 403, and a recognition error calculation unit 404.

[0079] The image refiner unit 401 is a first trained generative model in which the model parameters have been updated by the image refiner initialization unit 141 when the first learning process was executed. The refined image generation unit 142 executes the second learning process on the image refiner unit 401 and generates refined images with each target score from the erroneous recognition image.

[0080] For example, the refined image generation unit 142 inputs the erroneous recognition image to the image refiner unit 401 and the image error calculation unit 402. This prompts the image refiner unit 401 to generate a refined image. In addition, the image refiner unit 401 changes the erroneous recognition image such that the scores of the correct answer labels match each target score when the image recognition process is executed using the generated refined images. Furthermore, the image refiner unit 401 generates a refined image such that the amount of change from the erroneous recognition image (the difference between the generated refined image and the erroneous recognition image) becomes smaller. Consequently, according to the image refiner unit 401, an image (refined image) that is visually close to the image (erroneous recognition image) before the change may be generated.

[0081] For example, the refined image generation unit 142 executes the second learning process at each target score and updates the model parameters of the image refiner unit 401 such that [0082] the error (score error) between the score when the image recognition process is executed using the generated refined image and the target score of the correct answer label, and [0083] the image difference value, which is the difference between the generated refined image and the erroneous recognition image,

[0084] are minimized.

[0085] The image error calculation unit 402 calculates the difference between the erroneous recognition image and the refined image generated by the image refiner unit 401 through the course of the second learning process, and inputs the image difference value to the image refiner unit 401. The image error calculation unit 402 calculates the image difference value by performing, for example, a difference (L1 difference) or structural similarity (SSIM) calculation for each pixel, and inputs the calculated image difference value to the image refiner unit 401.

[0086] The image recognition unit 403 is a trained CNN that performs the image recognition process with the refined image generated by the image refiner unit 401 as an input, and outputs the recognition result (the score of the label). Note that the recognition error calculation unit 404 is notified of the score output by the image recognition unit 403.

[0087] The recognition error calculation unit 404 calculates the error between the score notified by the image recognition unit 403 and the target score and notifies the image refiner unit 401 of the recognition error (score error).

[0088] The second learning process for the image refiner unit 401 is

[0089] performed [0090] a preassigned number of times of learning (for example, the maximum number of times of learning=N times), or [0091] until the score of the correct answer label exceeds a predetermined threshold value with respect to the target score, or [0092] until the score of the correct answer label exceeds the predetermined threshold value with respect to the target score and the image difference value becomes smaller than a predetermined threshold value.

[0093] Note that the map generation unit 143 is notified of structural information of the image recognition unit 403 when the image recognition process was performed by the image recognition unit 403 on the refined images with each target score generated by the image refiner unit 401. In the present embodiment, the structural information of the image recognition unit 403

[0094] includes [0095] image recognition unit structural information when the image recognition process was performed on the refined image with a target score of 70%, [0096] image recognition unit structural information when the image recognition process was performed on the refined image with a target score of 80%, [0097] image recognition unit structural information when the image recognition process was performed on the refined image with a target score of 90%, and [0098] image recognition unit structural information when the image recognition process was performed on the refined image with a target score of 100%.

[0099] (3) Details of Map Generation Unit

[0100] Next, the details of the map generation unit 143 will be described. FIG. 5 is a first diagram illustrating an example of the functional configuration of the map generation unit.

[0101] As illustrated in FIG. 5, the map generation unit 143 includes an important feature map generation unit 511 and a difference map generation unit 512.

[0102] The important feature map generation unit 511 acquires the structural information of the image recognition unit 403 from the refined image generation unit 142. In addition, the important feature map generation unit 511 generates “important feature map”, based on the structural information of the image recognition unit 403 by using a back propagation (BP) method, a guided back propagation (GBP) method, or a selective BP method. The important feature map is a map that visualizes the feature portion that reacted during the image recognition process.

[0103] Note that the BP method is a method in which the error of each label with respect to the target score is computed from a classification probability obtained by performing the image recognition process on the refined images with the target scores, and the feature portion is visualized by forming an image of the magnitude of a gradient obtained by back-error propagation to the input layer. In addition, the GBP method is a method in which the feature portion is visualized by forming an image of only the positive values of the gradient information as the feature portion.

[0104] Furthermore, the selective BP method is a method in which the error between the score of the correct answer label and the target score is computed and the processing is performed using the BP method or the GBP method. In the case of the selective BP method, the feature portion to be visualized is the feature portion that affects only the target score of the correct answer label.

[0105] The important feature map generation unit 511 outputs an important feature map 520 corresponding to a target score of 70% among the generated important feature maps, as one piece of the erroneous recognition cause information. In addition, the important feature map generation unit 511 notifies the difference map generation unit 512 of the generated important feature maps.

[0106] The difference map generation unit 512 generates a plurality of difference maps by calculating the differences between the important feature maps generated by the important feature map generation unit 511. For example, the difference map generation unit 512: [0107] generates a difference map 521 by calculating the image difference value between the important feature map corresponding to a target score of 70% and the important feature map corresponding to a target score of 80%; [0108] generates a difference map 522 by calculating the image difference value between the important feature map corresponding to a target score of 80% and the important feature map corresponding to a target score of 90%; and [0109] generates a difference map 523 by calculating the image difference value between the important feature map corresponding to a target score of 90% and the important feature map corresponding to a target score of 100%.

[0110] In addition, the difference map generation unit 512: [0111] outputs an important feature map obtained by adding the difference map 521 to the important feature map 520 corresponding to a target score of 70%, as one piece of the erroneous recognition cause information; [0112] outputs an important feature map obtained by adding the difference map 521 and the difference map 522 to the important feature map 520 corresponding to a target score of 70%, as one piece of the erroneous recognition cause information; and [0113] outputs an important feature map obtained by adding the difference map 521, the difference map 522, and the difference map 523 to the important feature map 520 corresponding to a target score of 70%, as one piece of the erroneous recognition cause information.

[0114] <Flow of Erroneous Recognition Cause Extraction Process>

[0115] Next, the flow of an erroneous recognition cause extraction process by the erroneous recognition cause extraction unit 140 will be described. FIG. 6 is a first flowchart illustrating a flow of the erroneous recognition cause extraction process. When the erroneous recognition image is newly stored in the erroneous recognition image storage unit 130, the erroneous recognition cause extraction process illustrated in FIG. 6 is started.

[0116] In step S601, the erroneous recognition cause extraction unit 140 acquires the erroneous recognition image from the erroneous recognition image storage unit 130.

[0117] In step S602, the image refiner initialization unit 141 executes the first learning process in order to initialize the image refiner unit 301 (generative model) and generates the first trained generative model.

[0118] In step S603, the refined image generation unit 142 sets the initial target score (70%) and the incremental margin (10%) of the target score.

[0119] In step S604, the refined image generation unit 142 executes the second learning process on the image refiner unit 401 (first trained generative model) such that the current target score is reached. This prompts the image refiner unit 401 to generate a refined image with the current target score.

[0120] In step S605, the map generation unit 143 acquires the structural information of the image recognition unit 403 when the image recognition unit 403 performed the image recognition process by inputting the refined image with the current target score.

[0121] In step S606, the refined image generation unit 142 determines whether or not the current target score has reached the maximum score (100%). When it is determined in step S606 that the current target score has not reached the maximum score (in the case of NO in step S606), the process proceeds to step S607.

[0122] In step S607, the refined image generation unit 142 adds the incremental margin to the current target score and returns to step S604.

[0123] On the other hand, when it is determined in step S606 that the current target score has reached the maximum score (in the case of YES in step S606), the process proceeds to step S608.

[0124] In step S608, the map generation unit 143 generates the important feature maps corresponding to each target score, based on the structural information of the image recognition unit 403 corresponding to each target score.

[0125] In step S609, the map generation unit 143 generates the difference maps based on the important feature maps corresponding to each target score.

[0126] In step S610, the map generation unit 143 outputs the important feature map corresponding to the initial target score as one piece of the erroneous recognition cause information. In addition, the map generation unit 143 sequentially adds the difference maps to the important feature map corresponding to the initial target score and outputs each of the added important feature maps as one piece of the erroneous recognition cause information.

[0127] As is clear from the above description, the analysis device 100 according to the first embodiment executes the first learning process for initializing the image refiner unit, by inputting the erroneous recognition image, and generates the first trained generative model. In addition, the analysis device 100 according to the first embodiment generates the refined images with each level of recognition accuracy (each target score), using the first trained generative model, and generates the important feature maps based on the structural information when the image recognition process was performed on the refined images with each level of recognition accuracy. Furthermore, the analysis device 100 according to the first embodiment outputs the important feature map corresponding to the initial recognition accuracy, as one piece of the erroneous recognition cause information. Additionally, the analysis device 100 according to the first embodiment sequentially adds the difference maps between the important feature maps corresponding to each level of recognition accuracy to the important feature map corresponding to the initial recognition accuracy and outputs each of the added important feature maps, as one piece of the erroneous recognition cause information.

[0128] As described above, in the analysis device according to the first embodiment, with the recognition accuracy in the middle of the course, it may be possible to visualize which image part among the image parts that cause erroneous recognition has influence (degree of influence), by outputting the important feature maps corresponding to each level of recognition accuracy.

Second Embodiment

[0129] In the above first embodiment, each of the important feature maps generated based on the structural information when the image recognition process was performed on the refined images with each level of recognition accuracy is output as the erroneous recognition cause information. However, the map output as the erroneous recognition cause information is not limited to the important feature map. A second embodiment will be described below focusing on differences from the first embodiment described above.

[0130] <Functional Configuration of Erroneous Recognition Cause Extraction Unit>

[0131] (1) Details of Refined Image Generation Unit

[0132] FIG. 7 is a second diagram illustrating an example of the functional configuration of a refined image generation unit. The difference from the refined image generation unit 142 described with reference to FIG. 4 in the above first embodiment is that, in the case of FIG. 7, a score-maximized refined image storage unit 710 is included.

[0133] The score-maximized refined image storage unit 710 stores the refined image with a target score of 100% (score-maximized refined image) among the refined images generated by an image refiner unit 401.

[0134] (2) Details of Map Generation Unit

[0135] Next, the details of a map generation unit 143 will be described. FIG. 8 is a second diagram illustrating an example of the functional configuration of the map generation unit.

[0136] As illustrated in FIG. 8, the map generation unit 143 includes a deterioration scale map generation unit 801 and a superimposition unit 802 in addition to an important feature map generation unit 511 and a difference map generation unit 512.

[0137] The deterioration scale map generation unit 801 acquires the score-maximized refined image stored in the score-maximized refined image storage unit 710. In addition, the deterioration scale map generation unit 801 acquires the erroneous recognition image. Furthermore, the deterioration scale map generation unit 801 calculates the difference between the score-maximized refined image and the erroneous recognition image and generates a deterioration scale map 810.

[0138] For example, the deterioration scale map is a map indicating changed portions and the extent of change of each changed portion when the score-maximized refined image is generated from the erroneous recognition image.

[0139] The superimposition unit 802 generates an important feature index map 820 corresponding to a target score of 70%, by superimposing an important feature map 520 generated by the important feature map generation unit 511 and the deterioration scale map 810 generated by the deterioration scale map generation unit 801. In addition, the superimposition unit 802 outputs the generated important feature index map 820 corresponding to a target score of 70%, as one piece of the erroneous recognition cause information.

[0140] Furthermore, the superimposition unit 802 sequentially adds difference maps 521, 522, and 523 to the important feature index map 820 corresponding to a target score of 70% and outputs each of a plurality of important feature index maps including [0141] an important feature index map 821 corresponding to a target score of 80%, [0142] an important feature index map 822 corresponding to a target score of 90%, and [0143] an important feature index map 823 corresponding to a target score of 100%,

[0144] as one piece of the erroneous recognition cause information.

[0145] <Flow of Erroneous Recognition Cause Extraction Process>

[0146] Next, the flow of an erroneous recognition cause extraction process by an erroneous recognition cause extraction unit 140 will be described. FIG. 9 is a second flowchart illustrating a flow of the erroneous recognition cause extraction process. The differences from the erroneous recognition cause extraction process described with reference to FIG. 6 in the above first embodiment are steps S901 to S904.

[0147] In step S901, the map generation unit 143 acquires the score-maximized refined image generated by the image refiner unit 401.

[0148] In step S902, the map generation unit 143 calculates the difference between the score-maximized refined image and the erroneous recognition image and generates the deterioration scale map.

[0149] In step S903, the map generation unit 143 generates the important feature index map corresponding to the initial target score, by superimposing the important feature map corresponding to the initial target score onto the deterioration scale map, and outputs the generated important feature index map, as one piece of the erroneous recognition cause information.

[0150] In step S904, the map generation unit 143 sequentially adds the difference maps to the important feature index map corresponding to the initial target score and generates the important feature index maps corresponding to each target score. In addition, the map generation unit 143 outputs each of the important feature index maps corresponding to each target score, as one piece of the erroneous recognition cause information.

[0151] As is clear from the above description, an analysis device 100 according to the second embodiment further includes the deterioration scale map generation unit, in addition to the functions provided in the analysis device 100 according to the above first embodiment, and generates the deterioration scale map. In addition, the analysis device 100 according to the second embodiment further includes the superimposition unit, generates the important feature index map by superimposing the important feature map corresponding to the initial recognition accuracy onto the deterioration scale map, and outputs the generated important feature index map as one piece of the erroneous recognition cause information. Furthermore, the analysis device 100 according to the second embodiment sequentially adds the difference maps between the important feature maps corresponding to each level of recognition accuracy to the important feature index map corresponding to the initial recognition accuracy and outputs each of the added important feature index maps, as one piece of the erroneous recognition cause information.

[0152] As described above, in the analysis device according to the second embodiment, with the recognition accuracy in the middle of the course, it may be possible to visualize which image part among the image parts that cause erroneous recognition has influence (degree of influence), by outputting the important feature index maps corresponding to each level of recognition accuracy.

Third Embodiment

[0153] In the above first and second embodiments, the important feature maps corresponding to each level of recognition accuracy or the important feature index maps corresponding to each level of recognition accuracy are output as the erroneous recognition cause information. In contrast to this, in a third embodiment, the combinations of superpixels (changeable areas) at each level of recognition accuracy specified based on the important feature index maps corresponding to each level of recognition accuracy are output as the erroneous recognition cause information. The third embodiment will be described below focusing on differences from the first and second embodiments described above.

[0154] <Functional Configuration of Analysis Device>

[0155] FIG. 10 is a second diagram illustrating an example of the functional configuration of an analysis device. The difference from the functional configuration of the analysis device 100 described with reference to FIG. 1 in the above first embodiment is that, in the case of FIG. 10, an erroneous recognition cause extraction unit 140 includes a specifying unit 1001.

[0156] The specifying unit 1001 replaces a changeable area in the erroneous recognition image defined based on the generated important feature index map with the generated refined image. In addition, the specifying unit 1001 executes an image recognition process by inputting the erroneous recognition image in which the changeable area is replaced with the refined image, and determines the effect of the replacement from the output recognition result (the score of the label).

[0157] Furthermore, the specifying unit 1001 repeats the image recognition process while modifying the dimensions of the changeable area and specifies, from the recognition result (the score of the label), a combination of superpixels (changeable area) that causes erroneous recognition at each level of recognition accuracy (each target score). Additionally, the specifying unit 1001 outputs the combinations of superpixels (changeable areas) that cause erroneous recognition, which have been specified at each level of recognition accuracy, as the erroneous recognition cause information.

[0158] In this manner, by referring to the effect of the replacement when the changeable area is replaced with the refined image, each image part that causes erroneous recognition at each level of recognition accuracy (each target score) may be accurately specified.

[0159] <Functional Configuration of Specifying Unit>

[0160] Next, a functional configuration of the specifying unit 1001 will be described. FIG. 11 is a first diagram illustrating an example of the functional configuration of the specifying unit. As illustrated in FIG. 11, the specifying unit 1001 includes a superpixel dividing unit 1101, an important superpixel designation unit 1102, an image recognition unit 1103, and an important superpixel evaluation unit 1104.

[0161] The superpixel dividing unit 1101 divides the erroneous recognition image into “superpixels”, which are areas for each component of the object (a vehicle in the present embodiment) included in the erroneous recognition image, and outputs superpixel division information. Note that, in dividing the erroneous recognition image into superpixels, an existing dividing function is used, or a CNN or the like trained so as to divide for each component of the vehicle is used.

[0162] The important superpixel designation unit 1102 separately adds, for each superpixel, [0163] the value of each pixel of the important feature index map corresponding to a target score of 70%, [0164] the value of each pixel of the important feature index map corresponding to a target score of 80%, [0165] the value of each pixel of the important feature index map corresponding to a target score of 90%, and [0166] the value of each pixel of the important feature index map corresponding to a target score of 100%,

[0167] which have been generated by a superimposition unit 802 based on the superpixel division information output by the superpixel dividing unit 1101.

[0168] In addition, among respective superpixels, the important superpixel designation unit 1102 extracts a superpixel whose additional value of respective added pixels is equal to or higher than a predetermined threshold value (important feature index threshold value) for each target score. Furthermore, the important superpixel designation unit 1102 defines superpixels selected from among the superpixels extracted for each target score and combined, as a changeable area, and defines the superpixels other than the combined superpixels as an unchangeable area.

[0169] Additionally, the important superpixel designation unit 1102 extracts the image portion corresponding to the unchangeable area from the erroneous recognition image, extracts the image portion corresponding to the changeable area from the refined image, and generates a composite image by compositing the two extracted image portions. Since [0170] the refined image with a target score of 70%, [0171] the refined image with a target score of 80%, [0172] the refined image with a target score of 90%, and [0173] the refined image with a target score of 100%,

[0174] are output from an image refiner unit 401, the important superpixel designation unit 1102 generates [0175] the composite image corresponding to a target score of 70%, [0176] the composite image corresponding to a target score of 80%, [0177] the composite image corresponding to a target score of 90%, and [0178] the composite image corresponding to a target score of 100%,

[0179] for each of the refined images.

[0180] Note that the important superpixel designation unit 1102 increases the number of superpixels to be extracted (expands the changeable area and narrows down the unchangeable area), by slowly lowering the important feature index threshold value used when defining the changeable area and the unchangeable area. In addition, the important superpixel designation unit 1102 updates the changeable area and the unchangeable area while modifying the combination of superpixels selected from among the extracted superpixels.

[0181] The image recognition unit 1103, which has the same function as the function of the image recognition unit 403 in FIG. 4, performs the image recognition process by inputting each composite image generated by the important superpixel designation unit 1102 and outputs the recognition result (the score of the label).

[0182] The important superpixel evaluation unit 1104 acquires the recognition result (the score of the label) output from the image recognition unit 1103. As described above, for each of the target scores, the important superpixel designation unit 1102 generates a number of composite images according to the number of times the important feature index threshold value is lowered and the number of combinations of superpixels. Therefore, the important superpixel evaluation unit 1104 acquires a number of scores according to the number, for each of the target scores. In addition, the important superpixel evaluation unit 1104 specifies the combination of superpixels (changeable area) that causes erroneous recognition at each of the target scores, based on the recognition result, and outputs the specified combination as the erroneous recognition cause information.

[0183] <Specific Example of Processing of Each Unit of Specifying Unit>

[0184] Next, a specific example of processing of each unit (here, the superpixel dividing unit 1101 and the important superpixel designation unit 1102) of the specifying unit 1001 will be described.

[0185] (1) Specific Example of Processing of Superpixel Dividing Unit

[0186] First, a specific example of processing of the superpixel dividing unit 1101 will be described. FIG. 12 is a diagram illustrating a specific example of processing of the superpixel dividing unit. As illustrated in FIG. 12, the superpixel dividing unit 1101 includes, for example, a simple linear iterative clustering (SLIC) unit 1210 that performs SLIC processing. The SLIC unit 1210 divides the erroneous recognition image into superpixels, which are partial images for each component of the vehicle included in the erroneous recognition image. In addition, the superpixel dividing unit 1101 outputs the superpixel division information about the erroneous recognition image generated by the SLIC unit 1210 dividing the erroneous recognition image into superpixels.

[0187] (2) Specific Example of Processing of Important Superpixel Designation Unit

[0188] Next, a specific example of processing of the important superpixel designation unit 1102 will be described. FIG. 13 is a diagram illustrating a specific example of processing of the important superpixel designation unit.

[0189] As illustrated in FIG. 13, the important superpixel designation unit 1102 includes an area extraction unit 1310 and a compositing unit 1311.

[0190] The important superpixel designation unit 1102

[0191] overlays [0192] the important feature index maps corresponding to a target score of 70% to a target score of 100% output from the superimposition unit 802 (here, the important feature index map corresponding to a target score X % is assumed for simplification of explanation), and [0193] the superpixel division information output from the superpixel dividing unit 1101. This prompts the important superpixel designation unit 1102 to generate an important superpixel image 1301 corresponding to the target score X %.

[0194] In addition, the important superpixel designation unit 1102 adds the value of each pixel of the important feature index map corresponding to the target score X % for each of the superpixels in the generated important superpixel image 1301.

[0195] Furthermore, the important superpixel designation unit 1102 determines whether or not the additional value for each superpixel is equal to or higher than the important feature index threshold value and extracts the superpixel determined to have an additional value equal to or higher than the important feature index threshold value. Note that, in FIG. 13, an important superpixel image 1302 corresponding to the target score X % clearly indicates an example of the additional values for each superpixel.

[0196] In addition, the important superpixel designation unit 1102 defines superpixels selected from among the extracted superpixels and combined, as a changeable area, and defines the superpixels other than the combined superpixels as an unchangeable area. Furthermore, the important superpixel designation unit 1102 notifies the area extraction unit 1310 of the defined changeable area and unchangeable area.

[0197] The area extraction unit 1310 extracts the image portion corresponding to the unchangeable area from the erroneous recognition image.

[0198] In addition, the area extraction unit 1310 extracts the image portions corresponding to the changeable area from the refined images with a target score of 70% to a target score of 100% (here, the refined image with the target score X % is assumed for simplification of explanation).

[0199] The compositing unit 1311 composites the image portion corresponding to the changeable area extracted from the refined image with the target score X % and the image portion corresponding to the unchangeable area extracted from the erroneous recognition image, and generates a composite image corresponding to the target score X %.

[0200] FIG. 14 is a diagram illustrating a specific example of processing of the area extraction unit and the compositing unit. In FIG. 14, the upper part illustrates a situation in which the area extraction unit 1310 extracts the image portion (the white portion of an image 1402) corresponding to the changeable area from a refined image 1401 with the target score X %.

[0201] Meanwhile, in FIG. 14, the lower part illustrates a situation in which the area extraction unit 1310 extracts the image portion (the white portion of an image 1402′) corresponding to the unchangeable area from an erroneous recognition image 1411. Note that the image 1402′ is an image obtained by inverting the white portion and the black portion of the image 1402 (for convenience of explanation, the white portion in the lower part of FIG. 14 is assumed as the image portion corresponding to the unchangeable area).

[0202] As illustrated in FIG. 14, the compositing unit 1311 composites [0203] an image portion 1403 corresponding to the changeable area of the refined image 1401 with the target score X %, and [0204] an image portion 1413 corresponding to the unchangeable area of the erroneous recognition image 1411,

[0205] which have been output from the area extraction unit 1310, and generates a composite image 1420 corresponding to the target score X %.

[0206] In this manner, when generating the composite image 1420, the specifying unit 1001 adds the value of each pixel of the important feature index map corresponding to the target score X % in superpixel units. Consequently, according to the specifying unit 1001, the area to be replaced with the refined image with the target score X % may be specified in superpixel units.

[0207] <Flow of Erroneous Recognition Cause Extraction Process>

[0208] Next, the flow of an erroneous recognition cause extraction process by the erroneous recognition cause extraction unit 140 will be described. FIG. 15 is a third flowchart illustrating a flow of the erroneous recognition cause extraction process. The differences from the erroneous recognition cause extraction process described with reference to FIG. 9 in the above second embodiment are steps S1501 and S1502.

[0209] In step S1501, a map generation unit 143 sequentially adds the difference maps to the important feature index map corresponding to the initial target score and generates the important feature index maps corresponding to each target score.

[0210] In step S1502, the specifying unit 1001 executes a changeable area specifying process that outputs the changeable areas at each level of recognition accuracy specified based on [0211] the erroneous recognition image, [0212] the refined images with each target score, and [0213] the important feature index maps corresponding to each target score,

[0214] as the erroneous recognition cause information. Note that the details of the changeable area specifying process will be described later.

[0215] <Flow of Changeable Area Specifying Process>

[0216] Next, the flow of the changeable area specifying process (step S1502 in FIG. 15) will be described. FIG. 16 is a flowchart illustrating a flow of the changeable area specifying process.

[0217] In step S1601, the superpixel dividing unit 1101 divides the erroneous recognition image into superpixels and generates the superpixel division information.

[0218] In step S1602, the important superpixel designation unit 1102 adds the value of each pixel of the important feature index map corresponding to the current target score in superpixel units. Note that, at the start of the changeable area specifying process, it is assumed that the initial target score (70%) is set as the default value for “current target score”.

[0219] In step S1603, the important superpixel designation unit 1102 extracts a superpixel whose additional value is equal to or higher than the important feature index threshold value and defines the changeable area by combining superpixels selected from among the extracted superpixels. In addition, the important superpixel designation unit 1102 defines the superpixels other than the combined superpixels as the unchangeable area.

[0220] In step S1604, the important superpixel designation unit 1102 reads the refined image with the current target score.

[0221] In step S1605, the important superpixel designation unit 1102 extracts the image portion corresponding to the changeable area from the refined image with the current target score.

[0222] In step S1606, the important superpixel designation unit 1102 extracts the image portion corresponding to the unchangeable area from the erroneous recognition image.

[0223] In step S1607, the important superpixel designation unit 1102 composites the image portion corresponding to the changeable area extracted from the refined image and the image portion corresponding to the unchangeable area extracted from the erroneous recognition image, and generates a composite image corresponding to the current target score.

[0224] In step S1608, the image recognition unit 1103 performs the image recognition process by inputting the composite image corresponding to the current target score and calculates the score of the correct answer label. In addition, the important superpixel evaluation unit 1104 acquires the score of the correct answer label calculated by the image recognition unit 1103.

[0225] In step S1609, the important superpixel designation unit 1102 determines whether or not the important feature index threshold value has reached a lower limit value. When it is determined in step S1609 that the lower limit value has not been reached (in the case of NO in step S1609), the process proceeds to step S1610.

[0226] In step S1610, the important superpixel designation unit 1102 lowers the important feature index threshold value and then returns to step S1603.

[0227] On the other hand, when it is determined in step S1609 that the lower limit value has been reached (in the case of YES in step S1609), the process proceeds to step S1611.

[0228] In step S1611, the important superpixel evaluation unit 1104 specifies the combination of superpixels (changeable area) that causes erroneous recognition at the current target score, based on the acquired score of the correct answer label, and outputs the specified combination of superpixels (changeable area) as one piece of the erroneous recognition cause information.

[0229] In step S1612, the specifying unit 1001 determines whether or not the current target score has reached the maximum score (100%). When it is determined in step S1612 that the current target score has not reached the maximum score (in the case of NO in step S1612), the process proceeds to step S1613.

[0230] In step S1613, the specifying unit 1001 adds the incremental margin to the current target score and returns to step S1602.

[0231] On the other hand, when it is determined in step S1612 that the current target score has reached the maximum score (in the case of YES in step S1612), the changeable area specifying process is ended.

[0232] As is clear from the above description, the analysis device 100 according to the third embodiment further includes the specifying unit 1001, in addition to the functions provided in the analysis device 100 according to the above second embodiment. In addition, the analysis device 100 according to the third embodiment outputs the combinations of superpixels (changeable areas) at each level of recognition accuracy specified by the specifying unit 1001 based on the important feature index maps corresponding to each level of recognition accuracy, as the erroneous recognition cause information.

[0233] As described above, in the analysis device according to the third embodiment, with the recognition accuracy in the middle of the course, it may be possible to visualize which image part among the image parts that cause erroneous recognition has influence (degree of influence), by outputting the changeable areas corresponding to each level of recognition accuracy.

Fourth Embodiment

[0234] In the above third embodiment, the combinations of superpixels (changeable areas) corresponding to each level of recognition accuracy have been described as being output as the erroneous recognition cause information. However, the method of outputting the erroneous recognition cause information is not limited to this, and for example, an important portion in the changeable area may be output in pixel units. A fourth embodiment will be described below focusing on differences from the third embodiment described above.

[0235] <Functional Configuration of Specifying Unit>

[0236] First, a functional configuration of a specifying unit in an analysis device 100 according to the fourth embodiment will be described. FIG. 17 is a second diagram illustrating an example of the functional configuration of the specifying unit 1001. The difference from the functional configuration of the specifying unit 1001 illustrated in FIG. 11 is that a detailed cause analysis unit 1701 is included.

[0237] The detailed cause analysis unit 1701 calculates an important portion in the changeable area, using the erroneous recognition image and the refined images with each target score, and outputs the calculated important portion as an action result image.

[0238] <Functional Configuration of Detailed Cause Analysis Unit>

[0239] Next, a functional configuration of the detailed cause analysis unit 1701 will be described. FIG. 18 is a first diagram illustrating an example of the functional configuration of the detailed cause analysis unit. As illustrated in FIG. 18, the detailed cause analysis unit 1701 includes an image difference calculation unit 1801, an SSIM calculation unit 1802, a cutout unit 1803, and an action unit 1804.

[0240] The image difference calculation unit 1801 calculates the differences in pixel units between the erroneous recognition image and the refined images with each target score (here, the refined image with the target score X % is assumed for simplification of explanation), and output a difference image.

[0241] The SSIM calculation unit 1802 outputs an SSIM image by performing an SSIM calculation using the erroneous recognition image and the refined image with the target score X %.

[0242] The cutout unit 1803 cuts out the image portion for the changeable area corresponding to the target score X % from the difference image. In addition, the cutout unit 1803 cuts out the image portion for the changeable area corresponding to the target score X % from the SSIM image. Furthermore, the cutout unit 1803 generates a multiplied image by multiplying the difference image and the SSIM image obtained by cutting out the image portions for the changeable area at the target score X %.

[0243] The action unit 1804 generates the action result image corresponding to the target score X %, based on the erroneous recognition image and the multiplied image.

[0244] <Specific Example of Processing of Detailed Cause Analysis Unit>

[0245] Next, a specific example of processing of the detailed cause analysis unit 1701 will be described. FIG. 19 is a diagram illustrating a specific example of processing of the detailed cause analysis unit.

[0246] As illustrated in FIG. 19, first, in the image difference calculation unit 1801, the difference between the erroneous recognition image (A) and the refined image (B) with the target score X % (=(A)−(B)) is calculated, and the difference image is output. The difference image contains pixel correction information at each image part that causes erroneous recognition at the target score X %.

[0247] Subsequently, in the SSIM calculation unit 1802, the SSIM calculation is performed based on the erroneous recognition image (A) and the refined image (B) with the target score X % (y=SSIM((A), (B)). Furthermore, in the SSIM calculation unit 1802, the result of the SSIM calculation is inverted (y′=255−(y×255)), whereby the SSIM image is output. The SSIM image is an image in which each image part that causes erroneous recognition at the target score X % is located with high accuracy and represents that the difference is larger when the pixel value is higher, and that the difference is smaller when the pixel value is lower. Note that the process of inverting the result of the SSIM calculation may be performed, for example, by calculating y′=1− y.

[0248] Subsequently, in the cutout unit 1803, the image portion is cut out from the difference image for the changeable area corresponding to the target score X %, and a cutout image (C) is output. Similarly, in the cutout unit 1803, the image portion is cut out from the SSIM image for the changeable area corresponding to the target score X %, and a cutout image (D) is output.

[0249] Here, the changeable area corresponding to the target score X % is obtained by specifying an area of the image portion that causes erroneous recognition at the target score X %, and the detailed cause analysis unit 1701 aims to further analyze the cause at the granularity of pixels in the specified area.

[0250] Therefore, the cutout unit 1803 multiplies the cutout image (C) and the cutout image (D) and generates a multiplied image (G). The multiplied image (G) is nothing but pixel correction information in which the pixel correction information at each image part that causes erroneous recognition at the target score X % is located with higher accuracy.

[0251] In addition, the cutout unit 1803 performs an enhancement process on the multiplied image (G) and outputs an enhanced multiplied image (H). Note that the cutout unit 1803 calculates the enhanced multiplied image (H) based on the following formula.


Enhanced Multiplied Image (H)=255×(G)/(max(G)−min(G))  (Formula 3)

[0252] Subsequently, the action unit 1804 visualizes the important portion by subtracting the enhanced multiplied image (H) from the erroneous recognition image (A) and generates an action result image corresponding to the target score X %.

[0253] Note that the method for the enhancement process illustrated in FIG. 19 is merely an example, and the enhancement process may be performed by another method as long as the method makes it easier to identify the important portion when visualized.

[0254] <Flow of Detailed Cause Analysis Process>

[0255] Next, the flow of a detailed cause analysis process by the detailed cause analysis unit 1701 will be described. FIG. 20 is a first flowchart illustrating a flow of the detailed cause analysis process.

[0256] In step S2001, the image difference calculation unit 1801 calculates the difference image between the erroneous recognition image and the refined image with the target score X %.

[0257] In step S2002, the SSIM calculation unit 1802 calculates the SSIM image based on the erroneous recognition image and the refined image with the target score X %.

[0258] In step S2003, the cutout unit 1803 cuts out the difference image for the changeable area corresponding to the target score X %.

[0259] In step S2004, the cutout unit 1803 cuts out the SSIM image for the changeable area corresponding to the target score X %.

[0260] In step S2005, the cutout unit 1803 multiplies the cut-out difference image and the cut-out SSIM image and generates the multiplied image.

[0261] In step S2006, the cutout unit 1803 performs the enhancement process on the multiplied image. In addition, the action unit 1804 subtracts the multiplied image that has undergone the enhancement process, from the erroneous recognition image, and outputs the action result image corresponding to the target score X %.

[0262] As is clear from the above description, the analysis device 100 according to the fourth embodiment generates the difference images and the SSIM images based on the erroneous recognition image and the refined images with each level of recognition accuracy and outputs the important portions by cutting out and multiplying the changeable areas corresponding to each level of recognition accuracy.

[0263] As described above, in the analysis device according to the fourth embodiment, by outputting the important portion in the changeable area in pixel units, the degree of influence of each image part that causes erroneous recognition may be visualized in pixel units.

Fifth Embodiment

[0264] In the above fourth embodiment, a case has been described in which the degree of influence of each image part that causes erroneous recognition is visualized in pixel units, using the difference images and the SSIM images generated based on the erroneous recognition image and the refined images with each level of recognition accuracy.

[0265] In contrast to this, in a fifth embodiment, the degree of influence of each image part that causes erroneous recognition is visualized in pixel units, by further using important feature maps corresponding to each level of recognition accuracy. The fifth embodiment will be described below focusing on differences from the fourth embodiment described above.

[0266] <Functional Configuration of Detailed Cause Analysis Unit>

[0267] First, a functional configuration of a detailed cause analysis unit in an analysis device 100 according to the fifth embodiment will be described. FIG. 21 is a second diagram illustrating an example of the functional configuration of the detailed cause analysis unit. The difference from the functional configuration of the detailed cause analysis unit illustrated in FIG. 19 is that, in the case of FIG. 21, an important feature map generation unit 2101 is included.

[0268] The important feature map generation unit 2101 acquires image recognition unit structural information corresponding to each target score (here, for simplification of explanation, the image recognition unit structural information corresponding to the target score X %) from an image recognition unit 403. In addition, the important feature map generation unit 2101 generates an important feature map corresponding to the target score X %, based on the image recognition unit structural information corresponding to the target score X % by using the selective BP method.

[0269] In the present embodiment, using the difference image, the SSIM image, and the important feature map corresponding to the target score X % generated based on [0270] the erroneous recognition image, [0271] the refined image with the target score X %, and [0272] the image recognition unit structural information corresponding to the target score X %,

[0273] the detailed cause analysis unit 1701 visualizes the important portion in the changeable area and outputs the visualized important portion as the action result image corresponding to the target score X %.

[0274] Note that, in the present embodiment, the difference image, the SSIM image, and the important feature map corresponding to the target score X % that are used by the detailed cause analysis unit 1701 to output the action result image corresponding to the target score X % have attributes as follows. [0275] Difference image: difference information for each pixel, which is information having positive and negative values indicating how much the pixel is supposed to be corrected in order to raise the classification probability of the located label from the erroneous recognition state. [0276] SSIM image: difference information that takes into account the shift statuses of the entire image and local areas, which is information having less artifacts (unintended noise) than the difference information for each pixel. For example, this is more accurate difference information (however, is information with only positive values). [0277] Important feature map corresponding to the target score X %: a map that visualizes a feature portion of the correct answer label that affects the image recognition process.

[0278] <Specific Example of Processing of Detailed Cause Analysis Unit>

[0279] Next, a specific example of processing of the detailed cause analysis unit 1701 will be described. FIG. 22 is a second diagram illustrating a specific example of processing of the detailed cause analysis unit. Note that the differences from the specific example of processing of the detailed cause analysis unit 1701 in FIG. 19 is that the important feature map generation unit 2101 performs an important feature map generation process based on image recognition unit structural information (I) corresponding to the target score X % to generate the important feature map. In addition, a cutout unit 1803 cuts out the image portion for the changeable area corresponding to the target score X % from the important feature map corresponding to the target score X % and outputs a cutout image (J). Furthermore, the cutout unit 1803 multiplies a cutout image (C), a cutout image (D), and the cutout image (J) to generate a multiplied image (G).

[0280] <Flow of Detailed Cause Analysis Process>

[0281] Next, the flow of a detailed cause analysis process by the detailed cause analysis unit 1701 will be described. FIG. 23 is a second flowchart illustrating a flow of the detailed cause analysis process. The differences from the flowchart illustrated in FIG. 20 are steps S2301, S2302, and S2303.

[0282] In step S2301, the important feature map generation unit 2101 acquires, from the image recognition unit 403, the image recognition unit structural information corresponding to the target score X % when the image recognition process was performed with the refined image with the target score X % as an input. In addition, the important feature map generation unit 2101 generates the important feature map corresponding to the target score X %, based on the image recognition unit structural information corresponding to the target score X % by using the selective BP method.

[0283] In step S2302, the cutout unit 1803 cuts out the image portion for the changeable area corresponding to the target score X % from the important feature map corresponding to the target score X %.

[0284] In step S2303, the cutout unit 1803 multiplies the difference image, the SSIM image, and the important feature map corresponding to the target score X %, which have been obtained by cutting out the image portions for the changeable area corresponding to the target score X %, and generate the multiplied image.

[0285] As is clear from the above description, the analysis device 100 according to the fifth embodiment generates the difference images, the SSIM images, and the important feature maps corresponding to each level of recognition accuracy, based on [0286] the erroneous recognition image, [0287] the refined images with each level of recognition accuracy, and [0288] the image recognition unit structural information corresponding to each level of recognition accuracy, and

[0289] outputs the important portions by cutting out and multiplying the changeable areas corresponding to each level of recognition accuracy.

[0290] As described above, in the analysis device according to the fifth embodiment, by outputting the important portion in the changeable area in pixel units, the degree of influence of each image part that causes erroneous recognition may be visualized in pixel units.

Sixth Embodiment

[0291] In a sixth embodiment, an embodiment in which the degree of influence of each image part that causes erroneous recognition is visualized in pixel units, using difference images generated based on the erroneous recognition image and the refined images with each level of recognition accuracy (an embodiment different from the above fourth embodiment) will be described. The sixth embodiment will be described below focusing on differences from the fourth embodiment described above.

[0292] <Functional Configuration of Detailed Cause Analysis Unit>

[0293] First, a functional configuration of a detailed cause analysis unit in an analysis device 100 according to the sixth embodiment will be described. FIG. 24 is a third diagram illustrating an example of the functional configuration of the detailed cause analysis unit. The difference from the functional configuration of the detailed cause analysis unit 1701 illustrated in FIG. 18 is that, in the case of FIG. 24, the SSIM calculation unit 1802 is not included.

[0294] In the present embodiment, a detailed cause analysis unit 1701 visualizes the important portion in the changeable area, using the difference image generated based on [0295] the erroneous recognition image, and [0296] the refined image with the target score X %,

[0297] and outputs the visualized important portion as the action result image corresponding to the target score X %.

[0298] Note that, in the present embodiment, the difference image used by the detailed cause analysis unit 1701 to output the action result image corresponding to the target score X % has attributes as follows. [0299] Difference image: difference information for each pixel, which is information having positive and negative values indicating how much the pixel is supposed to be corrected in order to raise the classification probability of the located label from the erroneous recognition state.

[0300] <Specific Example of Processing of Detailed Cause Analysis Unit>

[0301] Next, a specific example of processing of the detailed cause analysis unit 1701 will be described. FIG. 25 is a third diagram illustrating a specific example of processing of the detailed cause analysis unit. Note that the differences from the specific example of processing of the detailed cause analysis unit 1701 in FIG. 19 are that there is no description regarding the cutout image (D) cut out by the SSIM calculation unit 1802 and there is no description regarding the multiplication process with the cutout image (C).

[0302] <Flow of Detailed Cause Analysis Process>

[0303] Next, the flow of a detailed cause analysis process by the detailed cause analysis unit 1701 will be described. FIG. 26 is a third flowchart illustrating a flow of the detailed cause analysis process. The differences from the flowchart illustrated in FIG. 20 are that the respective processes in steps S2002, S2004, and S2005 are not provided, and the process in step S2401 is executed instead of step S2006.

[0304] As illustrated in FIG. 26, in step S2001, an image difference calculation unit 1801 calculates the difference image between the erroneous recognition image and the refined image with the target score X %.

[0305] In step S2003, a cutout unit 1803 cuts out the changeable area corresponding to the target score X % from the difference image.

[0306] In step S2401, the cutout unit 1803 performs an enhancement process on the cut-out difference image. In addition, an action unit 1804 subtracts the difference image that has undergone the enhancement process, from the erroneous recognition image, and outputs the action result image corresponding to the target score X %.

[0307] As is clear from the above description, the analysis device 100 according to the sixth embodiment generates the difference images based on the erroneous recognition image and the refined images with each level of recognition accuracy and outputs the important portions by cutting out and enhancing the changeable areas corresponding to each level of recognition accuracy.

[0308] As described above, in the analysis device according to the sixth embodiment, by outputting the important portion in the changeable area in pixel units, the degree of influence of each image part that causes erroneous recognition may be visualized in pixel units.

OTHER EMBODIMENTS

[0309] In each of the above embodiments, a case where the refined image generation unit 142, the map generation unit 143, and the specifying unit 1001 perform processing using the erroneous recognition image has been described. However, the refined image generation unit 142, the map generation unit 143, and the specifying unit 1001 may perform processing using the refined image generated by the image refiner initialization unit 141 executing the first learning process, instead of the erroneous recognition image.

[0310] In addition, in each of the above embodiments, the recognition accuracy has been described as a score, but recognition accuracy other than the score may be used. For example, the recognition accuracy other than the score mentioned here includes the position and dimensions, existence probability, intersection over union (IoU), segment, other information regarding the output of deep learning, and the like.

[0311] Furthermore, in each of the above embodiments, a case where one object is included in the erroneous recognition image has been described, but a plurality of objects may be included. In this case, the erroneous recognition cause information may be output for each object, or the erroneous recognition cause information including a plurality of objects may be output.

[0312] In addition, in each of the above embodiments, it has been described that the first learning process is executed such that the erroneous recognition image in the same state as the input erroneous recognition image is generated. However, the method for the first learning process is not limited to this.

[0313] The purpose of executing the first learning process on the image refiner unit 301 is to learn model parameters to a predefined initial state instead of an unknown initial state before performing the second learning process. Accordingly, in the first learning process, apart from the method of updating the model parameters such that the erroneous recognition image in the same state as the input erroneous recognition image is generated, a predetermined targeted score may be predefined to perform initialization such that an image that outputs the score is generated.

[0314] In this case, the score of the first learning process does not necessarily have to be a score lower than the score when the image recognition process is executed on the refined image generated by executing the second learning process. For example, the first learning process may be executed on the image refiner unit 301 such that an image that gives the score=100% is generated, and the refined images that give the scores=90%, 80%, and 70% may be generated in the second learning process. Alternatively, the first and second learning processes may be executed in accordance with other fluctuation patterns of the score.

[0315] In addition, the coefficient for performing the enhancement process in the above fourth to sixth embodiments may be selected so as to adjust the action result image or the strength of the action on the refined image. For example, when it is difficult to distinguish the magnitude of the pixel value indicating the cause of erroneous recognition, the coefficient may be selected so as to promote the enhancement. Alternatively, the coefficient may be selected such that the scale of the pixel value changed by the action of multiplication is optimally adjusted, or the coefficient may be selected so as not to perform the enhancement process.

[0316] In addition, in the first learning process of learning such that the recognition accuracy of the image generated by the generative model matches the desired recognition accuracy, the output of the hidden layer of deep learning may be used together with the information regarding the output of deep learning mentioned above or the like (or may be used alone).

[0317] For example, when a feature map is also used together as the output of the hidden layer, the first learning process may be executed such that the information regarding the output of deep learning (image recognition unit) to be analyzed and the information regarding the output of the hidden layer of deep learning (image recognition unit) to be analyzed

[0318] have the same state [0319] when the input erroneous recognition image is processed, and [0320] when the image generated by the first learning process is processed.

[0321] When the information regarding the output of the hidden layer of deep learning (image recognition unit) to be analyzed is evaluated, for example,

[0322] evaluation may be made by executing some processing for evaluating whether the same state is achieved, such as [0323] L1/L2/SSIM, [0324] Neural Style Transfer loss, or [0325] Max Pooling or Average Pooling.

[0326] Note that the embodiments are not limited to the configurations described here and may include, for example, combinations of the configurations or the like described in the above embodiments with other elements. These points may be changed without departing from the spirit of the embodiments and may be appropriately assigned according to application modes thereof.

[0327] All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.