LEARNING FEATURE IMPORTANCE FOR IMPROVED VISUAL EXPLANATION
20230026787 · 2023-01-26
Assignee
Inventors
Cpc classification
G06V10/454
PHYSICS
G06V10/80
PHYSICS
International classification
G06V10/80
PHYSICS
Abstract
Systems, methods and computer readable media provide technology to perform image classification and produce visualization using a machine learning architecture. The disclosed image classification and visualization technology includes a feature extraction network to generate a feature map, a feature importance network to generate a feature importance vector, an attention map generated based on a weighted sum of the feature importance vector and the feature map, a classification output determined based on a combination of the attention map and the feature map, and a feature visualization image generated by overlaying the attention map onto an input image. Each of the feature extraction network and the feature importance network can include a neural network.
Claims
1. A computing system comprising: a processor; and a memory coupled to the processor, the memory storing instructions which, when executed by the processor, cause the computing system to perform operations comprising: generating, via a feature extraction network, based on an input image, one or more of a first feature map or a second feature map; generating, via a feature importance network, a feature importance vector based on combining the input image and the first feature map; generating an attention map based on a weighted sum of the feature importance vector and the first feature map; determining a classification output based on combining the attention map and one or more of the first feature map or the second feature map; and generating a feature visualization image by overlaying the attention map onto the input image.
2. The computing system of claim 1, wherein the feature extraction network comprises a first neural network including a plurality of convolution layers, wherein the first feature map is obtained from a last layer of the plurality of convolution layers, and wherein the feature importance network comprises a second neural network.
3. The computing system of claim 2, wherein the second feature map is obtained from an intermediate layer, other than the last layer, of the plurality of convolution layers.
4. The computing system of claim 2, wherein combining the input image and the first feature map comprises: generating an intermediate image by applying one or more of a downsize function or a greyscale function to the input image; generating an intermediate feature map by applying a normalize function to the first feature map; and generating a masked image by multiplying, via element-wise multiplication, the intermediate image and the intermediate feature map.
5. The computing system of claim 3, wherein combining the attention map and one or more of the first feature map or the second feature map comprises: generating an output map by combining, via an attention mechanism, the attention map and one or more of the first feature map or the second feature map; and applying an activation function to the output map.
6. The computing system of claim 5, wherein generating an attention map based on a weighted sum of the feature importance vector and the first feature map comprises: computing a specific weighted sum Σ.sub.k=1.sup.Nw.sub.kF.sub.M.sup.k, wherein weights w.sub.k are derived from respective coefficients of the feature importance vector, and F.sub.M.sup.k is a k-th channel of the first feature map; and applying an activation function to a result of the specific weighted sum; and wherein the attention mechanism comprises an equation F.sub.O=F.sub.L .Math.(1+A.sub.M), wherein F.sub.O is the output map, F.sub.L is the one or more of the first feature map or the second feature map, A.sub.M is the attention map, and .Math. denotes an element-wise multiplication function.
7. The computing system of claim 2, wherein the input image comprises an image of at least a portion of an aircraft or an aircraft component, and wherein the classification output comprises a determination of at least one of an identification of or a state of the aircraft or the aircraft component.
8. The computing system of claim 2, wherein at least one of the first neural network or the second neural network is implemented by an artificial intelligence (AI) accelerator.
9. A method comprising: generating, via a feature extraction network, based on an input image, one or more of a first feature map or a second feature map; generating, via a feature importance network, a feature importance vector based on combining the input image and the first feature map; generating an attention map based on a weighted sum of the feature importance vector and the first feature map; determining a classification output based on combining the attention map and one or more of the first feature map or the second feature map; and generating a feature visualization image by overlaying the attention map onto the input image.
10. The method of claim 9, wherein the feature extraction network comprises a first neural network including a plurality of convolution layers, wherein the first feature map is obtained from a last layer of the plurality of convolution layers, and wherein the feature importance network comprises a second neural network.
11. The method of claim 10, wherein the second feature map is obtained from an intermediate layer, other than the last layer, of the plurality of convolution layers.
12. The method of claim 10, wherein combining the input image and the first feature map comprises: generating an intermediate image by applying one or more of a downsize function or a greyscale function to the input image; generating an intermediate feature map by applying a normalize function to the first feature map; and generating a masked image by multiplying, via element-wise multiplication, the intermediate image and the intermediate feature map.
13. The method of claim 11, wherein combining the attention map and one or more of the first feature map or the second feature map comprises: generating an output map by combining, via an attention mechanism, the attention map and one or more of the first feature map or the second feature map; and applying an activation function to the output map.
14. The method of claim 10, wherein the input image comprises an image of at least a portion of an aircraft or an aircraft component, and wherein the classification output comprises a determination of at least one of an identification of or a state of the aircraft or the aircraft component.
15. At least one non-transitory computer readable medium comprising instructions which, when executed by a computing system, cause the computing system to perform operations comprising: generating, via a feature extraction network, based on an input image, one or more of a first feature map or a second feature map; generating, via a feature importance network, a feature importance vector based on combining the input image and the first feature map; generating an attention map based on a weighted sum of the feature importance vector and the first feature map; determining a classification output based on combining the attention map and one or more of the first feature map or the second feature map; and generating a feature visualization image by overlaying the attention map onto the input image.
16. The at least one non-transitory computer readable medium of claim 15, wherein the feature extraction network comprises a first neural network including a plurality of convolution layers, wherein the first feature map is obtained from a last layer of the plurality of convolution layers, and wherein the feature importance network comprises a second neural network.
17. The at least one non-transitory computer readable medium of claim 16, wherein the second feature map is obtained from an intermediate layer, other than the last layer, of the plurality of convolution layers.
18. The at least one non-transitory computer readable medium of claim 16, wherein combining the input image and the first feature map comprises: generating an intermediate image by applying one or more of a downsize function or a greyscale function to the input image; generating an intermediate feature map by applying a normalize function to the first feature map; and generating a masked image by multiplying, via element-wise multiplication, the intermediate image and the intermediate feature map.
19. The at least one non-transitory computer readable medium of claim 17, wherein combining the attention map and one or more of the first feature map or the second feature map comprises: generating an output map by combining, via an attention mechanism, the attention map and one or more of the first feature map or the second feature map; and applying an activation function to the output map.
20. The at least one non-transitory computer readable medium of claim 16, wherein the input image comprises an image of at least a portion of an aircraft or an aircraft component, and wherein the classification output comprises a determination of at least one of an identification of or a state of the aircraft or the aircraft component.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The various advantages of the examples of the present disclosure will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017] Accordingly, it is to be understood that the examples herein described are merely illustrative of the application of the principles disclosed. Reference herein to details of the illustrated examples is not intended to limit the scope of the claims, which themselves recite those features regarded as essential to the disclosure.
DESCRIPTION
[0018] Disclosed herein are systems, methods and computer readable media to perform image classification and provide visualization using a machine learning architecture. The system includes a perception module to generate a feature map, and an attention module to learn the importance of features and generate an attention map. The attention map is combined with the feature map by the perception module to provide a classification output. The attention map is used to overlay the input image to provide a visualization result that highlights the most important features identified by the system. As disclosed herein, the image classification and visualization technology provides advantages including improved classification results and stable visualization mappings without the need for retraining. For example, the disclosed attention module generates an attention map for visual explanation by learning feature importance from a feature map and the input image, while the disclosed perception module leverages the attention map to improve the classification performance through an attention mechanism.
[0019]
[0020]
[0021] The attention module 230 includes a combination unit 231, a feature importance network 234, an activation function 236 and a weighted sum unit 237. The attention module 230 corresponds to the attention module 130 (
[0022] The overlay module 260 receives as input the input image 110 and the attention map 140, and combines them by overlaying the attention map 140 over the input image 110 to generate the feature visualization image 170. In one or more examples, the processing by overlay module 260 includes adjusting the respective sizes of and/or re-scaling the input image 110 and/or the attention map 140 to produce a feature visualization image 170 suitable for showing which features are most important. In examples, the input image 110 and attention map 140 are blended together with a ratio of 1:1 (i.e., a contribution of 50% for each of the input image 110 and attention map 140); other ratios can be applied. The feature visualization image 170 provides a visualization of the importance of features derived from the input image 110 (by the system 200) for purposes of classification. The overlay module 260 corresponds to the overlay module 160 (
[0023]
[0024] The feature extraction network 321 includes a neural network 322 such as a convolutional neural network (CNN) having a plurality of layers. The neural network 322 can employ machine learning (ML) and/or deep neural network (DNN) techniques. In one or more examples, the neural network 322 can include other types of neural networks. As an example, for image sequences (e.g., video) the neural network 322 can include a recurrent neural network (RNN). In one or more examples, the neural network 322 can include a residual block (not shown in
[0025] Upon processing the input image 110, the neural network 322 generates a feature map 328. The feature map 328 as provided to the attention module 230 is a first feature map 322a obtained from the last convolutional layer of the neural network 322. In some examples, the feature map F.sub.L provided to the attention mechanism 324 is also the first feature map 322a obtained from the last convolutional layer of the neural network 322. In one or more examples, the feature map F.sub.L provided to the attention mechanism 324 is a second feature map 322b obtained from an intermediate convolutional layer, which is a layer (e.g., an internal layer), other than the last convolutional layer, of the neural network 322. In one or more examples, the feature map F.sub.L provided to attention mechanism 324 is obtained from a combination of convolutional layers of the neural network 322, such as the last convolutional layer and/or the intermediate convolutional layer. A combination of convolutional layers can include a weighted sum of the convolutional layers. The last convolutional layer typically provides higher-level features, while the intermediate convolutional layer typically provides lower-level features. The feature map 328 as well as the feature map F.sub.L is generally a three-dimensional matrix, where two dimensions represent the height and width (h×w) of the respective map and where the third dimension represents the number of channels in the respective map. The number of channels in the respective map is the same as the number of channels of the convolution layer from which the respective map is obtained.
[0026] The attention mechanism 324 operates to combine the feature map F.sub.L and the attention map 140 (e.g., attention map 140 in
F.sub.O=F.sub.L.Math.(1+A.sub.M) EQ. (1)
[0027] where F.sub.O is the output map generated as an output of the attention mechanism 324, F.sub.L is the first or second feature map from the neural network 322, A.sub.M is the attention map 140, and .Math. denotes an element-wise multiplication function. In some examples, the attention map A.sub.M is normalized to the range {0, 1} before being input to the attention mechanism 324. In some examples, the attention mechanism 324 can combine the feature map 328 and the attention map using other operations.
[0028] To determine the classification output 150, the activation function 326 is applied to the output map F.sub.O generated by the attention mechanism 324. The activation function 326 produces a vector output. In embodiments, the Softmax function is selected as the activation function 326 because, in image classification operations, the Softmax function vector output represents the respective probabilities (which all sum up to 1) that the input is in one of the respective classes. For example, if the classification operation is used for classifying a type of animal in an image, and if the universe of animal types (for which the classifier is trained) is a list of four animals, such as {dog; cat; duck; bear}, then the classification output, as provided by the vector output of the Softmax function, would represent the respective probabilities that the subject image contained a dog, cat, duck or bear. In an example, if the vector output of the Softmax function is {0.1, 0.1, 0.7, 0.1}, this would represent as a classification output the probabilities that the subject image is a dog: 10%, cat: 10%, duck: 70%, and bear: 10%. Other activation functions that serve to provide respective probabilities can be substituted for the Softmax function as the activation function 326.
[0029]
[0030] The combination unit 431 combines the input image 110 with the feature map 228 through a multiplication function, and the combination results in a masked image M that is provided to the feature importance network 434. In some examples, the input image 110 is processed through a downsize/greyscale function 432a 432b (shown in dotted lines). The downsize function 432a reduces the size of the image; when the image is a color image, the greyscale function 432b converts color to greyscale:
Î=D.sub.S(G.sub.S(I)) EQ. (2)
[0031] where Î is the resulting image, D.sub.S ( ) is a downsize function, and G.sub.S( ) is a color-to-greyscale conversion function. The downsize function reduces the size of the image to the two-dimensional size (h×w) of the feature map 228 (ignoring the depth of the feature map 228). In some examples, the feature map 228 is processed through a normalize function 433 that maps each element of the feature map(s) to the range {0, 1}; the resulting normalized feature map is denoted F{circumflex over ( )}. The multiplication function of the combination unit 431 provides a masked image M as follows:
M=Î.Math.F{circumflex over ( )} EQ. (3)
[0032] where Î is the resulting image (from EQ. 2), F{circumflex over ( )} is the normalized feature map, and .Math. denotes an element-wise multiplication function. The masked image M is a concatenated multi-layer set of images {M.sub.1, M.sub.2, M.sub.N} with the number of layers (N) equal to the number of channels (N) in the normalized feature map F{circumflex over ( )}. The masked image M is provided as input for processing by the feature importance network 434.
[0033] The feature importance network 434 includes a neural network 435 such as a convolutional neural network (CNN) having a plurality of layers. The neural network 435 can employ machine learning (ML) and/or deep neural network (DNN) techniques. In some examples, the neural network 435 is a 3-layer CNN. In one or more examples, the neural network 435 can include other types of neural networks, such as, e.g., a recurrent neural network (RNN) or a multilayer perceptron.
[0034] When operating in inference mode, the neural network 435 operates on the masked image M, and the activation function 436 is applied to the output of the neural network 435 to generate a feature importance vector V.sub.F. The feature importance vector V.sub.F is a 1× N vector (where N is the number of channels) which includes a set of weights w.sub.k, each weight w.sub.k representing a feature importance score for the k-th channel of the feature map. In one or more examples, a batch normalization process (not shown in
[0035] The feature importance vector V.sub.F is then combined with the feature map 228 in the weighted sum unit 437. The weighted sum unit 437 applies a weighted sum function to generate the attention map 140 (A.sub.M) via an activation function 439 as follows:
A.sub.M=ReLU(Σ.sub.k=1.sup.Nw.sub.kF.sub.M.sup.k) EQ. (4)
[0036] where A.sub.M is the generated attention map 140, w.sub.k is the k-th weight of the feature importance vector V.sub.F, F.sub.M.sup.k is the k-th channel of the feature map 228, and ReLU( ) is the rectified linear unit function. The attention map 140 is provided to the perception module 220 and to the overlay module 260 as described above. In embodiments, the rectified linear unit function (ReLU) is selected as the activation function 439 applied to the output of the weighted sum unit 437. The ReLU function is used as the activation function 439 to remove features with negative influence. Other activation functions that serve to remove features with negative influence can be substituted for the rectified linear unit function as the activation function 439.
[0037] Each of the system 100 and the system 200 is trained with a set of input training images containing examples of the types of objects for which classification is desired. In some examples, the system is trained end-to-end. In the end-to-end training scenario, the neural network in each of the perception module (e.g., the neural network 322) and the neural network in the attention module (e.g., the neural network 435) are trained at the same time. The system is trained in an end-to-end manner using training loss calculated as the combination of the Softmax function and cross-entropy at the perception module in an image classification task. The attention module is optimized by the attention mechanism of the perception branch to improve the classification accuracy without any additional loss function. In some examples, the neural network 322 and the neural network 435 are trained separately. In this scenario, the neural network 322 is trained first, and then the neural network 435 is trained. In one or more examples, the neural network 322 is a pre-trained neural network model, and the neural network 435 is trained using the pre-trained neural network model as the neural network 322.
[0038]
[0039] The method 500 begins at illustrated processing block 510 by generating, via a feature extraction network, based on an input image, one or more of a first feature map or a second feature map. In examples, the feature extraction network corresponds to the feature extraction network 221 (
[0040] Illustrated processing block 520 provides for generating, via a feature importance network, a feature importance vector based on combining the input image and the first feature map. In examples, the feature importance network corresponds to the feature importance network 234 (
[0041] Illustrated processing block 530 provides for generating an attention map based on a weighted sum of the feature importance vector and the first feature map. Illustrated processing block 540 provides for determining a classification output based on combining the attention map and one or more of the first feature map or the second feature map. Illustrated processing block 550 provides for generating a feature visualization image by overlaying the attention map onto the input image.
[0042]
[0043] The method 560 includes illustrated processing block 562, which provides for generating an intermediate image by applying one or more of a downsize function or a greyscale function to the input image. Illustrated processing block 564 provides for generating an intermediate feature map by applying a normalize function to the first feature map. Illustrated processing block 566 provides for generating a masked image by multiplying, via element-wise multiplication, the intermediate image and the intermediate feature map.
[0044]
[0045] The method 570 includes, at illustrated processing block 572, computing the specific weighted sum Σ.sub.k=1.sup.Nw.sub.kF.sub.M.sup.k, wherein weights w.sub.k are derived from respective coefficients of the feature importance vector, and F.sub.M.sup.k is a k-th channel of the first feature map. Illustrated processing block 574 provides for applying an activation function to a result of the specific weighted sum. In some embodiments, the rectified linear unit function (ReLU) is used as the activation function.
[0046]
[0047] The method 580 includes, at illustrated processing block 582, generating an output map by combining, via an attention mechanism, the attention map and one or more of the first feature map or the second feature map. More particularly, in examples the attention mechanism at processing block 584 includes computing an equation F.sub.O=F.sub.L .Math.(1+A.sub.M), wherein F.sub.O is the output map, F.sub.L is the one or more of the first feature map or the second feature map, A.sub.M is the attention map, and .Math. denotes an element-wise multiplication function. The method 580 then continues at illustrated processing block 586 which provides for applying an activation function to the output map. In some embodiments, the Softmax function is used as the activation function.
[0048]
[0049] The image classification and visualization system as described herein can be used in a variety of image classification applications, including applications involving the aircraft industry. In one example aircraft application, the image classification and visualization system can be used to review images of an aircraft or its components and make determinations of a state of the aircraft or the components—such as, e.g., whether a defect (e.g., surface defect such as scratch, bubble, dent, etc.) is present. As another example aircraft application, the image classification and visualization system can be used to review images of aircraft and make determinations of an identification of the aircraft or its components—such as, e.g., whether the aircraft is a Boeing 737, a Boeing 747, a Boeing 757, etc. As another example aircraft application, the image classification and visualization system can be used to review images of the ground or airspace surrounding an aircraft and make determinations for autonomous piloting or to assist piloting of the aircraft—such as, e.g., identification of nearby objects, landing strips, etc. The foregoing examples are described for illustrative purposes only, and the disclosed technology is not limited in application to the examples described herein.
[0050]
[0051] The processor 702 can include one or more processing devices such as a microprocessor, a central processing unit (CPU), a fixed application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, a field-programmable gate array (FPGA), etc., along with associated circuitry, logic, and/or interfaces. The processor 702 can include, or be connected to, a memory (such as, e.g., the memory 708) storing executable instructions and/or data, as necessary or appropriate. The processor 702 can execute such instructions to implement, control, operate or interface with any components or features of the system 100, the system 200, and/or any of the components or methods described herein with reference to
[0052] The I/O subsystem 704 includes circuitry and/or components suitable to facilitate input/output operations with the processor 702, the memory 708, and other components of the computing system 700.
[0053] The network interface 706 includes suitable logic, circuitry, and/or interfaces that transmits and receives data over one or more communication networks using one or more communication network protocols. The network interface 706 can operate under the control of the processor 702, and can transmit/receive various requests and messages to/from one or more other devices. The network interface 706 can include wired or wireless data communication capability; these capabilities support data communication with a wired or wireless communication network. The network interface 706 can support communication via a short-range wireless communication field, such as Bluetooth, NFC, or RFID. Examples of network interface 706 include, but are not limited to, one or more of an antenna, a radio frequency transceiver, a wireless transceiver, a Bluetooth transceiver, an ethernet port, a universal serial bus (USB) port, or any other device configured to transmit and receive data.
[0054] The memory 708 includes suitable logic, circuitry, and/or interfaces to store executable instructions and/or data, as necessary or appropriate, when executed, to implement, control, operate or interface with any components or features of the system 100, the system 200, and/or any of the components or methods described herein with reference to
[0055] The data storage 710 can include any type of device or devices configured for short-term or long-term storage of data such as, for example, memory devices and circuits, memory cards, hard disk drives, solid-state drives, non-volatile flash memory, or other data storage devices. The data storage 710 can include or be configured as a database, such as a relational or non-relational database, or a combination of more than one database. In some examples, a database or other data storage can be physically separate and/or remote from the computing system 700, and/or can be located in another computing device, a database server, on a cloud-based platform, or in any storage device that is in data communication with the computing system 700.
[0056] The artificial intelligence (AI) accelerator 712 includes suitable logic, circuitry, and/or interfaces to accelerate artificial intelligence applications, such as, e.g., artificial neural networks, machine vision and machine learning applications, including through parallel processing techniques. In one or more examples, the AI accelerator 712 can include a graphics processing unit (GPU). The AI accelerator 712 can implement one or more components or features of the system 100, the system 200, and/or components or methods described herein with reference to
[0057] The user interface 716 includes code to present, on a display, information or screens for a user and to receive input (including commands) from a user via an input device. The display 720 can be any type of device for presenting visual information, such as a computer monitor, a flat panel display, or a mobile device screen, and can include a liquid crystal display (LCD), a light-emitting diode (LED) display, a plasma panel, or a cathode ray tube display, etc. The display 720 can include a display interface for communicating with the display. In some examples, the display 720 can include a display interface for communicating with a display external to the computing system 700.
[0058] In some examples, one or more of the illustrative components of the computing system 700 can be incorporated (in whole or in part) within, or otherwise form a portion of, another component. For example, the memory 708, or portions thereof, can be incorporated within the processor 702. As another example, the user interface 716 can be incorporated within the processor 702 and/or code in the memory 708. In some examples, the computing system 700 can be embodied as, without limitation, a mobile computing device, a smartphone, a wearable computing device, an Internet-of-Things device, a laptop computer, a tablet computer, a notebook computer, a computer, a workstation, a server, a multiprocessor system, and/or a consumer electronic device. In some examples, the computing system 700, or portion thereof, is implemented in one or more modules as a set of logic instructions stored in at least one non-transitory machine- or computer-readable storage medium such as random access memory (RAM), read only memory (ROM), programmable ROM (PROM), firmware, flash memory, etc., in configurable logic such as, for example, programmable logic arrays (PLAs), field programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), in fixed-functionality logic hardware using circuit technology such as, for example, application specific integrated circuit (ASIC), complementary metal oxide semiconductor (CMOS) or transistor-transistor logic (TTL) technology, or any combination thereof.
Additional Notes and Examples
[0059] Further, the disclosure comprises additional examples as detailed in the following clauses.
[0060] Clause 1: A computing system comprising a processor, and a memory coupled to the processor, the memory storing instructions which, when executed by the processor, cause the computing system to perform operations comprising generating, via a feature extraction network, based on an input image, one or more of a first feature map or a second feature map, generating, via a feature importance network, a feature importance vector based on combining the input image and the first feature map, generating an attention map based on a weighted sum of the feature importance vector and the first feature map, determining a classification output based on combining the attention map and one or more of the first feature map or the second feature map, and generating a feature visualization image by overlaying the attention map onto the input image.
[0061] Clause 2: The computing system of clause 1, wherein the feature extraction network comprises a first neural network including a plurality of convolution layers, wherein the first feature map is obtained from a last layer of the plurality of convolution layers, and wherein the feature importance network comprises a second neural network.
[0062] Clause 3: The computing system of clause 1 or 2, wherein the second feature map is obtained from an intermediate layer, other than the last layer, of the plurality of convolution layers.
[0063] Clause 4: The computing system of clause 1, 2 or 3, wherein combining the input image and the first feature map comprises generating an intermediate image by applying one or more of a downsize function or a greyscale function to the input image, generating an intermediate feature map by applying a normalize function to the first feature map, and generating a masked image by multiplying, via element-wise multiplication, the intermediate image and the intermediate feature map.
[0064] Clause 5: The computing system of any of clauses 1-4, wherein generating an attention map based on a weighted sum of the feature importance vector and the first feature map comprises computing a specific weighted sum Σ.sub.k=1.sup.N w.sub.kF.sub.M.sup.k, wherein weights w.sub.k are derived from respective coefficients of the feature importance vector, and F.sub.M.sup.k is a k-th channel of the first feature map, and applying an activation function to a result of the specific weighted sum.
[0065] Clause 6: The computing system of any of clauses 1-5, wherein combining the attention map and one or more of the first feature map or the second feature map comprises generating an output map by combining, via an attention mechanism, the attention map and one or more of the first feature map or the second feature map, and applying an activation function to the output map.
[0066] Clause 7: The computing system of any of clauses 1-6, wherein the attention mechanism comprises computing an equation F.sub.O=F.sub.L .Math.(1+A.sub.M), wherein F.sub.O is the output map, F.sub.L is the one or more of the first feature map or the second feature map, A.sub.M is the attention map, and .Math. denotes an element-wise multiplication function.
[0067] Clause 8: The computing system of any of clauses 1-7, wherein the input image comprises an image of at least a portion of an aircraft or an aircraft component, and wherein the classification output comprises a determination of at least one of an identification of or a state of the aircraft or the aircraft component.
[0068] Clause 9: The computing system of any of clauses 1-8, wherein at least one of the first neural network or the second neural network is implemented by an artificial intelligence (AI) accelerator.
[0069] Clause 10: A method comprising generating, via a feature extraction network, based on an input image, one or more of a first feature map or a second feature map, generating, via a feature importance network, a feature importance vector based on combining the input image and the first feature map, generating an attention map based on a weighted sum of the feature importance vector and the first feature map, determining a classification output based on combining the attention map and one or more of the first feature map or the second feature map, and generating a feature visualization image by overlaying the attention map onto the input image.
[0070] Clause 11: The method of clause 10, wherein the feature extraction network comprises a first neural network including a plurality of convolution layers, wherein the first feature map is obtained from a last layer of the plurality of convolution layers, and wherein the feature importance network comprises a second neural network.
[0071] Clause 12: The method of clause 10 or 11, wherein the second feature map is obtained from an intermediate layer, other than the last layer, of the plurality of convolution layers.
[0072] Clause 13: The method of clause 10, 11 or 12, wherein combining the input image and the first feature map comprises generating an intermediate image by applying one or more of a downsize function or a greyscale function to the input image, generating an intermediate feature map by applying a normalize function to the first feature map, and generating a masked image by multiplying, via element-wise multiplication, the intermediate image and the intermediate feature map.
[0073] Clause 14: The method of any of clauses 10-13, wherein generating an attention map based on a weighted sum of the feature importance vector and the first feature map comprises computing a specific weighted sum Σ.sub.k=1.sup.Nw.sub.kF.sub.M.sup.k, wherein weights w.sub.k are derived from respective coefficients of the feature importance vector, and F.sub.M.sup.k is a k-th channel of the first feature map, and applying an activation function to a result of the specific weighted sum.
[0074] Clause 15: The method of any of clauses 10-14, wherein combining the attention map and one or more of the first feature map or the second feature map comprises generating an output map by combining, via an attention mechanism, the attention map and one or more of the first feature map or the second feature map, and applying an activation function to the output map.
[0075] Clause 16: The method of any of clauses 10-15, wherein the attention mechanism comprises computing an equation F.sub.O=F.sub.L .Math.(1+A.sub.M), wherein F.sub.O is the output map, F.sub.L is the one or more of the first feature map or the second feature map, A.sub.M is the attention map, and .Math. denotes an element-wise multiplication function.
[0076] Clause 17: The method of any of clauses 10-16, wherein the input image comprises an image of at least a portion of an aircraft or an aircraft component, and wherein the classification output comprises a determination of at least one of an identification of or a state of the aircraft or the aircraft component.
[0077] Clause 18: At least one non-transitory computer readable medium comprising instructions which, when executed by a computing system, cause the computing system to perform operations comprising generating, via a feature extraction network, based on an input image, one or more of a first feature map or a second feature map, generating, via a feature importance network, a feature importance vector based on combining the input image and the first feature map, generating an attention map based on a weighted sum of the feature importance vector and the first feature map, determining a classification output based on combining the attention map and one or more of the first feature map or the second feature map, and generating a feature visualization image by overlaying the attention map onto the input image.
[0078] Clause 19: The at least one non-transitory computer readable medium of clause 18, wherein the feature extraction network comprises a first neural network including a plurality of convolution layers, wherein the first feature map is obtained from a last layer of the plurality of convolution layers, and wherein the feature importance network comprises a second neural network.
[0079] Clause 20: The at least one non-transitory computer readable medium of clause 18 or 19, wherein the second feature map is obtained from an intermediate layer, other than the last layer, of the plurality of convolution layers.
[0080] Clause 21: The at least one non-transitory computer readable medium of clause 18, 19 or 20, wherein combining the input image and the first feature map comprises generating an intermediate image by applying one or more of a downsize function or a greyscale function to the input image, generating an intermediate feature map by applying a normalize function to the first feature map, and generating a masked image by multiplying, via element-wise multiplication, the intermediate image and the intermediate feature map.
[0081] Clause 22: The at least one non-transitory computer readable medium of any of clauses 18-21, wherein generating an attention map based on a weighted sum of the feature importance vector and the first feature map comprises computing a specific weighted sum Σ.sub.k=1.sup.N w.sub.kF.sub.M.sup.k, wherein weights w.sub.k are derived from respective coefficients of the feature importance vector, and F.sub.M.sup.k is a k-th channel of the first feature map, and applying an activation function to a result of the specific weighted sum.
[0082] Clause 23: The at least one non-transitory computer readable medium of any of clauses 18-22, wherein combining the attention map and one or more of the first feature map or the second feature map comprises generating an output map by combining, via an attention mechanism, the attention map and one or more of the first feature map or the second feature map, and applying an activation function to the output map.
[0083] Clause 24: The at least one non-transitory computer readable medium of any of clauses 18-23, wherein the attention mechanism comprises computing an equation F.sub.O=F.sub.L .Math.(1+A.sub.M), wherein F.sub.O is the output map, F.sub.L is the one or more of the first feature map or the second feature map, A.sub.M is the attention map, and .Math. denotes an element-wise multiplication function.
[0084] Clause 25: The at least one non-transitory computer readable medium of any of clauses 18-24, wherein the input image comprises an image of at least a portion of an aircraft or an aircraft component, and wherein the classification output comprises a determination of at least one of an identification of or a state of the aircraft or the aircraft component.
[0085] Clause 26: The computing system of any of clauses 1-4, wherein generating an attention map based on a weighted sum of the feature importance vector and the first feature map comprises computing a specific weighted sum Σ.sub.k=1.sup.N w.sub.kF.sub.M.sup.k, wherein weights w.sub.k are derived from respective coefficients of the feature importance vector, and F.sub.M.sup.k is a k-th channel of the first feature map; and applying an activation function to a result of the specific weighted sum; and wherein the attention mechanism comprises computing an equation F.sub.O=F.sub.L .Math.(1+A.sub.M), wherein F.sub.O is the output map, F.sub.L is the one or more of the first feature map or the second feature map, A.sub.M is the attention map, and .Math. denotes an element-wise multiplication function.
[0086] Embodiments are applicable for use with all types of semiconductor integrated circuit (“IC”) chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SoCs), SSD (solid state drive)/NAND controller ASICs, and the like. In addition, in some of the drawings, signal conductor lines are represented with lines. Some can be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner. Rather, such added detail can be used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit. Any represented signal lines, whether or not having additional information, can actually comprise one or more signals that can travel in multiple directions and can be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.
[0087] Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured. In addition, well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments. Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the platform or computing system within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits) are set forth in order to describe example embodiments, it should be apparent to one skilled in the art that embodiments can be practiced without, or with variation of, these specific details. The description is thus to be regarded as illustrative instead of limiting.
[0088] The term “coupled” may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and applies to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections, including logical connections via intermediate components (e.g., device A can be coupled to device C via device B). In addition, the terms “first”, “second”, etc. may be used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.
[0089] As used in this application and in the claims, a list of items joined by the term “one or more of” can mean any combination of the listed terms. For example, the phrases “one or more of A, B or C” can mean A; B; C; A and B; A and C; B and C; or A, B and C.
[0090] Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments described herein can be implemented in a variety of forms. Therefore, while the embodiments have been described in connection with particular examples thereof, the true scope of the embodiments should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.