Attack-less adversarial training for robust adversarial defense

Abstract

Disclosed herein is attack-less adversarial training for robust adversarial defense. The attack-less adversarial training for robust adversarial defense includes the steps of: (a) generating individual intervals (c.sub.i) by setting the range of color (C) and then discretizing the range of color (C) by a predetermined number (k); (b) generating one batch from an original image (X) and training a learning model with the batch; (c) predicting individual interval indices (ŷ.sub.i.sup.alat) from respective pixels (x.sub.i) of the original image (X) by using an activation function; (d) generating a new image (X.sup.alat) through mapping and randomization; and (e) training a convolutional neural network with the image (X.sup.alat) generated in step (d) and outputting a predicted label (Ŷ).

Claims

1. A method of attack-less adversarial training for robust adversarial defense, comprising the steps of: (a) generating individual intervals (c.sub.i) by setting a range of color (C) and then discretizing the range of color (C) by a predetermined number (k); (b) generating one batch from an original image (X) and training a learning model with the batch; (c) predicting individual interval indices (ŷ.sub.i.sup.alat) from respective pixels (x.sub.i) of the original image (X) by using an activation function; (d) generating a new image (X.sup.alat) through mapping and randomization; and (e) training a convolutional neural network with the image (X.sup.alat) generated in step (d) and outputting a predicted label (Ŷ), wherein the step (b) comprises: (b-1) generating individual accurate interval indices (y.sub.i) by randomly extracting a plurality of pixels (x.sub.i) from the original image (X) and then allowing the extracted pixels (x.sub.i) to the respective intervals (c.sub.i) generated in step (a); (b-2) generating a plurality of instances each including each of the pixels (x.sub.i) and a corresponding one of the accurate interval indices (y.sub.i); (b-3) generating one batch including the plurality of instances generated in step (b-2); and (b-4) training a learning model with the batch generated in step (b-3).

2. The method of claim 1, wherein the step (d) comprises: (d-1) mapping the individual predicted interval indices (ŷ.sub.i.sup.alat) and returning corresponding intervals (c.sub.i); (d-2) randomly generating individual new pixels (x.sub.i.sup.alat) within a range of the individual intervals (c.sub.i) returned in step (d-1); and (d-3) generating a new image (X.sup.alat) by allocating the individual new pixels (x.sub.i.sup.alat), generated in step (d-2), to respective locations of the individual pixels (x.sub.i) of the original image (X).

3. The method of claim 1, wherein the activation function used in the step (c) is a softmax function.

4. A method of attack-less adversarial training for robust adversarial defense, comprising the steps of: (a) generating individual intervals (c.sub.i) by setting a range of color (C) and then discretizing the range of color (C) by a predetermined number (k); (b) generating one batch from an original image (X) and training a learning model with the batch; (c) predicting individual interval indices (ŷ.sub.i.sup.alat) from respective pixels (x.sub.i) of the original image (X) by using an activation function; (d) generating a new image (X.sup.alat) through mapping and randomization; and (e) training a convolutional neural network with the image (X.sup.alat) generated in step (d) and outputting a predicted label (Ŷ), wherein the step (d) comprises: (d-1) mapping the individual predicted interval indices (ŷ.sub.i.sup.alat) and returning corresponding intervals (c.sub.i); (d-2) randomly generating individual new pixels (x.sub.i.sup.alat) within a range of the individual intervals (c.sub.i) returned in step (d-1); and (d-3) generating a new image (X.sup.alat) by allocating the individual new pixels (x.sub.i.sup.alat), generated in step (d-2), to respective locations of the individual pixels (x.sub.i) of the original image (X).

5. The method of claim 4, wherein the activation function used in the step (c) is a softmax function.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) The above and other objects, features, and advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:

(2) FIG. 1 is a flow chart illustrating the concept of ALAT;

(3) FIG. 2 is a block diagram illustrating the concept of ALAT;

(4) FIG. 3 is a view showing comparisons between original images and ALAT images;

(5) FIG. 4 is a view showing comparisons between images perturbed by FGSM, BIM, MIM and L2-CW attack techniques from an MNIST dataset for individual cases;

(6) FIG. 5 shows comparisons in the performance of the ALAT method applied to different attack scenarios in terms of accuracy;

(7) FIG. 6 shows comparisons in the performance of the ALAT method applied to different attack scenarios in terms of distortion;

(8) FIG. 7 shows comparisons in accuracy between a case using one ALAT model and a case using 20 ALAT models;

(9) FIG. 8 is a view showing comparisons among original images, RNI images, and ALAT images;

(10) FIG. 9 shows comparisons between an ALAT method and an RNI method; and

(11) FIG. 10 shows comparisons between the ALAT method and an adversarial training method.

DETAILED DESCRIPTION

(12) Embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that those having ordinary skill in the art to which the present invention pertains can easily practice the technical spirit of the present invention.

(13) However, the following embodiments are merely examples intended to help an understanding of the present invention, and thus the scope of the present invention is not reduced or limited by the embodiments. Furthermore, the present invention may be embodied in many different forms, and are not limited to the embodiments set forth herein.

(14) Attack-less adversarial training is a defense technique that generates a new image from an original image through mapping and randomization and trains a neural network with the generated new image, thereby robustly defending the neural network against state-of-the-art attack techniques.

Embodiment 1: Main Steps of Attack-Less Adversarial Training

(15) Hereinafter, attack-less adversarial training according to the present invention will be referred to as “ALAT.”

(16) The main steps of ALAT will be described below.

(17) FIG. 1 is a flow chart illustrating the concept of ALAT.

(18) Referring to FIG. 1, a first step is step 1010 of generating intervals c.sub.i by setting the range of color C and then discretizing the range of color C by a predetermined number.

(19) When the range of color C is discretized into k intervals, a resulting set of intervals is {c.sub.i|c.sub.i⊂C}, where c.sub.i=c.sub.1, c.sub.2, . . . , c.sub.k. In this case, the minimum value of intervals c.sub.i is s.sub.i.sub.min.sup.alat, and the maximum value thereof is s.sub.i.sub.max.sup.alat.

(20) For example, when color C=[0,255], color C is discretized into five intervals and each of the intervals is equally divided at [0,255], c.sub.1=[s.sub.1.sub.min.sup.alat, s.sub.1.sub.max.sup.alat]=[0,51], c.sub.2=[s.sub.2.sub.min.sup.alat, s.sub.2.sub.max.sup.alat]=[52,102], c.sub.3=[s.sub.3.sub.min.sup.alat, s.sub.3.sub.max.sup.alat]=[103,153], c.sub.4=[s.sub.4.sub.min.sup.alat, s.sub.4.sub.max.sup.alat]=[154,204], and c.sub.5=[s.sub.5.sub.min.sup.alat, s.sub.5.sub.max.sup.alat]=[205,255] are obtained.

(21) A second step is step 1020 of generating one batch from an original image X and then training a learning model.

(22) First, a plurality of pixels x.sub.i is randomly extracted from the original image X, and individual accurate interval indices y.sub.i are generated by mapping the extracted individual pixels x.sub.i to the respective generated intervals c.sub.i generated in the above-described first step.

(23) Furthermore, a plurality of instances is each generated by including each pixel x.sub.i and an accurate interval index y.sub.i corresponding to the pixel x.sub.i. In this case, the generated instance may be represented as (x.sub.i, y.sub.i), where x.sub.i is a randomly extracted pixel and y.sub.i is an accurate interval index corresponding to the randomly extracted pixel.

(24) Furthermore, there is generated one batch including the plurality of generated instances.

(25) Finally, the learning model is trained by inputting the generated one batch to the learning model.

(26) For example, when the randomly extracted pixel x.sub.1 is 38 in the pervious example, an accurate interval index y.sub.i corresponding to the randomly extracted pixel x is 1. The reason for this is that 38 is a number in the range of 0 to 51. In other words, the reason why the accurate interval index y.sub.i is 1 is that 38 is a number that belongs to the intervals c.sub.1=[0,51].

(27) In this case, the generated instance is (38,1). Furthermore, the instances (113,3), (204,4), and (3,1) may be generated in the same manner. Furthermore, there is generated one batch including a plurality of generated instances (38,1), (113,3), (204,4), and (3,1). Finally, a learning model may be trained by inputting the generated batch to the learning model.

(28) Hereinafter, the learning model trained in the second step of the embodiment 1 of the present invention will be referred to as the “ALAT model.”

(29) A third step is step 1030 of outputting each interval index ŷ.sub.i.sup.alat predicted from each pixel x.sub.i of the original image X by using an activation function.

(30) The equation for predicting the interval index is as follows:
ŷ.sub.i.sup.alat=a(wx.sub.i+b)

(31) where ŷ.sub.i.sup.alat is a predicted interval index, w is a weight, x.sub.i is a pixel of an original image, b is a bias, and a(⋅) is a softmax function, which is an activation function.

(32) In this case, an accurate interval index is represented by y.sub.i, and a predicted interval index is represented by ŷ.sub.i.sup.alat. These symbols are distinctively used because a predicted interval index is not an accurate interval index value but an interval index value that can be predicted by a trained ALAT model.

(33) A fourth step is step 1040 of generating a new image X.sup.alat through mapping and randomization.

(34) First, each predicted interval index ŷ.sub.i.sup.alat is mapped, and each interval c.sub.i corresponding to each predicted interval index ŷ.sub.i.sup.alat returned. A function that returns the interval c.sub.i is as is follows:
c.sub.i=colorset(ŷ.sub.i.sup.alat)

(35) where colorset(⋅) is a function that returns each interval c.sub.i from the predicted interval index ŷ.sub.i.sup.alat.

(36) Furthermore, each new pixel x.sub.i.sup.alat is randomly generated within the range of the individual mapped intervals c.sub.i.

(37) A function that generates the new pixel x.sub.i.sup.alat is defined as follows:
x.sub.i.sup.alat=random.sub.ci(s.sub.i.sub.min.sup.alat,s.sub.i.sub.max.sup.alat)

(38) where random.sub.ci is a random function that generates a random value within the range of the minimum value s.sub.i.sub.min.sup.alat of c.sub.i to the maximum value s.sub.i.sub.max.sup.alat of c.sub.i.

(39) Finally, an ALAT image X.sup.alat is generated by allocating each new pixel x.sub.i.sup.alat to the location of each pixel x.sub.i of the original image X.

(40) For example, in the pervious example, when a pixel x.sub.2 of the original image is 75, the interval index ŷ.sub.2.sup.alat predicted by the ALAT model may be 2. Furthermore, the predicted interval index ŷ.sub.2.sup.alat=2 is mapped by a colorset function, and the interval c.sub.2=[52,102] is returned. Furthermore, the new pixel x.sub.2.sup.alat=85 may be randomly generated within the range of the minimum value 52 of the mapped interval c.sub.2 to the maximum value 102 of c.sub.2. Finally, a new image X.sup.alat may be generated by allocating the new pixel x.sub.2.sup.alat to the location of the pixel x.sub.2 of the original image X and performing the same on the remaining pixels of the original image in the same manner. In this case, the image X.sup.alat newly generated from the original image is referred to as the “ALAT image.” Furthermore, a method that is applied to the first to fourth steps of embodiment 1 of the present invention is referred to as an ALAT method below.

(41) A fifth step is step 1050 of training a convolutional neural network (CNN) with the ALAT image X.sup.alat generated in the above-described fourth step and outputting a predicted label Ŷ.

(42) An equation that trains the convolutional neural network with the ALAT image X.sup.alat is as follows:
Ŷ=F(X.sup.alat)

(43) where the function F(⋅) is a function that generates the predicted label Ŷ for one image.

(44) FIG. 2 is a block diagram illustrating the concept of ALAT.

(45) Referring to FIG. 2, the convolutional neural network includes three convolutional layers and one fully connected layer.

(46) Referring to FIG. 2, there can be seen the process of generating the ALAT image by reproducing each pixel of the original image using an ALAT method and generating a predicted label by inputting the generated ALAT image to the convolutional neural network.

(47) FIG. 3 is a view showing comparisons between original images and ALAT images. Referring to FIG. 3, three pairs of images can be seen. In each pair of images, the left image is an original image, and the right image is an ALAT image. In embodiment 1, ALAT images generated from original images may be represented as shown in FIG. 3.

Experimental Example 1: Comparisons in the Performance of the ALAT Method Applied to Different Attack Scenarios

(48) First, symbols mainly used in experimental example 1 are as follows.

(49) {circumflex over (X)} is an adversarial image of the original image X. Ŷ is a predicted label that may be output by inputting one image to a convolutional neural network (CNN).

(50) Furthermore, the function F(⋅) is a function that generates a predicted label f for an image. Furthermore, the attack technique A(⋅) is a function that generates the adversarial image {circumflex over (X)} from the original image X with or without the function F(⋅). Furthermore, D(X.sub.1, X.sub.2) is the distance between two images X.sub.1 and X.sub.2.

(51) Attack techniques that were applied to experimental example 1 of the present invention include the Fast Gradient Sign Method (FGSM), the Basic Iterative Method (BIM), the Momentum Iterative Method (MIM), the L2-Carlini & Wagner's Attack (L2-CW), the Backward Pass Differentiable Approximation (BPDA), and the Expectation Over Transformation (EOT).

(52) First, FGSM (the Fast Gradient Sign Method) is a fast and simple attack technique that was proposed by Goodfellow et al. and generates an adversarial example.

(53) BIM (The Basic Iterative Method) is an extension of the FGSM that was proposed by Kurakin et al. and applies multiple iterations with a small step size in order to obtain the smallest perturbation of an original image.

(54) MIM (the Momentum Iterative Method) is an attack technique that was proposed by Dong et al. and is more advanced than BIM because it is equipped with a momentum algorithm.

(55) L2-CW is an attack technique that is effective in finding an adversarial example with the smallest perturbation.

(56) BPDA (Backward Pass Differentiable Approximation) is an attack technique that replaces a non-differentiable layer in a neural network with a differentiable approximation function during the back-propagation step.

(57) EOT (Expectation Over Transformation) is an attack technique that allows for the generation of adversarial examples that remain adversarial over a chosen distribution T of transformation functions taking an input.

(58) Furthermore, in experimental example 1 of the present invention, there were used Modified National Institute of Standards and Technology (MNIST), Fashion MNIST, and Canadian Institute For Advanced Research (CIFAR-10).

(59) For the CIFAR-10 dataset, another CIFAR-10 (grayscale) was generated in order to analyze the effect of the ALAT method on a color image.

(60) MNIST and fashion MNIST had 60,000 training images and 10,000 test images associated with 10 grade labels. The size of each image was 28×28 grayscales. CIFAR-10 had 50,000 training images and 10,000 test images associated with 10 grades. The size of each image was 32×32 color.

(61) In experimental example 1 of the present invention, when FGSM, BIM and MIM attacks were applied, ∈=77/255, which was the largest perturbation allowed for each pixel, was set for a MNIST dataset and ∈=8/255 was set for Fashion MNIST and CIFAR-10 datasets.

(62) Furthermore, for an L2-CW attack, the number of iterations for the execution of an attack was set to 1,000.

(63) In the present invention, the ALAT method may be evaluated based on individual cases having different attack scenarios. In this case, the individual cases are a normal case, case A, case B, case C, and case D.

(64) The process of generating the ALAT image X.sup.alat by applying the ALAT method to the original image X may be expressed by the following equation:
X.sup.alat=F.sup.alat(X)

(65) In the normal case, a convolutional neural network is evaluated using the ALAT method in the testing phase. The normal case is a case where an attack is not applied, in which case a convolutional neural network may be tested using an original image. A defense mechanism generates the ALAT image by applying the ALAT method to the original image.
X.sup.alat=F.sup.alat(X)

(66) Furthermore, the defense mechanism applies the ALAT image to the trained convolutional neural network.
Ŷ=F(X.sup.alat)=(F.sup.alat(X))

(67) In case A, the convolutional neural network is evaluated using the ALAT method in the testing phase. An attacker knows the parameters of the trained convolutional neural network, but does not know about the ALAT method. The attacker generates an adversarial image from the original image by using the parameters of the trained convolutional neural network.
{circumflex over (X)}=A(F,X)

(68) The defense mechanism generates the ALAT image by applying the ALAT method to the received adversarial image.
{circumflex over (X)}.sup.alat=F.sup.alat({circumflex over (X)})

(69) Furthermore, the defense mechanism applies the ALAT image to the trained convolutional neural network.
Ŷ=F({circumflex over (X)}.sup.alat)=F(F.sup.alat({circumflex over (X)}))=F(F.sup.alat(A(F,X)))

(70) In case B, the convolutional neural network is evaluated without the ALAT method in the testing phase. An attacker knows the parameters of the trained convolutional neural network, but does not know about the ALAT method. The attacker generates an adversarial image from an image by using the parameters of the trained convolutional neural network.
{circumflex over (X)}=A(F,X)

(71) The trained convolutional neural network uses the adversarial image as an input without undergoing a preprocessing process by the ALAT method.
Ŷ=F({circumflex over (X)})=F(A(F,X))

(72) In case C, the convolutional neural network is evaluated without the ALAT method in the testing phase. An attacker knows both the parameters of the trained convolutional neural network and the parameters of the ALAT model. The attacker generates an adversarial image from the original image by using the parameters of the trained convolutional neural network and the parameters of the ALAT model.
{circumflex over (X)}=A(F,F.sup.alat,X)

(73) The trained convolutional neural network uses the adversarial image as an input without undergoing a preprocessing process by the ALAT method.
Ŷ=F({circumflex over (X)})=F(A(F,F.sup.alat,X))

(74) In case D, the convolutional neural network is evaluated using the ALAT method in the testing phase. An attacker knows both the parameters of the trained convolutional neural network and the parameters of the ALAT model. The attacker generates an adversarial image from the original image by using the parameters of the trained convolutional neural network and the parameters of the ALAT model.
{circumflex over (X)}=A(F,F.sup.alat,X)

(75) The defense mechanism generates the ALAT image by applying the ALAT method to the received adversarial image.
{circumflex over (X)}.sup.alat=F.sup.alat({circumflex over (X)})

(76) Furthermore, the defense mechanism applies the newly generated ALAT image to the trained convolutional neural network.
Ŷ=F({circumflex over (X)}.sup.alat)=F(F.sup.alat({circumflex over (X)}))=F(F.sup.alat(A(F,F.sup.alat,X)))

(77) FIG. 4 is a view showing comparisons between images perturbed by FGSM, BIM, MIM and L2-CW attack techniques from an MNIST dataset for the individual cases.

(78) Referring to FIG. 4, the first column of FIG. 4 shows original images. The second column of FIG. 4 shows ALAT images generated from the original images in the normal case. The case A and case D columns of FIG. 4 show ALAT images generated from adversarial images generated by the FGSM, BIM, MIM and L2-CW attack techniques. The case B and case C columns of FIG. 4 show the adversarial images generated by the FGSM, BIM, MIM and L2-CW attacks.

(79) Meanwhile, the adversarial images generated from cases C and D have a large perturbation because it is difficult for an attacker to calculate the derivative of a randomization method used in the ALAT method. In order to mitigate the differential calculation problem of the randomization method and to minimize the high distortion of the adversarial image generated in case D, each attack technique is integrated with the BPDA method or the EOT method.

(80) If the defense system generates an obfuscated gradient, the attack technique cannot obtain appropriate gradient information to generate adversarial examples. Furthermore, when the BPDA method or EOT method is integrated with each attack technique, the conventional defense system is known to be unable to fully defend against the adversarial examples due to the obfuscated gradient.

(81) To evaluate whether or not the ALAT method generates the obfuscated gradient, each attack technique is integrated with the BPDA method or EOT method.

(82) Assuming that the ALAT image X.sup.alat is generated by adding some noise to the original image X, an equation for obtaining X.sup.alat is as follows:
X.sup.alat≈X+γ.sup.alat

(83) In this case, γ.sup.alat is a noise matrix.

(84) In this case, a predicted label may be calculated as follows:
Ŷ=F(X.sup.alat)≈W(X+γ.sup.alat)+B
Ŷ≈WX.sup.alat+B

(85) In this case, W is a weight matrix, and B is a bias matrix.

(86) In the above equation, it can be seen that the derivation of Ŷ in terms of X.sup.alat returns only W. From this, it can be seen that the adversarial examples generated in the attack scenarios of cases C and D have a larger perturbation than the adversarial examples generated from the attack scenarios of cases A and B.

(87) To minimize perturbation in the attack scenarios of cases C and D, adversarial examples are generated using BPDA for the attack scenarios of cases C and D.

(88) First, a preprocessing method of converting the original image into the ALAT image is executed. After the ALAT image has been input to the convolutional neural network, a predicted value and loss value of the convolutional neural network are obtained. Thereafter, during back propagation, the adversarial ALAT image {circumflex over (X)}.sup.alat is generated by adding the ALAT image X.sup.alat to a loss function value for X.sup.alat.
{circumflex over (X)}.sup.alat=X.sup.alat+∈.Math.∇.sub.X.sub.alatL(X.sup.alat,y)

(89) where ∈ is the largest perturbation allowed for each pixel and L is the loss function.

(90) Finally, noise γ.sup.alat is subtracted from adversarial ALAT image {circumflex over (X)}.sup.alat.

(91) The general equation of BPDA used in experimental example 1 is as follows:
{circumflex over (X)}=Clip.sub.X(A(F,X.sup.alat)−(X.sup.alat−X))
{circumflex over (X)}=Clip.sub.X({circumflex over (X)}.sup.alat−γ.sup.alat)

(92) where A(⋅) is an attack technique.

(93) To evaluate the EOT method, a final ALAT image is generated by generating 10 ALAT images and calculating the average of the images.
X.sub.final.sup.alat=E(X.sub.f1..10.sup.alat)

(94) An adversarial image is generated using the final ALAT image.

(95) FIG. 5 shows comparisons in the performance of the ALAT method applied to different attack scenarios in terms of accuracy.

(96) Referring to FIG. 5, it can be seen that the ALAT method exhibited better performance than a non-ALAT method. This means that a convolutional neural network trained with the ALAT method was more robust than a convolutional neural network trained without the ALAT method. Furthermore, it can be seen that the performance of the convolutional neural network was robust even in case B to which MNIST and Fashion MNIST datasets were applied. Furthermore, it can be seen that the convolutional neural network having used the ALAT method in cases A and D exhibited better performance than the convolutional neural network having used the ALAT method in cases B and C.

(97) Referring to FIG. 5, although the performance of the convolutional neural network was slightly degraded in the ALAT method in which BPDA and EOT were integrated with different attack techniques, the accuracy of ALAT (BPDA) and ALAT (10-EOT) was similar to that of ALAT (case A). Accordingly, it can be seen that ALAT effectively defends against methods specialized to attack obfuscated gradients like BPDA and EOT.

(98) FIG. 6 shows comparisons in the performance of the ALAT method applied to different attack scenarios in terms of distortion.

(99) Referring to FIG. 6, the ALAT method exhibited better performance in a grayscale dataset than in a color dataset. This can be seen by comparing CIFAR-10 and CIFAR-10 (grayscale) datasets. As can be seen from ALAT (BPDA) and ALAT (10-EOT) of FIG. 6, the perturbation of the generated adversarial image was considerably reduced in the ALAT method in which BPDA and EOT were integrated with the different attack techniques.

(100) Meanwhile, the reason for this is that in order to prevent a human from recognizing perturbation in an adversarial example, the perturbation between the original image and the adversarial image needs to be as low as possible.

Experimental Example 2: Comparisons in Performance Between a Case Using a Single ALAT Model and a Case Using a Plurality of ALAT Models

(101) In experimental example 2, the same benchmark datasets and attack techniques used in experimental example 1 are also applied.

(102) A plurality of ALAT models is trained, and each pixel of an original image may be predicted using one of the plurality of trained ALAT models. In other words, to predict the pixel of the original image, one ALAT model may be randomly selected from among the plurality of ALAT models. Thereafter, the step of randomly selecting one ALAT model from among the plurality of ALAT models is repeated until all the pixels are reproduced.

(103) As described above, in experimental example 1, cases C and D are not practical, and are thus excluded from the experiment.

(104) FIG. 7 shows comparisons in accuracy between a case using one ALAT model and a case using 20 ALAT models. Referring to FIG. 7, it can be seen that the case using 20 ALAT models exhibited better performance than the case using one ALAT model.

Experimental Example 3: Comparisons in Performance Between the ALAT Method and the Random Noise Injection (RNI) Method

(105) In experimental example 3, the same benchmark datasets and attack techniques used in experimental example 1 are also applied.

(106) The RNI method is applied to three different steps including (1) the training phase, (2) the testing phase, and (3) both the training and testing phases.

(107) In the RNI method, a uniform distribution is used, and a distribution range extends from −1.0 to +1.0.

(108) A noise value generated from a uniform distribution is added to an original image. Furthermore, the summed output is clipped into a range of 0.0 to 1.0 (a normalized pixel value).

(109) The equation of RNI may be expressed as follows:
X.sub.i.sup.RNI=Clip(X.sub.i+U(−1,+1))

(110) where X.sub.i is the original image and X.sub.i.sup.RNI is an image generated from the RNI method. Furthermore, U(−1, +1) is a uniform distribution ranging from −1 to +1.

(111) FIG. 8 is a view showing comparisons among original images, RNI images, and ALAT images.

(112) Referring to FIG. 8, in each dataset, the left image is an original image, the center image is an RNI image, and the right image is an ALAT image. From FIG. 8, it can be seen that the ALAT method may generate an image recognizable by a human, unlike the RNI method.

(113) FIG. 9 shows comparisons between the ALAT method and the RNI method.

(114) Referring to FIG. 9, it can be seen that the ALT method exhibited better performance than the RNI method in the training phase and both the stages. Furthermore, as the number of epochs increased, the ALAT method exhibited increasingly better performance than the RNI method in terms of accuracy.

(115) Referring to FIG. 9, the RNI image had a larger perturbation than the ALAT image. Accordingly, when an image is generated using the RNI method, it is difficult for the convolutional neural network to classify the image correctly.

Experimental Example 4: Comparisons in Performance Between the ALAT Method and the Adversarial Training Method

(116) In experimental example 4, the same benchmark datasets and attack techniques used in experimental example 1 are also applied.

(117) Adversarial training is related to attack techniques in the training phase. When a low-level attack technique such as an FGSM attack is used in adversarial training, the low-level attack technique provides lower performance than a high-level attack technique such as a BIM or MIM attack. For a realistic experiment, an MIM attack may be set as the attack technique that is used for adversarial training.

(118) FIG. 10 shows comparisons between the ALAT method and the adversarial training method.

(119) Referring to FIG. 10, it can be seen that in most cases, the ALAT method offers better performance than the adversarial training method.

Experimental Example 5: Performance of the ALAT Method According to the Number of Intervals k

(120) In experimental example 5, the same benchmark datasets and attack techniques used in experimental example 1 are also applied.

(121) In experimental example 5, the effect of the number of intervals k on the convolutional neural network is analyzed. In experimental example 5, k=3, k=4 and k=10 are set.

(122) As described above in conjunction with experimental example 1, cases C and D are not practical, and are thus excluded from the experiment.

(123) Table 1 shows comparisons in the performance of the ALAT method among a normal case, case A, and case B to which different numbers (k) of intervals were applied at their 1000th epoch.

(124) Referring to Table 1, the performance of the ALAT method in which k=3 was best. In other words, in the case of the ALAT method in which k=3, winning nodes are 19 in number. In contrast, in the case of the ALAT method in which k=4, winning nodes are 15 in number, and in the case of the ALAT method in which k=10, winning nodes are 2 in number.

(125) When an appropriate k is used in the ALAT method, the robustness of the convolutional neural network is improved.

(126) TABLE-US-00001 TABLE 1 Accuracy Normal Case A Case B Dataset Attack k = 3 k = 4 k = 10 k = 3 k = 4 k = 10 k = 3 k = 4 k = 10 MNIST FGSM, ϵ = 0.300 0.9871 0.9796 0.9609 0.9699 0.8151 0.8209 0.8787 0.8409 0.7500 BIM, ϵ = 0.300 0.9829 0.9306 0.8572 0.7930 0.6446 0.5935 MIM, ϵ = 0.300 0.9686 0.7000 0.5913 0.7504 0.5380 0.4350 L.sub.2-CW 0.9853 0.9902 0.9877 0.6624 0.8423 0.4854 Fashion FGSM ϵ = 0.031 0.8663 0.8941 0.8892 0.8459 0.8494 0.8250 0.7642 0.8212 0.8029 MNIST BIM, ϵ = 0.031 0.8435 0.8460 0.8271 0.7271 0.7913 0.7712 MIM, ϵ = 0.031 0.8367 0.8427 0.8157 0.6815 0.7622 0.7241 L.sub.2-CW 0.8450 0.8500 0.8283 0.1048 0.1590 0.1532 CIFAR-10 FGSM, ϵ = 0.031 0.5201 0.5764 0.5918 0.3344 0.2868 0.2026 0.2265 0.2254 0.1906 BIM, ϵ = 0.031 0.3534 0.2872 0.1972 0.2104 0.2072 0.1868 MIM, ϵ = 0.031 0.3040 0.2349 0.1812 0.1905 0.1851 0.1781 L.sub.2-CW 0.4667 0.4668 0.3884 0.1728 0.1742 0.1665 CIFAR-10 FGSM, ϵ = 0.031 0.4569 0.5210 0.5545 0.3371 0.3676 0.2126 0.2217 0.2079 0.1949 (grayscale) BIM, ϵ = 0.031 0.3451 0.3191 0.2083 0.2079 0.2079 0.1869 MIM, ϵ = 0.031 0.3158 0.2719 0.1841 0.1888 0.1903 0.1757 L.sub.2-CW 0.4290 0.4630 0.4154 0.1845 0.1763 0.1656

(127) The attack-less adversarial training for robust adversarial defense according to the present invention, which is configured as described above, provides the effects of improving the robustness of a neural network and not generating an obfuscated gradient.

(128) Furthermore, the attack-less adversarial training for robust adversarial defense according to the present invention offers better performance than the random noise injection method and the adversarial training method.

(129) Moreover, the attack-less adversarial training for robust adversarial defense according to the present invention does not require any attack technique unlike the conventional adversarial training and defending against new and state-of-the-art attack techniques.

(130) Although the specific embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

Attack-less adversarial training for robust adversarial defense

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06F21/64

PHYSICS

Classification Explorer

G06F21/55

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

International classification

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06F21/64

PHYSICS

Abstract

Claims

Description