Plasma processing apparatus, data processing apparatus and data processing method
11605530 · 2023-03-14
Assignee
Inventors
- Seiichi Watanabe (Tokyo, JP)
- Satomi Inoue (Tokyo, JP)
- Shigeru Nakamoto (Tokyo, JP)
- Kousuke Fukuchi (Tokyo, JP)
Cpc classification
International classification
Abstract
According to an embodiment of the present invention, a plasma processing apparatus includes: a processing chamber in which plasma processing is performed to a sample; a radio frequency power source that supplies radio frequency power for generating plasma in the processing chamber; and a data processing apparatus that performs processing to light emission data of the plasma. The data processing apparatus performs the processing to the light emission by using an adaptive double exponential smoothing method for varying a smoothing parameter based on an error between input data and a predicted value of smoothed data. A response coefficient of the smoothing parameter is derived by a probability density function including the error as a parameter.
Claims
1. A plasma processing apparatus comprising: a chamber in which plasma processing is performed to a sample; a radio frequency power source configured to supply radio frequency power for generating plasma in the processing chamber; a data processing apparatus which, when executing a program to perform processing to light emission data of the plasma, configures the data processing apparatus to: perform the processing to the light emission data by using an adaptive double exponential smoothing method, the adaptive double exponential smoothing method processing data, while varying a smoothing parameter based on an error between input data based on monitoring the light emission data and a predicted value of smoothed data; smooth the input data by using a polynomial fitting method at a latest time of receiving the input data; calculate the predicted value of smoothed data by using the smoothed input data; perform a first order differential, by using a polynomial fitting method, to the predicted value of smoothed data; calculate a predicted value of a slope of the smoothed data by using the predicted value of the smoothed data to which the first order differential is performed; and derive a response coefficient of the smoothing parameter by using a probability density function including the error as a parameter; and a system control apparatus configured to receive results of the processing to the light emission data and control the plasma processing apparatus by determining an etching end point.
2. The plasma processing apparatus according to claim 1, wherein the response coefficient of the smoothing parameter is derived by an N power of a value acquired by adding a constant to a predicted value of a relative value of the error, the predicted value being divided by a predicted value of an absolute value of the error, in a case where N is an integer of 0 or more.
3. A data processing apparatus connected to a plasma processing apparatus including a chamber in which plasma processing is performed to a sample and a radio frequency power source for generating plasma in the processing chamber, the data processing apparatus which, when executing a program to perform processing to light emission data of the plasma, configures the data processing apparatus to: perform the processing to the light emission data by using an adaptive double exponential smoothing method, the adaptive double exponential smoothing method processing data while varying a smoothing parameter based on an error between input data based on monitoring the light emission data and a predicted value of smoothed data; smooth the input data by using a polynomial fitting method at a latest time of receiving the input data; calculate the predicted value of the smoothed data by using the smoothed input data; perform a first order differential, by using a polynomial fitting method, to the predicted value of smoothed data; calculate a predicted value of a slope of the smoothed data by using the predicted value of the smoothed data to which the first order differential is performed; and derive a response coefficient of the smoothing parameter by using a probability density function including the error as a parameter; and a system control apparatus configured to receive results of the processing to the light emission data and control the plasma processing apparatus by determining an etching end point.
4. The data processing apparatus according to claim 3, wherein a response coefficient of the smoothing parameter is derived by an N power of a value acquired by adding a constant to a predicted value of a relative value of the error, the predicted value being divided by a predicted value of an absolute value of the error, in a case where N is an integer of 0 or more.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
DESCRIPTION OF THE PREFERRED EMBODIMENTS
(17) Embodiments of the present invention will be described below with reference to the drawings.
First Embodiment
(18) First, a data processing apparatus according to a first embodiment of the present invention will be described using
(19)
(20) As necessary, a data display apparatus (not illustrated) is disposed in addition to the above apparatuses. The data processing apparatus 1 can input/output data into/from a system 6 to be an object (for example, apparatus and analytical data). Accordingly, the system 6 to be an object is controlled with high precision. According to the present embodiment, the system 6 to be an object is a microwave plasma processing apparatus. The data processing apparatus 1 may be independently used and can be used for data analysis.
(21) The data input/output apparatus 2 can input/output, for example, processing data and a parameter of a data processing program. The data input/output apparatus 2 collectively or sequentially receives data to be processed from, for example, the system 6 to be an object. The data storage apparatus 3, such as a RAM, stores the data received by the data input/output apparatus 2. The data calculation processing apparatus 5 performs data smoothing processing and data differential processing to the data in accordance with a data processing program stored in the data processing program storage apparatus 4, such as a RAM. After the data calculation by the data calculation processing apparatus 5, the data input/output apparatus 2 outputs data smoothing processing result data and data differential processing result data to the system 6 to be an object. Then, the pieces of data are used in order to control the system 6 to be an object.
(22)
(23) In this case, the processing is performed by a type I adaptive double exponential smoothing processing method expressed by the following Expressions (5) to (14).
Smoothing data: S1.sub.t=α1.sub.tY1.sub.t+(1−α1.sub.t)(S1.sub.t−1+B1.sub.t−1) Expression (5)
Slope of smoothed data: B1.sub.t=γ1.sub.t(S1.sub.t−S1.sub.t−1)+(1−γ1.sub.t)B1.sub.t−1 Expression (6)
Smoothing coefficient: α1.sub.t=(K.sub.α−L.sub.α)F.sub.α+L.sub.α Expression (7)
Response coefficient: F.sub.αt=(|δα.sub.t/Δα.sub.t|+φ).sup.N Expression (8)
Relative error: δα.sub.t=A1(Y1.sub.t−S1.sub.t)+(1−A1)δα.sub.t−1 Expression (9)
Absolute error: Δα.sub.t=A1|Y1.sub.t−S1.sub.t|+(1−A1)Δα.sub.t−1+φ Expression (10)
Smoothing coefficient: γ1.sub.t=(K.sub.γ−L.sub.γ)Fγ+L.sub.γ Expression (11)
Response coefficient: F.sub.γt=(|δγ.sub.t/Δγ.sub.t|+φ).sup.N Expression (12)
Relative error: δγ.sub.t=A2{(S1.sub.t−S1.sub.t−1)−B1.sub.t}+(1−A2)δγ.sub.t−1 Expression (13)
Absolute error: Δγ.sub.t=A2|(S1.sub.t−S1.sub.t−1)−B1.sub.t|+(1−A2)Δγ.sub.t−1+φ Expression (14)
(24) Here, the input data is defined as, for example, time series data Y1.sub.t: t=1, 2, . . . . The predicted value S1.sub.t of smoothing of data of output and the predicted value Bit of a sloped of smoothed data of the output can be acquired by the sequential data processing. Symbols K.sub.α, L.sub.α, K.sub.γ, L.sub.γ, N, A1, A2, and φ are arbitrary constants. Note that, 1>K.sub.α>L.sub.α>0, 1>K.sub.γ>L.sub.γ>0, 1>A1>0, and 1>A2>0 are satisfied. Symbol φ serves to prevent the absolute errors Δα.sub.t and Δγ.sub.t or the response coefficients F.sub.αt and F.sub.γt from becoming zero. An extremely small value is selected as φ in order to make less effect with respect to ordinary calculation.
(25) In a case of N=0 in Expressions (8) and (12), the response coefficients F.sub.α=1 and F.sub.γ=1 are satisfied and the smoothing coefficients α.sub.t=K.sub.α and γ.sub.t=K.sub.γ are satisfied so as to be constant. Therefore, in the case of N=0, an ordinary double exponential smoothing method is made. In a case of N=1, there is provided the adaptive double exponential smoothing method in which each of the smoothing coefficients is proportional to a corresponding relative error/absolute error. One of the smoothing coefficients varies in a range between K.sub.α and L.sub.α and the other varies in a range between K.sub.γ to L.sub.γ. Similarly, in a case of N=5, there is provides the adaptive double exponential smoothing method in which each of the smoothing coefficients is proportional to the fifth power of the corresponding relative error/absolute error. One of the smoothing coefficients varies in a range between K.sub.α and L.sub.α and the other varies in a range between K.sub.γ to L.sub.γ.
(26) In a case where the corresponding relative error/absolute error is small, the adaptive double exponential smoothing method decreases each of the smoothing coefficients so as to improve data smoothing performance. Meanwhile, in a case where the corresponding relative error/absolute error is large, the adaptive double exponential smoothing method increases each of the smoothing coefficients so as to improve data responding performance. There is a trade-off relationship between the data smoothing performance and the data responding performance. A balance between them varies depending on the above value of N. Thus, the above value of N is selected in accordance with a characteristic of the input data.
(27) Next, a method for deriving the above initial value, will be described. Typically, in double exponential smoothing processing, an initial value of the predicted value S1 of smoothing of data and an initial value of the predicted value B1 of a slope of smoothed data are derived by, for example, the following method. The initial value of the predicted value S1 of smoothing of data is derived by, for example, S1.sub.1=input data Y1.sub.1 (Method A1) or S1.sub.1=an average value of initial N pieces of input data ({Y1.sub.1+Y1.sub.2+ . . . +Y1.sub.N}/N) (Method A2).
(28) The initial value of the predicted value B1 of a slope of smoothed data is derived by, for example, B1.sub.1=Y1.sub.2−Y1.sub.1 (Method B1) or B2.sub.1={(Y1.sub.2−Y1.sub.1)+(Y1.sub.4−Y1.sub.3)}/2 (Method B2). Typically, the double exponential smoothing processing has a problem that an error is large immediately after data processing is performed. One of the reasons is that there are large errors between the initial values by the above deriving method in the related art, and a predicted value of smoothing of originally initial data and a predicted value of a slope of originally initial smoothed data.
(29) Here, a least squares method derives a polynomial approximate expression by using initial data Y1.sub.t (t=1, 2, . . . , N) after starting to input predetermined N pieces of data. Time series data including initial ten pieces of data with regular time intervals, is used. The polynomial approximate expression derived above derives a predicted value S1.sub.0 of smoothing of data and a predicted value B1.sub.0 of a slope of smoothed data that are pieces of virtual data at t=0 just before the input data.
(30) Here, a linear primary expression is used as a polynomial approximate expression. The predicted value S1.sub.0 of smoothing of data and the predicted value B1.sub.0 of a slope of smoothed data are derived by Expressions (15) and (16), respectively.
Predicted value of smoothing of data: S1.sub.0={330Y1.sub.1+275Y1.sub.2+220Y1.sub.3+165Y1.sub.4+110Y1s+55Y1.sub.6+0Y17−55Y1.sub.8−110Y1.sub.9−165Y1.sub.10}/825 Expression (15)
Predicted value of slope of smoothed data: B1.sub.0={−45Y1.sub.1−35Y1.sub.2−25Y1.sub.3−15Y1.sub.4−5Y1.sub.5+5Y1.sub.6+15Y1.sub.7+25Y1.sub.8+35Y1.sub.9+45Y1.sub.10}/825 Expression (16)
(31) An initial value S2.sub.1 of a predicted value of smoothing of data and an initial value B21 of a predicted value of a slope of smoothed data are set to satisfy S2.sub.1=S1.sub.1 and B2.sub.1=0, respectively, in second double exponential smoothing processing illustrated in
(32) Here, the polynomial approximate expression has derived the predicted value S1.sub.0 of smoothing of data and the predicted value B1.sub.0 of a slope of smoothed data that are pieces of virtual data at t=0 just before the input data. Each of the predicted values derived by the polynomial approximate expression at arbitrary time, such as t=1, may be used as each of the initial values. Note that, in this case, there is a disadvantage in terms of, for example, data smoothing processing and data differential processing with short time steps when compared to a case where the virtual data at t=0 is used.
(33) As described above, the first double exponential smoothing processing is performed so as to acquire the predicted value S1.sub.t of smoothing of data of first output and the predicted value B1.sub.t of a slope of smoothed data of the first output. Next, a second double exponential smoothing processing is performed by using the following Expressions (17) and (18) with the predicted value B1.sub.t of a slope of smoothed data of the first output as second input data Y2t. Then, a predicted value S2t of smoothing of data of second output and a predicted value B2.sub.t of a slope of smoothed data of the second output, are acquired.
Smoothing of data: S2.sub.t=α2Y2.sub.t+(1−α2)(S2.sub.t−1+B2.sub.t−1) Expression (17)
Slope of smoothed data: B2.sub.t=γ2(S2.sub.t−S2.sub.t−1)+(1−γ2)B2.sub.t−1 Expression (18)
(34) Data smoothing processing result data S1.sub.t, data first order differential processing result data S2.sub.t, and data second order differential processing result data B2.sub.t are output as collective or sequential data. Here, the smoothing parameter α2 of smoothing of data and the smoothing parameter γ2 of the slope of smoothed of data in the second double exponential smoothing are previously set to be arbitrary constants. Note that 0<α2<1 and 0<γ2<1 are satisfied. The predicted value B1.sub.t of a slope of smoothed data of the first output also corresponds to the first order differential processing result and may be used. However, a variation of the data result is large. Thus, the second double exponential smoothing processing performs data smoothing processing to the predicted value B1.sub.t of a slope of smoothed data.
(35) The above type I adaptive double exponential smoothing method and a method for performing double exponential smoothing processing twice, can dramatically improve both an increase of an S/N ratio and shortening of delay time in each of, for example, the data smoothing processing result, the data first order differential processing result, and the data second order differential processing result. In a case where a change point of a system to be an object is detected based on a first differential data and a second differential data and an apparatus to be an object is controlled close to real time with further high precision, improvement of data processing performance, such as further shortening of time delay, improvement of responsiveness, and further a high S/N ratio together, is made. Thus, control accuracy of the apparatus to be an object can be improved.
(36) Accordingly, in the first double exponential smoothing processing illustrated in
Smoothing of data: S1.sub.t=α1.sub.tY1.sub.t+(1−α1.sub.t)(S1.sub.t−1+B1.sub.t−1) Expression (19)
Slope of smoothed data: B1.sub.t=α1.sub.t(S1.sub.t−S1.sub.t−1)+(1−α1.sub.t)B1.sub.t−1 Expression (20)
Smoothing coefficient: α1.sub.t=(K−L)F.sub.t+L Expression (21)
Response coefficient: F.sub.t=1−Exp[−δα.sub.t.sup.2/(2σ.sub.t.sup.2)] Expression (22)
Relative error: δα.sub.t=A1(Y.sub.t−S1.sub.t)+(1−A1)δα.sub.t−1 Expression (23)
Predicted error variance: σ.sub.t.sup.2=A1(Y.sub.t−S1.sub.t).sup.2+(1−A1)σ.sub.t−1.sup.2 Expression (24)
(37) where K, L, and A1 are arbitrary constants. Note that 1>K>L>0, and 1>A1>0 are satisfied. An initial value of at is calculated from standard deviation of an error between input data and a result of a polynomial approximation using initial several pieces of data, in the input data like the above initial value deriving method. The response coefficient F.sub.t is acquired by subtracting a probability density function (normal distribution function, Gaussian distribution function) from 1. Typically, the error sometimes becomes normal distribution. Accordingly, the response coefficient F.sub.t uses the probability density function representing normal distribution. Accordingly, the smoothing coefficient suitable for minimizing the error can be set in accordance with a data change.
(38) As illustrated in
(39) Here, PDF represents a probability density function. The graph with 1-PDF has a response coefficient characteristic similar to that of the graph with N=5. In a case where the deviation/the standard deviation σ<approximately 1 is satisfied, the response coefficient is larger than that of the graph with N=5. In a case where the deviation/the standard deviation σ>approximately 1 is satisfied, the response coefficient is smaller than that of the graph with N=5. According to the response characteristic, there is an effect that data smoothing processing and data differential processing can be performed with data smoothing performance (S/N ratio) and data responsiveness (shortening of delay time) compatible with each other.
(40) In the above first double exponential smoothing processing, in a case of type III according to the present embodiment, data processing is performed by using the following Expressions (25) to (32).
Smoothing of data: S1.sub.t=α1.sub.tY1.sub.t+(1−α1.sub.t)(S1.sub.t−1+B1.sub.t−1) Expression (25)
Slope of smoothed data: B1.sub.t=α1.sub.t(S1.sub.t-S1.sub.t−1)+(1−α1.sub.t)B1.sub.t−1 Expression (26)
Smoothing coefficient: α1.sub.t=(K−L)F.sub.t+L Expression (27)
Response coefficient: F.sub.t=1−Exp[−δα.sub.t.sup.2/(2σ.sub.t.sup.2)] Expression (28)
Relative coefficient: δα.sub.t=A1.sub.t(Y1.sub.t−S1.sub.t)+(1−A1.sub.t)δα.sub.t−1 Expression (29)
Predicted error variance: σ.sub.t.sup.2=A1.sub.t(Y1.sub.t−S1.sub.t).sup.2+(1−A1.sub.t)σ.sub.t−1.sup.2 Expression (30)
Adaptive addition coefficient: A1.sub.t=MAX(A1, β.sub.tA1 max) Expression (31)
Slope coefficient: β.sub.t=1−Exp[−B.sub.t.sup.2/(2NN C).sup.2] Expression (32)
where A1.sub.max is an upper limit value of the addition coefficient (0<A1.sub.max<1), and NN is a sensitivity coefficient and an arbitrary constant. Symbol C is a slope calculated from a result of a linear approximate expression using initial several pieces of data in input data like the above initial value deriving method. The type III is achieved by introducing the adaptive addition coefficient into the type II. Expressions (25) to (30) in the type III are similar to Expressions (19) to (24) in the type II.
(41) According to the present embodiment, the addition coefficient is changed in accordance with the change of the slope. According to Expressions (29) and (30), the addition coefficient A1.sub.t corresponds to a coefficient of exponential weighted average processing. Therefore, the coefficient of exponential weighted average processing is changed in accordance with the initial slope C. In a case where a slope B1.sub.t of smoothing data increases compared to the initial slope C, the adaptive addition coefficient A1.sub.t is decreased and a ratio of a coefficient of the latest data is effectively increased so that responsiveness is improved. Regarding the adaptive addition coefficient A1.sub.t, the slope coefficient β.sub.t is first calculated by Expression (32).
(42) The slope coefficient β.sub.t is calculated by using a probability density function (normal distribution function, Gaussian distribution function) like the above response coefficient F.sub.t. In Expression (31), the set value A1 is used as the adaptive addition coefficient A1.sub.t in a normal state. When the slope of smoothing data increases and β.sub.tA1.sub.max becomes larger than the set value A1, the value of β.sub.tA1.sub.max is used as an addition coefficient. Note that the upper limit value A1.sub.max can be set in order to inhibit the responsiveness from excessively improving.
(43) In a case where the slope of smoothed data has become similar to the initial slope due to introduction of the adaptive addition coefficient A1.sub.t, the addition coefficient remains so as to be the set value. In a case where the slope of smoothed data indicating a change of a data state becomes large, when the adaptive addition coefficient A1.sub.t is increased, specific gravity of the latest data becomes large. Thus, there is an effect that the responsiveness of data processing further improves.
(44) In the above first double exponential smoothing processing, in a case of type IV according to the present embodiment, data processing is performed by using the following Expressions (33) to (42).
Smoothing of data: S1.sub.t=α1.sub.tSGB0D.sub.t+(1−α1.sub.t)(S1.sub.t−1+B1.sub.t−1) Expression (33)
Slope of smoothed data: B1.sub.t=α1.sub.tSGB1D.sub.t+(1−α1.sub.t)B1.sub.t−1 Expression (34)
SGB0D.sub.t=(83Y1.sub.t+54Y1.sub.t−1+30Y1.sub.t−2+11Y1.sub.t−3−3Y1.sub.t−4−12Y1.sub.t−5−16Y1.sub.t−6−15Y1.sub.t−7−9Y1.sub.t−8+2Y1.sub.t−9+18Y1.sub.t−10)/143 Expression (35)
SGB1D.sub.t=(945S1.sub.t+456S1.sub.t−1+67S1.sub.t−2−222S1.sub.t−3−411S1.sub.t−4−500S1.sub.t−5−489S1.sub.t−6−378S1.sub.t−7−167S1.sub.t−8+144S1.sub.t−9+555S1.sub.t−10)/4290 Expression (36)
Smoothing coefficient: α1.sub.t=(K−L)F.sub.t+L Expression (37)
Response coefficient: F.sub.t=1−Exp[−δα.sub.t.sup.2/(2σ.sub.t.sup.2)] Expression (38)
Relative error: δα.sub.t=A1.sub.t(Y.sub.t−S1.sub.t)+(1−A1.sub.t)δα.sub.t−1 Expression (39)
Predicted error variance: σ.sub.t.sup.2=A1.sub.t(Y.sub.t−S1.sub.t).sup.2+(1−A1.sub.t)σ.sub.t−1.sup.2 Expression (40)
Adaptive addition coefficient: A1.sub.t=MAX(A1, β.sub.tA1.sub.max) Expression (41)
Slope coefficient: β.sub.t=1−Exp[−B.sub.t.sup.2/(2 NN C).sup.2] Expression (42)
(45) Here, by a backward Savitzky-Golay method described in James W Taylor, Journal of Forecasting, 2004, (23), pp 385-394, SGB0D.sub.t uses input data Y1.sub.t to Y1.sub.t−10 including successive eleven terms, and is a data smoothing processing result at a point in time of the latest data Y1.sub.t. Similarly, by the backward Savitzky-Golay method, SGB1D.sub.t uses predicted values S1.sub.t to S1.sub.t−10 of smoothing of data including successive eleven terms, and is a first order differential processing result at a point in time of a predicted value S1.sub.t of smoothing of the latest data. Note that, James W Taylor, Journal of Forecasting, 2004, (23), pp 385-394 may partially include an error. Therefore, the error has been corrected so as to be used.
(46) Expressions (33), (34), (37) to (42) are substantially the same as Expressions (25) to (32) in the above type III. Expressions (5) and (6) that are basic parts of the double exponential smoothing method and the type I adaptive double exponential smoothing method, mean the following Expressions (43) and (44), respectively.
Smoothing of data: S1.sub.t=α1.sub.t(input data).sub.t+(1−α1.sub.t)(S1.sub.t−1+B1.sub.t−1) Expression (43)
Slope of smoothed data: B1.sub.t=γ1.sub.t(slope of a predicted value of smoothing of data).sub.t+(1−γ1.sub.t)B1.sub.t−1 Expression (44)
(47) In the type IV, a result to which data smoothing processing has been performed by using the backward Savitzky-Golay method, is used as pre-processing instead of input data in Expression (43). A result to which a first order differential processing has been performed by using the backward Savitzky-Golay method, is used as pre-processing instead of the slope of a predicted value of smoothed data in Expression (44). Typically, a polynomial fitting method (Savitzky-Golay method) sometimes uses a series of a plurality of pieces of data so as to derive a data processing result at the center point of a period of the series of the plurality of pieces of data. Here, this is defined as a center Savitzky-Golay method. In contrast, a case where a data processing result at a point in time of the latest data is derived, is defined as the backward Savitzky-Golay method.
(48) In a case of the center Savitzky-Golay method, the data processing result is derived at the center point of a series of data. Therefore, occurrence of delay time cannot be avoided in the data processing. In a Savitzky-Golay method, as the number of data to be used increases, an S/N ratio improves. In the center method, as the number of data to be used increases, the delay time increases. In a case of the backward Savitzky-Golay method, since the data processing result is derived at a point in time of the latest data, a delay of the data processing hardly occurs. However, the backward method decreases an S/N ratio when compared to the center method.
(49) As the examination, when a case of the center method including five pieces of data and a case of the backward method including eleven pieces of data are compared to each other, it has been found that substantially the same S/N ratios can be acquired. The Savitzky-Golay method uses a plurality of pieces of data so as to perform a polynomial approximation to a quadratic curve or a cubic curve. Thus, when the number of data increases, the Savitzky-Golay method cannot be applied to a change of a cubic or more during a period during which the number of the data has been used. That is, when the number of data to be used increases, a risk that a radio frequency component is lost, occurs.
(50) In consideration of the above problems, in the type IV, data smoothing processing of the backward Savitzky-Golay method including eleven pieces of data is used for the input data in Expression (33). A data first order differential processing value of the backward Savitzky-Golay method including eleven pieces of data, is used for the slope of a predicted value of smoothing of data in Expression (34). Accordingly, according to the present embodiment, there is an effect that S/N ratios of data smoothing processing and data differential processing improve without increasing data delay time and with inhibiting degradation of a frequency characteristic of the data processing as small as possible. In the type IV, the backward Savitzky-Golay method including eleven pieces of data has been used. The number of data may be changed in accordance with data processing performance to be required. Alternatively, a Savitzky-Golay method in which data processing is performed at a point in time, for example, between the center method and the backward method, may be used.
(51) Furthermore, as pre-processing other than the Savitzky-Golay method, a result in which data smoothing processing has been performed to input data, may be used for the input data in Expression (43). A result in which data first order differential processing has been performed to a predicted value of smoothing of data, may be used for the slope of a predicted value of smoothing of data in Expression (44).
(52)
(53) The processing chamber 11 is arranged in a region in which coils 17 and 18 and a yoke 19 generate a magnetic field. A microwave having a frequency of 2.45 GHz oscillated by a magnetron 20 propagates, in a rectangular TE10 mode, into a rectangular waveguide 22 through an isolator (not illustrated), a power monitor (not illustrated), and a matching unit 21. Then, the microwave propagates, in a circular TE11 mode, into a circular waveguide 24 through a circle/rectangle converter 23. After that, the microwave is introduced to a cavity resonator 25 and passes through the quartz plate 9 and the quartz shower plate 15 so as to enter into the processing chamber 11. A magnetic field region with a magnetic flux density of 875 Gauss causes electron cyclotron resonance, together with the microwave having a frequency of 2.45 GHz to be introduced. Inside the processing chamber 11, the magnetic field region is formed perpendicular to both the center axis of the processing chamber 11 and a direction in which the microwave is introduced. In addition, the magnetic field region is formed on an entire surface in a cross-sectional direction with respect to the center axis of the processing chamber 11.
(54) Etching processing is performed to a wafer 27 disposed on a wafer disposing electrode 26 that is a sample stage, by using plasma mainly generated by interaction between the microwave having a frequency of 2.45 GHz and a magnetic field having a magnetic flux density of 875 Gauss. In order to control an etching shape of the wafer 27 that is a sample, the wafer disposing electrode 26 is coupled to a radio frequency power source 28 through a matching unit (not illustrated) so that a radio frequency voltage can be applied. A chiller unit (not illustrated) is coupled to the wafer disposing electrode 26 so that a temperature of the wafer 27 can be controlled.
(55) Each of the processing chamber 11, the wafer 27, and the wafer disposing electrode 26 is coaxially disposed. Each of a gas hole region of the quartz shower plate 15 introducing the etching gas, the on-off valve 12 for exhaust that is an evacuation unit, the exhaust speed variable valve 16, and the evacuation device 13 is also coaxially disposed with respect to the processing chamber 11. Accordingly, a gas flow is coaxially symmetric on the wafer 27. The coils 17 and 18 and the yoke 19 that generate a magnetic field are also coaxially disposed with respect to the processing chamber 11. Thus, a magnetic field profile and an electron cyclotron resonance region having a magnetic flux density of 875 Gauss in the processing chamber 11 are coaxially formed with respect to the processing chamber 11. The circular waveguide 24 and the cavity resonator 25 are also coaxially disposed with respect to the processing chamber 11. Thus, the microwave to be introduced into the processing chamber 11 is also coaxially introduced with respect to the processing chamber 11.
(56) The magnetic field is coaxially generated with respect to the processing chamber 11 and the microwave is also coaxially introduced with respect to the processing chamber 11. Thus, the plasma formed by the interaction between the magnetic field and the microwave, is coaxially generated with respect to the processing chamber 11. Accordingly, electrons and ions in the plasma are coaxially transported with respect to the wafer 27. A flow of the etching gas is also coaxial with respect to the processing chamber 11. Thus, radicals generated by the plasma and a reaction product due to etching of the wafer 27 are also coaxially introduced and discharged with respect to the wafer 27. Therefore, etching processing can be performed with uniform etching process processing performance, such as an etching rate, a material selection ratio, and an etching shape, in a surface of the wafer.
(57) Light emission, from the side of the processing chamber 11, from the plasma generated in the processing chamber 11, passes through the quartz window 10 and an optical fiber 29. Then, the light emission is introduced into a spectroscope 30 so as to be output as time series data of wavelength dependency of light intensity. Light emission from the plasma from an upper part of the processing chamber 11 passes through the quartz shower plate 15, the quartz plate 9, the cavity resonator 25, the circular waveguide 24, the circle/rectangle converter 23, and an optical fiber 31. Then, the plasma light emission is introduced into a spectroscope 32 so as to be output as time series data of wavelength dependency of light intensity.
(58) The etching gas and the etching reaction product from the wafer 27 are introduced into the processing chamber 11. The interaction between the microwave and the magnetic field separates these so as to generate the plasma. Accordingly, light emission from the plasma generated in the processing chamber 11 includes information on atoms, molecules, radicals included in the etching gas and the etching reaction product, and reactants of the atoms, the molecules, and the radicals.
(59) For example, a typical poly-Si etching using an Si substrate on which a poly-Si film and an SiO.sub.2 film are disposed below a pattern mask, is required to perform Poly-Si etching with a high selection ratio with respect to the lower layer SiO.sub.2. A halogen based gas is used for the etching gas. The etching reaction product includes Si that is a material to be etched, and a halogen. Since the etching reaction product is re-separated by the plasma, the spectroscope 30 or the spectroscope 32 monitors light intensity of light emission having a wavelength of 288 nm resulting from the Si, from the plasma.
(60) In this case, in a case where the etching of the poly-Si film has been completed and the lower layer SiO.sub.2 appears, since an etching rate of the lower layer SiO.sub.2 is small, the plasma light emission intensity with a wavelength of 288 nm resulting from the Si dramatically decreases and then comes close to a constant value. A change of the plasma light emission is monitored so that an end point of the etching processing is detected.
(61) The light emission of the plasma from the side of the processing chamber 11 includes information on the etching gas and the etching reaction product. Meanwhile, the light emission of the plasma from the upper part of the processing chamber 11 includes, in addition to the above information, information on a film structure and a step structure of the wafer 27 since the plasma light causes interference due to the film structure and the step structure of the wafer 27. Analyzing the light emission data of the plasma can monitor the thickness of the film and the depth of etching during the etching. According to the present embodiment, for simplicity, light emission data of the plasma from the side of the processing chamber 11 is used for monitoring an etching end point.
(62)
Y1.sub.t=H/[1+exp{−A(t−T)}]+Ct+D+F(R−0.5) Expression (45)
(63) where H, A, T, C, D, and F are arbitrary constants, and R is a random number between 0 and 1. Since analytical true values of the data smoothing processing, the first order differential processing, the second order differential processing, have been known, using the above evaluation function can compare and evaluate data processing performance, such as an absolute error, delay time in accordance with data processing, an S/N ratio (signal/noise ratio), with respect to the true values in various data processing methods.
(64) In the typical data processing flow for detecting the etching end point illustrated in
(65) The change point can be further clearly and simply determined from sequentially the peak of the first differential processing and the zero cross of the second differential processing. However, an absolute value of signal intensity sequentially decreases. Accordingly, data processing with a high S/N ratio is important. In particular, in a case where a mask pattern having a small area to be etched, with a low aperture ratio, is etched, a change of the plasma light emission intensity is small before and after the etching end point. Thus, data processing with a further high S/N ratio is required.
(66) Typically, in data smoothing processing and data differential processing, as an S/N ratio increases, delay time lengthens. Thus, an absolute value increases with respect to a true value. That is, there is a trade-off relationship between the S/N ratio and the delay time, the absolute value. Data smoothing processing and data differential processing for simultaneously satisfying the S/N ratio, the delay time, and the absolute value, are required.
(67) According to the present embodiment, data processing in
(68) Pieces of output data from the spectroscope 30 and the spectroscope 32 are transmitted to the data processing apparatus 1. A data smoothing processing result, a data first order differential result, and a data second order differential result are transmitted to the system control apparatus 33 that is a control apparatus. Based on the data smoothing processing result, the data first order differential result, and the second order differential result, the system control apparatus 33 determines the etching end point so as to control the microwave plasma etching apparatus with magnetic field as a system. Plasma production is mainly controlled in the determination of the etching end point. In
(69)
(70) In
(71)
(72) Regarding the second order differential processing of Ds, a result in
(73) Delay time at the second order differential wave form zero cross time in
(74) Therefore, it can be found that the first order differential peak point (time) and the second order differential zero cross point (time) to be reference for determining an etching end point can be clearly detected. Accordingly, according to the embodiment of the present invention, there is an effect that the data smoothing processing and the data differential processing can be sequentially performed in real time with the small absolute value error, the high S/N ratio, and the short delay time.
(75) Regarding the first order differential processing of Bs, a result in
(76) In a case where only the data smoothing processing result and the data first order differential result are required, according to the present embodiment, the first double exponential smoothing processing acquires a first order differential smooth wave form due to the pre-processing effect by the backward Savitzky-Golay method. Thus, the pieces of first output data (S1.sub.t) and (B1.sub.t) may be used without performing the second double exponential smoothing processing. In this case, there is an effect that a data processing program becomes simple and data processing speed is improved.
(77) Type V is defined as a case where the backward Savitzky-Golay method is applied to the type I adaptive double exponential smoothing method illustrated in Expressions (5) to (14). The type V is illustrated by the following Expressions (46) to (57).
Smoothing of data: S1.sub.t=α1.sub.tSGB0D.sub.t+(1−α1.sub.t)(S1.sub.t−1+B1.sub.t−1) Expression (46)
Slope of smoothed data: B1.sub.t=γ1.sub.tSGB1D.sub.t+(1−γ1.sub.t)B1.sub.t−1 Expression (47)
SGB0D.sub.t=(83Y1.sub.t+54Y1.sub.t−1+30Y1.sub.t−2+11Y1.sub.t−3−3Y1.sub.t−4-12Y1.sub.t−5−16Y1.sub.t−6−15Y1.sub.t−7−9Y1.sub.t−8+2Y1.sub.t−9+18Y1.sub.t−10)/143 Expression (48)
SGB1D.sub.t=(945S1.sub.t+456S1.sub.t−1+67S1.sub.t−2−222S1.sub.t−3−411S1.sub.t−4−500S1.sub.t−5−489S1.sub.t−6−378S1.sub.t−7−167S1.sub.t−8+144S1.sub.t−9+555S1.sub.t−10)/4290 Expression (49)
Smoothing coefficient: α1.sub.t=(K.sub.α−L.sub.α)F.sub.α+L.sub.α Expression (50)
Response coefficient: F.sub.αt=(|δα.sub.t/Δα.sub.t|+φ).sup.N Expression (51)
Relative error: δαt=A1(Y1.sub.t−S1.sub.t)+(1−A1)δα.sub.t−1 Expression (52)
Absolute error: Δα.sub.t=A1|Y1.sub.t−S1.sub.t|+(1−A1)Δα.sub.t−1+φ Expression (53)
Smoothing coefficient: γ1.sub.t=(K.sub.γ−L.sub.γ)F.sub.γ+L.sub.γ Expression (54)
Response coefficient: F.sub.γ=(|δγ.sub.t/Δγ.sub.t|+φ).sup.N Expression (55)
Relative error: δγ.sub.t=A2{(S1.sub.t−S1.sub.t−1)−B1.sub.t}+(1−A2)δγ.sub.t−1 Expression (56)
Absolute error: Δγ.sub.t=A2|(S1.sub.t−S1.sub.t−1)−B1.sub.t|+(1−A2)Δγ.sub.t−1+φ Expression (57)
(78) Responsiveness of the type V is worse than that of the type IV. However, there is an effect that a data processing program becomes simple. A form may be selected with a level and complexity necessary in accordance with performance of data processing to be required.
(79) Furthermore, type VI is defined as a case where the backward Savitzky-Golay method is applied to the type II adaptive double exponential smoothing method illustrated in Expressions (19) to (24). The type VI is illustrated by the following Expression (58) to (65).
Smoothing of data: S1.sub.t=α1.sub.tSGB0D.sub.t+(1−α1.sub.t)(S1.sub.t−1+B1.sub.t−1) Expression (58)
Slope of smoothed data: B1.sub.t=α1.sub.tSGB1D.sub.t+(1−α1.sub.t)B1.sub.t−1 Expression (59)
SGB0D.sub.t=(83Y1.sub.t+54Y1.sub.t−1+30Y1.sub.t−2+11Y1.sub.t−3−3Y1.sub.t−4−12Y1.sub.t−5−16Y1.sub.t−6−15Y1.sub.t−7−9Y1.sub.t−8+2Y1.sub.t−9+18Y1.sub.t−10)/143 Expression (60)
SGB1D.sub.t=(945S1.sub.t+456S1.sub.t−1+67S1.sub.t−2−222S1.sub.t−3−411S1.sub.t−4−500S1.sub.t−5−489S1.sub.t−6−378S1.sub.t−7−167S1.sub.t−8+144S1.sub.t−9+555S1.sub.t−10)/4290 Expression (61)
Smoothing coefficient: α1.sub.t=(K−L)F.sub.t+L Expression (62)
Response coefficient: F.sub.t=1−Exp[−δσ.sub.t.sup.2/(2σ.sub.t.sup.2)] Expression (63)
Relative error: δα.sub.t=A1(Y1.sub.t−S1.sub.t)+(1−A1)δα.sub.t−1 Expression (64)
Predicted error variance: σ.sub.t.sup.2=A1(Y1.sub.t−S1.sub.t).sup.2+(1−A1)σ.sub.t−1.sup.2 Expression (65)
(80) Responsiveness of the type VI is also worse than that of the type IV. However, there is an effect that a data processing program becomes simple. In a case where N=0 is satisfied in the type V, a form in which the backward Savitzky-Golay method has been applied to the double exponential smoothing method with a fixed smoothing coefficient, is acquired. Thus, the simplest form is made. A form may be selected with a level and complexity necessary in accordance with performance of data processing to be required.
(81) The type II is made by introducing a data adaptive smoothing coefficient using a probability density function into the type I adaptive double exponential smoothing method. The type III is made by introducing a data adaptive addition coefficient into the type II. The type IV is made by introducing pre-processing of the backward Savitzky-Golay method into the type III. Like the type V, the data adaptive smoothing coefficient using the probability density function and the data adaptive addition coefficient that are improvement elements in the type II and the type III, respectively, may be individually applied to the type I adaptive double exponential smoothing method. The data adaptive smoothing coefficient using the probability density function, the data adaptive smoothing coefficient, and the backward Savitzky-Golay method may be appropriately combined and may be applied to the type I adaptive double exponential smoothing method. Similarly as described above, a form may be selected with a level and complexity necessary in accordance with performance of data processing to be required.
Second Embodiment
(82) As described above, in the data smoothing processing and the data differential processing, there is a trade-off relationship between the S/N ratio performance improvement and the data responding performance improvement (shortening of delay time). Accordingly, in the above type I adaptive double exponential smoothing method, or the data smoothing processing methods and the data differential processing methods of, for example, the type II, the type III, the type IV, the type V, and the type VI described in the first embodiment, a parameter for each of the data processing methods is required to be optimized in accordance with input data to which data processing is performed.
(83) In the related art, a parameter of data processing is sequentially changed so that the data processing is performed. Each of data smoothing wave form, data differential wave form, and numerical data, such as an S/N ratio and delay time, are collectively overlooked so as to find an optimum parameter. However, the above method takes a long time to find the optimum parameter. For example, there is a problem that, for example, knowledge and experience of data processing are required in order to shorten the time taken for finding the optimum parameter.
(84) James W Taylor, Journal of Forecasting, 2004, (23), pp 385-394 discloses the method for estimating an optimum smoothing parameter by minimizing a total sum of errors of a one-period predicted value by the simple exponential smoothing method. However, this method does not take smoothness of a curve of the predicted values into consideration. Thus, there is a problem that noise is large and an S/N ratio is not excellent in first order differential processing and second order differential processing that detect a change point.
(85) Therefore, a method for simply and automatically, in a short time, finding an optimum parameter necessary for performing data smoothing processing and data differential processing to input data, will be described.
(86) First, an evaluation function W in the following Expression (66) is used.
Evaluation function W=mean square error E+coefficient λ×second order differential mean square D Expression (66)
(87) Here, E that is the mean square error is a mean square error of the input data and a data smoothing processing result, and evaluates fitness of the data smoothing processing result with respect to the input data. D that is the second order differential mean square, evaluates curve smoothness of data smoothing processing wave form. The coefficient X is an arbitrary numerical value, and adjusts a ratio of importance between the above fitness evaluation with respect to the input data and the above curve smoothness evaluation.
(88) As the fitness of the data smoothing processing result with respect to the input data increases, for example, no overshoot occurs. In addition, responsiveness of the data processing is excellent and delay time of the data processing is small. As a curve of the data smoothing processing wave form is smooth, each of a data first order differential processing wave form and a data second order differential wave form becomes a smooth curve. As a result, an S/N ratio of the data first differential processing and an S/N ratio of the data second order differential process improve. The curve smoothness is evaluated by the second order differential mean square D of the data smoothing processing wave form. However, here, a second order differential is calculated by a difference method.
(89) As the second order differential mean square D comes close to a straight line, the value of the second order differential mean square D decreases. Therefore, in a case where not only the second order differential mean square D is simply small but also a value of the mean square error E is small and the fitness of the data smoothing processing result with respect to the input data is excellent, compatible with each other, the optimum processing has been performed with an excellent S/N ratio and excellent responsiveness of the data processing (delay time is small). The evaluation function W used in the present embodiment will be illustrated by the following Expression (67).
W=Σ(Y1.sub.t−S1.sub.t).sup.2/N+λ×Σ{(S1.sub.t+1−2×S1.sub.t+S1.sub.t+1)/ΔT.sup.2}.sup.2/N Expression (67)
(90) where N is the number of data, and ΔT is sampling time (time interval) of the input data. The optimum parameter for each of the data smoothing processing and the data differential processing is derived by using a gradient method, such as a steepest descent method, so that the evaluation function W becomes a minimum value. In order to illustrate a characteristic of the evaluation function W in a two-dimensional graph, an example of the simple type I adaptive double exponential smoothing method will be described using
(91)
(92)
(93) As illustrated in
(94) The optimum value of the parameter with the x mark illustrated in
(95) According to the present embodiment, the descriptions in which one parameter has been changed, has been given for convenience. A plurality of parameters can be derived by using a gradient method, such as a steepest descent method. In this case, in consideration of whether there is a local minimum value (optimum value), initial values of the parameters and a searching range are required to be examined. According to the embodiment, for convenience, the descriptions have been given in a case of a simple type I adaptive double exponential smoothing method. The above data processing methods, such as the type II, the type III, the type IV, the type V, and the type VI, or other data smoothing processing, such as the simple exponential smoothing method (exponential weighted moving average: EWMA), an adaptive simple exponential smoothing method, a low pass filter, and a Kalman filter, or data differential processing, such as the difference method, can find an optimum parameter at which an S/N ratio and responsiveness are compatible with each other, by similarly minimizing an evaluation function W.
(96) According to the present embodiment, a second order differential value of the data smoothing processing result has been used for the second order differential mean square D in Expressions (66) and (67). A second order differential mean square of a first order differential processing result or a second order differential processing result, may be used. In this case, curve smoothness of a first order differential wave form and curve smoothness of a second order differential wave form are individually and directly evaluated. In this case, when compared to the data smoothing processing result, the first differential processing result and the second order differential result sequentially decreases in terms of an absolute value of a numerical value. Thus, adjustment for increasing coefficients λ in Expressions (66) and (67) is required.
(97) In the adaptive simple exponential smoothing method by Expressions (1) to (4) described in James W Taylor, Journal of Forecasting, 2004, (23), pp 385-394, the input data Y1.sub.t at current time t and the predicted value S1.sub.t of smoothing of data at current time t are used so as to derive the predicted value S.sub.t+1 of smoothing of data at one-period ahead time t+1. Meanwhile, in the type I to type VI embodiments, the input data Y1.sub.t at current time t and the predicted value S1.sub.t−1 of smoothing of data at one-period previous time t−1 are used so as to derive the predicted value S1.sub.t of smoothing of data at current time t. The first one can be referred to as “one-period prediction” and the second one can be referred to as “current estimation”. Even in a case of the type I to type VI embodiments, the “current estimation” may be changed to the “one-period prediction” by, for example, conversion from S1.sub.t−1 to S1.sub.t or from S1.sub.t to S1.sub.t+1.
(98) Note that typically the “current estimation” is better than the “one-period prediction” in terms of accuracy of a predicted value of smoothing of data. James W Taylor, Journal of Forecasting, 2004, (23), pp 385-394 discloses the method for estimating an optimum smoothing parameter by minimizing a total sum of errors of a one-period predicted value by the “one-period prediction” simple exponential smoothing method.
(99) In a case of the “current estimation”, when a smoothing coefficient is set to be 1, Y1.sub.t=S1.sub.t is satisfied. An error or a mean square error becomes zero. Thus, an optimum smoothing parameter cannot be estimated by using, for example, a steepest descent method. However, even when the “current estimation” is used, as illustrated in the embodiment, using the evaluation function W in Expression (66) taking a mean square error and curve smoothness into account can derive an optimum smoothing parameter.
(100) According to the present embodiment, there is an effect that the optimum parameter for performing the data smoothing processing and the data differential processing to the input data can be automatically found in a short time without depending on knowledge and experience of data processing. Accordingly, there is an effect that a processing apparatus including the data processing apparatus, the data processing method, and the control apparatus for controlling the processing chamber, can be provided, the processing apparatus being easily used by general users, namely, having excellent usability.
(101) As described above, the embodiment of the present invention is effective, for example, for, in particular, an etching process in a short time and for detecting an end point of etching accompanied with a change in a short time. The number of processes for etching multilayer thin films increases in semiconductor etching accompanied with high integration and miniaturization of semiconductor devices. Detecting an etching end point in each of an etching process in a short time and an etching step accompanied with a change in a short time, has been important.
(102) For the short-time process and the short-time change process, according to an embodiment of the present invention, improving collective performance in which an S/N ratio improvement and responsiveness improvement are compatible with each other can detect a clear first order differential wave form and a clear second order differential wave form. Accordingly, an end point of etching can be determined with high precision. Based on this, a process in a plasma processing chamber is controlled so that micromachining can be performed to a semiconductor wafer with stable performance and high precision.
(103) In the above embodiments, the detailed descriptions in which the data processing apparatus and the data processing method according to the embodiments of the present invention have been applied to detection of an etching end point in a microwave plasma etching apparatus and the etching has been performed with high precision, have been given. The data processing apparatus and the data processing method according to the embodiments of the present invention are applied to, for example, etching apparatuses and deposition apparatuses in other plasma generating methods (for example, inductive coupling type or parallel plate type) or processing apparatuses and other apparatuses in other fields, with numerical data acquired from, for example, the apparatuses as input data. As a result, a state of each of the apparatuses can be monitored and a change of the state can be detected with high precision. Accordingly, there is an effect that an apparatus to be an object can be controlled with high precision. There is an similar functional effect in control of other apparatuses.
(104) Applying the data processing apparatus and the data processing method according to the embodiments of the present invention to economic and financial fields, such as supply and demand forecasting, causes an effect that data can be analyzed with high precision.
(105) According to the embodiments of the present invention, in sequential data processing, data smoothing processing and data differential processing can be performed with a high S/N ratio and less data delay. During an initial period of data processing start, data processing can also performed with high reliability.
(106) According to the embodiments of the present invention, a data smoothing value, a first order differential value, and a second order differential value can be sequentially acquired in real time with a high S/N ratio and short delay time, or with high reliability at the beginning of data processing start. According to the embodiments of the present invention, a system to be an object can be controlled with high precision by using the data smoothing value, the first order differential value, and the second order differential value.
(107) The present invention is not limited to each of the above embodiments, and includes various modifications. For example, the above embodiments have been described in detail in order to easily understand the present invention. The present invention is not necessarily limited to including all the configurations having been described above. A part of a configuration in one of the embodiments can be replaced with a configuration in another embodiment. In addition, a configuration in one embodiment can be added to a configuration in another embodiment. With respect to a part of the configuration in each of the embodiments, additions, deletions, and replacements of the other configurations may be made.