MACHINE LEARNING VARIABLE SELECTION AND ROOT CAUSE DISCOVERY BY CUMULATIVE PREDICTION
20210117861 ยท 2021-04-22
Assignee
Inventors
- Richard Burch (McKinney, TX, US)
- Qing Zhu (Rowlett, TX, US)
- Jonathan Holt (Sachse, TX, US)
- Tomonori Honda (Santa Clara, CA, US)
Cpc classification
G06F18/2113
PHYSICS
G06F17/18
PHYSICS
International classification
G06F17/18
PHYSICS
Abstract
A sequence of models accumulates r-squared values for an increasing number of variables in order to quantify the importance of each variable to the prediction of a targeted yield or parametric response.
Claims
1. A method, comprising: configuring a machine learning model to predict a target feature based on a plurality of process parameters provided as inputs to the machine learning model; selecting a first one of the process parameters and determining a first r-squared value for predicting the target feature based on the first selected process parameter; selecting a second one of the process parameters and determining a second r-squared value for predicting the target feature based on the second selected process parameter; selecting additional ones of the process parameters and determining corresponding additional r-squared values for predicting the target feature based on the additional selected process parameters until an accumulation of the first, second and additional r-squared values increases by less than a threshold value; and ranking from high to low the process parameters on the basis of r-squared values; and identifying as key inputs to the machine learning model a set of high ranking process parameters.
Description
DESCRIPTION OF DRAWINGS
[0008]
[0009]
[0010]
[0011]
[0012]
[0013]
[0014]
[0015]
DETAILED DESCRIPTION
[0016] Machine learning (ML) is playing a more active role in modeling semiconductor processes, especially for prediction and root cause analysis. Some of the principals of stepwise regression can be used to optimize the input parameters for any complex ML approach (e.g., neural networks, extreme gradient boosting, etc.), and in particular, an ML model based on a cross-validated r-squared approach to remove variables that do not improve prediction on the training data and only keep those variables that are important to prediction.
[0017] However, that concept can be taken further by first building the best one-variable model possible. Next, build the best two-variable model possible, which by definition must include the variable from the best one-variable model; and then build the best three-variable model possible, which by definition must include the variables from the best two-variable model. This process is repeated until all variables are rank-ordered in terms of most important to least important or until the best n-variable model predicts a desired or acceptable percentage of the cross-validated r-squared values for all variables selected by the stepwise variable selection process.
[0018] Thus, a cumulative sequence of models can be used to quantify the importance of each variable to the overall prediction of a targeted yield (or continuous parametric) response based on an improvement in cross-validated r-squared values.
[0019] For example,
[0020] The first parameter listed in row 111 is PCM-32, its source (column 122) is a continuous parametric measurement of a first physical test structure on the wafer, and its incremental contribution (column 124) to an identified variance or excursion is 15.4% based on the r-squared value from a statistical analysis. Adding the second parameter PCM-1 (row 112), a continuous parametric measurement of a second, different test structure on the same wafer, has an incremental impact of 12.4% and increases the explained cumulative variance to 27.8%, while adding the third parameter PE-1 (row 113), a process tool identified in the wafer equipment history (WEH), with an incremental impact of 9.9%, raises the cumulative variance total to 37.7%. Based on this cross-validated r-squared statistical analysis, these are the only three variables important enough, e.g., approximately 10% of more incremental contribution to the variance in this example, to include in the machine learning model.
[0021] Understanding how important each parameter is to the overall model is valuable, but may not be as valuable as understanding how important each parameter is to a particular observation. For example, several low yielding wafers might be best predicted by contact resistance changes, giving the user information that allows a more rapid diagnosis of the root cause. Consider how this works for a simple example data set.
[0022]
[0023] First, the measures of accuracy are defined. Since this particular model was focused on predicting die and wafer yield, the two measures of accuracy are (i) the difference between predicted wafer yield and actual wafer yield, and (ii) the r-squared value in the die level correlation of predicted yield and actual yield across the wafer. For the die level correlation, the data is smoothed by averaging each die with the immediately surrounding die, as represented by line 230, to get a smoother representation of yield and yield prediction. The accuracy of wafer prediction is considered for each of the three cumulative predictions identified in
[0024] The wafer level prediction error for each cumulative model is shown in bar graph 300 in
[0025] Similarly, the predicted error for wafer 220 (Lot143_W09) is driven primarily by PCM-2 since the predicted error 322 does not change much when other parameters (324, 326, 328) are also considered.
[0026]
[0027]
[0028] As shown in
[0029] For wafer 220 (Lot143_W09), the first cumulative predicted model shown in panel 620 clearly matches fairly well with the actual map shown in panel 621. However, the predicted pattern for wafer 210 (Lot134_W06) in panel 610 does not match well with the actual pattern shown in panel 611
[0030]
[0031] This approach can be used in this example for die level yield prediction, as described herein, or for any complex multi-variate machine learning problem, provided that there are separable identifiable root causes.
[0032] A flowchart is presented in
[0033] Once the cumulative total is no longer increasing significantly in step 808, i.e., the increase is less than the threshold, then the process parameters will be ranked in step 812 by r-squared values. Finally, in step 814, the highest-ranking process parameters, i.e., those parameters that have a demonstrated impact on the target features by virtual of the cross-validated r-square values approach, will be identifies as key variables to the ML model.