METHOD FOR PREDICTING ELECTRICAL CHARACTERISTICS OF SEMICONDUCTOR ELEMENT
20220252658 · 2022-08-11
Inventors
- Seiko INOUE (Atsugi, Kanagawa, JP)
- Yusuke KOUMURA (Atsugi, Kanagawa, JP)
- Takahiro FUKUTOME (Atsugi, Kanagawa, JP)
Cpc classification
H01L21/02
ELECTRICITY
H01L22/14
ELECTRICITY
G01R31/2832
PHYSICS
H01L29/7869
ELECTRICITY
H01L21/00
ELECTRICITY
H01L29/78696
ELECTRICITY
International classification
Abstract
The electrical characteristics of a semiconductor element are predicted from a process list. A feature-value calculation portion and a feature prediction portion are used to predict the electrical characteristics of the semiconductor element. The feature-value calculation portion includes a first learning model and a second learning model, and the feature prediction portion includes a third learning model. The first learning model includes a step of learning the process list for generating the semiconductor element and a step of generating a first feature value. The second learning model includes a step of learning the electrical characteristics of the semiconductor element generated in accordance with the process list and a step of generating a second feature value. The third learning model includes a step of performing multimodal learning with use of the first feature value and the second feature value and a step of outputting a value of a variable used in a formula for the semiconductor element characteristics. The first to third learning models include neural networks different from each other.
Claims
1. A method for predicting electrical characteristics of a semiconductor element comprising a feature-value calculation portion and a feature prediction portion, wherein the feature-value calculation portion comprises a first learning model and a second learning model, wherein the feature prediction portion comprises a third learning model, and wherein the method comprises steps of: learning a process list for generating the semiconductor element, in the first learning model; learning the electrical characteristics of the semiconductor element generated in accordance with the process list, in the second learning model; generating a first feature value in the first learning model; generating a second feature value in the second learning model; performing multimodal learning in the third learning model with use of the first feature value and the second feature value; and outputting a value of a variable used in a formula representing the electrical characteristics of the semiconductor element, from the third learning model.
2. The method for predicting electrical characteristics of a semiconductor element according to claim 1, wherein the feature-value calculation portion comprises a fourth learning model, and wherein the method comprising the steps of: learning a schematic cross-sectional view generated with use of the process list, in the fourth learning model; generating a third feature value in the fourth learning model; performing multimodal learning in the third learning model with use of the first feature value, the second feature value, and the third feature value; and outputting the value of the variable used in the formula representing the electrical characteristics of the semiconductor element, from the third learning model.
3. The method for predicting electrical characteristics of a semiconductor element according to claim 1, wherein the first learning model comprises a first neural network, wherein the second learning model comprises a second neural network, and wherein the method comprises a step of updating a weight coefficient of the second neural network by the first feature value generated by the first neural network.
4. The method for predicting electrical characteristics of a semiconductor element according to claim 1, wherein when the first learning model is supplied with a process list for inference and the second learning model is supplied with a value of a voltage applied to a terminal of the semiconductor element, the method comprises a step of outputting a value of current corresponding to the value of the voltage, from the second learning model.
5. The method for predicting electrical characteristics of a semiconductor element according to claim 1, wherein when the first learning model is supplied with a process list for inference and the second learning model is supplied with a value of a voltage applied to a terminal of the semiconductor element, the method comprises a step of outputting the value of the variable used in the formula representing the electrical characteristics of the semiconductor element, from the third learning model.
6. The method for predicting electrical characteristics of a semiconductor element according to claim 1, wherein the semiconductor element is a transistor.
7. The method for predicting electrical characteristics of a semiconductor element according to claim 6, wherein the transistor comprises a metal oxide in a semiconductor layer.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
MODE FOR CARRYING OUT THE INVENTION
[0029] Embodiment is described in detail with reference to the drawings. Note that the present invention is not limited to the following description, and it will be readily appreciated by those skilled in the art that modes and details of the present invention can be modified in various ways without departing from the spirit and scope of the present invention. Therefore, the present invention should not be interpreted as being limited to the descriptions of the embodiment below.
[0030] Note that in structures of the invention described below, the same portions or portions having similar functions are denoted by the same reference numerals in different drawings, and a description thereof is not repeated. Furthermore, the same hatch pattern is used for the portions having similar functions, and the portions are not especially denoted by reference numerals in some cases.
[0031] In addition, the position, size, range, or the like of each structure shown in drawings does not represent the actual position, size, range, or the like in some cases for easy understanding. Therefore, the disclosed invention is not necessarily limited to the position, size, range, or the like disclosed in the drawings.
[0032] Furthermore, it is noted that ordinal numbers such as “first”, “second”, and “third” used in this specification are used in order to avoid confusion among components, and the terms do not limit the components numerically.
Embodiment
[0033] In one embodiment of the present invention, a method for predicting electrical characteristics of a semiconductor element will be described. For the method for predicting electrical characteristics of a semiconductor element, a feature-value calculation portion and a feature prediction portion are used as an example. The feature-value calculation portion includes a first learning model and a second learning model, and the feature prediction portion includes a third learning model. The first learning model includes a first neural network, the second learning model includes a second neural network, and the third learning model includes a third neutral network. Note that the first to third neural networks are preferably different from each other.
[0034] First, a learning method for predicting electrical characteristics of a semiconductor element will be described.
[0035] As an example, the case where the first learning model learns a process list for generating a semiconductor element is described. The first learning model is supplied with the process list for generating a semiconductor element, thereby updating a weight coefficient of the first neural network. In other word, the first neural network is a neural network performing learning using the process list as teacher data. Hereinafter, a semiconductor element is, for example, represented as a transistor in the description. Note that the semiconductor element is not limited to a transistor. The transistor is just an example, and the semiconductor element may be a diode, a thermistor, a gyroscope sensor, an acceleration sensor, a light-emitting element, a light-receiving element, or the like. Note that a semiconductor element can include a resistor, a capacitor, or the like.
[0036] Note that the above-described process list corresponds to information that is a combination of a plurality of steps needed to form a transistor. Next, one process item described in the process list is described. The process item preferably includes at least a process ID, an equipment ID, and conditions. The process includes at least one or a plurality of kinds of steps such as a film deposition step, a cleaning step, a resist application step, an exposure step, a development step, a shaping step, a baking step, a separation step, and a doping step. The conditions include setting conditions of equipment for each step and the like.
[0037] Each step listed as the process ID is conducted with equipment units with different functions in some cases. For example, in the film deposition step, a metal organic chemical vapor deposition method (MOCVD), a chemical vapor deposition method (CVD), a sputtering method, or the like is used. Thus, as the information supplied to the first learning model, the process ID and the equipment ID are represented as one code, whereby two-dimensional information can be managed as one-dimensional information. With use of the code representing the process ID and the equipment ID, the number of learning items is reduced, so that the computational complexity is reduced. Note that a method for generating a code is described in detail with reference to
[0038] Furthermore, in the first learning model, a first feature value is generated by the first neural network which has done the learning according to the process list.
[0039] In one embodiment of the present invention, the second learning model performs learning of electrical characteristics of the transistor generated by the first model, concurrently with the learning in the first learning model. As a specific description, the second learning model performs learning of the electrical characteristics of the transistor generated in accordance with the process list supplied to the first learning model. The second learning model is supplied with the electrical characteristics of the transistor, thereby updating a weight coefficient of the second neural network. In other word, the second neural network is a neural network performing learning using the electrical characteristics of the transistor as teacher data. As the electrical characteristics of the transistor, for example, I.sub.d-V.sub.gs characteristics for evaluating the temperature characteristics, threshold voltage, or the like of the transistor and I.sub.d-V.sub.ds characteristics for evaluating the saturation characteristics of the transistor can be used.
[0040] The drain current I.sub.d indicates the magnitude of current flowing, in the transistor, through a drain terminal at the time of applying voltages to a gate terminal, the drain terminal, and a source terminal. Note that the I.sub.d-V.sub.gs characteristics correspond to a change in drain current I.sub.d caused by applying different voltages to the gate terminal of the transistor. The I.sub.d-V.sub.ds characteristics correspond to a change in values of drain current I.sub.d caused by applying different voltages to the drain terminal of the transistor.
[0041] Furthermore, in the second learning model, a second feature value is generated by the second neural network which has done the learning of the electrical characteristics of the transistor generated in accordance with the process list.
[0042] Next, the third learning model performs multimodal learning with use of the first feature value and the second feature value. The third learning model is supplied with the first feature value and the second feature value, thereby updating a weight coefficient of the third neural network. In other word, the third neutral network is a neural network performing learning using the process list and the electrical characteristics of the transistor corresponding to the process list as teacher data.
[0043] Note that the multimodal learning is learning using different types of information such as the first feature value generated from the process list for generating a semiconductor element and the second feature value generated from the electrical characteristics of the semiconductor element generated in accordance with the process list. For example, the neural network in which feature values generated from a plurality of different types of information are used as input information can be called a neural network having a multimodal interface. In this embodiment, the third neural network corresponds to a neural network having a multimodal interface.
[0044] For example, the third learning model outputs a value of a variable used in a formula representing the electrical characteristics of the transistor. In other words, the variable value is a value predicted by the method for predicting electrical characteristics of a semiconductor element.
[0045] For example, a formula of gradual channel approximation of the transistor is used as the formula representing the electrical characteristics of the transistor. Formula (1) represents electrical characteristics in a saturated region of the transistor. Formula (2) represents electric characteristics in a linear region of the transistor.
[0046] Variables predicted by the method for predicting electrical characteristics of a transistor include a drain current I.sub.d, a field-effect mobility μ.sub.FE, a capacitance per unit area C.sub.ox of a gate insulating film, a channel length L, a channel width W, a threshold voltage V.sub.th, or the like, used in Formula (1) or (2). It is preferable to give inference data described later to a gate voltage V.sub.g applied to the gate terminal or a drain voltage V.sub.d applied to the drain terminal. The third learning model can output values of all variables described above or may output a value of any one or more of the variables.
[0047] Since supervised learning is used in the method for predicting electrical characteristics of a semiconductor element, the first to third neural networks are rewarded on the basis of the output result of the third learning model. For example, in order to approach the results calculated from Formula (1) or (2), the first to third neural networks update weight coefficients on the basis of the electrical characteristics of the transistor.
[0048] The feature-value calculation portion includes a fourth learning model. The fourth learning model learns a schematic cross-sectional view of a transistor generated in accordance with the process list. Alternatively, the fourth learning model learns a cross-sectional SEM image of a transistor generated in accordance with the process list. The fourth learning model generates a third feature value through the learning of the schematic cross-sectional view or the cross-sectional SEM image of the transistor. When the fourth learning model generates the third feature value, it is preferable that the first learning model and the second learning model respectively generate the first feature value and the second feature value, concurrently with the generation of the third feature value by the fourth learning model.
[0049] Thus, the third learning model performs multimodal learning with use of the first feature value, the second feature value, and the third feature value. Accordingly, the third learning model outputs a value of a variable used in a formula representing electrical characteristics of the transistor.
[0050] Furthermore, the first feature value updates a weight coefficient of the second neural network. The first feature value corresponds to an output of the first learning model that has done the learning of the process list. In other word, the first feature value has a relation with the electrical characteristics of the transistor generated in accordance with the process list.
[0051] Next, an inference method using the method for predicting electrical characteristics of a transistor is described. When the first learning model and the second learning model are respectively supplied with a process list for inference and a value of a voltage applied to a terminal of the semiconductor element, the third learning model outputs a value of a variable used in a formula representing the electrical characteristics of the transistor.
[0052] Furthermore, described is an inference method using the method for predicting electrical characteristics of a transistor in the case where the first feature value updates a weight coefficient of the second neural network. The first learning model is supplied with a process list for inference, and the second learning model is supplied with a value of a voltage applied to a terminal (a gate terminal, a drain terminal, or a source terminal) of the transistor. The second learning model outputs a value of current, as a predicted value, flowing through the drain terminal depending on the voltage value.
[0053] Next, a method for predicting electrical characteristics of a semiconductor element is described with reference to
[0054] The method for predicting electrical characteristics of a transistor described with
[0055] The learning model 210 includes a neural network 211 and a neural network 212. Note that the neural network 211 and the neural network 212 are described in detail with reference to
[0056] The learning model 220 includes a neural network 221 and an activation function 222. The neural network 221 preferably includes an input layer, an intermediate layer, and an output layer. Note that the neural network 221 is described in detail with reference to
[0057] The learning model 230 includes a neural network including a connected layer 231, a fully connected layer 232, and a fully connected layer 233. Note that the connected layer 231 includes a multimodal interface. In
[0058] The fully connected layer 233 outputs predicted values of electrical characteristics (e.g., drain current) to output terminals OUT_1 to OUT_w. The values of variables in Formula (1) or Formula (2) described above correspond to the output terminals OUT_1 to OUT_w. In the case where the semiconductor element is a resistor or a capacitor as another example, it is preferable to use a formula calculating a resistance value or a formula calculating the capacitance to obtain variable values output by the fully connected layer 233. Note that w is an integer greater than or equal to 1.
[0059]
[0060]
[0061]
[0062] The equipment ID used in the process can be assigned, for example, as follows: CVD1 assigned to the film deposition step, WAS1: the cleaning step, REG1: the resist application step, PAT1: the exposure step, DEV1: the development step, ETC1: the shaping step 1, CMP1: the shaping step 2, OVN1: the baking step, PER1: the separation step, and DOP1: the doping step. The process ID is preferably managed in association with the equipment ID constantly. The process ID and the equipment ID can be combined to be represented by one code. For example, when the process ID and the equipment ID are respectively the film deposition step and CVD1, the code is 0011. Note that a code to be assigned is managed as a unique number. In addition, the conditions set to each equipment unit include a plurality of setting items. In
[0063]
[0064]
[0065]
[0066] Steps for processing a film formed by a film deposition step are described, as an example, using the part of the process list shown in
[0067] Next, the deposited film is coated with a photoresist in a resist application step. Next, a mask pattern of the film is transferred to the photoresist in an exposure step. Next, the photoresist other than a portion to which the mask pattern is transferred is removed with a developer in a development step, so that a mask pattern of the photoresist is formed. A step of baking the photoresist may be included in the development step. Next, the film is shaped using the mask pattern formed in the photoresist in a shaping step 1. Next, the photoresist is separated in a separation step.
[0068] In
[0069] Steps different from those in
[0070]
[0071] Process items are given to the neural network 211 in the order of steps according to the process list. As shown in
[0072] For example, the neural network 211 preferably vectorizes the process items using Word2Vec (W2V). To vectorize text data, Word2VecGloVe (Global Vectors for Word Representation), Bag-of-words, or the like can be used. Vectorizing text data can be rephrased as conversion into distributed representation. Furthermore, distributed representation can be rephrased as embedded representation (feature vector or embedded vector).
[0073] In one embodiment of the present invention, the conditions of the process item are handled as not sentences but aggregates of words. Thus, it is preferable to handle the process list as aggregates of words. For example, the neural network 211 includes an input layer 211a, a hidden layer 211b, and a hidden layer 211c. The neural network 211 outputs a feature vector generated from the process list. At this point, a plurality of feature vectors can be output, or the vectors may be integrated to one. Hereinafter, the case where the neural network 211 outputs a plurality of feature vectors is described. Note that the hidden layer can include one or more hidden layers.
[0074] Next, the plurality of feature vectors generated by the neural network 211 are supplied to the neural network 212. For the neural network 212, it is preferable to use DAN (Deep Averageing Network). For example, the neural network 212 includes an AGGREGATE layer 212a, a fully connected layer 212b, and a fully connected layer 212c. The AGGREGATE layer 212a can collectively handle the plurality of feature vectors output from the neural network 211.
[0075] The fully connected layer 212b and the fully connected layer 212c preferably include a sigmoid function, a step function, a ramp function (Rectifield Linear Unit), or the like as an activation function. A function with a nonlinear activation function is effective in feature vectorization of complicated leaning data. Thus, the neural network 212 can average the feature vectors of the process items constituting the process list and integrate them to one. The integrated feature vector is supplied to the learning model 230. The fully connected layer is, in some cases, one or more layers.
[0076]
[0077]
[0078]
[0079]
[0080] In the neural network 221, as an example, the input layer includes neurons X1 to X4, a hidden layer includes neurons Y1 to Y10, and an output layer includes a neuron Z1. The neuron Z1 performs feature vectorization of the electrical characteristics, and the activation function 222 outputs a predicted value. It is preferable that the number of neuron included in the hidden layer be equal to the number of plots supplied as learning data. It is further preferable that the number of neurons included in the hidden layer be larger than the number of plots supplied as learning data. In the case where the number of neurons included in the hidden layer is larger than the number of plots supplied as learning data, the learning model 220 learns the electrical characteristics of the transistor in detail. Note that the neuron Z1 has a function of the activation function 222.
[0081] As an example, a method with which the neural network 221 learns the electrical characteristics of the transistor is described. First, the neuron X1 is supplied with the voltage V.sub.d applied to the drain terminal of the transistor, the neuron X2 is supplied with the voltage V.sub.g applied to the gate terminal of the transistor, the neuron X3 is supplied with the voltage V.sub.s applied to the source terminal of the transistor, and the neuron X4 is supplied with the drain current I.sub.d flowing through the drain terminal of the transistor. At this time, the drain current I.sub.d is supplied as teacher data. A weight coefficient of the hidden layer is updated so that the output of the neuron Z1 or the output of the activation function 222 is close the drain current I.sub.d. In the case where the drain current I.sub.d is not supplied as learning data, learning is performed so that the output of the neuron Z1 or the output of the activation function 222 is close to the drain current I.sub.d.
[0082] Although
[0083] The learning model 220 preferably performs learning concurrently with the learning in the learning model 210. A process list supplied to the learning model 210 is highly related to the electrical characteristics supplied to the learning model 220. Thus, concurrent learning in the learning model 220 and the learning model 210 is effective in learning for predicting the electrical characteristics of the transistor.
[0084] Next, the feature prediction portion 120 is described. For the description of the feature prediction portion 120,
[0085] The fully connected layer 233 outputs predicted values of the electrical characteristics to an output terminal OUT_1 to an output terminal OUT_w. In one embodiment of the present invention, the predicted values of the electrical characteristics as the outputs correspond to the field-effect mobility μ.sub.FE, the capacitance per unit area C.sub.ox of a gate insulating film, the channel length L, the channel width W, the threshold voltage V.sub.th, or the like in Formula (1) or (2) described above. In addition, it is preferable to output a drain voltage V.sub.d, a gate voltage V.sub.gs or the like. Each value of variables calculated from the electrical characteristics of the transistor may be supplied to the connected layer 231 as teacher data. A weight coefficient of the learning model 230 is updated when the teacher data is supplied.
[0086]
[0087] A connected layer 231A included in the feature prediction portion 120 couples a feature vector generated from the process list, a feature vector generated from the electrical characteristics of the transistor generated in accordance with the process list, and a feature vector generated from the schematic cross-sectional view or a cross-sectional observation image of an actually produced device, and generates data output to the fully connected layer 232.
[0088]
[0089] The feature-value calculation portion 110A provided with the learning model 240 facilitates the prediction of the electrical characteristics of a semiconductor element with use of different three feature vectors.
[0090] For example, a semiconductor layer, a gate oxide film, and a gate electrode are shown in
[0091]
[0092]
[0093] With reference to
[0094] With use of a feature vector generated by the inference data 1 and a feature vector generated by the inference data 2, the feature prediction portion 120 predicts each value of the variables in Formula (1) or (2) described above. The activation function 222 can output an inference result 1 based on the inference data 2. As the inference result 1, the drain current I.sub.d can be predicted from the drain voltage, the gate voltage, and the source voltage respectively supplied to the drain terminal, the gate terminal, and the source terminal of the transistor.
[0095]
[0096] With reference to
[0097] With use of a feature vector generated by the inference data 1, a feature vector generated by the inference data 2, and a feature vector generated by the inference data 3, the feature prediction portion 120 predicts each value of the variables in Formula (1) or (2) described above. The activation function 222 can output an inference result 1 based on the inference data 2. As the inference result 1, the drain current I.sub.d can be predicted from the drain voltage, the gate voltage, and the source voltage respectively supplied to the drain terminal, the gate terminal, and the source terminal of the transistor.
[0098] The fully connected layer 233 in
[0099]
[0100] The network includes a local area network (LAN) or the Internet. For the above network, wired or wireless communication or wired and wireless communication can be used. In the case where wireless communication is used in the above network, a variety of communication methods can be used, for example, a near field communication method such as Wi-Fi (registered trademark) or Bluetooth (registered trademark), a communication method satisfying the third generation mobile communication system (3G) such as LTE (also referred to as 3.9G in some cases), a communication method satisfying the fourth generation mobile communication system (4G), and a communication method satisfying the fifth generation mobile communication system (5G).
[0101] In the method for predicting electrical characteristics of a semiconductor element of one embodiment of the present invention, the computer 10 is used for predicting the electrical characteristics of the semiconductor element. A program of the computer 10 is stored in the memory 12 or the storage 15. The program generates a learning model using the arithmetic unit 11. The program can be displayed on the display device through the input/output interface 13. A user can supply learning data such as a process list, electrical characteristics, a schematic cross-sectional view, a cross-sectional observation image, or the like from a keyboard with respect to the program displayed on the display device 16a. The electrical characteristics of the semiconductor element, which are predicted by the method for predicting electrical characteristics of a semiconductor element, are converted into numbers, formulae, or graphs and displayed by the display device 16a.
[0102] Note that the program can also be utilized in the remote computer 22 or the remote computer 23 through the network. Alternatively, the program can be activated by the computer 10 with a program stored in a memory or a storage of the database 21, the remote computer 22, or the remote computer 23. The remote computer 22 may be a portable information terminal or a portable terminal such as a tablet computer or a laptop computer. In the case of a portable information terminal, a portable terminal, or the like, communication can be performed using wireless communication.
[0103] Hence, according to one embodiment of the present invention, a method for predicting electrical characteristics of a semiconductor element, with use of a computer, can be provided. In the method for predicting electrical characteristics of a semiconductor element, multimodal learning can be performed by supply of learning data such as a process list, the electrical characteristics of the semiconductor element generated in accordance with the process list, or a schematic cross-sectional view or cross-sectional observation image of the semiconductor element generated in accordance with the process list. Furthermore, in the method for predicting electrical characteristics of a semiconductor element, the electrical characteristics of the semiconductor element or values of variables in a formula representing the electrical characteristics can be predicted by supply of inference data such as a new process list, conditions of voltages applied to the semiconductor element, a schematic cross-sectional view, or a cross-sectional observation image. For example, in the case of adding a new step to the process list, the electrical characteristics of a transistor can be easily predicted. Therefore, with the method for predicting electrical characteristics of a semiconductor element of one embodiment of the present invention, the number of demonstrations in the development of the semiconductor element can be reduced, and the past demonstration information can be effectively used.
[0104] Parts of this embodiment can be combined as appropriate for implementation.
REFERENCE NUMERALS
[0105] OUT_w: output terminal, OUT_1: output terminal, 10: computer, 11: arithmetic unit, 12: memory, 13: input/output interface, 14: communication device, 15: storage, 16a: display device, 16b: keyboard, 17: network interface, 21: database, 22: remote computer, 23: remote computer, 110: feature-value calculation portion, 110A: feature-value calculation portion, 110B: feature-value calculation portion, 110C: feature-value calculation portion, 120: feature prediction portion, 210: learning model, 211: neural network, 211a: input layer, 211b: hidden layer, 211c: hidden layer, 212: neural network, 212a: AGGREGATE layer, 212b: fully connected layer, 212c: fully connected layer, 220: learning model, 221: neural network, 230: learning model, 231: connected layer, 231A: connected layer, 232: fully connected layer, 233: fully connected layer, 240: learning model, 241: neural network, 241a: convolutional layer, 241e: convolutional layer, 242: fully connected layer, 242a: fully connected layer, 242c: fully connected layer