NONDESTRUCTIVE ESTIMATION OF STRUCTURAL PROPERTIES OF A SPECIMEN VIA X-RAY MODELLING BASED ON GROUND TRUTH MEASUREMENTS

20250231132 ยท 2025-07-17

Assignee

Inventors

Cpc classification

International classification

Abstract

Disclosed herein is a system for non-destructive characterization of specimens. The system includes an electron beam (e-beam) source for projecting e-beams at one or more e-beam landing energies on a specimen; an X-ray detector for sensing X-rays emitted from the specimen, thereby obtaining measurement data; and a processing circuitry. The processing circuitry is configured to: (i) extract from the measurement data key features specified by a vector {right arrow over ()}.sub.key; and (ii) estimate values {right arrow over (p)} of one or more structural parameters characterizing the specimen, based on {right arrow over ()}.sub.key and a set of vectors of key features {{right arrow over ()}.sub.n}.sub.n=1.sup.N of ground truth (GT) reference specimens. Each of the {right arrow over ()}.sub.n is a product of measurements of emission of X-rays from a reference specimen due to impinging thereof with e-beams at each of the one or more landing energies.

Claims

1. A system for non-destructive characterization of specimens, the system comprising: an electron beam (e-beam) source for projecting e-beams at one or more e-beam landing energies on a specimen being tested; an X-ray detector for sensing X-rays emitted from the tested specimen; and processing circuitry configured to: receive from the X-ray detector X-ray measurement data pertaining to one or more e-beam landing energies; extract from the X-ray measurement data a vector {right arrow over ()}.sub.key specifying values of key features of the X-ray measurement data; and estimate values {right arrow over (p)}.sub.s of one or more structural parameters of the tested specimen, based on {right arrow over ()}.sub.key and a set of vectors {{right arrow over ()}.sub.n}.sub.n=1.sup.N comprising vectors of key features corresponding to ground truth (GT) reference specimens, and for each 1nN, {right arrow over (p)}.sub.n specifies values of one or more actually-measured structural parameters of an n-th specimen, and {right arrow over ()}.sub.n is obtained through actual measurements of emission of X-rays from the n-th reference specimen due to impinging thereof with e-beams at each of the one or more landing energies.

2. The system of claim 1, wherein {right arrow over (p)}.sub.s minimizes a loss function, which is a function of at least {right arrow over ()}.sub.key and a vector-valued function {right arrow over ()}.sub.ext({right arrow over (p)}) of the key features, which is extrapolated from {{right arrow over ()}.sub.n}.sub.n=1.sup.N.

3. The system of claim 2, wherein the processing circuitry is further configured to estimate {right arrow over (p)}.sub.s by computing a minimum distance between {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)}).

4. The system of claim 1, wherein the processing circuitry is further configured to estimate {right arrow over (p)}.sub.s by computing distances between {right arrow over ()}.sub.key and the {right arrow over ()}.sub.n.

5. The system of claim 1, wherein the reference specimens include specimens of a same, or a similar, intended design as the tested specimen; and/or the reference specimens include especially prepared samples exhibiting selected variations with respect to the intended design; and/or the reference specimens are selected so as to encompass expected variations of the one or more structural parameters.

6. The system of claim 1, wherein the one or more structural parameters comprise one or more of an overall concentration of at least one material that the tested specimen comprises, and, optionally, when the tested specimen comprises a structure embedded therein or thereon, a width of the embedded structure.

7. The system of claim 1, wherein the tested specimen comprises a plurality of layers, and the one or more structural parameters comprise one or more of (i) at least one thickness of at least one of the layers; (ii) a combined thickness of at least two or more of the layers; (iii) at least one mass density of at least one of the layers; and (vi) at least one relative concentration of at least one material in one or more of the layers.

8. The system of claim 7, wherein the one or more e-beam landing energies induce emission of X-rays originating from at least two of the plurality of layers.

9. The system of claim 1, wherein the one or more e-beam landing energies induce emission of X-rays about one or more characteristic X-ray lines pertaining to one or more target substances which the tested specimen comprises; wherein the X-ray detector is configured to sense at least one measured spectrum of the emitted X-rays in at least one photon energy range, which comprises at least one of the characteristic X-ray lines; and wherein the X-ray measurement data comprises the measured spectra.

10. The system of claim 9, wherein the key features are, comprise, or are functions of intensities of the characteristic X-ray lines and/or intensities of background radiation.

11. The system of claim 2, wherein p .fwdarw. s = arg min p .fwdarw. .Math. f .fwdarw. key - f .fwdarw. ext ( p .fwdarw. ) .Math. with the double vertical bars denoting a vector norm.

12. The system of claim 2, wherein {right arrow over ()}.sub.ext({right arrow over (p)})={right arrow over ()}.sub.ext({right arrow over (p)}.sub.0+{right arrow over ()})={right arrow over ()}.sub.0+A{right arrow over ()} with {right arrow over (p)}.sub.0 specifying nominal values of the one or more structural parameters, {right arrow over ()} specifying deviations from the nominal values, {right arrow over ()}.sub.0 being a vector of values of the key features corresponding to {right arrow over (p)}.sub.0, and A being a matrix.

13. The system of claim 12, wherein the matrix A equals arg min B .Math. ( ( f .fwdarw. 1 - f .fwdarw. 0 - B .fwdarw. 1 ) T ( f .fwdarw. 2 - f .fwdarw. 0 - B .fwdarw. 2 ) T .Math. ( f .fwdarw. N - f .fwdarw. 0 - B .fwdarw. N ) T ) .Math. , with the double vertical bars denoting a matrix norm, and for each 1nN, {right arrow over ()}.sub.n={right arrow over (p)}.sub.n{right arrow over (p)}.sub.0.

14. The system of claim 12, wherein {right arrow over ()}.sub.0 and the matrix A are obtained as the solution of arg min g .fwdarw. , B .Math. ( ( f .fwdarw. 1 - g .fwdarw. - B .fwdarw. 1 ) T ( f .fwdarw. 2 - g .fwdarw. - B .fwdarw. 2 ) T .Math. ( f .fwdarw. N - g .fwdarw. - B .fwdarw. N ) T ) .Math. , with the double vertical bars denoting a matrix norm and, for each 1nN, {right arrow over ()}.sub.n={right arrow over (p)}.sub.n{right arrow over (p)}.sub.0.

15. The system of claim 4, wherein the processing circuitry is configured to, as part of estimating {right arrow over (p)}.sub.s, apply a k-nearest neighbor (k-NN) regression algorithm to {right arrow over ()}.sub.key with respect to {{right arrow over ()}.sub.n}.sub.n=1.sup.N in order to determine k vectors of the {right arrow over ()}.sub.n, which are closest to {right arrow over ()}.sub.key.

16. The system of claim 2, wherein the processing circuitry is further configured to obtain {{right arrow over ()}.sub.n}.sub.n=1.sup.N by subjecting {right arrow over ()}.sub.key to an (k=N)-NN classifier with respect to a set of {{right arrow over ()}.sub.n}.sub.n=1.sup.N vectors of key features, wherein N>N and {right arrow over ()}.sub.n comprises {right arrow over ()}.sub.n, and the additional NN vectors are obtained by actual measurements of additional reference specimens.

17. The system of claim 10, wherein, in order to derive intensities of the characteristic X-ray lines, the processing circuitry is configured to fit a free curve onto each interval of the measured spectra, which is about centered about a respective characteristic X-ray line and constituted by a vicinity of the characteristic X-ray line, thereby obtaining a respective optimized curve.

18. The system of claim 17, wherein the free curve is a sum of functions, which comprises a bulge-shaped function and a second function, which is a polynomial; and the processing circuitry is configured to, as part of the fitting of the free curve, fit the bulge-shaped function onto a peak about the characteristic X-ray line of the respective measured spectrum, and fit the second function so as to account for a background intensity component of the respective measured spectrum.

19. The system of claim 1, wherein the tested specimen is a patterned wafer, or a part of patterned wafer, optionally, in one of the fabrication stages thereof.

20. A method for non-destructive characterization of specimens, the method comprising: a measurement operation comprising, for each of one or more landing energies, suboperations of: projecting an e-beam on a tested specimen; and obtaining measurement data by measuring intensity of X-rays emitted from the tested specimen due to penetration of the e-beam thereinto; and a measurement data analysis operation comprising suboperations of: extracting from the measurement data a vector {right arrow over ()}.sub.key specifying values of key features; and estimating values {right arrow over (p)}.sub.s of one or more structural parameters of the tested specimen based on {right arrow over ()}.sub.key and a set of vectors {{right arrow over ()}.sub.n}.sub.n=1.sup.N comprising vectors of key features corresponding to ground truth (GT) specimens, and for each 1nN, {right arrow over (p)}.sub.n specifies values of one or more actually-measured structural parameters of an n-th specimen, and {right arrow over ()}.sub.n is obtained through measurements of emission of X-rays from the n-th specimen due to impinging thereof with e-beams at each of the one or more landing energies.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0063] Some embodiments of the disclosure are described herein with reference to the accompanying figures. The description, together with the figures, makes apparent to a person having ordinary skill in the art how some embodiments may be practiced. The figures are for the purpose of illustrative description and no attempt is made to show structural details of an embodiment in more detail than is necessary for a fundamental understanding of the disclosure. For the sake of clarity, some objects depicted in the figures are not drawn to scale. Moreover, two different objects in the same figure may be drawn to different scales. In particular, the scale of some objects may be greatly exaggerated as compared to other objects in the same figure.

[0064] In the figures:

[0065] FIG. 1 presents a flowchart of a method for non-destructive three-dimensional probing and characterization of tested specimens based on X-ray measurements of the tested specimens, and analyses based on GT data, according to some embodiments;

[0066] FIGS. 2A to 2D schematically depict a specimen being depth-probed as part of characterization thereof in accordance with the method of FIG. 1, according to some embodiments;

[0067] FIG. 3 presents a flowchart of a measurement data analysis operation of the method of FIG. 1, according to some specific embodiments thereof;

[0068] FIG. 4A presents an X-ray emission spectrum of a specimen, which was obtained by implementing a measurement operation of the method of FIG. 1, according to some embodiments thereof;

[0069] FIG. 4B presents an optimized curve which was fitted onto the X-ray emission spectrum of FIG. 4A in accordance with specific embodiments of a measurement data analysis operation of the method of FIG. 1;

[0070] FIG. 4C presents the optimized curve of FIG. 4B superimposed on the X-ray emission spectrum of FIG. 4A;

[0071] FIG. 4D presents a fitted gaussian included in the optimized curve of FIG. 4B, according to some embodiments;

[0072] FIG. 4E presents a fitted polynomial included in the optimized curve of FIG. 4B, the fitted polynomial accounting for bremsstrahlung, according to some embodiments; and

[0073] FIG. 5 schematically depicts a system for non-destructive three-dimensional probing and characterization of tested specimens based on X-ray measurements of the tested specimens, and analyses based on GT data, according to some embodiments.

DETAILED DESCRIPTION OF THE INVENTION

[0074] The principles, uses, and implementations of the teachings herein may be better understood with reference to the accompanying description and figures. Upon perusal of the description and figures present herein, one skilled in the art will be able to implement the teachings herein without undue effort or experimentation. In the figures, same reference numerals refer to same parts throughout.

[0075] The present application is directed to methods and systems for non-destructive three-dimensional probing and characterization of tested specimens (e.g., semiconductor specimens). In principle, the methods of the invention are based on analyzing X-ray emission profiles measured for a tested specimen (having unknown structural parameters) and a set of reference data including ground truth (GT) emission profiles of GT specimens (having actually measured, i.e., non-simulated, structural parameters), in order to estimate values of the structural parameters of the tested specimen.

[0076] In more detail, according to some embodiments, e-beams at each of a plurality of (e-beam) landing energies are projected on a tested specimen, which is to be depth-profiled. Each e-beam penetrates into the specimen and excites emission of characteristic X-rays therefrom (and accompanying bremsstrahlung, i.e., background radiation). The greater the e-beam landing energy, the greater the depth to which the e-beam penetrates the specimen.

[0077] The spectrum of the emitted X-rays depends on the internal geometry of the specimen and the material composition thereof, in particular, the distribution of each material (i.e., substance) making up the specimen. As an e-beam travels through a specimen, the e-beam probes different regions it traverses. The contribution of each traversed region to the spectrum of the emitted X-rays depends not only on the concentration of each material included in the traversed region but also on the energy of the e-beam on entry thereto, which, in turn, decreases with the depth.

[0078] Since the spectra of emitted X-rays, as well as key features thereof, typically depend on the structural parameters of the tested specimen, the unknown structural parameters of the tested specimen may be estimated by processing values of key features of the spectra of emitted X-rays and values of key features of emitted X-rays spectra obtained in similar experiments from reference specimens having physically measured structural parameters.

[0079] The term key features, as used herein, relates to selected features of X-ray measurements, e.g., intensities of characteristic X-ray lines, which are characteristic of, or correlate with, values of structural parameters of the respective specimen.

[0080] Accordingly, the present application teaches how to characterize internal structural parameters of a tested specimen without destroying the specimen, based on values of key features extracted from measured spectra of the tested specimen. The measured spectra may be analyzed using optimization tools based on GT measurements of reference specimens and measurement setup employed to obtain the measured spectra.

Methods

[0081] According to an aspect of some embodiments, there is provided a computerized method for non-destructive, three-dimensional, characterization of a tested specimen including estimating values of structural parameters of the tested specimen through measurement of characteristic X-rays returned from the tested specimen, and on processing of measurement data, as explained in more detail below. FIG. 1 presents a flowchart of exemplary method 100, according to some embodiments, including: [0082] A measurement operation 110, which includes, for each of (one or more) e-beam landing energies (i.e., landing energies of the e-beams), performing: [0083] A suboperation 110a, wherein an e-beam is projected on a specimen (also referred to as the tested specimen). [0084] A suboperation 110b, wherein measurement data is obtained by measuring intensity of X-rays emitted from the tested specimen due to penetration of the e-beam. [0085] A data analysis operation 120 including: [0086] A suboperation 120a, wherein values of key features of the tested specimen, specified by a vector {right arrow over ()}.sub.key, are extracted from the measurement data. [0087] A suboperation 120b, wherein values {right arrow over (p)}.sub.s of one or more structural parameters of the tested specimen are estimated based on {right arrow over ()}.sub.key and a set of reference vectors {{right arrow over ()}.sub.n}.sub.n=1.sup.N, including vectors of key features obtained by implementing measurement operation 110 with respect to GT reference specimens

[0088] In some embodiments, the set of reference vectors {{right arrow over ()}.sub.n}.sub.n=1.sup.N consists of vectors of key features obtained with respect to GT reference specimens. In some embodiments, the set of reference vectors {{right arrow over ()}.sub.n}.sub.n=1.sup.N does not include vectors of key features obtained with respect to computer-simulated reference specimens.

[0089] The methods of the invention may be implemented, for example, by using a system, such as the system described below in the description of FIG. 5, or systems similar thereto.

[0090] The terms bremsstrahlung and background radiation are used herein interchangeably.

[0091] According to some embodiments, the tested specimen is selected from a patterned wafer, a part of a patterned wafer, a semiconductor device embedded in or on a patterned wafer, such as a gated stack, optionally, in one of the fabrication stages of the patterned wafer. According to some embodiments, the tested specimen is or includes a structure including one or more semiconductor materials. According to some embodiments, the structure may be constructed as part of a manufacturing processes of semiconductor devices and/or components of semiconductor devices. According to some embodiments, the structure may be an assist structure, which is constructed as part of a manufacturing processes of semiconductor devices and/or components of semiconductor devices. According to some embodiments, the tested specimen may be or includes one or more logic components (e.g., a fin FET (FinFET), a gate-all-around (GAA) FET), memory components (e.g., a dynamic RAM, and/or a vertical NAND (V-NAND)), optionally in one of the fabrication stages thereof.

[0092] According to some embodiments, the tested specimen includes a single layer. According to some embodiments, the tested specimen includes two or more layers.

[0093] In some embodiments, the tested specimen and the reference specimens all have the same or similar structure. For example, the tested specimen and the reference specimen may all include the same number of layers, or are all of the same type of structure such as a patterned wafer or a semiconductor device. In some embodiments, the tested specimen and the reference specimens are all made from the same materials. By same or similar it is meant that the tested specimen and the reference specimens are defined by the same type of structure (such as a patterned wafer, a part of a patterned wafer, a semiconductor device embedded in or on a patterned wafer, such as a gated stack, etc., as defined above), but may have different measurements (values of structural parameters) such as height, length, thickness of layers, concentration of materials in each layer, etc.

[0094] The term structural parameters, as used herein, relates to physical properties of a specimen. According to some embodiments, the one or more structural parameters may include one or more of an overall concentration of at least one material that the tested specimen includes, and, optionally, when the tested specimen includes a structure embedded therein or thereon, a width of the embedded structure (e.g., the width of a gate, a fin, or a depletion layer).

[0095] More generally, the one or more structural parameters may include any geometrical parameter and/or compositional parameter of the tested specimen whose modification impacts at least some of the components of {right arrow over ()}.sub.key in a measurable manner so as to allow estimating the values of the one or more structural parameters, characterizing the tested specimen, as described above and in more detail below.

[0096] Additionally, or alternatively, according to some embodiments wherein the tested specimen includes one or more layers, the one or more structural parameters may include one or more of (i) at least one thickness of at least one of the layers, respectively, (ii) a combined thickness of at least two or more of the layers, (iii) at least one mass density of at least one of the layers, respectively, and (iv) at least one relative concentration of at least one material, respectively, in one or more of the layers. By way of a non-limiting example, with respect to item (iv), the one or more structural parameters may include the relative concentration of a first material in a subset of the layers, e.g., adjacent layers or layers of a first type (potentially nonadjacent). The relative concentration of the first material in the subset (of the layers) may correspond to the overall number of particles of the first material, included in the subset, divided by the overall number of particles (of all materials, including the first material) included in the subset.

[0097] According to some such embodiments, the one or more structural parameters include structural parameters of at least one of the two or more layers.

[0098] The first step of the method (exemplified in measurement operation 110 of FIG. 1) involves measurements including measuring intensity of X-rays emitted from the tested specimen due to penetration of the e-beam.

[0099] The term e-beam, as used herein, stands for electron beam. The term characteristic X-rays regime refers to a photon energy range (i.e., an energy range of a photon, or, equivalently, frequency range) within the X-ray spectrum, which includes characteristic X-ray lines.

[0100] Parameters characterizing the e-beam, particularly the e-beam landing energies, are selected so as to induce in suboperation 110a emission of characteristic X-rays by particles (specifically, particles of at least one target substance) in a probed region centered about a respective depth, which depends on the e-beam landing energy. More precisely, each probed region may correspond to a respective volume of the tested specimen, wherein electrons from the respective e-beam may cause ejections of electrons in the inner shells of atoms (in the probed region), leaving each of these atoms with an inner shell vacancy. The inner shell vacancy may be filled through the relaxation of an outer shell electron to the inner shell. The relaxation may be accompanied by emission of a photon (having energy equal to the energy lost by the electron in transitioning from the outer shell to the inner shell).

[0101] According to some embodiments, the number of e-beam landing energies, and the minimum and maximum e-beam landing energies, may be selected to ensure that the tested specimen is probed over a range of depths. According to some such embodiments, the number of e-beam landing energies, and the minimum and maximum e-beam landing energies, may be selected to ensure that the tested specimen is probed all along the depth-dimension thereof.

[0102] According to some embodiments, in each implementation of suboperation 110b, at least one measured spectrum is obtained. Each of the measured spectra may correspond to a photon energy range, or a respective photon energy range, which includes at least one characteristic X-ray line pertaining to at least one target substance, respectively. The measurement data include (e.g., are constituted by) the measured spectra.

[0103] As used herein, the term target substance is used to refer to a substance (i.e., material), which is included in a tested specimen and whose spectrum, at least about one characteristic X-ray line of the substance, is measured as part of measurement operation 110 (i.e., when method 100 is applied to the tested specimen).

[0104] According to some embodiments, {right arrow over ()}.sub.key (a vector of values of key features of the tested specimen) is, includes, or is obtained or extracted from (i.e., is a function of) the measured spectra of characteristic X-ray lines. More specifically, according to some embodiments, each component of {right arrow over ()}.sub.key may be derived based on a set of extracted parameters characterizing the shape of a spectral peak about the respective characteristic X-ray line. According to some embodiments, the key features are, include, or are functions of intensities of the characteristic X-ray lines and/or intensities of background radiation. According to some embodiments, the key features include (e.g., are constituted by) a so-called energy signature. According to some embodiments, each component of the energy signature may correspond to an absolute, normalized, or relative intensity of a respective characteristic X-ray line. Each possibility corresponds to separate embodiments. According to some embodiments, each component of the energy signature may correspond to an intensity of a respective characteristic X-ray line normalized by a mean background intensity about the characteristic X-ray line. Various ways, whereby the energy signature may be derived, are described below in the description of FIGS. 4A-4E.

[0105] According to some embodiments, and as expanded on below, for example, in the description of FIGS. 4A-4E, for each of the at least one target substance, and for each e-beam landing energy, in suboperation 110b, an X-ray emission spectrum in a photon energy range, which includes a characteristic X-ray line of the target substance, is measured. According to some embodiments, suboperation 110b may be implemented using an energy-dispersive X-ray (EDX) spectrometer and/or a wavelength-dispersive X-ray (WDX) spectrometer. According to some embodiments, and as elaborated on below in the description of suboperation 120a, in order to derive {right arrow over ()}.sub.key, onto each of the X-ray emission spectra a respective curve is fitted.

[0106] According to some embodiments, the photon energy range over which the X-ray emission spectra are measured may be narrow in the sense of being limited to a vicinity (e.g., about three times, about five times, or even about ten times the width) of a characteristic X-ray line, or an immediate vicinity of the characteristic X-ray line, of a target substance. Pertinent non-limiting examples include embodiments wherein a WDX spectrometer is used to obtain the measured spectra. Alternatively, according to some embodiments, and as described in more detail below, an X-ray detector and an optical filter may be employed to measure the intensity of the emitted X-rays at or about a characteristic X-ray line of a target substance.

[0107] To facilitate the description, reference is additionally made to FIGS. 2A-2D. FIGS. 2A-2D schematically depict an implementation of measurement operation 110 of method 100, according to some embodiments thereof. FIG. 2A shows a cross-sectional view of a specimen 20 being probed by an e-beam in accordance with measurement operation 110. To render the description more concrete, it is assumed that specimen 20 includes a plurality of lateral (i.e., horizontal) layers 22 with at least some of layers 22 differing from one another in material composition (e.g., in the concentrations of one or more of the target substances). According to some embodiments, at least some of layers 22 may differ from one another in thickness.

[0108] As a non-limiting example, in FIGS. 2A-2D specimen 20 is shown as including three layers disposed one on top of the other: a first layer 22, a second layer 22, and a third layer 22. First layer 22 is disposed above second layer 22. Second layer 22 is sandwiched between first layer 22 and third layer 22. The top surface of first layer 22 constitutes an external surface 24 of specimen 20. Also shown is an e-beam source 202 and an e-beam 205 produced thereby, so as to impinge (e.g., normally impinge) on external surface 24. E-beam source 202 may be configured to project e-beams (one at a time) at each of a plurality of e-beam landing energies, thereby implementing suboperation 110a.

[0109] The greater the landing energy of e-beam 205, the greater the depth to which electrons from e-beam 205 will (on average) penetrate into specimen 20. Further, the greater the landing energy of e-beam 205, the greater may be the volume within the specimen wherein electrons from e-beam 205 interact with matter in specimen 20 so as to induce emission of characteristic X-rays. This is exemplified in FIG. 2A via three probed regions 26: A first probed region 26a corresponds to the volume in which about all (e.g., at least 80%, at least 90%, or at least 95%) of the characteristic X-ray (i.e., electromagnetic X-ray radiation) emitting interactions will occur due to the penetration into specimen 20 of an e-beam at a first e-beam landing energy E.sub.1. A second probed region 26b corresponds to the volume in which about all of the characteristic X-ray emitting interactions will occur due to the penetration into specimen 20 of an e-beam at a second e-beam landing energy E.sub.2. A third probed region 26c corresponds to the volume in which about all of the characteristic X-ray emitting interactions will occur due to the penetration into specimen 20 of an e-beam at a third e-beam landing energy E.sub.3. First probed region 26a is centered about a first point P.sub.1 at a depth u.sub.1, second probed region 26b is centered about a second point P.sub.2 at a depth u.sub.2, and third probed region 26c is centered about a third point P.sub.3 at a depth u.sub.3. E.sub.1<E.sub.2<E.sub.3. Accordingly, u.sub.1<u.sub.2<u.sub.3. According to some embodiments, and as depicted in FIG. 2A, third probed region 26c is of greater size than second probed region 26b, which is of greater size than first probed region 26a.

[0110] FIG. 2B shows a first e-beam 205a, generated by e-beam source 202 and having the first e-beam landing energy E.sub.1, incident on specimen 20. Also delineated is first probed region 26a (in which about all the characteristic X-ray emitting interactions, induced by first e-beam 205a, occur). X-rays may be emitted in all directions, as exemplified by X-rays 215a. X-rays 215a indicate X-rays (from X-rays 215a), which arrive at an X-ray detector 204 (such as the X-ray detector of FIG. 5).

[0111] FIG. 2C shows a second e-beam 205bgenerated by e-beam source 202 and having the second e-beam landing energy E.sub.2, incident on specimen 20. Also delineated is second probed region 26b (in which about all the characteristic X-ray emitting interactions, induced by second e-beam 205b, occur). X-rays may be emitted in all directions, as indicated by X-rays 215b. X-rays 215b indicate X-rays (from X-rays 215b), which arrive at X-ray detector 204.

[0112] FIG. 2D shows a third e-beam 205c, generated by e-beam source 202 and having the third e-beam landing energy E.sub.3, incident on specimen 20. Also delineated is third probed region 26c (in which about all the characteristic X-ray emitting interactions, induced by third e-beam 205c, occur). X-rays may be emitted in all directions, as indicated by X-rays 215c. X-rays 215c indicate X-rays (from emitted X-rays 215c), which arrive at X-ray detector 204.

[0113] While in FIGS. 2B-2D layers 22 are depicted as differing from one another in their respective refractive indices (as evinced by the refraction of the X-rays on transition from one layer to another), it is to be understood that method 100 is equally applicable without such differences being present.

[0114] For each of the e-beam landing energies (e.g. e-beam landing energies E.sub.1, E.sub.2, and E.sub.3), respective measurement data of emitted X-rays may be obtained by X-ray detector 204, thereby implementing suboperation 110b. In particular, for each of the e-beam landing energies, the respective measurement data may include a respective X-ray emission spectrum (which X-ray detector 204 is configured to measure) in a photon energy range, which includes a least one characteristic X-ray line pertaining to a respective target substance of the at least one target substance. The intensity of a characteristic X-ray line corresponding to a target substance is indicative of the average concentration (i.e. particle density) of the target substance in the respective probed region. More specifically, each substance is characterized by a unique set of characteristic X-ray lines (i.e. spectral lines in the characteristic X-rays regime) corresponding to the energy differences between orbitals of elements making up the substance. The greater the concentration of a substance, the greater the measured intensity of each characteristic X-ray line pertaining thereto.

[0115] In the next step, (exemplified in suboperation 120b of FIG. 1), values {right arrow over (p)}.sub.s of structural parameters of the tested specimen are estimated as explained in detail below, based on processing of the values of the key features described above ({right arrow over ()}.sub.key), and of values of key features derived from reference data ({right arrow over ()}.sub.n) (having respective structural features values {right arrow over (p)}.sub.n) including measured emission profiles of reference specimens, having actually measured structural parameters. The reference data are obtained by actual measurements as prescribed by operation 110.

[0116] The processing included in suboperation 120b may generally be conducted by applying an estimator to the values of key features of the tested specimen {right arrow over ()}.sub.key (extracted in suboperation 120a). The term estimator, as used herein, refers to an algorithm used to obtain the values of the structural parameters of the tested specimen based on {right arrow over ()}.sub.key and the {right arrow over ()}.sub.n. The algorithm may include minimizing a loss function {right arrow over ()}.sub.key and a vector-valued function {right arrow over ()}.sub.ext({right arrow over (p)}) of the key features, which is extrapolated from the {right arrow over ()}.sub.n. According to some embodiments, the loss function may include a term indicative of the distance between {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)}). According to some embodiments, the algorithm may include a k-NN algorithm. According to some embodiments, the algorithm may include minimizing a loss-function following the application of a k-NN algorithm.

[0117] The term extrapolation is employed herein in an expansive manner and refers, generally, to the derivation of a continuous function from a plurality of data points.

[0118] In some embodiments, the vectors including values of key features of reference data, and optionally the estimator or the extrapolation, are obtained in advance of measurements of the tested specimen (offline). However, it is conceivable that the vectors, the estimator, and/or the extrapolation are obtained following measurement of the tested specimen.

[0119] The terms ground truth (GT), or GT, when used herein with respect to reference specimens, relate to actual, or real, specimens, as opposed to simulated specimens. When these terms are used with respect to data, measurements, vectors, etc., they relate to data, measurements, vectors, etc. which are derived from the GT specimens (i.e., actual specimens), by actual, i.e., non-simulated, measurements. In this application, the terms GT and reference are used interchangeably with respect to specimens or data retrieved therefrom.

[0120] Reference data may include values of structural parameters which are obtained by actual (i.e., non-simulated) measurements, potentially (but not necessarily) involving destruction of the respective specimens. Non-limiting examples for such physical measurements include Time-of-Flight Secondary Ion Mass Spectrometry (ToF-SIMS) and transmission electron microscopy energy-dispersive X-ray (TEM-EDX).

[0121] Additionally, reference data may include measurements, which refer to actual X-ray measurements such as of X-ray emissions of the reference specimens or background radiation detected during actual experiments.

[0122] In some embodiments, the reference specimens have values of structural parameters close to a nominal value or a gold standard. For example, may have values of structural parameters similar to, or same as, values of structural parameters of an intended design. In some embodiments, the reference specimens include specimens of a same, or a similar, intended design as the tested specimen.

[0123] In some embodiments, the reference specimens include especially prepared samples exhibiting selected variations with respect to the intended design.

[0124] In some embodiments, characteristic X-rays for each reference specimen are obtained by projecting e-beams on the reference specimen and obtaining measurement data by measuring intensity of X-rays emitted from the reference specimen due to penetration of the e-beam, as described herein for the tested specimen.

[0125] In some embodiments, characteristic X-rays for each reference specimen are measured for the reference specimens under the same, or similar, conditions as conditions used for the tested specimen, such as by using the same or similar equipment under the same or similar settings.

[0126] Accordingly, in some embodiments, the method further includes a measurement operation for each of the reference specimens, which includes, for each of the e-beam landing energies, performing: a suboperation wherein an e-beam is projected on the reference specimen; and a suboperation wherein measurement data is obtained by measuring intensity of X-rays emitted from the reference specimen due to penetration of the e-beam. In some embodiments, the measurement operation for the reference specimens is performed prior to operation 110. In some embodiments, the measurement operation for the reference specimens is performed after operation 110.

[0127] In some embodiments, the measured X-ray intensities of the reference specimens include background radiation intensities similar to those measured for the tested specimen.

[0128] In some embodiments, the e-beam landing energies used for projecting on the reference specimens include the e-beam landing energies used for projecting on the tested specimen. In some embodiments, the e-beam landing energies used for projecting on the reference specimens are the same as the e-beam landing energies used for projecting on the tested specimen.

[0129] In some embodiments, the method of the invention further includes obtaining structural parameters for the reference specimens. As explained above, obtaining structural parameters may be conducted by various methods, destructive or non-destructive, including as non-limiting examples ToF-SIMS and TEM-EDX.

[0130] It is noted that obtaining structural parameters, when conducted by a method which may be destructive to the reference specimen (i.e., causing any change to the specimen), must be conducted after the measurement operation.

[0131] In some embodiments, values of key features are extracted from the measurement data for the reference specimens, as described herein for the tested specimen. In some embodiments, the key features are the same key features used for the tested specimen. In some embodiments, the extracting of values of key features from the measurement data for the reference specimens is conducted by the same methods used for extracting values of key features from the measurement data for the tested specimen.

[0132] Accordingly, in some embodiments, the method further includes a data analysis operation for each of the reference specimens, wherein values of key features, specified by a vector {right arrow over ()}.sub.n, are extracted from the reference measurement data. In some embodiments, the reference data analysis operation is performed prior to operation 120a. In some embodiments, the reference data analysis operation is performed after operation 120a.

[0133] According to some embodiments, the reference specimens may be selected so as to reflect expected variations (e.g., due to manufacturing imperfections) in values of the one or more structural parameters between specimens (of the same intended design). In some embodiments, the values of structural parameters {right arrow over (p)}.sub.n may be selected to sample a selected hypervolume centered about {right arrow over (p)}.sub.0 in a K.sub.p dimensional vector space defined by the one or more structural parameters. {right arrow over (p)}.sub.0 specifies nominal values of the one or more structural parameters, and K.sub.p is the number of the structural parameters. In some embodiments, K.sub.p is about 2-6, or about 3-5 structural parameters. The size and boundaries of the selected hypervolume may be selected so as to encompass the expected variation of the one or more structural parameters. According to some embodiments, the number of reference specimens N.sub.2 is greater than 2K.sub.p (or at least strictly greater than K.sub.p). In some embodiments, the number of reference specimens N.sub.2 is about 3K.sub.p-7K.sub.p or about 5K.sub.p-7K.sub.p. As a non-limiting example, according to some embodiments, the {right arrow over (p)}.sub.n may be selected such that, in terms of deviations from nominal values {right arrow over ()}.sub.n={right arrow over (p)}.sub.n{right arrow over (p)}.sub.0, the {right arrow over ()}.sub.n include each vector in a set of 2K.sub.p vectors {{right arrow over (d)}.sub.k{right arrow over (d)}.sub.k}.sub.k=1.sup.K.sup.p. For each 1kK.sub.p, {right arrow over (d)}.sub.k is about equal to .sub.k{circumflex over (.Math.)}.sub.k. .sub.k is the expected standard deviation of the k-th structural parameter, and {circumflex over (.Math.)}.sub.k is a unit vector pointing along the k-th axis of the K.sub.p dimensional vector space defined by the K.sub.p structural parameters.

[0134] In some embodiments, all of the {right arrow over (p)}.sub.n relating to the reference specimens are within the selected hypervolume. In some embodiments, at least one of the {right arrow over (p)}.sub.n relating to the reference specimens is within the selected hypervolume. In some embodiments, at least one of the {right arrow over (p)}.sub.n of the reference specimens is not within the selected hypervolume.

[0135] It is noted that in embodiments wherein the one or more structural parameters are constituted by a single structural parameter, {right arrow over (p)}.sub.s and each of the {right arrow over (p)}.sub.n are a one-dimensional vectors (i.e., scalars). Non-limiting examples include embodiments wherein only the thickness of a single layer, or the overall mass density of a single target substance, is to be determined.

[0136] It is noted that any specific implementation, such as a model or an algorithm used in the present invention, may be applied to the complete set of vectors {right arrow over ()}.sub.n or to a partial set of the vectors {right arrow over ()}.sub.n.

[0137] According to some embodiments, and as expanded on below in the description of FIG. 3, in suboperation 120b, the minimum distance between {right arrow over ()}.sub.key and a vector-valued function {right arrow over ()}.sub.ext({right arrow over (p)}) (extrapolated from the {right arrow over ()}.sub.n) may be computed, thereby obtaining {right arrow over (p)}.sub.s. Each of the components of the vector-valued function quantifies the dependence of a respective (extrapolated) key feature on (the values of) the one or more structural parameters. The one or more structural parameters are parameterized by {right arrow over (p)}, wherein {right arrow over (p)} is a vector of free parameters (i.e., variables corresponding to the structural parameters, respectively). As detailed below, according to some embodiments, the minimum distance may be obtained by minimizing over {right arrow over (p)} with each of the components of {right arrow over (p)} being varied over a respective continuous range of values.

[0138] According to some embodiments, and as described in detail below in the description of FIG. 3, {right arrow over (p)}.sub.s minimizes a loss function {right arrow over ()}.sub.key and a vector valued function {right arrow over ()}.sub.ext({right arrow over (p)}) of the key features that is extrapolated from {{right arrow over ()}.sub.n}.sub.n=1.sup.N. According to some embodiments, apart from a first term which depends on both {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)}), the loss function may additionally include a regularizing term(s), e.g., in order to stabilize the solution or as a constraint(s) that reflects some prior knowledge about the one or more structural parameters. According to some embodiments, the first term corresponds to a mathematical distance between {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)}). According to some embodiments, standard local and/or global optimization tools may be used to minimize the loss function and thereby estimate {right arrow over (p)}.sub.s. Alternatively, according to some embodiments wherein the optimization problem, defined by the minimization over the loss function, admits a known analytical solution, {right arrow over (p)}.sub.s may be computed directly from the (function defining the) analytical solution.

[0139] According to some embodiments, and as expanded on below, {right arrow over (p)}.sub.s is estimated by computing distances from {right arrow over ()}.sub.key to the {right arrow over ()}.sub.n.

[0140] Alternatively, according to some embodiments, {right arrow over (p)}.sub.s may be obtained by applying a k-nearest neighbor (k-NN) regression algorithm (k<N) to {right arrow over ()}.sub.key with respect to the {right arrow over ()}.sub.n. According to some embodiments, the {right arrow over (p)}.sub.s include {right arrow over (p)}.sub.0 (e.g., a vector specifying nominal values of the one or more structural parameters) It is noted that the k-NN regression algorithm may be weighted or non-weighted. That is, {right arrow over (p)}.sub.s may be taken to equal the average or the weighted average of the {right arrow over (p)}.sub.n corresponding to the k closest {right arrow over ()}.sub.n.

[0141] In some embodiments, when the k-NN regression algorithm is weighted, the weights are determined by multiple iterations of the k-NN algorithm, each time leaving a different reference specimen out of the computation and using the left out reference specimen as the tested specimen, trying to predict the values of the structural parameters of the left out GT specimen. The weights which were used for the closest prediction are selected for the estimator.

[0142] In some embodiments, the k-NN algorithm is applied to the complete set of reference data. In some embodiments, the k-NN algorithm is applied to a partial set of reference data.

[0143] Alternatively, {right arrow over (p)}.sub.s may be taken to equal the median of the {right arrow over (p)}.sub.n corresponding to the k {right arrow over ()}.sub.n closest to {right arrow over ()}.sub.key. Generally, i.e., when the one or more structural parameters include two or more structural parameters, the term median is to be understood as referring to a multi-variate extension of the (one-dimensional notion of the) median, such as the marginal median or the geometric median. More generally, {right arrow over (p)}.sub.s may be substantially any function of the {right arrow over (p)}.sub.n corresponding to the k {right arrow over ()}.sub.n closest to {right arrow over ()}.sub.key. In some embodiments, the median is a weighted median, assigning different weights to different data points.

[0144] Still, according to some other embodiments, {right arrow over (p)}.sub.s is estimated as the output of a neural network. The neural network is configured to receive as input a vector of key features (i.e., {right arrow over ()}.sub.key) and to output {right arrow over (p)}.sub.s. The neural network is trained using a training set including N pairs of vectors, such that, for each 1nN, {right arrow over ()}.sub.n serves as the (training) input and {right arrow over (p)}.sub.n as the corresponding (training) output. In some embodiments, the neural network algorithm is configured to treat different data points differently, e.g., by using nonuniform weights in a loss function.

[0145] FIG. 3 presents a flowchart of a measurement data analysis operation 300, which corresponds to specific embodiments of measurement data analysis operation 120 of method 100. Measurement data analysis operation 300 includes: [0146] A suboperation 310, wherein key features, specified by a vector {right arrow over ()}.sub.key, are extracted from the measurement data. [0147] A suboperation 320, wherein {right arrow over (p)}.sub.s is obtained as the (numerical or analytical) solution of minimization over a loss function depending on at least {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)}). (That is, {right arrow over (p)}.sub.s minimizes the loss function.) {right arrow over ()}.sub.ext({right arrow over (p)}) is a vector-valued function of the key features obtained through extrapolation from {{right arrow over ()}.sub.n}.sub.n=1.sup.N.

[0148] Suboperations 310 and 320 correspond to specific embodiments of suboperations 120a and 120b, respectively, of method 100.

[0149] Each of the components of {right arrow over ()}.sub.ext({right arrow over (p)}) is a function quantifying the dependenceas prescribed by a modelof the corresponding key feature on the values {right arrow over (p)} of the one or more structural parameters. Thus, for example, {right arrow over ()}.sub.ext.sup.(j)({right arrow over (p)}), the j-th component of {right arrow over ()}.sub.ext({right arrow over (p)}), is a function quantifying the dependenceper the modelof the j-th key feature (e.g., the j-th component of the energy signature) on {right arrow over (p)}.

[0150] According to some embodiments, in suboperation 320,

[00011] p .fwdarw. s = arg m p .fwdarw. in D ( f .fwdarw. key , f .fwdarw. ext ( p .fwdarw. ) ) . D ( v .fwdarw. 1 , v .fwdarw. 2 )

denotes a mathematical distance (which may or not be a norm) between a pair of vectors {right arrow over (v)}.sub.1 and {right arrow over (v)}.sub.2 (so that D({right arrow over ()}.sub.key, {right arrow over ()}.sub.ext({right arrow over (p)}) is a (mathematical) distance between {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)})). Non-limiting examples of distances include the L.sup.1 norm and the L.sup.2 norm. (It is noted that in embodiments wherein the norm is L.sup.2, and {right arrow over ()}.sub.ext({right arrow over (p)}) is linear in {right arrow over (p)}, the optimization problem admits an analytical solution.) According to some such embodiments,

[00012] p .fwdarw. s = arg m p .fwdarw. in .Math. f .fwdarw. key - f .fwdarw. ext ( p .fwdarw. ) .Math. .

The double vertical bars denote a vector norm (e.g., a Euclidian norm:

[00013] .Math. f .fwdarw. key - f .fwdarw. ext ( p .fwdarw. ) .Math. 2 = ( f .fwdarw. key - f .fwdarw. ext ( p .fwdarw. ) ) 2 ) .

More generally, and as elaborated on below, {right arrow over (p)}.sub.s may be estimated via

[00014] p .fwdarw. s = arg m p .fwdarw. in ( min M 1 , M 2 D ( M 1 f .fwdarw. key , M 2 f .fwdarw. ext ( p .fwdarw. ) ) ) ,

wherein M.sub.1 and M.sub.2 are matrices having suitably selected properties as specified below. In particular, each of M.sub.1 and M.sub.2 may be a positive definite matrixoptionally, diagonalwith a respective minimum eigenvalue which is greater than a respective prespecified (positive) threshold.

[0151] According to some embodiments, a regularizing term(s) may be added to the norm {right arrow over ()}.sub.key{right arrow over ()}.sub.ext({right arrow over (p)}) (or, more generally, M.sub.1{right arrow over ()}.sub.keyM.sub.2{right arrow over ()}.sub.ext({right arrow over (p)}), D(M.sub.1{right arrow over ()}.sub.key, M.sub.2{right arrow over ()}.sub.ext({right arrow over (p)})), or a first term, depending on {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)}), of a loss function) to stabilize the solution or as a constraint(s), which reflects some prior knowledge about the one or more structural parameters.

[0152] According to some embodiments, the extrapolation is to a linear function. That is, {right arrow over ()}.sub.ext({right arrow over (p)})={right arrow over ()}.sub.ext({right arrow over (p)}.sub.0+{right arrow over ()})={right arrow over ()}.sub.0+A{right arrow over ()}={right arrow over ()}.sub.0+A({right arrow over (p)}{right arrow over (p)}.sub.0), wherein {right arrow over ()}.sub.0 is a vector of values of the key features corresponding to {right arrow over (p)}.sub.0. A is a K.sub.K.sub.p matrix. K.sub. is the number of key features, i.e., the dimensionality of {right arrow over ()}.sub.ext({right arrow over (p)}.sub.0+{right arrow over ()}) (and of {right arrow over ()}.sub.0 and each of the {right arrow over ()}.sub.n). K.sub.p is the dimensionality of {right arrow over (p)}.sub.0 (and the {right arrow over (p)}.sub.n), that is, the number of the one or more structural parameters of the tested specimen (and each of the reference specimens), which are to be estimated. {right arrow over ()}.sub.0 specifies values of key features corresponding to a nominal specimen (i.e., characterized by the nominal values {right arrow over (p)}.sub.0).

[0153] According to some embodiments, {right arrow over ()}.sub.0 includes GT values obtained by actual measurements in the same way done for the tested specimen or for the reference specimens, described above.

[0154] According to some embodiments,

[00015] A = arg min B .Math. ( ( f .fwdarw. 1 - f .fwdarw. 0 - B .fwdarw. 1 ) T ( f .fwdarw. 2 - f .fwdarw. 0 - B .fwdarw. 2 ) T .Math. ( f .fwdarw. N - f .fwdarw. 0 - B .fwdarw. N ) T ) .Math. .

[0155] For each 1nN, {right arrow over ()}.sub.n={right arrow over (p)}.sub.n{right arrow over (p)}.sub.0. B is a KfKp matrix. The double vertical bars denote a matrix norm (e.g., the Frobenius norm). Optionally, according to some embodiments, a regularizing term(s) may be added to the matrix norm, in which case the minimization is to be understood as being over the sum of the matrix norm and the regularizing term(s).

[0156] Alternatively, according to some embodiments, similarly to the matrix A, {right arrow over ()}.sub.0 is determined through optimization. In particular, according to some such embodiments, both {right arrow over ()}.sub.0 and the matrix A are obtained as the solution of

[00016] arg min g .fwdarw. , B .Math. ( ( f .fwdarw. 1 - g .fwdarw. - B .fwdarw. 1 ) T ( f .fwdarw. 2 - g .fwdarw. - B .fwdarw. 2 ) T .Math. ( f .fwdarw. N - g .fwdarw. - B .fwdarw. N ) T ) .Math.

with the double vertical bars denoting a matrix norm (e.g., the Frobenius norm). Optionally, according to some embodiments, a regularizing term(s) may be added to the matrix norm, in which case the minimization is to be understood as being over the sum of the matrix norm and the regularizing term(s).

[0157] In some embodiments, the {right arrow over ()}.sub.n in the matrix are not all treated equally. Some non-limiting exemplary embodiments are provided below.

[0158] In some embodiments,

[00017] f .fwdarw. ext = arg min f .Math. i = 1 N D i ( f .fwdarw. ( p .fwdarw. i ) , f .fwdarw. i ) ,

where custom-character is a predefined class of functions and D.sub.i: custom-charactercustom-charactercustom-character, where Kf is the number of key features, i.e., the dimensionality of f, is a general loss function that may or may not depend on i. In some embodiments, all data are treated equally by D.sub.i being the same for all i. However, in some embodiments, certain specimens are treated differently from others, by using different D.sub.is for different specimens, or for different groups of specimens.

[0159] Non-limiting examples for D include the L1 norm and the L2 norm, discussed above with reference to suboperation 320, and

[00018] D ( v .fwdarw. 1 , v .fwdarw. 2 ) = min M 1 , M 2 D ( M 1 v .fwdarw. 1 , M 2 v .fwdarw. 2 ) ,

also discussed above, wherein {right arrow over (v)}.sub.1 and {right arrow over (v)}.sub.2 are a pair of vectors (so that D({right arrow over (v)}.sub.1, {right arrow over (v)}.sub.2) is a (mathematical) distance between {right arrow over (v)}.sub.1 and {right arrow over (v)}.sub.2, and M1 and M2 are matrices having suitably selected properties as specified in the above discussion of suboperation 320. As above, different D.sub.is may be used for different specimens, or for different groups of specimens.

[0160] A further non-limiting example for custom-character is a function composed of a linear function (affine function, which may include a constant term). Here custom-character is the class of functions of the form ({right arrow over (v)})=A{right arrow over (v)}+b (or, e.g., ({right arrow over (v)})=A{right arrow over (v)}+.sub.0). In this case the optimization becomes

[00019] A , b = arg min B , g .Math. i = 1 N d i ( B p .fwdarw. i + g , f .fwdarw. i )

and the resulting {right arrow over ()}.sub.ext is {right arrow over ()}.sub.ext({right arrow over (p)})=A{right arrow over (p)}+b.

[0161] According to some such embodiments, similar to embodiments mentioned above with respect to suboperation 320, the function is a linear function with Frobenius/L2 norm, where D.sub.i(v.sub.1, v.sub.2)=.sub.iv.sub.1v.sub.2.sup.2. Accordingly, the matrix

[00020] A equals arg min B .Math. i = 1 N i .Math. f .fwdarw. i - f .fwdarw. 0 - B .fwdarw. i .Math. 2 ,

or both {right arrow over ()}.sub.0 and the matrix A are obtained as the solution of

[00021] arg min g .fwdarw. , B .Math. i = 1 N i .Math. f .fwdarw. i - g .fwdarw. - B .fwdarw. i .Math. 2 .

As noted above for using a different D.sub.i for different specimens or different groups of specimens, here .sub.i may be the same for all data (resulting in equal treatment for all data), or different for different data. Optionally, according to some embodiments, a regularizing term(s) may be added to the matrix norm, in which case the minimization is to be understood as being over the sum of the matrix norm and the regularizing term(s).

[0162] According to some embodiments, the extrapolation may be to a non-linear function of {right arrow over ()}. As a non-limiting example, according to some such embodiments, {right arrow over ()}.sub.ext({right arrow over (p)}.sub.0+{right arrow over ()}) is a square function of {right arrow over ()}. That is, in such embodiments, for each 1cKf, the c-th component of {right arrow over ()}.sub.ext({right arrow over (p)}.sub.0+{right arrow over ()}) will include (in addition to a linear contribution and a constant) a square contribution given by .sub.b=1.sup.K.sup.p.sub.a=1.sup.K.sup.pT.sub.ab.sup.(c).sub.a.sub.b, wherein T.sub.ab.sup.(c) denotes the (a, b)-th component of a K.sub.pK.sub.p matrix T.sup.(c).

[0163] Optimization problems specified throughout the application may be solved using standard local and/or global optimization algorithms, such as gradient descent or quasi-Newton. According to some embodiments, wherein a specified optimization problem admits a known analytical solution, the quantity (e.g., {right arrow over (p)}.sub.s, the matrix A, {right arrow over ()}.sub.0) sought to be optimized may be computed directly from the (function defining the) analytical solution. As a non-limiting example, in embodiments wherein {right arrow over ()}.sub.ext({right arrow over (p)}) is linear in {right arrow over (p)} (that is, {right arrow over ()}.sub.ext({right arrow over (p)})={right arrow over (p)}+{right arrow over (b)} with being a matrix) and the norm is L2, the optimization problem assumes the form

[00022] arg m p .fwdarw. in .Math. A ~ p .fwdarw. + b .fwdarw. - f .fwdarw. key .Math. ,

so that {right arrow over (p)}.sub.s={tilde over (M)}({right arrow over ()}.sub.key{right arrow over (b)}), wherein the matrix M is Moore-Penrose inverse of A. As yet another example, in embodiments wherein

[00023] arg min B .Math. ( ( f .fwdarw. 1 - f .fwdarw. 0 - B .fwdarw. 1 ) T ( f .fwdarw. 2 - f .fwdarw. 0 - B .fwdarw. 2 ) T .Math. ( f .fwdarw. N - f .fwdarw. 0 - B .fwdarw. N ) T ) .Math.

admits a (known) analytical solution, as is the case, for instance, when the matrix norm is the Frobenius norm, A may be obtained by plugging the {right arrow over ()}.sub.n and the {right arrow over ()}.sub.n into the (function defining the) analytical solution. More precisely, the analytical solution is given by A=(FF.sub.0)Q, wherein Q is the Moore-Penrose inverse of a matrix {tilde over (D)} whose columns are constituted by the {right arrow over ()}.sub.n, F is a matrix whose columns are constituted by the {right arrow over ()}.sub.n, and F.sub.0 is matrix whose columns are each constituted by {right arrow over ()}.sub.0. Finally, in embodiments wherein

[00024] arg min g .fwdarw. , B .Math. ( ( f .fwdarw. 1 - g .fwdarw. - B .fwdarw. 1 ) T ( f .fwdarw. 2 - g .fwdarw. - B .fwdarw. 2 ) T .Math. ( f .fwdarw. N - g .fwdarw. - B .fwdarw. N ) T ) .Math.

admits a (known) analytical solution, as is the case, for example, when the matrix norm is the Frobenius norm, {right arrow over ()}.sub.0 and A may be obtained by plugging the {right arrow over ()}.sub.n and the {right arrow over ()}.sub.n into the (function defining the) analytical solution. Through suitable manipulation, the analytical solution may be obtained in essentially the same manner as in the case wherein only A is to be determined (i.e., when {right arrow over ()}.sub.0 is given).

[0164] Optionally, according to some embodiments, measurement data analysis operation 300 may further include, prior to suboperation 320, an (optional) suboperation (not specified in FIG. 3) of obtaining {{right arrow over ()}.sub.n}.sub.n=1.sup.N by subjecting {right arrow over ()}.sub.key to an (k=N)-NN classifier with respect to a larger set of N>N vectors of key features {{right arrow over ()}.sub.i}.sub.i=1.sup.N corresponding to N>N additional reference specimens, which include the {right arrow over ()}.sub.n (optionally, relabeled). The full set of N {right arrow over (p)}.sub.i may be selected as described above in the description of method 100.

[0165] Referring again to method 100, according to some embodiments, in suboperation 120a (and therefore also suboperation 310), in order to derive {right arrow over ()}.sub.key, onto each of the X-ray emission spectra (obtained for each of the e-beams projected in measurement operation 110) a respective curve is fitted. This is illustrated by way of example in FIGS. 4A-4E, according to some embodiments, in the case wherein {right arrow over ()}.sub.key is given by the energy signature associated with a single target substance and a single spectral line (i.e., single characteristic X-ray line). A more general case, wherein the energy signature is associated with a plurality of target substances, and/or for at least some of the target substances a plurality of spectral lines thereof is taken into account, is described later on.

[0166] In some embodiments, a respective curve is fitted onto each of the X-ray emission spectra obtained for the reference specimens for deriving {right arrow over ()}.sub.n, by the same methods described herein for deriving {right arrow over ()}.sub.key.

[0167] Referring to FIG. 4A, FIG. 4A depicts a measured (X-ray emission) spectrum 400, which was obtained by implementing measurement operation 110 with respect to a tested specimen (e.g., specimen 20). As is also the case in each of FIGS. 4B-4E, the horizontal axis corresponds to the photon energy (or equivalently the frequency) of the emitted X-rays and the vertical axis to the intensity I of the emitted X-rays. The graduations on each of the horizontal and vertical axes are linearly spaced-apart with i<i+1 and Ii<Ii+1. A peak 410 of measured spectrum 400 is substantially centered about a characteristic X-ray line of a target substance, which is included in the tested specimen, and whose energy signature is to be obtained. FIG. 4B depicts an optimized curve 450, which was fitted onto measured spectrum 400. FIG. 4C depicts optimized curve 450 superimposed on measured spectrum 400.

[0168] According to some embodiments, the fitting onto measured spectrum 400 involves optimizing over values of one or more adjustable parameters of a curve (also referred to as the free curve), thereby obtaining optimized curve 450. The values of the one or more adjustable parameters are fixed by minimizing (over the one or more adjustable parameters) a distance between the free curve and the measured spectrum.

The one or more adjustable parameters may include a (first) adjustable parameter whose value is indicative of an intensity of the emitted X-rays about the characteristic X-ray line of the target substance. According to some such embodiments, the adjustable parameter is a multiplicative coefficient of a normalized cap-shaped function (e.g., a normalized gaussian), which may be centered the characteristic X-ray line. According to some embodiments, the one or more adjustable parameters include a plurality of adjustable parameters, which may includein addition to the first adjustable parameteran additive bias parameter, at least one parameter governing a shape of the cap-shaped function (e.g., the width of a normalized gaussian), and/or a (characteristic X-ray) line shift parameter governing the location of the center of the cap-shaped function.

[0169] More generally, according to some embodiments, the free curve may be a sum of at least two adjustable functions: an adjustable cap-shaped function, which may be centered about the characteristic X-ray line, and an adjustable second function quantifying the (continuous) spectrum of the bremsstrahlung (i.e., background radiation) component of the respective measured X-ray emission spectrum (e.g., the background radiation in the vicinity of the characteristic X-ray line). As a non-limiting example, the at least one landing energy includes N.sub.E e-beam landing energies {E.sub.i}.sub.i=1.sup.N.sup.E, so that N.sub.E X-ray emission spectra are measured: {s.sub.i()}.sub.i=1.sup.N.sup.E. Here denotes a photon energy of the emitted X-rays and s.sub.i()the i-th measured X-ray emission spectrumis the measured X-ray emission spectrum induced by projecting an e-beam at the landing energy E.sub.i. According to some embodiments, a set of N.sub.E free curves {c.sub.i()}.sub.i=1.sup.N.sup.E may be fitted onto the set of measured spectra {s.sub.i()}.sub.i=1.sup.N.sup.E. According to some embodiments, for each 1iN.sub.E, c.sub.i()=G.sub.i()+b.sub.i(), wherein G.sub.i() is the adjustable cap-shaped function and b.sub.i() is the adjustable second function. G.sub.i()=a.sub.i.Math.g.sub.i(), wherein g.sub.i() is a normalized cap-shaped function and a.sub.i is a multiplicative coefficient. According to some embodiments, g.sub.i() may be a (normalized) gaussian, in which case the width and, optionally, center of g.sub.i() may be adjustable parameters (over which the optimization is carried out). According to some alternative embodiments, g.sub.i() may be a (normalized) gamma distribution or generalized gaussian distribution. According to some embodiments, b.sub.i() may be a polynomial (e.g., a first order polynomial or a second order polynomial) whose coefficients are adjustable. Alternatively, according to some embodiments, b.sub.i() may be determined from Kramer's law.

[0170] Since g.sub.i() is normalized, a.sub.i substantially equals the intensity of the X-rays (or equivalently the number of photons) emitted due to transitionswhich correspond to the characteristic X-ray line of the target substancea and collected (detected) by the X-ray measurement module.

[0171] Denoting by {g.sub.i, j}.sub.j=1.sup.j.sup.max and {b.sub.i, k}.sub.k=0.sup.k.sup.max the adjustable parameters of g.sub.i() and b.sub.i(), respectively, for each 1iN.sub.E, the optimized values .sub.i, {.sub.i, j}.sub.j=1.sup.j.sup.max and {{circumflex over (b)}.sub.i, k}.sub.k=0.sup.k.sup.max of the adjustable parameters may be obtained by minimizing D(c.sub.i(), s.sub.i()) over a.sub.i, {g.sub.i, j}.sub.j=1.sup.j.sup.max, and {b.sub.i, k}.sub.k=0.sup.k.sup.max. D(c.sub.i(), s.sub.i()) is a distance between c.sub.i() and s.sub.i(). More generally, according to some embodiments, the optimized values may be obtained by minimizing over a loss function depending at least on c.sub.i() and s.sub.i(). As a non-limiting example, according to some embodiments, wherein g.sub.i() is gaussian and b.sub.i() is a second order polynomial: (i) {g.sub.i, j}.sub.j=1.sup.2={g.sub.i, 1, g.sub.i, 2} with g.sub.i, 1 and g.sub.i, 2 parameterizing the width and center of the gaussian; and (ii) {b.sub.i, k}.sub.k=0.sup.2={b.sub.i, 0, b.sub.i, 1, b.sub.i, 2} with b.sub.i, 0, b.sub.i, 1, and b.sub.i, 2 being the zeroth order, first order, and second order coefficients of the polynomial. In particular,

[00025] a ^ i = arg min a i min { g i , j } j = 1 j max , { b i , k } k = 1 k max D ( c i ( ) ,

s.sub.i()). According to some embodiments, D(c.sub.i(), s.sub.i())=d|c.sub.i()s.sub.i()|.sup.2 (or a discretized equivalent expression). According to some embodiments, a regularization term may be added to D(c.sub.i(), s.sub.i()) to take into account prior knowledge regarding any of the free parameters and/or stabilize the solution (of the minimization algorithm).

[0172] According to some alternative embodiments, wherein there exists prior knowledge relating at least some of the free parameters to one another, the full set of optimized values, i.e., {.sub.i, {.sub.i, j}.sub.j=1.sup.j.sup.max, {{circumflex over (b)}.sub.i, k}.sub.k=0.sup.k.sup.max}.sub.i=1.sup.N.sup.E (or equivalently {.sub.i, .sub.i(), {circumflex over (b)}.sub.i()}.sub.i=1.sup.N.sup.E, wherein .sub.i() and {circumflex over (b)}.sub.i() denote the optimized functions defined by {.sub.i, j}.sub.j=1.sup.j.sup.max and {{circumflex over (b)}.sub.i, k}.sub.k=0.sup.k.sup.max, respectively) is obtained by jointly optimizing over all of the adjustable parameters, i.e., {a.sub.i{g.sub.i, j}.sub.j=1.sup.j.sup.max, {b.sub.i, k}.sub.k=0.sup.k.sup.max}.sub.i=1.sup.N.sup.E subject to constraints imposed by the aforementioned prior knowledge. More specifically, in such embodiments,

[00026] { a ^ i } i = 1 N E = arg min { a i } i = 1 N E min { { g i , j } j = 1 j max , { b i , k } k = 0 k max } i = 1 N E .Math. i = 1 N E D ( c i ( ) , s i ( ) ) s . t . { Q l } l = 1 N c ,

wherein {Q.sub.l}.sub.l=1.sup.N.sup.c is a set of N.sub.c constraints. (That is, each of the Q.sub.l is an equation, or inequality, relating at least some of the free parameters to one another.)

[0173] As a non-limiting example, according to some embodiments depicted in FIGS. 4B-4E, the free curve is a sum of three adjustable functions. In addition to g.sub.i(), which is gaussian, and b.sub.i(), which is a second order polynomial, the sum additionally includes a gaussian Y.sub.i(). Referring to FIG. 4D, a curved line 460 corresponds to .sub.i.Math..sub.i()+.sub.i(). .sub.i() (which is also gaussian) was obtained by optimizing over free parameters of .sub.i(). .sub.i() is centered about the characteristic X-ray line of the target substance. .sub.i() is centered about a characteristic X-ray line of a (non-target) second substance present in the tested specimen. The characteristic line of the second substance is close to the characteristic line of the target substance and accordingly was taken into account in order to improve the accuracy of the determination {right arrow over ()}.sub.key (and consequently {right arrow over (p)}.sub.s). Referring to FIG. 4E, a curved line 470 corresponds to {circumflex over (b)}.sub.i(). Curved line 470 is also plotted in FIG. 4C.

[0174] According to some embodiments, wherein the X-ray emission spectrum about a single characteristic X-ray line of a single target substance (included in the tested specimen) is used to determine {right arrow over ()}.sub.key, the number of components {right arrow over ()}.sub.key is equal to the number of e-beam landing energies. According to some embodiments, for each 1jJ, .sub.key.sup.(j) is equal to .sub.jthe j-th component of the energy signature. More generally, according to some embodiments, for each 1jJ, {right arrow over ()}.sub.key.sup.(j)=.sub.key (.sub.j, {{circumflex over (b)}.sub.j, k}.sub.k=0.sup.k.sup.max), wherein .sub.key (.sub.j, {{circumflex over (b)}.sub.j, k}.sub.k=0.sup.k.sup.max) is a function of .sub.j and {{circumflex over (b)}.sub.j, k}.sub.k=0.sup.k.sup.max. That is, for each 1jJ, the j-th component of the energy signature is a function of both .sub.j and the coefficients of {circumflex over (b)}.sub.j(). According to some such embodiments, .sub.key (.sub.j, {{circumflex over (b)}.sub.j, k}.sub.k=0.sup.k.sup.max)=.sub.key(.sub.j, q({{circumflex over (b)}.sub.j, k}.sub.k=0.sup.k.sup.max)), wherein q is a function of the coefficients of {circumflex over (b)}.sub.j(). As a non-limiting example, according to some embodiments, q({{circumflex over (b)}.sub.j, k}.sub.k=0.sup.k.sup.max)=custom-character{circumflex over (b)}.sub.j()custom-character and .sub.key(.sub.j, q({{circumflex over (b)}.sub.j, k}.sub.k=0.sup.k.sup.max))=.sub.j/custom-character{circumflex over (b)}.sub.j()custom-character, wherein the triangular brackets denote averaging about the center of () along an interval equal to the width of .sub.i().

[0175] According to some embodiments, the key features may be derived based on a dependence on the e-beam landing energy of the intensities of the emitted X-rays about each of a plurality of different characteristic X-ray lines. According to some such embodiments, wherein N.sub.L is the number of different characteristic X-ray lines, the key features are specified by a J=N.sub.EN.sub.L component vector with components .sub.key.sup.(n.sup.E.sup.,n.sup.L.sup.) with 1n.sub.EN.sub.E and 1n.sub.LN.sub.L (N.sub.E is the number of landing energies). The first index denotes the e-beam landing energy and the second index denotes the characteristic X-ray line. That is, {right arrow over ()}.sub.key=(.sub.key.sup.(1,1), .sub.key.sup.(1,2), . . . , .sub.key.sup.(1,N.sup.L.sup.), .sub.key.sup.(2,1), .sub.key.sup.(2,2), . . . , .sub.key.sup.(2,N.sup.L.sup.), . . . , .sub.key.sup.(N.sup.E.sup.,1), .sub.key.sup.(N.sup.E.sup.,2), . . . , .sub.key.sup.(N.sup.E.sup.,N.sup.L.sup.)). In such embodiments, in measurement operation 110, for each e-beam landing energy, the X-ray emission spectrum is measured over a photon energy range or photon energy ranges including the plurality of characteristic X-ray lines. The components {right arrow over ()}.sub.key pertaining to a same characteristic X-ray line (e.g., .sub.key.sup.(1,2), .sub.key.sup.(2,2), . . . , .sub.key.sup.(N.sup.E.sup.,2)) may be obtained as described above in the case wherein N.sub.L=1. According to some embodiments, wherein the at least one target substance includes N.sub.sub (N.sub.subN.sub.L) target substances, the N.sub.L characteristic X-ray lines include characteristic X-ray lines corresponding to each of the N.sub.sub target substances, respectively.

[0176] As mentioned above, according to some embodiments,

[00027] p .fwdarw. s = arg min p .fwdarw. ( min M 1 , M 2 D ( M 1 f .fwdarw. key , M 2 f .fwdarw. ext ( p .fwdarw. ) ) ) .

M.sub.1 and M.sub.2 are matrices having suitably selected properties (e.g., positive-definiteness and symmetries as specified below) and D(M.sub.1{right arrow over ()}.sub.key, M.sub.2{right arrow over ()}.sub.ext({right arrow over (p)})) denotes a mathematical distance between M.sub.1{right arrow over ()}.sub.key and M.sub.2{right arrow over ()}.sub.ext({right arrow over (p)}). According to some embodiments, one or more regularizing terms may be added to

[00028] min M 1 , M 2 D ( M 1 f .fwdarw. key , M 2 f .fwdarw. ext ( p .fwdarw. ) ) .

To render the description more concrete by way of a non-limiting example, addressed in detail are embodiments wherein the X-ray emission spectrum about a single characteristic X-ray line of a single target substance is used to determine {right arrow over ()}.sub.key, such that for each 1jN.sub.E, .sub.key.sup.(j) is equal to .sub.j and .sub.key.sup.(N.sup.E.sup.+j)=custom-character{circumflex over (b)}.sub.j()custom-character (i.e., the number of components of {right arrow over ()}.sub.key is equal to 2N.sub.E). That is, the first N.sub.E components of {right arrow over ()}.sub.key are unnormalized energy signature components and the last N.sub.E components are pure bremsstrahlung (i.e., background radiation) components. According to some such embodiments,

[00029] p .fwdarw. s = arg min p .fwdarw. ( min M D ( M 1 f .fwdarw. key , M f .fwdarw. ext ( p .fwdarw. ) ) )

(so that M.sub.1 equals the identity matrix and M.sub.2=M). M is a diagonal matrix whose diagonal terms are pairwise equal in the sense that for each 1jN.sub.E M.sub.N.sub.E.sub.+j, N.sub.E.sub.+j=M.sub.j, jT, wherein T is a the prespecified (positive) threshold. That is, for each 1jN.sub.E, the (N.sub.E+j)-th component along the diagonal of M equals the j-th component there along. Accordingly, for each 1jN.sub.E, .sub.j and custom-character{circumflex over (b)}.sub.j()custom-character are weighted by the same respective factor. The inclusion of M, and the minimization thereover, may account for potentially different scaling of components {right arrow over ()}.sub.ext({right arrow over (p)}) and corresponding components {right arrow over ()}.sub.key, whereby, for at least some 1jN.sub.E, a scale of {right arrow over ()}.sub.ext.sup.(j)({right arrow over (p)}) and .sub.ext.sup.(N.sup.E.sup.+j)({right arrow over (p)}) varies from that of .sub.key.sup.(j) and .sub.key.sup.(N.sup.E.sup.+j). According to some such embodiments,

[00030] p .fwdarw. s = arg min p .fwdarw. ( min M .Math. f .fwdarw. key - M f .fwdarw. ext ( p .fwdarw. ) .Math. ) .

According to some embodiments, one or more regularizing terms may be added to

[00031] min M .Math. f .fwdarw. key - M f .fwdarw. ext ( p .fwdarw. ) .Math. .

More generally, according to some embodiments,

[00032] p .fwdarw. s = arg min p .fwdarw. ( F 0 ( f .fwdarw. key , f .fwdarw. ext ( p .fwdarw. ) ) ) ,

wherein F.sub.0 ({right arrow over ()}.sub.key, {right arrow over ()}.sub.ext({right arrow over (p)})) is a loss function, which depends on {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)}) and is equal to

[00033] min M F ~ 0 ( f .fwdarw. key , M f .fwdarw. ext ( p .fwdarw. ) )

(with M having the above specified properties and symmetries, i.e., pairwise equality). {tilde over (F)}.sub.0 ({right arrow over ()}.sub.key, M{right arrow over ()}.sub.ext({right arrow over (p)})) is a function (e.g., a loss function) depending on {right arrow over ()}.sub.key and M{right arrow over ()}.sub.ext({right arrow over (p)}). According to some embodiments,

[00034] p .fwdarw. s = arg min p .fwdarw. ( min M F ~ 0 ( f .fwdarw. key , M f .fwdarw. ext ( p .fwdarw. ) ) + F 1 ( p .fwdarw. ) ) ,

wherein F.sub.1({right arrow over (p)}) is a regularizing function including one or more regularizing terms.

[0177] According to some embodiments, wherein a spectrometer is used to obtain the X-ray emission spectra, measurement data analysis operation 120 may include an initial preprocessing suboperation, wherein the X-ray emission spectra may be preprocessed to remove noise. According to some embodiments, a similar preprocessing suboperation may be included for the reference specimens, wherein the X-ray emission spectra are preprocessed to remove noise.

[0178] According to some embodiments of method 100, suboperation 120a includes an initial suboperation wherein difference spectra for the tested specimen are obtained by subtracting control spectra of a control specimen from the measured spectra of the tested specimen, which are obtained in measurement operation 110. The control spectra may be of a gold standard specimen, which is known, or assumed, to closely match the intended design of the tested specimen in the sense that one or more structural parameters (optionally, also structural parameters which are not estimated by method 100) of the gold standard specimen do not deviate by more than 1%, 2%, or 5% from the nominal values thereof. Each possibility corresponds to separate embodiments. In some embodiments, the spectra of the control specimen include background radiation intensities similar to those measured for the tested specimen.

[0179] More specifically, the difference spectra may be processed (in suboperation 120a) to extract therefrom a vector of key features {right arrow over ()}.sub.dif for the tested specimen, essentially as described above in the description of FIGS. 4A-4E with the difference that (due to the subtraction) any accounting for bremsstrahlung is obviated.

[0180] In such embodiments, the method further includes an operation in which difference spectra for the reference specimens are also obtained by subtracting control spectra of a control specimen from the measured spectra of the reference specimens by the same methods described herein for the tested specimen.

[0181] In some embodiments, instead of obtaining difference spectra for the tested and reference specimens by subtracting spectra of a control specimen (as described above), certain key features of the tested and reference specimens are obtained by subtracting from the key features initially obtained corresponding key features obtained for a control specimen.

[0182] In such embodiments, the computer algorithm of suboperation 120b is configured to receive as an input {right arrow over ()}.sub.dif, and output {right arrow over (p)}.sub.s, based on a set of N vectors of reference key features {{right arrow over ()}.sub.n}.sub.n=1.sup.N. For each 1nN, {right arrow over ()}.sub.n is obtained in the same way as {right arrow over ()}.sub.dif for the tested specimen, including obtaining difference spectra by subtracting control spectra from the GT measured spectra and processing to extract the vector of key features {right arrow over ()}.sub.n.

[0183] According to some such embodiments, method 100 may additionally include obtaining the control spectra, e.g., by implementing measurement operation 110 with respect to the control specimen.

[0184] It is to be understood that the applicability of method 100 is not limited to specimens including nominally flat layers (as depicted by way of a non-limiting example in FIGS. 2A-2D), and, more generally, layered specimens. Regions differing from one another in material composition may in principle be arbitrarily shaped. In particular, method 100 may be applied to structures including localized embedded (buried) features, such as nanowires, gate-all-around nanosheets, and, more generally, channels. Method 100 may also be applied to specimens characterized by continuously varying densities of substances included therein as function of the depth coordinate and/or, in the three-dimensional case, as a function of the lateral coordinates. Further, the skilled person will readily perceive that method 100 may be applied to specimens including empty cavities and/or holes.

Systems

[0185] According to an aspect of some embodiments, there is provided a computerized system for non-destructive three-dimensional probing and characterization of specimens (such as semiconductor structures, e.g., included in patterned wafers) based on X-ray measurements and subsequent analysis of the obtained measurement data using actual measurements of GT reference specimens. FIG. 5 schematically depicts such a system, a computerized system 500, according to some embodiments. As will be apparent from the description of system 500, system 500 may be used to implement method 100 (including specific embodiments of method 100, which include measurement data analysis operation 300). In particular, system 500 may be used to estimate values of one or more structural parameters characterizing a tested specimen. Non-limiting examples of structural parameters, which may be estimated using system 500, are listed above in the Methods Subsection in the description of method 100.

[0186] System 500 includes an e-beam source 502, an X-ray detector 504 (or, more generally, an X-ray sensing assembly including two or more X-ray detectors), a processing circuitry 506, and a controller 508. According to some embodiments, system 500 may further include a stage 520 (e.g., a xyz stage) configured to accommodate a (tested) specimen 50.

[0187] According to some embodiments, e-beam source 502, X-ray detector 504, and controller 508 form part of a scanning electron microscope. According to some embodiments, specimen 50 may be a patterned wafer or a structure (e.g., a semiconductor structure) included in or on a patterned wafer. According to some such embodiments, specimen 50 may be a preliminary structure in one of the fabrication stages of a patterned wafer or an assist structure employed in one of the fabrication stages of a patterned wafer. According to some embodiments, specimen 50 may be or include one or more memory components and/or logic components (such as a gate stack, for example, a high-k metal gate stack). It is noted that specimen 50 does not form part of system 500.

[0188] Dotted lines between elements indicate functional or communicational association there between.

[0189] E-beam source 502 is configured to produce e-beams at a plurality of e-beam landing energies. In particular, e-beam source 502 is configured to produce e-beams at each of a plurality of landing energies, so as to allow probing specimen 50 to a plurality of depths, respectively, essentially as described above in the description of suboperation 110a of method 100.

[0190] The greater the depth to which a tested specimen is to be probed, the greater the maximum e-beam landing energy, and, optionally, the number of e-beam landing energies. According to some embodiments, the plurality of e-beam landing energies may include landing energies up to about 5 keV, about 10 keV, about 15 keV, about 20 keV, or even about 30 keV. Each possibility corresponds to different embodiments. In silicon, an e-beam with a landing energy of about 15 keV may penetrate as deep as about 3 m.

[0191] The durations of the projections of the e-beams may be dictated by the required precision to which {right arrow over (p)}.sub.s (i.e., the values of the one or more structural parameters characterizing specimen 50) is to be estimated.

[0192] An e-beam 505, generated by e-beam source 502, is shown incident on (an external surface 54 of) specimen 50, according to some embodiments. As a result of the impinging of e-beam 505 on specimen 50, and the penetration of e-beam 505 into specimen 50, X-rays, and, in particular, characteristic X-rays, are generated. A portion of these X-rays, constituted by X-rays 515, arrives at X-ray detector 504.

[0193] According to some embodiments, X-ray detector 504 is sensitive to electromagnetic radiation in the X-ray photon energy range (at least over the characteristic X-rays regime or one or more subranges thereof). According to some embodiments, X-ray detector 504 may be an EDX spectrometer or a WDX spectrometer. According to some embodiments, instead of a single X-ray detector, an X-ray detector assembly, which includes both an EDX spectrometer and a WDX spectrometer, may be used. In such embodiments the X-ray emission spectra may be obtained using both an EDX spectrometer and a WDX spectrometer with the WDX spectrometer being used to zoom in on the characteristic X-ray lines. In particular, the greater resolution of the WDX spectrometer (which renders it slower), as compared to the EDX spectrometer, allows obtaining narrower peaks and dips. According to some embodiments, wherein the spectrometer is a WDX spectrometer, X-ray detector 504 may be configured to allow scanning over extended photon energy ranges (thereby allowing to obtain X-ray emission spectra over extended photon energy ranges). X-ray detector 504 is configured to relay (optionally, via controller 508) the measurement data collected thereby (e.g., the spectrum of X-rays incident thereon) to processing circuitry 506.

[0194] According to some embodiments, system 500 may additionally include a window (not shown) positioned between X-ray detector 504 and stage 520, which may be configured to controllably and differentially attenuate the spectrum of the emitted X-rays and/or protect an X-ray sensitive surface of the spectrometer.

[0195] According to some alternative embodiments, X-ray detector 504 is configured to measure the intensity of electromagnetic X-ray radiation (i.e., electromagnetic radiation in the X-ray photon energy range) at or about a characteristic X-ray line of a (target) substance included in specimen 50 without additionally measuring the intensity of the electromagnetic X-ray radiation over an extended photon energy range outside the immediate vicinity of the characteristic X-ray line. According to some such embodiments, system 500 may additionally include an optical filter (not shown). The optical filter is configured to block electromagnetic radiation having a photon energy outside the immediate vicinity of the characteristic X-ray line from reaching X-ray detector 504.

[0196] According to some embodiments, system 500 may include additional elements. The additional elements may include electron optics (not shown; e.g., an electrostatic lens(es) and a magnetic deflector(s)), which may be used to guide and manipulate an e-beam generated by e-beam source 502. Additionally, or alternatively, the additional elements may include collection optics configured to guide onto X-ray detector 504 electromagnetic radiation generated due to the impinging of an e-beam on specimen 50 and penetration of the e-beam thereinto. According to some embodiments, the additional elements may include a filter configured to block electromagnetic radiation outside characteristic X-rays regime and/or one or more subranges thereof.

[0197] According to some embodiments, at least e-beam source 502 and stage 520 may be housed within a vacuum chamber 530. While in FIG. 5 X-ray detector 504 is shown positioned inside vacuum chamber 530, according to some alternative embodiments, X-ray detector 504 may be positioned outside vacuum chamber 530.

[0198] Controller 508 may be functionally associated with e-beam source 502 and, optionally, stage 520. More specifically, controller 508 is configured to control and synchronize operations and functions of the above-listed instrumentation and components during probing of a tested specimen (e.g., instruct the e-beam source to change the e-beam landing energy).

[0199] Processing circuitry 506 may include one or more processors and, optionally, RAM and/or non-volatile memory components (not shown). The one or more processors are configured to execute software instructions stored e.g., in the non-volatile memory components. Through the execution of the software instructions, measurement data (e.g., obtained by X-ray detector 504) of a tested specimen (e.g., specimen 50) is processed to estimate {right arrow over (p)}.sub.s, essentially as described above in the description of FIGS. 1 and 3.

[0200] More specifically, processing circuitry 506 is configured to process the measurement data to estimate the values (denoted by {right arrow over (p)}.sub.s) of the one or more structural parameters characterizing the tested specimen, as detailed above in the Methods Subsection. To this end, processing circuitry 506 is configured to extract from the measurement data a vector {right arrow over ()}.sub.key specifying values of key features obtained with respect to the tested specimen, as detailed above in the description of suboperation 120b of method 100. In particular, according to some embodiments, the key features include (e.g., are constituted by) the so-called energy signature.

[0201] According to some embodiments, each component of the energy signature may correspond to an absolute, normalized, or relative intensity of a respective characteristic X-ray line.

[0202] According to some embodiments, wherein the measurement data are X-ray emission spectra, processing circuitry 506 may be configured to fit onto each of the X-ray emission spectra a respective (free) curve, thereby obtaining an optimized curve. From the optimized curve {right arrow over ()}.sub.key may next be extracted, as described above in the description of suboperation 120a of method 100. According to some such embodiments, processing circuitry 506 may be configured to execute one or more optimization algorithms (e.g., to solve the optimization problems specified above in the description of FIGS. 4A-4E). Examples of relevant optimization algorithms include standard iterative optimization algorithms, such as gradient descent or Newton's method. According to some embodiments, customized iterative optimization algorithmsobtained by tweaking standard iterative optimization algorithms to account for constraints and/or to assure that a global minimum is attainedmay be employed.

[0203] More specifically, in order to estimate {right arrow over (p)}.sub.s, processing circuitry 506 is configured to additionally take into account a set of reference vectors including measured key features {{right arrow over ()}.sub.n}.sub.n=1.sup.N, as defined above for the method aspect. For each 1nN, {right arrow over (p)}.sub.n specifies values of the one or more structural parameters characterizing an n-th reference specimen, and each of the {right arrow over ()}.sub.nN may be obtained by actual measurements in the same way as {right arrow over ()}.sub.key is obtained for the tested specimen, and as described above in the Methods Subsection. According to some embodiments, and as expanded on above in the description of method 100, the {right arrow over (p)}.sub.n sample a selected hypervolume centered about {right arrow over (p)}.sub.0 in a K.sub.p dimensional vector space defined by the one or more structural parameters with K.sub.p being the number of the one or more structural parameters.

[0204] According to some embodiments, processing circuitry 506 may be configured to compute {right arrow over (p)}.sub.s=custom-character{right arrow over (p)}.sub.icustom-character, wherein the k {right arrow over (p)}.sub.i (k<N) label the k {right arrow over ()}.sub.n, which are closest to {right arrow over ()}.sub.key and the triangular brackets denote averaging, optionally, weighted, over the k {right arrow over (p)}.sub.i. To this end, according to some embodiments, processing circuitry 506 may be configured to apply a k-nearest neighbor (k-NN) regression algorithm to {right arrow over ()}.sub.key with respect to {{right arrow over ()}.sub.n}.sub.n=1.sup.N. According to some embodiments, processing circuitry 506 may be configured to obtain {right arrow over (p)}.sub.s by computing the median of the {right arrow over (p)}.sub.n corresponding to the k closest {right arrow over ()}.sub.n. As explained above for the method aspect, the k-NN or the median may assign weights so as to be biased towards GT data points.

[0205] According to some alternative embodiments, {right arrow over (p)}.sub.s minimizes a loss function, which is a function of at least {right arrow over ()}.sub.key and a vector-valued function {right arrow over ()}.sub.ext({right arrow over (p)}) of the key features that is extrapolated from {{right arrow over ()}.sub.n}.sub.n=1.sup.N. (Optionally, in addition to a first term dependent on {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)}), the loss function may additionally include one or more regularizing terms.) Accordingly, processing circuitry 506 may be configured to (i) execute an optimization algorithm to minimize over {right arrow over (p)} the loss function (e.g., minimize the distance between {right arrow over ()}.sub.key and {right arrow over ()}.sub.ext({right arrow over (p)})), and (ii) in embodiments wherein the minimization over the loss function has a known analytical solution, additionally, or alternatively, estimate {right arrow over (p)}.sub.s directly from the (function defining the) analytical solution. According to some such embodiments, processing circuitry 506 may be configured to estimate {right arrow over (p)}.sub.s by (numerically or analytically) solving the optimization problem

[00035] arg min p .fwdarw. .Math. f .fwdarw. key - f .fwdarw. ext ( p .fwdarw. ) .Math. ,

or, more generally,

[00036] arg min p .fwdarw. ( min M .Math. f .fwdarw. key - M f .fwdarw. ext ( p .fwdarw. ) .Math. ) ,

wherein M is a positive definite matrix (e.g., a diagonal positive definite matrix with pairwise equal diagonal terms as specified above) whose minimum eigenvalue is greater than a prespecified (positive) threshold, and, even more generally,

[00037] arg min p .fwdarw. ( min M 1 , M 2 D ( M 1 f .fwdarw. key , M 2 f .fwdarw. ext ( p .fwdarw. ) ) ) ,

wherein M.sub.1 and M.sub.2 are suitably selected matrices and D(M.sub.1{right arrow over ()}.sub.key, M.sub.2{right arrow over ()}.sub.ext({right arrow over (p)})) is a mathematical distance between M.sub.1{right arrow over ()}.sub.key and M.sub.2{right arrow over ()}.sub.ext({right arrow over (p)}). {right arrow over ()}.sub.ext({right arrow over (p)}) is the vector-valued function of the key features (extrapolated from {{right arrow over ()}.sub.n}.sub.n=1.sup.N) and models the dependence of the key features on the on the values of the one or more structural parameters. As explained above for the method aspect, the vector-valued function {right arrow over ()}.sub.ext({right arrow over (p)}), the optimization, and/or the loss function may comprise different treatment for different data points, or for different groups of data points

[0206] According to some embodiments, processing circuitry 506 may be further configured to perform the extrapolation (and thereby obtain {right arrow over ()}.sub.ext({right arrow over (p)})). According to some such embodiments, the extrapolation may be to a linear function, i.e., {right arrow over ()}.sub.ext({right arrow over (p)})={right arrow over ()}.sub.ext({right arrow over (p)}.sub.0+{right arrow over ()})={right arrow over ()}.sub.0+A{right arrow over ()}={right arrow over ()}.sub.0+A({right arrow over (p)}{right arrow over (p)}.sub.0). {right arrow over ()} denotes deviations from nominal values {right arrow over (p)}.sub.0 of the one or more structural parameters. {right arrow over ()}.sub.0 is a vector of values of the key features corresponding to {right arrow over (p)}.sub.0. A is a matrix determinable through optimization, which takes into account the {{right arrow over ()}.sub.n}.sub.n=1.sup.N, as detailed above in the description of method 300. As explained above for the method aspect, the extrapolation may comprise different treatment for different data points, or for different groups of data points.

[0207] According to some embodiments, processing circuitry 506 may be configured to select the {{right arrow over ()}.sub.n}.sub.n=1.sup.N from a larger set {{right arrow over ()}.sub.i}.sub.i=1.sup.N (N>N) of reference specimens by subjecting {right arrow over ()}.sub.key to an (k=N)-NN classifier with respect to {{right arrow over ()}.sub.i}.sub.i=1.sup.N.

[0208] As used herein, the terms fitted and optimized are interchangeable when employed in the context of curve fitting.

[0209] According to some embodiments, processing circuitry 506 and controller 508 may be housed in a common housing, for example, when implemented by a single computer.

[0210] In the description and claims of the application, the words include and have, and forms thereof, are not limited to members in a list with which the words may be associated.

[0211] As used herein, the term about may be used to specify a value of a quantity or parameter (e.g., the length of an element) to within a continuous range of values in the neighborhood of (and including) a given (stated) value. According to some embodiments, about may specify the value of a parameter to be between 80% and 120% of the given value. For example, the statement the length of the element is equal to about 1 m is equivalent to the statement the length of the element is between 0.8 m and 1.2 m. According to some embodiments, about may specify the value of a parameter to be between 90% and 110% of the given value. According to some embodiments, about may specify the value of a parameter to be between 95% and 105% of the given value.

[0212] As used herein, according to some embodiments, the terms substantially and about may be interchangeable.

[0213] According to some embodiments, an estimated quantity or estimated parameter may be said to be about optimized or about optimal when falling within 5%, 10% or even 20% of the optimal value thereof. Each possibility corresponds to separate embodiments. In particular, the expressions about optimized and about optimal also cover the case wherein the estimated quantity or estimated parameter is equal to the optimal value of the quantity or the parameter. The optimal value may in principle be obtainable using mathematical optimization software. Thus, for example, an estimated (e.g., an estimated residual) may be said to be about minimized or about minimal/minimum, when the value thereof is no greater than 101%, 105%, 110%, or 120% (or some other pre-defined threshold percentage) of the optimal value of the quantity. Each possibility corresponds to separate embodiments.

[0214] For ease of description, in some of the figures a three-dimensional cartesian coordinate system (with orthogonal axes x, y, and z) is introduced. It is noted that the orientation of the coordinate system relative to a depicted object may vary from one figure to another. Further, the symbol may be used to represent an axis pointing out of the page, while the symbol may be used to represent an axis pointing into the page.

[0215] In flowcharts, optional operations, and suboperations, are delineated by a dashed line. Similarly, in block diagrams, optional elements may be delineated by a dashed line. Further, (in block diagrams) dotted lines connecting elements may be used to represent functional association or at least one-way or two-way communicational association between the connected elements.

[0216] It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination or as suitable in any other described embodiment of the disclosure. No feature described in the context of an embodiment is to be considered an essential feature of that embodiment, unless explicitly specified as such.

[0217] Although operations of methods, according to some embodiments, may be described in a specific sequence, the methods of the disclosure may include some or all of the described operations carried out in a different order. In particular, it is to be understood that the order of operations and suboperations of any of the described methods may be reordered unless the context clearly dictates otherwise, for example, when a latter operation requires as input the output of earlier operation or when a latter operation requires the product of an earlier operation. A method of the disclosure may include a few of the operations described or all of the operations described. No particular operation in a disclosed method is to be considered an essential operation of that method, unless explicitly specified as such.

[0218] Although the disclosure is described in conjunction with specific embodiments thereof, it is evident that numerous alternatives, modifications, and variations that are apparent to those skilled in the art may exist. Accordingly, the disclosure embraces all such alternatives, modifications, and variations that fall within the scope of the appended claims. It is to be understood that the disclosure is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth herein. Other embodiments may be practiced, and an embodiment may be carried out in various ways.

[0219] The phraseology and terminology employed herein are for descriptive purposes and should not be regarded as limiting. Citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the disclosure. Section headings are used herein to ease understanding of the specification and should not be construed as necessarily limiting.

[0220] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. In case of conflict, the patent specification, including definitions, governs. As used herein, the indefinite articles a and an mean at least one or one or more unless the context clearly dictates otherwise.

[0221] Unless specifically stated otherwise, as apparent from the disclosure, it is appreciated that, according to some embodiments, terms such as processing, computing, calculating, determining, estimating, assessing, gauging or the like, may refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulate and/or transform data, represented as physical (e.g., electronic) quantities within the computing system's registers and/or memories, into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.

[0222] Embodiments of the present disclosure may include apparatuses for performing the operations herein. The apparatuses may be specially constructed for the desired purposes or may include a general-purpose computer(s) selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), electrically programmable read-only memories (EPROMs), electrically erasable and programmable read only memories (EEPROMs), magnetic or optical cards, flash memories, solid state drives (SSDs), or any other type of media suitable for storing electronic instructions, and capable of being coupled to a computer system bus.

[0223] The processes and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the desired method(s). The desired structure(s) for a variety of these systems appear from the description below. In addition, embodiments of the present disclosure are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present disclosure as described herein.

[0224] Aspects of the disclosure may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. Disclosed embodiments may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.