ORDER O(1) ALGORITHM FOR FIRST-PRINCIPLES CALCULATION OF TRANSIENT CURRENT THROUGH OPEN QUANTUM SYSTEMS

Abstract

A fast algorithm is used to study the transient behavior due to the step-like pulse. This algorithm consists of two parts: The algorithm I reduces the computational complexity to T.sup.0N.sup.3 for large systems as long as T<N; The algorithm II employs the fast multipole technique and achieves scaling T.sup.0N.sup.3whenever T<N.sup.2 beyond which it becomes T log.sub.2 N for even longer time. Hence it is of order O(1) if T<N.sup.2. Benchmark calculation has been done on graphene nanoribbons with N=10.sup.4 and T=10.sup.8. This new algorithm allows many large scale transient problems to be solved, including magnetic tunneling junctions and ferroelectric tunneling junctions that could not be achieved before.

Claims

1. A method of transient current evaluation to determine the response time and peak transient current of nano-chip in nano electronics, comprising the steps of: inputting Hamiltonians, H.sub.eq, H.sub.neq; preparing self-energy of leads by using complex absorbing potential (CAP); generating eigenstates and eigen-energies; expanding Fermi functions from the Pade spectrum decomposition (PSD) method; constructing the exact equation of transient current which is beyond the wideband limit (WBL) from the eigenstates and eigen-energies and the Fermi function based on non-equilibrium Green's function (NEGF) formalism using residue theorem; separating the expression into space dependent and time depending components respectively so that it is of a general form I(t)=1.sub.t(I.sub.o, V.sub.m(t), M.sub.n), such that it is a function composed of time dependent Vandermonde matrices V.sub.m and other space dependent matrices M.sub.n; computing and constructing space dependent matrices I.sub.o and M.sub.n by an optimized matrix multiplication process, compute the multiplication of V.sub.m and M.sub.n directly; summing up all of the contributing parts of I(t) from the multiplication of V.sub.m and M.sub.n and I.sub.o to form Algorithm I, which has a computational complexity of 50N.sup.3+TN.sup.2, where N is the system size and T is the time step; and outputting the complete transient response I(t) over the user-defined time period.

2. The method of claim 1 further including the steps of: using FMM and FFT methods to compute the multiplication of V.sub.m and M.sub.n; summing up all of the contributing parts of I(t) from the multiplication of V.sub.m and M.sub.n and I.sub.o to form Algorithm II, which has a computational complexity of 50N.sup.3+2N.sup.2 log.sub.2 N for T<N.sup.2 and large N; and outputting the complete transient response I(t) over the user-defined time period.

3. The method of claim 1 wherein the Hamiltonians, H.sub.eq, H.sub.neq are derived from first-principles, a tight-binding method or a similar technique.

4. The method of claim 1 wherein the first principles of the input step are (DFT), TB,

5. The method of claim 1 wherein the self-energy of leads is from CAP.

6. The method of claim 2, wherein T=N and Algorithm IIa is formed.

7. The method of claim 2 wherein T<2N and Algorithm IIb is formed.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0026] The foregoing and other objects and advantages of the present invention will become more apparent when considered in connection with the following detailed description and appended drawings in which like designations denote like elements in the various views, and wherein:

[0027] FIG. 1A shows the configuration of a gated graphene nanoribbon and a voltage applied to it over time and FIG. 1B shows the transient current of the zigzag graphene nanoribbon for a system of 600 atoms for the present invention and the Zhang method;

[0028] FIG. 2 shows the scaling of computation time versus the number of computations N for the prior art Zhang method and the Algorithm I of the method according to the present invention;

[0029] FIG. 3 shows the scaling of computational time against time t for N equal to 10,200, time step dt=1 fs;

[0030] FIG. 4 shows a graph of the typical behavior of transient current, wherein the inset is the long time behavior between 17 ps to 18 ps;

[0031] FIG. 5 is a graph of the transmission coefficient of a zigzag graphene nanoribbon for a system of 10,000 atoms calculated by 30 layers of CAP and by exact numerical solution;

[0032] FIG. 6 shows the distribution of (1/ω.sub.T).sup.j on the complex plane;

[0033] FIG. 7 shows the distribution of d.sub.j on the complex plane;

[0034] FIG. 8 illustrates the transient current calculated by Algorithm I, Algorithm IIa and Algorithm IIb; and

[0035] FIG. 9 is a flow chart showing the process for the calculation of the algorithm according to the present invention.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

[0036] First principles transient current calculation is essential to the study of the response time and to capture the peak transient current for preventing the melt down of nano-chips in nano-electronics. For a period of time T, its calculation is known to be extremely time consuming with the best scaling being TN.sup.3 where N is the dimension of the device. The dynamical response of the system is usually probed by sending a step-like pulse and monitoring its transient behavior.

[0037] The present invention is directed to a new algorithm for calculating the transient current of a system, which requires much less computational time than the prior art. FIG. 9 shows a flow chart of the method for calculating the algorithm of the present invention. It can be carried out on a computer workstation as part of the design of a nano-device.

[0038] Generally speaking, the algorithm provides an O(1) computational method for obtaining the transient response of current I(t) over the entire user-defined time period under suitable conditions. The input parameters for this algorithm include Hamiltonians of the open quantum system before and after the transient process denoted as H.sub.eq and H.sub.neq. These Hamiltonians, whether based on atomistic first-principle density functional theory (DFT) or tight-binding methods, can be adopted at will for the system of interest. Nonetheless, complex absorbing potential (CAP) is required to represent the self-energy of the leads of the device. Subsequently, corresponding eigenstate and energy can be generated and fed into the algorithm. Additionally, Fermi function is expanded using Pade spectrum decomposition (PSD).

[0039] It is only possible to construct the particular exact equation of transient current which can be calculated separated in space and time domains by using the mentioned form of input ingredients. Once such an equation is constructed, whether for the case for step-up, step-down bias or other similar classes of transient response, so that I(t) is a function of I.sub.o, V.sub.m(t) and M.sub.n for some m and n where V.sub.m(t) are Vandemonde matrices and I.sub.o and M.sub.n are time independent matrices. Once the required Vandemonde matrices and M.sub.n are constructed, fast multi pole method (FMM) and fast Fourier transform (FFT) are used to calculate the matrix multiplication between them. Eventually, I(t) can be obtained by summing up the contribution due to I.sub.o, V.sub.m(t) and M.sub.n.

[0040] In step 901 of FIG. 9 the process begins with input preparation. This includes step 901A, which involves forming Preliminary Input quantities H.sub.eq, H.sub.neq (From first-principle (DFT), TB, etc) and Self energy of leads (from CAP). In step 901B eigenstates and eigen-energy are generated. This input information is then directed to step 902. In step 902 construction is performed on the exact equation of transient current based on NEGF formalism that is organized into the general form of I(t)=I.sub.t(I.sub.o, V.sub.m(t), M.sub.n) such that it is a function composed of time dependent Vandermonde matrices V.sub.m and other space dependent matrices M.sub.n. Also, there is an input from step 903, i.e., Fermi functions (from the PSD method) to step 902.

[0041] The results of step 902 are then reviewed in step 904 to see if they contain time dependent components. If the answer is NO, the process moves to step 905. In step 905 the process computes and constructs space dependent matrices by an optimized matrix multiplication process. From step 905 the process goes to step 906 where the FMM and FFT methods are used for computing the multiplication of V.sub.m and M.sub.n.

[0042] If the result of step 904 is YES, the process moves to step 907 where Vandermonde matrices are prepared before moving to step 906. The results in both step 905 and 906 are summed in Step 908. The summed results are directed to output step 909, where the complete transient response I(t) over the user-defined time period is generated.

[0043] To demonstrate the power of the algorithm according to the present invention, the transient current in a graphene nanoribbon is calculated. Graphene is a well-known intrinsic 2D material with many exotic properties. See, A. H. Castro Neto et al., Rev. Mod. Phys. 81, 109 (2009) and Y. Zhang et al., Nature, 438, 201-204 (2005), which are incorporated herein by reference in their entirety. Studies of its transient behavior in response to a step-like pulse have been reported in the literature. See the Stefanucci article and E. Perfetto et al., Phys. Rev. B 82, 035446 (2010) and Y. O. Klymenko et al., Eur. Phys. J. B 69, 383-388 (2009), which are both incorporated herein by reference in their entirety.

[0044] The algorithm of the present invention was tested on a gated graphene nanoribbon at room temperature using the tight-binding (TB) Hamiltonian given by

[00009] $\begin{matrix} \hat{H} = - h_{0} .Math. \underset{< i, j >}{.Math.} .Math. {\hat{c}}_{i}^{†} .Math. {\hat{c}}_{j} - q .Math. \underset{i}{.Math.} .Math. .Math. [V_{1} .Math. θ (t) + V_{g .Math. .Math. 1 .Math. i} + V_{g .Math. .Math. 2 .Math. .Math. i}] .Math. .Math. {\hat{c}}_{i}^{†} .Math. {\hat{c}}_{j}, & (8) \end{matrix}$

[0045] where ĉ.sub.i.sup.†(ĉ.sub.i) is the creation (annihilation) operator at site i and h.sub.0=2.7 eV being the nearest hopping constant. Here V(x)=V.sub.L+(V.sub.R−V.sub.L)×/L is the potential landscape due to the external bias with V.sub.R=−V.sub.L=0.54V and V.sub.g1 and V.sub.g2 being the gate voltages in regions S.sub.1 and S.sub.2, respectively.

[0046] First it is confirmed that the transient current calculated using the new method is the same as that of the Zhang reference. Using 30 layers of CAP, the transmission coefficient versus the energy was calculated and it showed good agreement with the exact solution. FIG. 5 provides a comparison for a graphene nanoribbon with N=10,000. In FIG. 5 the solid line (T.sub.S) is the exact numerical result using self-energy of the lead and the dashed line (T.sub.CAP) is from CAP. This also ensures the correct steady state current. For the transient current, excellent agreement is also obtained between the algorithm of the present invention and that of the Zhang reference (See FIG. 2). It should be noted that in the presence of gates, the “on-off” time of graphene is shortened in comparison to un-gated graphene which has a long oscillating current as reported in the Stefannucci article. Also the transient current is calculated for a graphene nanoribbon with size of N=10.sup.4 and T=20,000 with each time step ifs. See FIG. 5.

[0047] Now testing of the scaling of the algorithm of the present invention can be achieved by calculating the transient current for nanoribbons with different system sizes ranging from 600 to 10,200 atoms. The first test is with algorithm I. The computational time for the transient current for 3 time steps compared against system sizes N is shown in FIG. 2. In FIG. 5 the data is fitted using 50N.sup.3+TN.sup.2 with very good agreement showing TN.sup.2 scaling for the time-dependent part. For comparison, FIG. 5 also shows a plot of the computation time using the method in the Zhang reference. It was found that the number of energy points N.sub.E depends on the spectrum of resonant states of the system. For graphene nanoribbons with 600 atoms, N.sub.E=6000 was used to converge the integral over a Fermi function. FIG. 2 shows that a speed up factor of 1000T is achieved at N=2400. The scaling is shown in FIG. 3, from which it can be seen that for T<N the computational time is almost independent of the number of time steps.

[0048] Now algorithm II, which reduces the scaling TN.sup.2 further, can be examined Notice that the scaling TN.sup.2 comes from matrix multiplication involving Vandermonde matrix V.sup.tM.sub.1. The fast algorithm is available to speed up the calculation involving a structured matrix, such as the Vandermonde matrix. As discussed in detail below the FMM as in the Rokhlin and Song articles and FFT can be used to carry out the same matrix multiplication using only c.sub.3N.sup.2 log.sub.2N operations provided that T<N.sup.2. Here the coefficient c.sub.3 is a large constant that depends only on the tolerance of the calculation τ and the setup of FMM. The theoretical estimate of this coefficient is about 40 log.sub.2(1/τ) where τ is the tolerance in the FMM calculation, which used τ=10.sup.−4. See N. Yarvin et al., Anal. 36, 629 (1999), which is incorporated herein by reference in its entirety. When implementing FMM, this coefficient is in general larger than the theoretical one.

[0049] To test algorithm II, the transient current is calculated for N=10.sup.4 and T=10.sup.8 as explained in detail below, using FMM and FFT. Denote t.sub.1 the CPU time needed for the spacial calculation (50N.sup.3), t.sub.2 is the time needed for the temporal part (matrix multiplication in Eq. (7)) using, e.g., a Xeon X5650 workstation with 12 cores and a frequency of 2.67 GHz. The result t.sub.1=3500 s is obtained using 12 cores and t.sub.1=33800 s using a single core so the efficiency of multithreading is about 80%. For an FMM calculation, multithreading could be very inefficient so a single core has been used to perform the calculation. It was found t.sub.2=3400 s for T=10.sup.8 using a single core, as shown below in the detailed computational analysis and numerical calculation. It was found that for T=10.sup.8 the time spent in the time dependent part is about one tenth of the time of the independent calculation. This confirms that the method of the present invention uses an algorithm of order O(1) as long as T<N.sup.2. It should be pointed out that algorithm II is directed to the calculation of the transient current l(t) with time steps T=N.sup.2 at one shot with scaling N.sup.2 log.sub.2N. This scaling remains if l(t) with the number of time step less than N.sup.2 is desired.

[0050] FIG. 2 shows the scaling of N against computation time at T=3 (each time step is 1 fs). The fitted curve, which is in the form of 50N.sup.3+TN.sup.2, is in good agreement with the calculated results The Y-axis is on the right. In order compare the present invention to the performance reported in the Zhang reference, 6000 energy points were used for integration using the Y-axis on the left.

[0051] FIG. 3 illustrates the scaling of t against CPU time for N=10,200 (t=100 fs corresponds to T=100) using algorithm I. The left hand side uses an exponential scale in t and the right hand side uses a linear scale in t. FIG. 3 shows that at extremely large data points, e.g., ranges over 10 thousands points, the computational time is proportional to T.

[0052] Since the algorithm of the present invention is based on the NEGF-CAP formalism, it can be extended to the NEGF-DFT-CAP formalism which performs the first principles calculation. In fact, the NEGF-DFT-CAP method has already been successfully implemented in the first principles transient current calculation as shown in the Zhang reference, which gives exactly the same result as that of NEGF-DFT. With the fast algorithm at hand, many applications can be envisaged. For instance, the transient spin current (related to spin transfer torque) using the NEGF-DFT-CAP formalism has been carried out for planar structures where k-sampling in the first Brillouin zone is needed. It is straightforward to include k-sampling in the method of the present invention. It is also possible to extend this method to the case where electron-phonon interaction in the Born approximation as well as other dephasing mechanisms are present. Finally, first principles transient photo-induced current on two dimensional layered materials can be calculated using the method of the present invention.

[0053] Some of the details of the calculations presented above are given here.

Pade Approximant

[0054] Brute force integration over the Fermi function along the real energy axis to obtain G.sup.<(t,t) may need thousands of energy points to converge, which is very inefficient. To obtain an accurate result while reducing the cost, fast converging Pade spectrum decomposition (PSD) is used for the Fermi function f in Eq. (4) above, so that the residue theorem can be applied. Using [n−1/n] PSD scheme with the Pade approximant accurate up to O((ε/kT).sup.4n−1), Fermi function f can be expressed as

[00010] $\begin{matrix} f (.Math.) = \frac{1}{2} - {.Math.}_{j = 1}^{n} .Math. .Math. \frac{2 .Math. .Math. η_{j} .Math. β.Math.}{{(β.Math.)}^{2} + ξ_{j}^{2}}, & (9) \end{matrix}$

[0055] where and ζ.sub.j and n.sub.j are two set of constants that can be calculated easily. Using the PSD scheme analytic form of G.sup.< in Eq. (4) can be obtained using the residue theorem. See J. Hu et al., J. Chem. Phys. 133, 101106 (2010), which is incorporated herein by reference in its entirety.

Calculation of the Spectral Function

[0056] The terms and Ĝ.sup.r(ε) and G.sup.r(ε), the equilibrium and non-equilibrium retarded

[0057] Green's functions, respectively, can be expressed in terms of their eigen-functions by solving the following eigen-equations for H.sub.eq and H.sub.neq, i.e.,

(H.sub.eq−iW)ψ.sub.n.sup.0=ε.sub.n.sup.0ψ.sub.n.sup.0,

(H.sub.eq+iW.sup.†)φ.sub.n.sup.0=ε.sub.n.sup.0φ.sub.n.sup.0, (10)

[0058] where

[00011] $W = \underset{α}{.Math.} .Math. .Math. W_{α}$

and similar equations can be defined for H.sub.neq. See the Zhang reference. Using the eigen-functions of H.sub.eq−iW and H.sub.neq−iW, we have

[00012] $\begin{matrix} {\tilde{G}}^{r} (ε) = {[ε - H_{eq} + iW]}^{- 1} = \underset{n}{.Math.} .Math. .Math. \frac{.Math. ψ_{n}^{0} 〉 .Math. 〈 φ_{n}^{0} .Math.}{(ε - ε_{n}^{0} + i .Math. .Math. 0^{+})}, & (11) \\ {\overline{G}}^{r} (ε) = {[ε - H_{neq} + iW]}^{- 1} = \underset{n}{.Math.} .Math. .Math. \frac{.Math. ψ_{n} 〉 .Math. 〈 φ_{n} .Math.}{(ε - ε_{n} + i .Math. .Math. 0^{+})} . & (12) \end{matrix}$

[0059] Performing an integral over ω using the residue theorem, the analytic solution of A.sub.α is obtained

[00013] $\begin{matrix} A_{α} (ε, t) = \underset{n}{.Math.} .Math. .Math. \frac{.Math. ψ_{n} 〉 .Math. 〈 φ_{n} .Math.}{ε + Δ_{α} - ε_{n} + i .Math. .Math. 0^{+}} + \underset{n}{.Math.} .Math. .Math. \frac{e^{{i (ε + Δ_{α} - ε_{n})}^{t}} .Math. .Math. ψ_{n} 〉 .Math. 〈 φ_{n} .Math.}{ε - ε_{n} + i .Math. .Math. 0^{+}} \times .Math. [\frac{Δ_{α}}{ε + Δ_{α} - ε_{n} + i .Math. .Math. 0^{+}} - Δ .Math. \underset{l}{.Math.} .Math. \frac{.Math. ψ_{l}^{0} 〉 .Math. 〈 φ_{l}^{0} .Math.}{(ε - ε_{l}^{0} + i .Math. .Math. 0^{+})}], & (13) \end{matrix}$

[0060] where Δ=H.sub.neq−H.sub.eq.

Calculation of the Lesser Green's Function

[0061] In Eq. (5) using residue theorem the involved terms are defined as

[00014] $.Math. {\overline{B}}_{2} = - 4 .Math. π^{2} [B_{2} (ω, ω^{'}) .Math. (ω - ε_{n}) .Math. (ω^{'} - ε_{m}^{*})] .Math. _{ω = ε_{n}, ω^{'} = ε_{m}^{*}} .Math. .Math. .Math. {\overline{B}}_{3 .Math. α} = - 2 .Math. π .Math. .Math. {iA}_{1 .Math. α} ({\tilde{ε}}_{l}) .Math. W_{α} [A_{2 .Math. α}^{†} (ω^{'}, {\tilde{ε}}_{l}) .Math. (ω^{'} - ε_{m}^{*})] .Math. _{ω^{'} = ε_{m}^{*}} .Math. .Math. {\overline{B}}_{4 .Math. α} = - 4 .Math. π^{2} [A_{1 .Math. α} (ε) .Math. W_{α} .Math. A_{2 .Math. α}^{†} (ω^{'}, ε) .Math. (ω^{'} - ε_{m}^{*}) \times (ε - ε_{n} + Δ_{α})] .Math. _{ε = ε_{n} - Δ_{α}, ω^{'} = ε_{m}^{*}} .Math. .Math. \overline{f} = 2 .Math. π .Math. .Math. i (f (ε) .Math. (ε - {\tilde{ε}}_{l})) .Math. _{ε = {\overline{ε}}_{l}}$

Calculation of the Transient Current

[0062] Starting from Eq. (1) and in analogue to Eq. (6), the expressions of the current in Eq. (7) can be obtained as follows:

[00015] $I_{0 .Math. L} (t_{j}) = 2 .Math. Re .Math. .Math. Tr [\frac{i}{π} .Math. {\overline{Γ}}_{L} .Math. H_{neq} .Math. B_{1} .Math. {\overline{Γ}}_{L}]$ $M_{1} = Re .Math. .Math. Tr [\frac{i}{π} .Math. {\overline{Γ}}_{L} (2 .Math. H_{neq} - (ε_{n} - ε_{m}^{*})) .Math. ({\overline{B}}_{2} + \underset{α}{.Math.} .Math. .Math. f (ε_{n} - Δ_{α}) .Math. {\overline{B}}_{4 .Math. α}) .Math. {\overline{Γ}}_{L}]$ $M_{2 .Math. α} = Re .Math. .Math. Tr [\frac{i}{π} .Math. {\overline{Γ}}_{L} (2 .Math. H_{neq} - ({\overline{ε}}_{l} - ε_{m}^{*} + Δ_{o})) .Math. (f (ε_{m}^{*}) .Math. {\overline{B}}_{3 .Math. α}) .Math. {\overline{Γ}}_{L}] .$

[0063] The expression of transient current I.sub.R(t) is similar to Eq. (7).

Transient Current for a Araphene Nanoribbon of System Size N=10.SUP.4

[0064] A calculation was performed on transient current through a zigzag graphene nanoribbon of 10,000 atoms with T=20,000 time steps (each time step is 1 fs). The width of the system is two unit cells (16 atoms) while the length of the system is 625 unit cells. Two gate voltages of 2.2V were applied so that the system is in the tunneling regime. The bias voltage is v.sub.L=−v.sub.R=0.5 V. From FIG. 4, it can be seen that a typical behavior of transient current with the current shooting up initially and then decreasing oscillations to the long time limit (the numerical results using FMM show that 100 ps is needed to reach the dc limit, which is shown as a dashed line). The oscillatory behavior is due to resonant states in the system. This test was at a temperature of 300° K. The inset is the long time behavior between 17 ps to 18 ps.

[0065] FIG. 5 shows the transmission coefficient of the zigzag graphene nanoribbon for a system of 10,000 atoms. The solid line (T.sub.S) is the exact numerical result using self-energy of the lead and the dashed line (T.sub.CAP) is from CAP. Thus, FIG. 5 shows a comparison between the exact numerical result of the transmission coefficient using the self-energy of the lead and that of CAP. There is very close agreement.

Fast Multipole Method

[0066] The fast multipole method has been widely used and has been ranked top 10 best algorithms in 20th Century. See, V. Rokhlin, J. Comput. Phys. 60, 187-207 (1985); J. Song, C. C. Lu, and W. C. Chew, IEEE trans. Antennas Propagat. 45, 1488-1493 (1997); and B. A. Cipra, SIAM News, 33(4), 2 (2000), which are incorporated herein by reference in their entirety. It is extremely efficient for large N. The following quantity is then calculated:

[00016] $\begin{matrix} I (t) = \underset{n, m}{.Math.} .Math. .Math. \exp (- i .Math. .Math. {.Math.}_{n} .Math. t) .Math. M_{nm} .Math. \exp (i .Math. .Math. {.Math.}_{m}^{*} .Math. t), & (14) \end{matrix}$

[0067] where the matrix M can be expressed in terms of vectors as M=(c.sub.0,c.sub.1, . . . ,c.sub.N−1) and V.sub.nj=exp(−iε.sub.nt.sub.j) is a Vandermonde matrix with t.sub.j=jdt and j=1,2, . . . T. Eq. (14) is of the form V.sup.tMV* where t stands for transpose. In the following, an outline of how to calculate V.sup.tc where c is a vector of N components is given.

[0068] Setting a.sub.j=exp(−iε.sub.jdt) and denoting T the number of time steps. Then b=V.sup.tc is equivalent to

[00017] $b_{n} = {.Math.}_{j = 0}^{N - 1} .Math. .Math. {c_{j} (a_{j})}^{n} .$

A direct computation shows that the entries of b=V.sup.tc are the first T coefficients of the Taylor expansion of

[00018] $\begin{matrix} S (x) = {.Math.}_{j = 0}^{N - 1} .Math. \frac{c_{j}}{1 - a_{j} .Math. x} = {.Math.}_{n}^{\infty} .Math. .Math. {.Math.}_{j = 0}^{N - 1} .Math. {c_{j} (a_{j} .Math. x)}^{n} = \underset{n}{.Math.} .Math. .Math. b_{n} .Math. x^{n}, & (15) \end{matrix}$

[0069] where b.sub.n=Σ.sub.j=0.sup.N−1 c.sub.j(a.sub.j).sup.n. Denoting S(x)=Σ.sub.m=0.sup.T−1 b.sub.nx.sup.m and setting x=ω.sub.T.sup.l with ω.sub.T=exp(i2π/T) it can be used to calculate .sup.−S(ω.sub.T.sup.l) which is the Fourier transform of b.sub.n,

[00019] $\begin{matrix} S (ω_{T}^{l}) = .Math. {.Math.}_{j = 0}^{N - 1} .Math. .Math. {.Math.}_{n = 0}^{T - 1} .Math. .Math. c_{j} .Math. a_{j}^{n} .Math. ω_{T}^{nl} \\ = .Math. {.Math.}_{j = 0}^{N - 1} .Math. .Math. c_{j} .Math. \frac{1 - {(a_{j} .Math. ω_{T}^{l})}^{T}}{1 - a_{j} .Math. ω_{T}^{l}} \\ = .Math. ω_{T}^{- l} .Math. {.Math.}_{j = 0}^{N - 1} .Math. .Math. \frac{c_{j} (1 - a_{j}^{T})}{{(1 / ω_{T})}^{l} - a_{j}}, \end{matrix}$

[0070] where ω.sub.T.sup.T=1 is used. Note that the fast multipole method (FMM) aims to calculate

[00020] $v_{l} = \underset{j}{.Math.} .Math. .Math. c_{j} / (x_{l} - a_{j})$

with O(N) operations instead of N.sup.2 operations. Hence S(ω.sub.T.sup.l) can be obtained using FMM, from which b.sub.n can be calculated using FFT.

[0071] Now the computational complexity for T≦N can be estimated. For FMM the value κ.sub.1max(T,N) operations are needed where κ.sub.1 is about 40 log.sub.2(1/τ) with τ the tolerance. See N. Yarvin and V. Rokhlin, SIAM J. Numer. Anal. 36, 629 (1999), which is incorporated herein by reference in its entirety. For FFT the computational complexity is at most κ.sub.2N log.sub.2N, where κ.sub.2 is a coefficient for FFT calculation. To compute V.sup.tM where M has N vectors, V.sup.tc is calculated N times. Hence the total computational complexity is κ.sub.1N.sup.2+κ.sub.2N.sup.2 log.sub.2N. This algorithm is denoted as algorithm IIa while the algorithm for T<N.sup.2 discussed below is denoted as algorithm IIb.

[0072] For very large T up to T=N.sup.2 (if N=10.sup.4 and T=10.sup.8), it can be shown that the computational complexity is κ.sub.1N.sup.2+2κ.sub.2N.sup.2 log.sub.2N. In fact, it is easy to see that I(t.sub.j) defined in Eq. (6) is the first T coefficients of the Taylor expansion of

[00021] $\begin{matrix} \begin{matrix} S (x) = .Math. {.Math.}_{n, m = 0}^{N - 1} .Math. .Math. \frac{M_{nm}}{1 - a_{n} .Math. a_{m}^{*} .Math. x} \\ = .Math. {.Math.}_{j}^{\infty} .Math. .Math. {.Math.}_{n, m = 0}^{N - 1} .Math. .Math. {M_{nm} (a_{n} .Math. a_{m}^{*})}^{i} .Math. x^{j} (17) \\ = .Math. \underset{j}{.Math.} .Math. .Math. I (t_{j}); \end{matrix} & (16) \end{matrix}$

[0073] where a.sub.n=exp(−ε.sub.ndt). Now two new vectors u and d can be defined which have N.sup.2 components with u.sup.t=(c.sub.0.sup.t,c.sub.1.sup.t, . . . ,c.sub.N−1.sup.t) (recall the definition M=(c.sub.0,c.sub.1, . . . ,c.sub.N−1)) and d.sup.t=(a.sub.0*a.sup.t,a.sub.1*a.sup.t, . . . ,a.sub.N−1*a.sup.t), where once again t stands for transpose. With the new vectors defined, S(x) in Eq. (16) is expressed as

[00022] $\begin{matrix} S (x) = {.Math.}_{j = 0}^{N^{2} - 1} .Math. .Math. \frac{u_{j}}{1 - d_{j} .Math. x}, & (18) \end{matrix}$

[0074] which is exactly the same form as Eq. (15). The only difference is that c and a in Eq. (15) have N components and S has to be calculated N times while u and d in Eq. (18) have N.sup.2 components and S can be calculated according to Eq. (18) just once. Therefore the computational complexity is κ.sub.1N.sup.2+κ.sub.2N.sup.2 log.sub.2N.sup.2. If T=nN with n=1,2, . . . N, it is not difficult to show that the computational complexity is κ.sub.1TN/n+κ.sub.2T(N/n) log.sub.2(nN)=κ.sub.1N.sup.2+κ.sub.2N.sup.2 log.sub.2(nN).

[0075] To summarize, the computational complexity of Eq. (14) is κ.sub.1N.sup.2+2κ.sub.2N.sup.2 log.sub.2N for T<N.sup.2. It is easy to show that for T>N.sup.2 the scaling is κ.sub.1N.sup.2+2κ.sub.2T log.sub.2N. However, for large T, the physics comes into play. Since a.sub.j=exp(−ε.sub.jdt) with ε.sub.j the energy of resonant state, a.sub.j.sup.T quickly decays to zero before T=N.sup.2 and hence no need to go up for T>N.sup.2.

[0076] Algorithm II was tested numerically for a system with N=10.sup.4 and T=10.sup.8. The configuration of the system is the same as that which appears in FIG. 1, except the width W of the system is now 17 times wider with a total of 10200˜10.sup.4 atoms. Each time step is 0.012 fs. The computed transient current using Algorithm I and II are shown in FIG. 8. The purpose of this calculation is to test the computational complexity only. All that needs to be done is to compute,

[00023] $\begin{matrix} ^{-} .Math. S (ω_{T}^{l}) = ω_{T}^{- l} .Math. {.Math.}_{j = 0}^{N^{2} - 1} .Math. .Math. \frac{u_{j} (1 - d_{j}^{T})}{{(1 / ω_{T})}^{l} - d_{j}}, & (19) \end{matrix}$

[0077] using FMM and then taking FFT to obtain I(t.sub.j) where u.sub.j and d.sub.j have been defined just before Eq. (18). Note that u.sub.j has been obtained in the time independent calculation. If (1/ω.sub.T).sup.j and d.sub.j in Eq. (19) are uniformly distributed on the complex plane, the FMM can be done much faster. However, as shown in FIG. 6 and FIG. 7, the unit distribution of (1/ω.sub.T).sup.j and d.sub.j are highly non-uniform in this case. Actually, (1/ω.sub.T).sup.j are distributed non-uniformly along the circle (FIG. 6) while d are distributed in a sector of unit circle (FIG. 7). This makes the calculation more difficult. For N=10.sup.4 and T=N.sup.8, it was determined that the optimum number of levels in FMM is 10. With 10 levels in FMM, over 60% of the CPU time was spent on the direct sum in the FMM calculation. Using a single core, it took 3,400 s to finish the time dependent part of calculation and about 90% of the CPU time was spent on FMM. For comparison, it took 33,800 s to perform the time independent calculation. This shows that the algorithm II is of order O(1) if T<N.sup.2.

[0078] FIG. 6 shows the testing of the algorithm IIa, which is suitable for T=N and the algorithm IIb designed for T=N.sup.2, against the algorithm I For T<N, the results of the algorithm I, the algorithm IIa and the algorithm IIb are on top of each other. For T>N, the calculation was done for T=N.sup.2. FIG. 6, only shows the results for T<40,000. There is no significant feature in the transient current plot beyond that.

[0079] FIG. 7 shows the distribution of d.sub.j on the complex plane.

[0080] FIG. 8 shows the transient current calculated by Algorithm I, Algorithm IIa and Algorithm IIb, IIa and IIb refer to the cases with FMM methods targeting T=N and T=N.sup.2, respectively.

[0081] In summary, the exact solution of the algorithm for the transient current always contains time dependent parts. According to the present invention the expression for the current is separated so that, e.g.: for a function f=f(t,x) that depends on time t and space x, it is separated into two part so that f=g(t)h(x). Thus, h(x) can be calculated first, which is a very complicated function involves many multiplications. Eventually, for any time t, the h(x) only needs to be computed once.

[0082] In addition, the major difference between Algorithm I and II is that for the expression f=g(t)h(x), in algorithm I the multiplication is performed directly; but, for algorithm II, FFT and FMM are adopted to further speed up the multiplication between g(t) and h(x).

[0083] While the present invention has been particularly shown and described with reference to preferred embodiments thereof; it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

ORDER O(1) ALGORITHM FOR FIRST-PRINCIPLES CALCULATION OF TRANSIENT CURRENT THROUGH OPEN QUANTUM SYSTEMS

Inventors

Cpc classification

Classification Explorer

B82Y10/00

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

G01R19/04

PHYSICS

Classification Explorer

Y10S977/842

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

G01R19/2506

PHYSICS

Classification Explorer

G01R31/31702

PHYSICS

Classification Explorer

G01R19/0053

PHYSICS

Classification Explorer

B82Y35/00

PERFORMING OPERATIONS; TRANSPORTING

Classification Explorer

Y10S977/734

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

Y10S977/936

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

Classification Explorer

Y10S977/88

GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS

International classification

Classification Explorer

G01R19/25

PHYSICS

Classification Explorer

G01R19/00

PHYSICS

Classification Explorer

G01R19/04

PHYSICS

Abstract

Claims

Description