OPTIMAL METASTABILITY-CONTAINING SORTING VIA PARALLEL PREFIX COMPUTATION
20210349687 · 2021-11-11
Inventors
Cpc classification
G06F7/78
PHYSICS
G06F7/24
PHYSICS
International classification
G06F7/24
PHYSICS
Abstract
In order to provide smaller, faster and less error-prone circuits for sorting possibly metastable inputs, a novel sorting circuit is provided. According to the invention, the circuit is metastability-containing.
Claims
1. A sorting circuit, characterized in that the circuit is metastability-containing.
2. The circuit of claim 1, comprising one or more sub circuits for comparing each prefix pair of at least two input strings.
3. The circuit of claim 2, further comprising a sub circuit for inferring the output bits, based on the result of the comparison.
4. The circuit of claim 2, wherein the input strings are Gray coded.
5. The circuit of claim 4, wherein the Gray code is a binary reflected Gray code.
6. The circuit of claim 1, wherein the sorting circuit is a 2-sort circuit for sorting two input strings.
7. The circuit of claim 1, wherein the sorting circuit is a sorting network for sorting n strings.
8. The circuit of claim 1, wherein a size of the sorting circuit is within the order of the size of the input strings O(B).
9. The circuit of claim 1, wherein buffers are used to bound the fan-out.
10. The circuit of claim 9, wherein the number of buffers used is twice the length (B) of an input string.
11. The circuit of claim 9, wherein a fan-out of the sorting circuit is constant.
12. The circuit of claim 1, wherein a depth of the sorting circuit is within the order of ┌log B┐, wherein B is the number of bits in an input string.
13. A transistor-level implementation of the circuit of claim 1.
Description
[0011]
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
DETAILED DESCRIPTION
[0023] We set [N]:={0, . . . , N−1} for N∈ and [i, j]={i, i+1, . . . , j} for i, j∈
, i≤j. We denote
:={0,1} and
.sub.M:={0,1, M}. For a B-bit string g∈
.sub.M and i∈[1,B], denote by g.sub.i its i-th bit, i.e., g=g.sub.1g.sub.2 . . . g.sub.B. We use the shorthand g.sub.i,j:=g.sub.i . . . g.sub.j, where i, j∈[1,B] and i≤j. Let par(g) denote the parity of g∈
.sup.B, i.e, par(g)=Σ.sub.i=1.sup.Bg.sub.i mod 2. For a function f and a set A, we abbreviate f(A):={f(y)|y∈A}.
[0024] A standard binary representation of inputs is unsuitable: uncertainty of the input values may be arbitrarily amplified by the encoding. E.g. representing a value unknown to be 11 or 12, which are encoded as 1011 resp. 1100, would result in the bit string 1MMM, i.e., a string that is metastable in every position that differs for both strings. However, 1MMM may represent any number in the interval from 8 to 15, amplifying the initial uncertainty of being in the interval from 11 to 12. An encoding that does not lose precision for consecutive values is Gray code.
[0025] A B-bit binary reflected Gray code, rg.sub.B:[N].fwdarw., is defined recursively. For simplicity (and without loss of generality) we set N:=2.sup.B. A 1-bit code is given by rg.sub.1(0)=0 and rg.sub.1(1)=1. For B>1, we start with the first bit fixed to 0 and counting with rg.sub.B−1(⋅) (for the first 2.sup.B−1 codewords), then toggle the first bit to 1, and finally “count down” rg.sub.B−1(⋅) while fixing the first bit again, cf. Table 1. Formally, this yields for x∈[N]
[0026] As each B-bit string is a codeword, the code is a bijection and the encoding function also defines the decoding function. Denote by ⋅
:
.sup.B.fwdarw.[N] the decoding function of a Gray code string, i.e., for x∈[N],
rg.sub.B(x)
=x.
[0027] For two binary reflected Gray code strings g, h∈.sup.B, we define their maximum and minimum as
[0028] For example:
max.sup.rg{0011,0100}=max.sup.rg{rg.sub.B(2),rg.sub.B(7)}=0100,
min.sup.rg{0111,0101}=min.sup.rg{rg.sub.B(9),rg.sub.B(10)}=0111.
[0029] Inputs to the sorting circuit may have some metastable bits, which means that the respective signals behave out-of-spec from the perspective of Boolean logic. However, they are valid strings in the sense of the invention. Valid strings have at most one metastable bit. If this bit resolves to either 0 or 1, the resulting string encodes either x or x+1 for some x, cf. Table 2.
[0030] More formally, if B∈ and N=2.sup.B, the set of valid strings of length B is defined as
[0031] The operator * is called the superposition and is defined as
[0032] The specification of max.sup.rg and min.sup.rg may be extended to valid strings in the above sense by taking all possible resolutions of metastable bits into account. More particularly, in order to extend the specification of max.sup.rg and min.sup.rg to valid strings, the metastable closure (Stephan Friedrichs, Matthias Függer, and Christoph Lenzen. Metastability-Containing Circuits. Transactions on Computers, 67, 2018) is used. The metastable closure of an operator on binary inputs extends it to inputs that may contain metastable bits, by considering all possible stable resolutions of the inputs, applying the operator and taking the superposition of the results.
[0033] The closure is the best one can achieve w.r.t. containing metastability with clocked logic using standard registers, i.e., when f.sub.M(x).sub.i=M, no such implementation can guarantee that the i.sup.th output stabilizes in a timely fashion.
[0034] If one wants to construct a circuit computing the maximum and minimum of two valid strings, allowing to build sorting networks for valid strings, one also needs to answer the question what it means to ask for the maximum or minimum of valid strings. To this end, suppose a valid string is rg.sub.B(x)*rg.sub.B(x+1) for some x∈[N−1], i.e., the string contains a metastable bit that makes it uncertain whether the represented value is x or x+1. If one waits for metastability to resolve, the string will stabilize to either rg.sub.B(x) or rg.sub.B(x+1). Accordingly, it makes sense to consider rg.sub.B(x)*rg.sub.B(x+1) “in between” rg.sub.B(x) and rg.sub.B(x+1), resulting in the following total order on valid strings (cf. Table 2).
[0035] Definition (<). A total order < is defined on valid strings as follows. For g, h∈.sup.B, g<h⇔
g
<
h
. For each x∈[N−1], we define rg.sub.B(x)<rg.sub.B(x)*rg.sub.B(x+1)<rg.sub.B(x+1). We extend the resulting relation on S.sub.rg.sup.B×S.sub.rg.sup.B to a total order by taking the transitive closure. Note that this also defines ≤, via g≤h⇔(g=h∨g<h).
[0036] We intend to sort with respect to this order. It turns out that implementing a 2-sort circuit w.r.t. this order amounts to implementing the metastable closure of max.sup.rg and min.sup.rg. In other words, max.sub.M.sup.rg and min.sub.M.sup.rg are the max and min operators w.r.t. the total order on valid strings shown in Table 2, e.g.,
max.sub.M.sup.rg{1001,1000}=rg.sub.4(15)=1000,
max.sub.M.sup.rg{0M10,0010}=rg.sub.4(3)*rg.sub.4(4)=0M10, and
max.sub.M.sup.rg{0M10,0110}=rg.sub.4(4)=0110.
[0037] Hence, our task is to implement max.sub.M.sup.rg and min.sub.M.sup.rg.
[0038] Definition (2-sort(B)). For B∈, a 2-sort(B) circuit is specified as follows.
[0039] Input: g, h∈S.sub.rg.sup.B
[0040] Output: g′, h′∈S.sub.rg.sup.B
[0041] Functionality: g′=max.sub.M.sup.rg{g, h}, h′=min.sub.M.sup.rg{g, h}.
[0042]
[0043] The invention seeks to use standard components and combinational logic only. In particular, the behavior of basic gates on metastable inputs may be specified via the metastable closure of their behavior on binary inputs, cf. Table 3, using the standard notational convention that a+b=OR.sub.M (a, b) and ab=AND.sub.M(a, b). In this logic, most familiar identities hold: AND and OR are associative, commutative, and distributive, and DeMorgan's laws hold. However, naturally the law of the excluded middle becomes void. For instance, in general, OR(x,
[0044] It can be shown that the basic CMOS gates shown in
[0045]
[0046] More particularly,
[0047] Because the parity keeps track of whether the remaining bits are compared w.r.t. the standard or “reflected” order, the state machine performs the comparison correctly w.r.t. the meaning of the states indicted in
[0048] For all i∈[1,B], we have that max
[0049] In order to extend this approach to potentially metastable inputs, all involved operators are replaced by their metastable closure: for i∈[1, B] (i) compute s.sup.(i), (ii) determine max.sup.rg{g, h}.sub.i and min.sup.rg{g, h}.sub.i according to Table 4, and finally (iii) exploit associativity of the operator computing the state s(i) in the PPC framework. Thus, we only need to implement ⋄.sub.M and the out.sub.M (both of constant size), plug them into the framework, and immediately obtain an efficient circuit.
[0050] The reader may raise the question why we compute s.sub.M.sup.(1) for all i∈[09,B−1] instead of computing only s.sub.M.sup.(B)) with a simple tree of ⋄.sub.M elements, which would yield a smaller circuit. Since s.sub.M.sup.(B) is the result of the comparison of the entire strings, it could be used to compute all outputs, i.e., we could compute the output by out(s.sub.M.sup.(B), g.sub.ih.sub.i) instead of out(s.sup.(i−1).sup.
[0051] While it is not obvious that this approach yields correct outputs, it may be formally proven that: (P1) ⋄.sub.M is associative. (P2) repeated application of ⋄.sub.M computes s.sub.M.sup.(i). (P3) applying out.sub.M to s.sub.M.sup.(i−1) and g.sub.ih.sub.i results for all valid strings in max.sub.M.sup.rg{g, h}.sub.i min.sub.M.sup.rg{g, h}.sub.i. This yields the desired correctness. Regarding the first point, we note the statement that ⋄.sub.M is associative does not depend on B. In other words, it can be verified by checking for all possible x, y, z∈B.sub.;M.sup.2 whether (x⋄.sub.My)⋄.sub.Mz=x⋄.sub.M(y⋄.sub.Mz).
[0052] While it is tractable to manually verify all 3.sup.6=729 cases (exploiting various symmetries and other properties of the operator), it is tedious and prone to errors. Instead, it was verified that both evaluation orders result in the same outcome by a short computer program, proving the desired associativity of the operator.
[0053] For the convenience of the reader, Table 6 gives the truth table of ⋄.sub.M. It can be shown that repeated application of this operator to the input pairs g.sub.jh.sub.jj∈[1, i], actually results in s.sub.M.sup.(i). This is closely related to the elegant recursive structure of Binary Reflected Gray Code, leading to the important observation that if in a valid string there is a metastable bit at position m, then the remaining B−m following bits are the maximum code word of a (B−m)-bit code.
[0054] It may be observed that for g∈S.sub.rg.sup.B, if there is an index 1≤m<B such that g.sub.m=M then g.sub.m+1,B=10.sup.B−m−1.
[0055] The reasoning is based on distinguishing two main cases: one is that s.sub.M.sup.(i) contains at most one metastable bit, the other that s.sub.M.sup.(i′)MM. Each of these cases can be proven by technical statements.
[0056] It may further be observed that if |res(s.sub.M.sup.(i))|≤2 for any i∈[B+1], then res(s.sub.m.sup.(i))=⋄.sub.j=1.sup.ires(g.sub.jh.sub.j).
[0057] The operator out: B.sup.2×B.sup.2.fwdarw.B.sup.2 is the operator given in Table 4 computing max.sup.rg{g, h}.sub.i min.sup.rg{g, h}.sub.i out of s.sup.(i−1) and g.sub.ih.sub.i. For convenience of the reader, we provide the truth table of out.sub.M in Table 7.
[0058]
[0059] In order to derive a small circuit from the above, we make use of the PPC framework by Ladner and Fischer. They described a generic method that is applicable to any finite state machine translating a sequence of B input symbols to B output symbols, to obtain circuits of size O(B) and depth O(log B). They reduce the problem to a parallel prefix computation (PPC) task by observing that each input symbol defines a restricted transition function, whose compositions evaluated on the starting state yield the state of the machine after the corresponding number of steps. This matches our needs, as we need to determine s.sub.M.sup.(i) for each i∈[B]. However, their generic construction involves large constants. Fortunately, we have established that ⋄.sub.M:B.sub.M.sup.2×B.sub.M.sup.2 is an associative operator, permitting us to directly apply the circuit templates for associative operators they provide for computing s.sub.M.sup.(i)=(⋄.sub.M).sub.j=1.sup.ig.sub.jh.sub.j for all i∈[B]. Accordingly, only these templates are discussed.
[0060] We revisit the part of the framework relevant to our construction, also providing a minor improvement on their results in the process. To this end, we first formally specify the PPC task for the special case of associative operators.
[0061] Definition 5.1 (PPC.sub.⊕(B)). For associative ⊕: D×D.fwdarw.D and B∈N, a PPC.sub.⊕(B) circuit is specified as follows.
[0062] Input: d∈D.sup.B,
[0063] Output: π∈D.sup.B,
[0064] Functionality: π.sub.i=⊕.sub.j=1.sup.id.sub.j for all i∈[1, B].
[0065] In our case, ⊕=⋄.sub.M and D=B.sub.M.sup.2 the method by Ladner et al. provides a family of recursive constructions of PPC.sub.⊕ circuits. They are obtained by combining two different recursive patterns.
[0066]
[0067] More particularly, suppose that C and P are circuits implementing ⊕ and PPC.sub.⊕(┌B/2┐) for some B∈N, respectively. Then applying the recursive pattern given at the left of
[0068] The second recursive pattern, shown in
[0069] Definition. T.sub.0 is a single leaf. T.sub.1 consists of the (right) root and two attached leaves. For b≥2, T.sub.b can be constructed from T.sub.b−1 and T.sub.b−2 by taking a (right) root r, attaching the root of T.sub.b−1 as its right child, a new left node l as the left child of r, and then attaching the root of T.sub.b−2 as (only) child of l.
[0070] The recursive construction is now defined as follows. A right node applies the pattern given in
[0071] In the following, denote by PPC(C,T.sub.b) the circuit that results from applying the recursive construction described above to the base circuit C implementing ⊕. Moreover, we refer to the i.sup.th input and output of the subcircuit corresponding to node ν∈T.sub.b as d.sub.i.sup.ν and π.sub.i.sup.ν, respectively.
[0072] It may be shown that If C implements ⊕, PPC(C,T.sub.b) is a PPC.sub.⊕(2.sup.b)circuit and PPC(C,T.sub.b) has depth b.Math.d(C).
[0073] It remains to bound the size of the circuit. Denote by F.sub.i, i∈N, the i.sup.th Fibonacci number, i.e., F.sub.1=F.sub.2=1 and F.sub.i+1=F.sub.i+F.sub.i−1 for all 2≤i∈N. Then it may be shown that PPC(C,T.sub.b) has size (2.sup.b+2−F.sub.b+5+1)|C|.
[0074] Asymptotically, the subtractive term of F.sub.b+5 is negligible, as F.sub.b+5∈(1/√{square root over (5)}+0(1))((1+√{square root over (5)})/2).sup.b+5.Math.O(1.62.sup.b); however, unless B is large, the difference is substantial. We also get a simple upper bound for arbitrary values of B. To this end, we “split” in the recursion such that the left branch is “complete,” while applying the same splitting strategy on the right. This is where our construction differs from and improves on the method of Ladner et al. They perform a balanced split and obtain an upper bound of 4B on the circuit size.
[0075] It follows that for B∈N and circuit C implementing ⊕, set b:=┌log B┐. Then a PPC.sub.⊕(B) of depth ┌log B┐d(C) and size smaller than (5B−2.sup.b−F.sub.b+3)|C|≤(4B−F.sub.b+3) exists.
[0076] We remark that one can give more precise bounds by making case distinctions regarding the right recursion, which for the sake of brevity we omit here. Instead, we computed the exact numbers for B≤70.
[0077]
[0078] The construction derived from iterative application of the above results can be combined with PPC(C,T.sub.b), achieving the following trade-off; note that if B=2.sup.b for b∈N, then F.sub.┌log B┐−k+3 can be replaced by F.sub.b−k+5.
[0079] Suppose C implements ⊕. For all k∈[┌log B┐+1] and b∈N, there is a PPC.sub.⊕(B) circuit of depth (┌log B┐+k)d(C) and size at most
[0080]
[0081] The optimal depth construction incurs an excessively large fan-out of Θ(B), as the last output of left recursive calls needs to drive all the copies of C that combine it with each of the corresponding right call's outputs. This entails that, despite its lower depth, it will not result in circuits of smaller physical delay than simply recursively applying the construction from
[0082] We now modify the recursive construction to ensure a constant fan-out, at the expense of a limited increase in size of the circuit. The result is the first construction that has size O(B), optimal depth, and constant fan-out.
[0083] In the following, we denote by f≥3 the maximum fanout we are trying to achieve, where we assume that gates or memory cells providing the input to the circuit do not need to drive any other components. For simplicity, we consider C to be a single gate.
[0084] We proceed in two steps. First, we insert 2B buffers into the circuit, ensuring that the fan-out is bounded by 2 everywhere except at the gate providing the last output of each subcircuit corresponding to a right node. In the second step, we will resolve this by duplicating such gates sufficiently often, recursively propagating the changes down the tree. Neither of these changes will affect the output of the circuit or its depth, so the main challenges are to show our claim on the fan-out and bounding the size of the final circuit.
[0085] Step 1: Almost Bounding Fan-Out by 2
[0086] Before proceeding to the construction in detail, we need some structural insight on the circuit.
[0087] For node ν∈T.sub.b, define its range R.sub.υ and left-count a.sub.υ recursively as follows. [0088] If υ is the root, then R.sub.υ=[1,2.sup.b] and a.sub.υ=0. [0089] If υ is the left child of p with R.sub.p=[i, i+j], then R.sub.υ=[i, i+(j+1)/2] and a.sub.υ=a.sub.p. [0090] If υ is the right child of right node p with R.sub.p=[i, i+j], then R.sub.υ=[i+(j+1)/2+1, i+j] and a.sub.υ=a.sub.p. [0091] If υ is the right child of left node p, then R.sub.υ=R.sub.p and a.sub.υ=a.sub.p+1.
[0092] Suppose the subcircuit of PPC(C,T.sub.b) represented by node ν∈T.sub.b in depth d∈[b+1] has range R.sub.υ=[i, i+j]. [0093] Then [0094] (i) it has 2.sup.b−d inputs, [0095] (ii) j=2.sup.b−d+α.sup.
[0098] This leads to an alternative representation of the circuit PPC(C,T.sub.b), see
[0099] From this representation, we will derive that the following modifications of PPC(C,T.sub.b) result in a PPC.sub.⊕(2.sup.b) circuit PPC(C,T.sub.b)′, for which a fan-out larger than 2 exclusively occurs on the last outputs of subcircuits corresponding to nodes of T.sub.b. [0100] 1) Add a buffer on each wire connecting a non-root node of any of the aggregation trees to its corresponding subcircuit (see
[0103] With the exception of gates providing the last output of subcircuits corresponding to nodes of T.sub.b (blue in
[0104] It remains to count the inserted buffers. The following helper statement will be useful for this, but also later on.
[0105] Denote by L.sub.b.Math.T.sub.b the set of leaves of T.sub.b. Then |L.sub.b|=F.sub.b+2 and Σ.sub.ν∈L.sub.
[0106] Next, consider the recurrence given by L′.sub.0=1, L′.sub.1=2, and L′.sub.b=L′.sub.b−1+2L′.sub.b−2 for b≥2; the factor of 2 assigns twice the weight to the subtree rooted at the child of the root's left child, thereby ensuring that each leaf is accounted for with weight 2.sup.α.sup.
[0107] Denote by s the size of a buffer. Then
|PPC(C,T.sub.b)′|=|PPC(C,T.sub.b)|+(2.sup.b+2.sup.b−1−F.sub.b+3)s.
[0108] Step 2: Bounding Fan-Out by f
[0109] In the second step, we need to resolve the issue of high fan-out of the last output of each recursively used sub circuit in PPC(C,T.sub.b)′. Our approach is straightforward. Starting at the root of T.sub.b and progressing downwards, we label each node υ with a value a(υ) that specifies a sufficient number of additional copies of the last output of the sub circuit represented by υ to avoid fan-out larger than f. At right nodes, this is achieved by duplicating the gate computing this output sufficiently often, marked blue in
[0110] We start by defining a(υ) and then argue how to use these values for modifying the circuit to obtain our fan-out f circuit. Afterwards, we will analyze the increase in size of the circuit compared to PPC(C,T.sub.b)′.
[0111] Definition (a(υ)). Fix b∈N.sub.0. For ν∈T.sub.b in depth d∈[b+1], define
[0112] Suppose that for each leaf ν∈T.sub.b, there are └a(ν)┘ additional copies of the root of the aggregation tree, and for each right node ν∈T.sub.b, we add └a(ν)┘ gates that compute (copies of) the last output of their corresponding sub circuit of PPC(C,T.sub.b)′. Then we can wire the circuit such that all gates that are not in aggregation trees have fan-out at most f, and each output of the circuit is driven by a gate or buffer driving only this output.
[0113] It remains to modify the aggregation trees so that sufficiently many copies of the roots' output values are available.
[0114] Consider an aggregation tree corresponding to leaf ν∈T.sub.b and fix f≥3. We can modify it such that the fan-out of all its non-root nodes becomes at most f, there are └a(ν)┘ additional gates computing the same output as the root, and at most (fa(ν))/(f−2)+(2.sup.a.sup.
[0115] Finally, we need to count the total number of gates we add when implementing these modifications to the circuit.
[0116] For f≥3, define PPC.sup.(f)(C,T.sub.b) by modifying PPC(C,T.sub.b)′ according to Lemmas 5.15 and 5.16. Then, with λ.sub.1:=(1+√{square root over (5)})/4, |PPC.sup.(f)(C,T.sub.b)| is bounded by
[0117] We summarize our findings in the following:
[0118] Suppose that C implements ⊕, buffers have size s and depth at most d(C), and set λ.sub.1:=(1+√{square root over (5)})/4. Then for all k∈[b+1], b∈N.sub.0 and f≥3, there is a PPC.sub.⊕(2.sup.b) circuit of fan-out f, depth (b+k)d(C), and size at most
[0119] Due to space constraints, we refrain from analyzing the size of the construction for values of B that are not powers of 2. However, in
[0120]
[0121]
[0122]
[0123] First, we need to specify implementations of the sub circuits computing ⋄.sub.M and out.sub.M.
[0124] From Tables 5a and 5b, for s, b∈B.sup.2 we can extract the Boolean formulas
(s⋄b).sub.1=s.sub.1
(s⋄b).sub.2=
out(s,b).sub.1=
out(s,b).sub.2=s.sub.1b.sub.2+s.sub.2b.sub.1+b.sub.1b2.
[0125] In general, realizing a Boolean formula f by replacing negation, multiplication, and addition by inverters, AND, and OR gates, respectively, does not result in a circuit implementing f.sub.M.sup.1 However, we can easily verify that the above formulas are disjunctions of all prime implicants of their respective functions. As one can manually verify, these formulas evaluate to the truth tables given in Tables 6 and 7 and in this special case the resulting circuits do implement the closure—provided the gates behave as in Table 3, which the implementations given in
(s⋄b).sub.1=s.sub.1(
(s⋄b).sub.2=s.sub.2(
out(s,b).sub.1=b.sub.1(b.sub.2+
out(s,b).sub.2=b.sub.2(b.sub.1+s.sub.1)+b.sub.1s.sub.2.
.sup.1 For instance, (s⋄b).sub.1=s.sub.1b.sub.1+
[0126] We see that, in fact, a single circuit with suitably wired (and possibly negated) inputs can implement all four operations. As for sel.sub.1=
XMUX(sel.sub.1,sel.sub.2,x,y):=y(x+sel.sub.2)+x sel.sub.1.
[0127] Table 8 lists how to map inputs to compute ⋄.sub.M and out.sub.M.
[0128] We note that this circuit is not a particularly efficient XMUX implementation; a transistor-level implementation would be much smaller. However, our goal here is to verify correctness and give some initial indication of the size of the resulting circuits—a fully optimized ASIC circuit is beyond the scope of this article. The size of the implementation may be slightly reduced by moving negations. Due to space limitations, we refrain from detailing this modification here, but note that
[0129] We now have all the pieces in place to assemble a containing 2-sort(B) circuit. As stated above, ⋄.sub.M is associative. Thus, from a given implementation of ⋄.sub.M (e.g., two copies of the circuit from
[0130]
[0131] The correctness of this construction follows from the above explanations, where we can plug in any PPC.sub.⋄.sub.
[0132] More particularly, we implemented the design given in
[0133]
[0134] For the implementation of PPC.sub.⋄.sub.
[0135] After behavioral simulation we continue with a comparison of our design and a standard sorting approach Bin-comp(B). As mentioned earlier, the 2-sort(B) implementation given in
[0136] Since metastability-containing circuits may include additional gates that are not required in traditional Boolean logic, Boolean optimization may compromise metastability containing properties. Accordingly, we were forced to disable optimization during synthesis of the circuits.
[0137]
[0138] As a binary benchmark Bin-comp was used: In short, Bin-comp consists of a simple VHDL statement comparing two binary encoded inputs and outputting the maximum and the minimum, accordingly. It follows the same design process as 2-sort, but then undergoes optimization using a more powerful set of basic gates. For example, the standard cell library provides prebuilt multiplexers. These multiplexers are used by Bin-comp, but not by 2-sort. We stress that these more powerful gates provide optimized implementations of multiple Boolean functions, yet each of them is still counted as a single gate. Thus, comparing our design to the binary design in terms of gate count, area, and delay disfavors our solution. Moreover, we noticed that the optimization routine switches to employing more powerful gates when going from B=8 to B=16 (cf.
[0139] Nonetheless, our design performs comparably to the non-containing binary design in terms of delay, cf.
[0140] We emphasize that we refrained from optimizing the design by making use of all available gates or devising transistor-level implementations, since such an approach is tied to the utilized library or requires design of standard cells.
[0141] In conclusion, we demonstrated that efficient metastability containing sorting circuits are possible. Our results indicate that optimized implementations can achieve the same delay as non-containing solutions, without a dramatic increase in circuit size. This is of high interest to an intended application motivating us to design MC sorting circuits: fault tolerant high-frequency clock synchronization. Sorting is a key step in envisioned implementations of the Lynch-Welch algorithm with improved precision of synchronization. The complete elimination of synchronizer delay is possible due to the efficient MC sorting networks presented in this article; enabling an increment of the rate at which clock corrections are applied, significantly reducing the negative impact of phase drift of local clock sources on the precision of the algorithm.
[0142] More generally speaking, MC circuits like those presented here are of interest in mixed signal control loops whose performance depends on very short response times. When analog control is not desirable, traditional solutions incur synchronizer delay before being able to react to any input change. Using MC logic saves the time for synchronization, while metastability of the output corresponds to the initial uncertainty of the measurement; thus, the same quality of the computational result can be achieved in shorter time. Note that our circuits are purely combinational, so they can be used in both clocked and asynchronous control logic.
[0143] Examples of such control loops are clock synchronization circuits, but MC has been shown to be useful for adaptive voltage control and fast routing with an acceptable low probability of data corruption as well.