Analog Computing System For Accelerating Combinatorial Optimization

20260023941 · 2026-01-22

Assignee

The Regents Of The University Of Michigan (Ann Arbor, MI)

Inventors

Cpc classification

International classification

Abstract

A hybrid analog-digital architecture is presented. The hybrid analog-digital architecture is comprised of an adjacency memory, a global controller, an array of memory cells and a computing circuit. The adjacency memory is configured to store a graph representing an optimization problem. The array of memory cells is arranged in columns and rows. Each node of the graph is assigned to a memory cell in the array of memory cells and each memory cell is configured to store an electric charge representing a spin state of an Ising model. The computing circuit is interfaced with the array of memory cells. The computing circuit is configured to read current from a given pair of memory cells in the array of memory cells, compute a differential current between the currents read from the given pair of memory cells, compute an update charge for one of the memory cells in the given pair of memory cells using the differential current, and transfer the update charge to the one memory cell in the given pair of memory cells. The global controller is interconnected between the adjacency memory and the array of memory cells.

Claims

1. A hybrid analog-digital architecture, comprising: an adjacency memory configured to store a graph representing an optimization problem; an array of memory cells arranged in columns and rows, each node of the graph is assigned to a memory cell in the array of memory cells and each memory cell is configured to store an electric charge representing a spin state of an Ising model; a computing circuit interfaced with the array of memory cells, wherein the computing circuit is configured to read current from a given pair of memory cells in the array of memory cells, compute a differential current between the currents read from the given pair of memory cells, compute an update charge for one of the memory cells in the given pair of memory cells using the differential current, and transfer the update charge to the one memory cell in the given pair of memory cells; and a global controller interconnected between the adjacency memory and the array of memory cells, wherein the global controller operates to select the given pair of memory cells in the array of memory cells in accordance with the graph and coordinate an update to one of the memory cells in the given pair of memory cells using the computing circuit.

2. The hybrid analog-digital architecture of claim 1 wherein each memory cell in the array of memory cells stores the electric charge on a capacitor.

3. The hybrid analog-digital architecture of claim 1 wherein each memory cell in the array of memory cells is comprised of a transconductance cell and a current inverter.

4. The hybrid analog-digital architecture of claim 1 wherein the global controller is configured to initialize spin states of the memory cells in the array of memory cells.

5. The hybrid analog-digital architecture of claim 1 wherein the global controller iteratively updates the electric charge stored in each memory cell of the array of memory cells until a stopping condition is met.

6. The hybrid analog-digital architecture of claim 5 wherein the global controller iteratively updates the electric charge for a given memory cell by identifying nodes adjacent to a given node in the graph; for node adjacent to the given node, determining a contribution to the given node by the adjacent node; accumulating the contributions for the given node; and updating the electric charge for the given memory cell with the accumulated contribution, where the given node is assigned to the given memory cell.

7. The hybrid analog-digital architecture of claim 6 further comprises determining a contribution to the given node according to a piece-wise linear periodic function.

8. The hybrid analog-digital architecture of claim 6 further comprises determining a contribution to the given node in proportion with the coupling function $(v) = {\begin{matrix} - v, & v (- \frac{P}{4}, \frac{P}{4}] \\ v - \frac{P}{2}, & v (\frac{P}{4}, \frac{3 P}{4}] \end{matrix}$ where is difference between stored voltages on capacitors of a pair of memory cells and P is a predetermined circuit parameter.

9. The hybrid analog-digital architecture of claim 5 wherein, for each memory cell in the array of memory cells and after meeting the stopping condition, the global controller compares the electric charge stored by a particular memory cell to a referential charge v.sub.y stored in a designated memory cell and evaluates the sign of a coupling function for the difference between the charges, the obtained value, +1 or 1, is taken as the rounded value of the charge in the particular cell.

10. A method for solving an optimization problem, comprising: constructing a problem graph for an optimization problem, where the node of the problem graph represent variables in the optimization problem; mapping nodes of the problem graph to spin states of an Ising model; assigning each spin state of the Ising model to a memory cell in an array of memory cells, where each memory cell is configured to store an electric charge representing the spin state; and determining a minimum energy state for the array or memory cells in accordance with the problem graph.

11. The method of claim 10 further comprises randomly initializing spin states of the Ising model stored in the array of memory cells.

12. The method of claim 10 further comprises determining a minimum energy state by iteratively updating the electric charge stored in the array of memory cells.

Description

DRAWINGS

[0013] The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.

[0014] FIGS. 1A and 1B are graphs comparing the Hamiltonian kernel and the dynamic coupling function, respectively, for the rank-2 relaxation to the triangular model.

[0015] FIG. 2A depicts an example graph with twenty-five vertices.

[0016] FIG. 2B is a graph showing state evolution based on the relaxed dynamics with period of set to 2, for 50 time steps.

[0017] FIGS. 2C and 2D are polar plots of the states at T=0 and T=50, respectively.

[0018] FIGS. 2E and 2F are graphs showing cut versus polar angle at T =0 and T=50, respectively, with the dashed line indicating the max-cut.

[0019] FIG. 2G is a graph showing the average cut evolution from 20 random initial conditions as obtained using the relaxed heuristic and the local search procedure.

[0020] FIG. 3A is a diagram illustrating evaluating modulo for computing (v.sub.iv.sub.i) on a 33 vertex array.

[0021] FIG. 3B is a diagram illustrating temporally multiplexed multiply-accumulate on a 33 vertex array.

[0022] FIG. 4 is a block diagram of a hybrid analog-digital architecture.

[0023] FIG. 5 is a diagram depicting an example embodiment for the array of memory cells.

[0024] FIG. 6 is a block diagram depicting a memory cell in the array of memory cells.

[0025] FIG. 7 depicts an example implementation of the computing circuit.

[0026] FIG. 8 is a schematic for an example implementation of a memory cell.

[0027] FIG. 9 is a flow diagram showing the operation of the global controller.

[0028] FIG. 10 is a schematic of a read-write amplifier.

[0029] Corresponding reference numerals indicate corresponding parts throughout the several views of the drawings.

DETAILED DESCRIPTION

[0030] Example embodiments will now be described more fully with reference to the accompanying drawings.

[0031] Ising model is a set of coupled binary variables (commonly represented by 1) or spins (symbolically by ). Each spin is coupled to a subset of other spins which makes one spins-state more energetically favorable than the other. Let the coupling between the spins of Ising model be represented by an undirected graph G with N vertices, M edges and adjacency matrix A={A.sub.ij}. For spin-state ={.sub.ii} with {1, +1}, the corresponding Hamiltonian H is

[00001] $\begin{matrix} H () = \frac{1}{2} {.Math.}_{i, j} A_{ij}_{i}_{j} & (1) \end{matrix}$

The ground state of the Ising model is the spin-state yielding the minimum value of H().

[0032] For generic graphs, finding the ground state is an NP-complete problem. For example, all of Karp's NP-complete problems, described by R. M. Karp in Reducibility among Combinatorial Problems published in Complexity of Computer Computations (Springer, 1972), can be re-formulated as the ground state search problem of a specially constructed Ising models.

[0033] Finding the ground state of the Ising model based on G is equivalent to determining the max-cut of G. Indeed, any spin-state partitions the vertices into two sub-sets: one with only positive spins and another with only negative spins. The corresponding cut-size is defined as the number (or the net weight) of the edges from one partition to the other:

[00002] $\begin{matrix} \begin{matrix} _{1} () = \frac{1}{4} \underset{i, j}{.Math.} A_{ij} (1 -_{i}_{j}) \\ = \frac{M}{2} - \frac{1}{2} H () \end{matrix} & (2) \end{matrix}$

[0034] Thus, the minimum of H() corresponds to the maximum of .sub.1() and vice-versa. Alternatively, the max-cut problem can be presented in terms of a NN positive semi-definite matrix S:

[00003] $\begin{matrix} C_{G} = \max_{s} \frac{M}{2} - \frac{1}{4} (A .Math. S) & (3) \end{matrix}$ $Subject to rank (S) = 1$ $diag (S) = {1}^{N}$ $S 0$

where A.Math.S=.sub.i,jA.sub.ijS.sub.ij and the last condition denotes positive semi-definiteness. The spin-state and the positive semidefinite matrix S are related by S=.sup..

[0035] The NP-hardness of the max-cut problem prompted the development of algorithms that guarantee a lower-bound on their approximate solution in polynomial time. Among these, the algorithm described by Goemans and Williamson in Improved Approximation Algorithm for Maximum Cut and Satisfiability Problems using Semidefinite Programming Journal ACM, vol. 42, no. 6 (November 1995) provides the tightest bound on the expected solution. Their algorithm has two stages. The first stage replaces each of the N spins, .sub.i, with N-dimensional unit vector {right arrow over (s)}.sub.i and the product of spins with their dot-product. Then, an analogue of the cut .sub.n is defined

[00004] $\begin{matrix} _{n} (\overset{.fwdarw.}{s}) = \frac{1}{4} {.Math.}_{i, j} A_{ij} (1 - {\overset{.fwdarw.}{s}}_{i} .Math. {\overset{.fwdarw.}{s}}_{j}) & (4) \end{matrix}$

For S={{right arrow over (s)}.sub.i.Math.{right arrow over (s)}.sub.l}, the following rank-N SDP is solved:

[00005] $\begin{matrix} C_{G} = \arg \max_{s} \frac{M}{2} - \frac{1}{4} (A .Math. S) & (5) \end{matrix}$ $Subject to diag (S) = {1}^{N}$ $S 0$

[0036] The solution vectors of the SDP problem ({right arrow over (s)}.sub.l), distributed over the N-dimensional unit sphere, are mapped to +1 or 1 by comparing their orientation with respect to a random hyperplane through the origin. The cut so obtained was shown to be within 87% of the maximum cut for random graphs with non-negative weights.

[0037] Burer-Monteiro-Zhang heuristic uses rank-2 SDP, which is equivalent to limiting the dimensionality of {right arrow over (s)} to two in equation (4). Each two-dimensional unit vector is represented by polar coordinate , so that {right arrow over (s)}=[cos(), sin()]. The rank-2 cut analogue .sub.2 is defined as

[00006] $\begin{matrix} _{2} () = \frac{1}{4} {.Math.}_{i, j} A_{ij} (1 - \cos (_{i} -_{j})), & (6) \end{matrix}$

where ={.sub.1, .sub.2, . . . , .sub.n}. This heuristic was implemented in the max-cut solver circuit, where the dynamical realization of rank-2 SDP was followed by finding the optimal rounding and post-processing based on 1-opt and 2-opt local search.

[0038] The dynamical realization of SDP ensures that .sub.2() monotonously increases with time. By updating according to d/dt=.sub.2(), the equations of motion governing the gradient descent of the system are:

[00007] $\begin{matrix} \frac{d_{i}}{dt} = {.Math.}_{j} A_{ij} \sin (_{i} -_{j}) . & (7) \end{matrix}$

It can be shown that if the stable steady state is such that .sub.i.sub.j=k.sub.ij, where k.sub.ij is an integer, then represents the maximum cut. However, generally the elements of are distributed across [0, 2), and consequently each .sub.i (now one-dimensional) is rounded by comparing its orientation with a reference coordinate. Specifically, the following equation may be used to round to :

[00008] $\begin{matrix} (_{Y}) = sgn (\sin (-_{Y})), & (8) \end{matrix}$

where .sub.y is the reference coordinate. In practice, several .sub.y are chosen from the range [0, 2). For each (.sub.y), the corresponding cut is evaluated using equation (2). The configuration leading to the largest cut is selected as the solution of rounding. During post-processing, a local search for the maximal cut is performed, ensuring that the cut cannot be increased by inverting any single spin or a pair of spins.

[0039] In contrast with the Goemans and Williamson's rank-N algorithm, rank-2 SDP-based heuristic does not have proven approximation guarantee. The landscape described by .sub.2() is non-convex and the solution's quality depends on the quality of the minima of the Hamiltonian. Nevertheless, the algorithm was empirically shown to obtain competitive results in polynomial time.

[0040] The dynamical rank-2 SDP of the BMZ heuristic can be viewed as a system of vertices with sinusoidal coupling: each vertex i has a dynamical state .sub.i, whose change rate depends non-linearly on states of the other vertices. An exact realization of the nodal coupling dynamics requires computing sinusoidal function over multiple periods. Since there is a separate sine-evaluation corresponding to each edge, this computational effort may greatly shoot up for large and/or dense graphs. Instead, we consider a triangular coupling function, as a simpler piece-wise linear approximation of the sine. This reduces a complex non-linear coupling function to one composed of basic arithmetic operationspair-wise additions and sign inversions, which are directly realizable using analog computing methods.

[0041] In equation (6), if the cosine cut-counting function is replaced with a more general but periodic and the state vector by v, then the new cut .sub.F is,

[00009] $\begin{matrix} _{} (v) = \frac{1}{4} {.Math.}_{i, j} A_{ij} (1 - (v_{i} - v_{j})), & (9) \end{matrix}$

and correspondingly, the new dynamical equations are

[00010] $\begin{matrix} \frac{{dv}_{i}}{dt} = {.Math.}_{j} A_{ij} (v_{i} - v_{j}), & (10) \end{matrix}$

where (v)=1/2 d(v)/dv. For a piece-wise linear coupling function, is a periodic function with period P and (kP/2)=(1).sup.k:

[00011] $\begin{matrix} (v) = {\begin{matrix} 1 - v^{2}, & v (- \frac{P}{4}, \frac{P}{4}] \\ {(.Math. v .Math. - \frac{P}{2})}^{2} - 1, & .Math. v .Math. (\frac{P}{4}, \frac{P}{2}] \end{matrix} & (11) \end{matrix}$

So that

[00012] $\begin{matrix} (v) = {\begin{matrix} - v, & v (- \frac{P}{4}, \frac{P}{4}] \\ v - \frac{P}{2}, & .Math. v .Math. (\frac{P}{4}, \frac{3 P}{4}] \end{matrix} & (12) \end{matrix}$

FIGS. 2A and 2B compare and with the original ones for P=4(a.u.).

[0042] The questions regarding the capability of the described heuristics to approximately solve the intended problem in polynomial time were addressed in A. Shukla, M. Erementchouk, and P. Mazumder, Scalable almost-linear dynamical Ising machines, Natural Computing (2024) which is incorporated in its entirety by reference herein. The integrality gap due to the random rounding following the new dynamics was shown to be about 85%, versus 87% for the full rank SDP.

[0043] The detailed steps of the proposed heuristic are provided in Algorithms 1 and 2.

TABLE-US-00001 Algorithm 1: Nodal Coupling Dynamics (Input: A; Output: x). 1: for i {1, 2. . .N} do 2: x.sub.i N(0, P) custom-character Normally distributed 3: end for 4: while t < T.sub.max do 5: for i {1, 2. . .N} do 6: S .sub.jA.sub.i,j(x.sub.i x.sub.j) 7: x.sub.i(t + 1) x.sub.i(t) + S is time-discretization factor 8: end for 9: t t + 1 10: end while
In Algorithm 1, the N analog spins are first initialized to normally distributed values after which the states are let to evolve for T.sub.max time-steps.

TABLE-US-00002 Algorithm 2: Randomized Rounding (Input: x, K; Output: ). 1:fori{1,2...K}do 2:y.sub.i(P/2,P/2) 3:endfor 4:C.sub.max0 5:y.sub.maxy.sub.1 6:fori{1,2...K}do 7:forj{1,2...N}do 8:.sub.jsgn((x.sub.jy.sub.i)) 9:endfor [00013] $10 : C \frac{M}{2} -_{m}_{n} A_{m, n} \frac{_{m}_{n}}{4}$ 11:ifC>C.sub.maxthen 12:C.sub.maxC 13:y.sub.maxy.sub.i 14:endif 15:forj{1,2...N}do 16:.sub.jsgn((x.sub.jy.sub.max)) 17:endfor 18:endfor
In Algorithm 2, K uniformly distributed rounding centers are chosen. For each choice, the analog spins are rounded to 1 and the cut evaluated. The algorithm then finds the largest cut and corresponding spin-state.

[0044] The heuristic is demonstrated on an example graph with 25 vertices as shown in FIG. 2A. FIG. 2B plots the states of vertices as they evolve following equation (10) with the period of set to 2. FIGS. 2C and 2D plot the initial and final states as phasors with period of 2. The solid black diameter at angle a rounds continuous states to binary spins, 1. The mapping clearly depends on and this dependence is plotted in FIGS. 2E and 2F for the initial and final states. FIG. 2G shows the cut evolution over time. It is divided into two stages: the first shows a coarse optimization from the relaxed heuristics and the second shows a finer increment in the cut from the local search procedure.

[0045] Significant progress has been made towards accelerating the ground-state search problem of Ising models and more generally, solving combinatorial optimization problems via dynamical Ising-like models. Binary Ising machines, in most cases, are designed to parallelize the thermal annealing of a large number of semi-independent Ising models. Most binary spin annealing systems are directly challenged by the need for extremely large samples of the spin-states. On the other hand, analog models are hard to implement in a continuous time fashion due to sensitivity to analog parameters, difficulty in replicating their complex dynamics exactly using analog components, and dependence on alternatives to the widely-used CMOS devices.

[0046] The polynomial scaling and an overall reduced run-time of the heuristic based on the triangular approximation described above was demonstrated on a general-purpose computer. There, the max-cut values for graphs in G-set, a special benchmarking collection of graphs, were found to be only slightly (about 2%) reduced than the max-cut obtained by solver Circut based on the BMZ heuristic. With the higher level of customization, one expects better scaling performance than achievable by a general-purpose computer.

TABLE-US-00003 TABLEI summarizesthekeycomputationaloperationsinthe algorithmpresentedaboveandidentifiesthecorresponding inputandoutputtypes. Algebraic Input Output expression type type Nodalcouplingdynamics .sub.jA.sub.ij(x.sub.ix.sub.j) custom-character .sup.N .sup.N Binarization Thresholding sgn((x.sub.ix.sub.j)) {0,1}.sup.N Cut evaluation [00014] $\underset{i, j}{.Math.}_{i} A_{i j}_{j}$ {0,1}.sup.N
The table reveals two key composite operations that need to be efficiently conducted by an accelerator. The first operation is the modulo, defined for any real numbers a and b (b0) as mod (a, b)=a|b|a/|b|, where |x| is the largest integer less than, or equal to x. The function in equation (12) may be expressed as the modulo with respect to 's period P, followed by conditional sign-inversions. Binarization of analog spins requires a modulo of the analog state with respect to P, followed by 1-bit quantization.

[0047] The second operation is multiplication-and-accumulation (MAC). Both cut-evaluation and local search steps involve the product of spin-vector with the adjacency matrix. The vector output determines a binary spin vector either for the binarization or for Hamiltonian reducing spin-flips.

[0048] In this disclosure, operations discussed above are based around a mix of analog-digital computing methodologies for two reasons. First, MAC is efficiently accelerated through a simultaneous multi-element product and sum, where the sub-operations of the MAC (usually a product of time-variant vector with time in variant ones) are spatially unfolded and simultaneously conducted. Each analog/digital variable input of the MAC supplies a proportional current signal onto an accumulating wire/bus and thus physical movement of data is avoided. Higher the number of simultaneous summations, more efficient is the accumulation operation via unfolding.

[0049] Second, the utlized heuristic based on the triangular approximation involves the evolution of analog variables via a series of accumulations conducted over time. Charge, as an analog variable, can be stored and naturally accumulated on a capacitor, within the limits of supply voltage and leakage. Hence a capacitor can serve as a stationary analog memory element to store the spin-state. Whereas, in a purely digital memory, updating states without the movement of data would require adders in each vertex

[0050] FIG. 4 depicts a hybrid analog-digital architecture 40 for solving an optimization problem in accordance with this disclosure. The architecture is comprised of an adjacency memory 41, a global controller 42, an array of memory cells 43, and a computing circuit 44. The adjacency memory 41 is configured to store a graph representing the optimization problem and the final solution to the problem. In one embodiment, the adjacency memory 41 is a non-transitory storage medium. The global controller 42 is interconnected between the adjacency memory 41 and the array of memory cells 43. The global controller controls the time progression for updating the spin-representing variables stored in the array of memory cells 43.

[0051] An iterative process for updating the electric charge in each memory cell is further described in relation to FIG. 9. As a starting point, the global controller 42 is configured to initialize spin states of the memory cells in the array of memory cells 43. In one embodiment, the spin state for each memory cell is randomly assigned.

[0052] Next, the global controller 42 iteratively updates the electric charge stored in each memory cell of the array of memory cells 43 at 92 until a stopping condition is met as indicated at 93. To do so, the global controller 42 traverses the node in the graph and updates the electric charge associated with a given node in the graph. For a given node in the graph, nodes adjacent to a given node in the graph are identified at 97. For nodes adjacent to the given node, a contribution to the given node by the adjacent node is determined at 98 and then the contributions for the given node are accumulated. Lastly, the electric charge for the given memory cell is updated at 99 with the accumulated contribution, where the given node is assigned to the given memory cell. This process is repeated as each node in the graph is traversed and until the stopping condition is met.

[0053] When the stopping criterion is met, the global controller ceases updating the electric charge in the memory cells. In this instance, the electric charges are continuously distributed. To obtain the distribution of binary spins from the terminal state of the memory, the continuously distributed charges must be rounded following the parametric rounding procedure as indicated at 94. The parameter of this procedure is the referential charge stored in a dedicated memory cell. In one embodiment, the value of the referential charge is chosen randomly.

[0054] The parametric rounding procedure follows the modified charge updating without performing the charge update step. After the referential charge is set, the global controller iteratively accesses each memory cell of the array. The global controller connects each chosen cell representing a node in a graph with the dedicated memory cell containing the referential charge. The computing circuit evaluates the update charge due to the established connection. If the update charge is zero or positive, the rounded value of the charge stored in the chosen memory cell is taken to be +1, and if the update charge is negative, the rounded value is taken to be 1. Such obtained rounded value is written in the part of the adjacency memory allocated to store the solution of the optimization problem. Finally, the obtained update charge is discarded to keep the charge in the chosen memory cell unchanged.

[0055] After the global controller finishes processing the last memory cell, the allocated part of the adjacency memory contains the obtained solution of the optimization problem. In one embodiment, the obtained solution is further processed to evaluate the value of the cut. The parametric rounding procedure is repeated with a newly generated value of the referential charge. Only the rounded solution providing the largest cut value is kept and is output at 95 as the final result produced by the architecture.

[0056] A shared current-bus enables a multi-vertex read via superposition of individual currents and establishes connection with the computing circuit 44. FIG. 3A depicts the modulo operation on a pair of vertices i and j selected from array of vertices. FIG. 4B shows the time-multiplexed MAC operation, divided into multiple column-wise simpler accumulations due to restrictions on the selectability of the vertices.

[0057] FIG. 5 further depicts an example embodiment for the array of memory cells 43. In this example, an array of memory cells 43 arranged in columns and rows. Each node of the graph is assigned to a memory cell in the array of memory cells. That is, there is a one-to-one correspondence between each node in the graph and a memory cell in the array. Each memory cell is configured to store an electric charge representing a spin state of an Ising model.

[0058] An example implementation for a memory cell in the array of memory cells is shown in FIG. 6. In this example, each memory cell in the array of memory cells is comprised of a transconductance cell 61 and a 2:2 multiplexer (i.e., current inverter) 62. Other types of implementations for the memory cells are contemplated by this disclosure.

[0059] FIG. 7 further depicts an example embodiment of the computing circuit 44. The computing circuit 44 is interfaced with the array of memory cells. The computing circuit 44 is configured to read current from a given pair of memory cells in the array of memory cells, compute a differential current between the currents read from the given pair of memory cells, compute an update charge for one of the memory cells in the given pair of memory cells using the differential current, and transfer the update charge to the one memory cell in the given pair of memory cells.

[0060] The entire system of equations in (10) is realized by processing one vertex, one edge at a time. Each such processing is essentially divided into two phases: reading and writing. During the reading phase, given and adjacent vertices are simultaneously read by superposing multiple read currents onto a voltage-pinned current carrying bus. The net current is stepped up/down by superposing digitized current from a digital-to-analog converter (DAC) 71 as seen in FIG. 7. This is followed by sign-inversions from a 2-input and 2-output (2:2) multiplexer 72, to get a current proportional to . During the writing phase, a voltage proportional to is buffered onto capacitor C.sub.B. The charge so stored is sent up to the updated vertex, through a separate write-bus, leading to the evolution of the state by long-term accumulation.

[0061] Considering the array of memory cells (i.e., vertices or nodes of the graph) in FIG. 5, let vertex i be chosen for updating, based on an adjacent vertex j. It is updated by first selecting the vertices via the row and column selection multiplexers 51, 52 and closing the read-switch R or asserting binary signal .sub.R, while keeping the common-mode bus voltage pinned. This gives an output differential current through the read bus equal to

[00015] $\begin{matrix} I_{i} - I_{j} = g_{M} (v_{c, i} - v_{c, j}) & (13) \end{matrix}$

The sign of differential current I.sub.j above is reversed using 2:2 multiplexer local to the vertex, which reverses the current direction by exchanging the differential components. If the DAC output is represented by nDAcI.sub.0, then it is adjusted so that the net current lies within the peak values of :

[00016] $\begin{matrix} - \frac{I_{0}}{2} < I_{i} - I_{j} - n_{DAc} I_{0} < \frac{I_{0}}{2} . & (14) \end{matrix}$

Finally, the input to the multiplexer S is set to nDAC to give a net modulated current

[00017] $\begin{matrix} (v_{i} - v_{j}) = {(- 1)}^{n_{DAC}} (g_{M} (v_{i} - v_{j}) - n_{DAC} I_{0}) & (15) \end{matrix}$

[0062] For writing, once binary signal .sub.w is asserted, the voltage on the read-bus stored in C.sub.B is pushed to the vertex being updated, akin to the standard switched-capacitor accumulator. The change in v.sub.c,i is given by

[00018] $\begin{matrix} v_{c, i} = \frac{c_{B}}{c} AR (v_{c, i} - v_{c, j}), & (16) \end{matrix}$

where C is the vertex's capacitance, A is the voltage gain from read current to C.sub.B and R is the net resistance seen the by the read-current bus. Similarly, the other vertices are updated in sequence. Each such cycle of state updates, for all vertices, constitutes a time-step.

[0063] Algorithms 3-5 provide detailed steps for realizing the proposed heuristics on the system 40 and are set forth in the appendix. The algorithms are suited for sequential code-blocks in a hardware-description language (e.g., Verilog), as these involve reading and asserting real-time signals. All key input and output signals, referenced in Algorithms 3-3, are depicted in FIG. 5. Two key sub-routines are repeatedly used through the algorithmsCOMPUTE- and READ-WRITE, the exact sequence of steps of which is provided in Algorithms 6 and 7.

TABLE-US-00004 Algorithm 3 Nodal coupling dynamics (Input: A; Output: v) 1: for {i, j} : A.sub.i,j is 1 do 2:Select j for read and i for write 3:Enable global read 4:COMPUTE-( ) 5:READ-WRITE(Analog) 6: end for

TABLE-US-00005 Algorithm4Randomizedrounding(Input:v;Output:) Require:VertexYreservedforrounding 1:fork{1,2...K}do custom-character Eachroundingcenter 2:SelectYforread 3:fori{1,2...N}do Eachvertex 4:Selectiforwrite 5:.sub.R1 Enableglobalread 6:COMPUTE-() 7:READ-WRITE(Binary) Binarywrite 8:endfor 9:Evaluatecut SeeAlgorithm6 [00019] $10 : n_{DAC} U (0, \frac{P}{K}) Random positive increment$ 11:SelectYforwrite custom-character Rounding-centerupdate 12:READ-WRITE(Analog) 13:endfor

TABLE-US-00006 Algorithm 5 Cut-evaluation (Input: , A; Output: C) 1: for k {1, 2...K} do 2: Select n.sub.C0 + k-th vertex for write custom-character Index for storing cut 3: for i {1, 2...N} do 4: Select i vertex for read 5: Enable global binary read 6: if C+/ is 1 then 7: Invert sign of the following reads 8: end if 9: for c 1, 2...C do Iterate over columns 10: L { } List of vertices simultaneously read 11: for r 1, 2...R do custom-character Iterate over rows 12: j index of vertex at (r, c) 13: if A.sub.ij is 1 && i < j then 14: Push j to L 15: end if 16: end for 17: Select all vertices in L for read 18: READ-WRITE(Analog) 19: end for 20: end for 21: if k > 1 then 22: Select vertices k and k.sub.max for read 23: Enable global read 24: if C+/ is 1 then 25: k.sub.max k custom-character Max. cut index swapping 26: end if 27: Disable all read/write 28: end if 29: end for

TABLE-US-00007 Algorithm 6 Key sub-routine COMPUTE- Require: Vertices enabled for reading and writing 1: procedure COMPUTE- 2: C.sub.1 0 3: C.sub.2 0 4: n.sub.DAC 1 5: while C.sub.1 is C.sub.2 do 6: C.sub.1 C.sub.+/ custom-character Read comparator 7: n.sub.DAC nDAC + 2C.sub.1 8: C.sub.2 C.sub.+/ 9: end while 10: n.sub.DAC n.sub.DAC + C.sub.2 11: if mod(n.sub.DAC, 4) is 0 then 12: S 1 13: else 14: S 0 15: end if 16: end procedure

TABLE-US-00008 Algorithm 7 Key sub-routine READ-WRITE 1: procedure READ-WRITE(mode = {Binary, Analog} 2: Toggle .sub.R 5: if mode is Analog then 6: Enable global write 9: else 10: Enable binary write 11: end 12: Toggle .sub.W 14: Disable all read/writes 16: end procedure

[0064] The algorithm for the nodal coupling dynamics is provided in Algorithm 3. The execution time for the nodal coupling dynamics scales as O(M), where M is the number of edges. Though a vertex-stationary array realizes the dynamics of equation (12), a shared bus restricts the number of state updates in a given period of time, as only one edge is processed at a time.

[0065] Detailed algorithms for binarization with K rounding centers and cute evaluation are provided in Algorithms 4 and 5, respectively

[0066] The vertex is primarily a capacitor-based analog memory with a differential gm-cell that converts the capacitor's voltage to a proportional current as the output of a read. Additionally, the vertex has a 2:2 multiplexer to invert the sign of the current, and a minimal logic to select itself for read/write. FIG. 6 depicts the block diagram of the proposed vertex, with the capacitor C, and the other components accordingly labelled.

[0067] The analog memory is implemented using metal-insulator-metal (MIM) capacitor. This allows one to place the capacitor above the substrate containing the actual transistors, thus saving area. The capacitor's minimum allowed size depends on (1) ability to undergo writing cycles without losing substantial existing charge, and (2) the size of the CMOS components below, which it should not significantly exceed. To set the common mode of the state voltages to a pre-determined value, we split the capacitor in a series of two identical C.sub.P and C.sub.N and assert their common node during every read-cycle.

[0068] The g.sub.M-cell (FIGS. 6 and 7) converts the capacitor's analog charge to a proportional current. A differential amplifier with a tail current-source ensures that the transconductance is largely independent of input common-mode voltage.

[0069] All the analog switches of the cell are implemented using transmission gates. The design requires that all the bus voltages be between 0 and the V.sub.DD, even when writing. Read-enable switches turn on the gm-cell's current output to the current bus. Write-enable switches enable charge from the write bus to pass through. A 2:2 multiplexer serves as an output sign-selection switch for inverting the output differential current.

[0070] A minimally sized logic determines the following local signals from the global row and column signals: (1) select-enable of the vertex, (2) g.sub.M-cell's one-bit weight, (3) sign of the output current, and (4) binary spin-state.

[0071] The read-write amplifier buffers the net read-bus voltage onto the buffer capacitor C.sub.B and pushes the charge thus stored to the updated vertex. This block may be divided into two components: read-buffer and write-buffer (FIG. 10). The read-buffer is a differential amplifier with a current-mirror load and has a small but linear gain. Extra sinking loads are used for operating point's adjustment, setting it closer to V.sub.DD/2. Capacitor C.sub.B stores the read-bus voltage and supplies its charge to the capacitor of the vertex being written onto. Since the peak increment in the state is generally smaller than the maximum state voltage, the capacitance C.sub.B is smaller than C.sub.P and C.sub.N. For this reason, lower metals layers are used for C.sub.B.

[0072] The write-stage is essentially a high gain amplifier with differential input and single-ended output (FIGS. 6 and 10). Its primary gain stage is a current-mirror loaded differential amplifier. One can design its gain to be large enough so that the amplifier can push charges reliably over the long write-bus, but not so much as to cause any oscillatory side-reaction due to positive feedback. Its secondary/output stage is an inverting amplifier with resistive load to limit the gain, provide high voltage swing, and predictably set the output operating point. Ideally, the charge increments provided by write-stage must be homogeneous with respect to the read-bus voltage and must cause zero increments for zero inputs. This trend may be easily disrupted post-fabrication due to the single-endedness of its input (the voltage across C.sub.B). To limit any such non-homogeneity, the amplifier output is balanced with respect to the input. This is done using a three-step process once during the entire IC's operation. First, remove any differential input to the read bus, or effectively ensuring V.sub.i,N=V.sub.i,P (FIG. 10). Then, store the voltage across C.sub.B onto C by asserting binary signal .sub.o. Lastly, all the vertex capacitors (C) are simultaneously preset to the output of the write stage, V.sub.o,NV.sub.o,P.

[0073] The DAC plays a critical role in the realization of the heuristics by digitally stepping up/down the read-bus current. It is used in modulo and realization, random initialization of vertices and incrementally shifting the rounding center during binarization. In one example, a 5-bit DAC is used, which provides enough separation between the zeroes assuming a full (0 to V.sub.DD) input range for . To implement DAC, the gm-cell of the vertex is reused with the multiplicities 1,2,4,8 and 16. This allows a direct control over the zeros of , e.g., if the first zero is to be placed at 0.1 V, one could set the input to the DAC's g.sub.M cell to 0.1 V. By tuning the input voltage of the g.sub.M-cell, one may also scale the current-step of the DAC and alter the number of bits.

[0074] The comparator was implemented using a 5-stage high gain differential amplifier, with an absolute voltage gain of about 1000. Thus, it could compare small but frequently occurring differential voltages of 1 mV. Read-bus pinning circuit, pins the common mode read-bus voltage to V.sub.DD/2. It uses the principle of common-mode feedback, i.e., it feeds back current into the bus depending on the separation from its desired value of V.sub.DD/2. Lastly, the sign-inverter is essentially a 2:2 multiplexer.

[0075] The presented computing device is designed to accommodate a fully connected graphs and can be directly employed for a more general class of problems. Some graph topologies, such as the planar, are relatively easily accelerated using alternative Ising machines. Many area-efficient accelerators have been proposed that support planar spin lattices. However, assuming planarity of problems limits the applicability of the machine and requires embedding methods that increase the effective number of nodes.

[0076] Many problems of practical importance demand weights to possess multiple bits. This can be achieved by modulating the capacitance that sends back the charge to the state capacitor (C.sub.B). For multi-bits of weights, the charge increments can be scaled up by using larger C.sub.B.

[0077] The foregoing description of the embodiments has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure. Individual elements or features of a particular embodiment are generally not limited to that particular embodiment, but, where applicable, are interchangeable and can be used in a selected embodiment, even if not specifically shown or described. The same may also be varied in many ways. Such variations are not to be regarded as a departure from the disclosure, and all such modifications are intended to be included within the scope of the disclosure.