Computer vision systems and methods for optimizing correlation clustering for image segmentation using Benders decomposition
11636607 · 2023-04-25
Assignee
Inventors
Cpc classification
G06V10/762
PHYSICS
International classification
Abstract
Computer vision systems and methods for optimizing correlation clustering for image segmentation are provided. The system receives input data and generates a correlation clustering formulation for Benders Decomposition for optimized correlation clustering of the input data. The system optimizes the Benders Decomposition for the generated correlation clustering formulation and performs image segmentation using the optimized Benders Decomposition.
Claims
1. A computer vision system for optimizing correlation clustering comprising: a memory; and a processor in communication with the memory, the processor: receiving input data, generating a correlation clustering formulation for Benders Decomposition for optimized correlation clustering of the input data, optimizing the Benders Decomposition for the generated correlation clustering formulation, and performing image segmentation using the optimized Benders Decomposition.
2. The system of claim 1, wherein the processor generates the correlation clustering formulation to utilize Benders Decomposition by: applying an auxiliary function to a conventional correlation clustering formulation, the auxiliary function being indicative of a cost to alter a vector of the auxiliary function to satisfy cycle inequalities, and mapping the altered vector to a solution that satisfies the cycle inequalities without increasing a cost of the auxiliary function.
3. The system of claim 1, wherein the processor optimizes the Benders Decomposition via a cutting plane algorithm.
4. The system of claim 3, wherein the Benders Decomposition includes a master problem and a set of subproblems and the cutting plane algorithm executes optimization over the variables of the master problem and then executes optimization over the subproblems in parallel.
5. The system of claim 1, wherein the processor accelerates the Bender Decomposition utilizing Benders rows and Magnanti-Wong Benders rows.
6. The system of claim 1, wherein the dataset is a Berkeley Segmentation Data Set (BSDS).
7. A method for optimizing correlation clustering by a computer vision system, comprising the steps of: receiving input data; generating a correlation clustering formulation for Benders Decomposition for optimized correlation clustering of the input data; optimizing the Benders Decomposition for the generated correlation clustering formulation; and performing image segmentation using the optimized Benders Decomposition.
8. The method of claim 7, further comprising the steps of generating the correlation clustering formulation to utilize Benders Decomposition by: applying an auxiliary function to a conventional correlation clustering formulation, the auxiliary function being indicative of a cost to alter a vector of the auxiliary function to satisfy cycle inequalities; and mapping the altered vector to a solution that satisfies the cycle inequalities without increasing a cost of the auxiliary function.
9. The method of claim 7, further comprising the step of optimizing the Benders Decomposition via a cutting plane algorithm.
10. The method of claim 9, wherein the Benders Decomposition includes a master problem and a set of subproblems and the cutting plane algorithm executes optimization over the variables of the master problem and then executes optimization over the subproblems in parallel.
11. The method of claim 7, further comprising the step of accelerating the Bender Decomposition utilizing Benders rows and Magnanti-Wong Benders rows.
12. The method of claim 7, wherein the dataset is a Berkeley Segmentation Data Set (BSDS).
13. A non-transitory computer readable medium having instructions stored thereon for optimizing correlation clustering by a computer vision system which, when executed by a processor, causes the processor to carry out the steps of: receiving input data; generating a correlation clustering formulation for Benders Decomposition for optimized correlation clustering of the input data; optimizing the Benders Decomposition for the generated correlation clustering formulation; and performing image segmentation using the optimized Benders Decomposition.
14. The non-transitory computer readable medium of claim 13, the processor further carrying out the steps of generating the correlation clustering formulation to utilize Benders Decomposition by: applying an auxiliary function to a conventional correlation clustering formulation, the auxiliary function being indicative of a cost to alter a vector of the auxiliary function to satisfy cycle inequalities; and mapping the altered vector to a solution that satisfies the cycle inequalities without increasing a cost of the auxiliary function.
15. The non-transitory computer readable medium of claim 13, the processor further carrying out the step of optimizing the Benders Decomposition via a cutting plane algorithm.
16. The non-transitory computer readable medium of claim 15, wherein the Benders Decomposition includes a master problem and a set of subproblems and the cutting plane algorithm executes optimization over the variables of the master problem and then executes optimization over the subproblems in parallel.
17. The non-transitory computer readable medium of claim 13, the processor further carrying out the step of accelerating the Bender Decomposition utilizing Benders rows and Magnanti-Wong Benders rows.
18. The non-transitory computer readable medium of claim 13, wherein the dataset is a Berkeley Segmentation Data Set (BSDS).
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The foregoing features of the invention will be apparent from the following Detailed Description of the Invention, taken in connection with the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION
(10) The present disclosure relates to computer vision systems and methods for optimizing correlation clustering for image segmentation using Benders decomposition, as described in detail below in connection with
(11) The standard formulation for correlation clustering corresponds to a graph partitioning problem with respect to graph G′=(,
). This problem is defined by Equations 1 and 2, as follows:
(12)
(13) Where the variables are defined as: d∈: The set of nodes in the graph, which correlation clustering is applied on, is denoted
and indexed by d. (d.sub.1, d.sub.2)∈
: The set of undirected edges in the graph, which correlation clustering is applied on is denoted
, and indexed by nodes d.sub.1, d.sub.2. The graph described by
is sparse for real problems. x.sub.d1d2∈{0, 1}: x.sub.d1d2=1 to indicate that nodes d.sub.1, d.sub.2 are in separate components, and zero otherwise. (d.sub.1, d.sub.2) is referred to as an edge, where x.sub.d1d2=1 as a cut edge. ϕ.sub.d1d2∈
: ϕ.sub.d1d2 denotes the weight associated with edge (d.sub.1, d.sub.2).
.sup.+,
.sup.− denotes the subsets of
, for which ϕ.sub.d1d2 is non-negative, and negative respectively. c∈C: C denotes the set of (undirected) cycles of edges in
, each of which contains exactly one member of
.sup.−. C is indexed with c. (d.sup.c.sub.1, d.sup.c.sub.2): (d.sup.c.sub.1, d.sup.c.sub.2) denotes the only edge in
.sup.− associated with cycle c.
.sub.c.sup.+:
.sub.c.sup.+ denotes the subset of
.sup.+ associated with the cycle c.
(14) The objective in Equation 1 describes the total weight of the cut edges. The constraints described in Equation 2 enforce the standard relaxation of correlation clustering, which requires that transitivity regarding association of nodes with components be respected. Equation 2 can be expressed with the following example. Any cycle of edges “c” contains exactly one edge in (d.sup.c.sub.1, d.sup.c.sub.2)∈.sup.−. Equation 2 states that if edge (d.sup.c.sub.1, d.sup.c.sub.2) is cut, then at least one other edge must be cut on the cycle. If this constraint is violated, this means d.sup.c.sub.1, d.sup.c.sub.2 are in separate components (since x.sub.d.sup.c.sub.1d.sup.c.sub.2=1), and that all nodes on the cycle are in the same component (since x.sub.d1d2=0 for all (d.sub.1, d.sub.2)∈
.sub.c.sup.+) creating a contradiction.
(15) The constraints in Equation 2 are referred to as cycle inequalities. Solving Equation 1 is intractable due to the large number of cycle inequalities. To attack such problems, prior art systems iterate between solving an integer linear program (“ILP”) over a nascent set of constraints Ĉ (initialized empty), and adding new constraints from the set of currently violated cycle inequalities. Generating constraints corresponds to iterating over (d.sub.1, d.sub.2)∈.sup.−, and identifying the shortest path between the d.sub.1, d.sub.2 in the graph with edges
, and weights equal to the vector x. If the corresponding path has total weight less than x.sub.d1d2, then the corresponding constraint is added to Ĉ. The linear program relaxation of Equations 1 and 2 can be solved instead of the ILP in each iteration until no violated cycle inequalities exist, after which the ILP is solved in each iteration.
(16) It is noted that in prior art systems, correlation clustering for computer vision did not require that cycle inequalities contain exactly one member of .sup.−, which is on the right hand side of Equation 2. The addition of cycle inequalities, that contain edges in
.sup.−,
.sup.+ on the left hand side, right hand side of Equation 2, respectively, do not tighten the ILP in Equation 1 and 2 or its linear program relaxation.
(17) The system of the present disclosure reformulates optimization in the ILP to admit efficient optimization via Benders decomposition. Benders decomposition is an exact MLP programming solver, but can be intuitively understood as a coordinate descent procedure, iterating between the master problem, and the subproblems. Solving the subproblems not only provides a solution for their variables, but also a lower bound in the form of a hyper-plane over the master problem's variables. The lower bound is tight at the current solution to the master problem.
(18) This formulation is defined by a minimal vertex cover on .sup.−, with members N⊂D indexed by n. Each n∈N is associated with a Benders subproblem, and is referred to as the root of that Benders subproblem. Edges in
.sup.− are partitioned arbitrarily between the subproblems, such that each (d.sub.1, d.sub.2)∈
.sup.− is associated with either the subproblem with root d.sub.1 or the subproblem with root d.sub.2. For example,
.sub.n.sup.− is the subset of
.sup.− associated with subproblem n. The subproblem with root n enforces the cycle inequalities C.sub.n, where C.sub.n is the subset of C containing edges in
.sub.n.sup.−.
.sub.n.sup.+ denotes the subset of
.sup.+ adjacent to n. Byway of example, the system assumes that N is provided. However, those skilled in the art would understand that N can be produced greedily or using an LP/ILP (linear program/integer linear program) solver.
(19)
(20) .sup.+/
.sub.n.sup.−, respectively. Equations 3 and 4 below describe the changes to x using x.sup.n, which is indexed as x.
(21)
(22) In step 24, the system 10 maps {x, x.sup.n ∀n∈N} to a solution {x*, x.sup.n* ∀n∈N}, where x* satisfies all cycle inequalities by construction, without increasing the cost according to Equation 3. The system 10 defines x* as seen below in Equation 5:
(23)
(24) Given x*, the optimizing solution to each Benders subproblem, n is denoted x.sup.n*, and defined as follows. The system 10 sets x.sup.n*.sub.d1d2=1, if (d.sub.1, d.sub.2)∈.sub.n.sup.−, and otherwise set to zero. It is noted that cost of {x*, x.sup.n* ∀n∈N}, is no greater than that of {x, x.sup.n ∀n∈N}, with regard to the objective in Equation 3. It is further noted that Q(ϕ, n, x*)=0 for all n∈N. Thus there always exists an optimizing solution to Equation 3 denoted x, such that Q(ϕ, n, x)=0 for all n∈i. Further, there exists an optimal partition x.sup.n, in Equation 4, that is 2-colorable. This is because any partition x.sup.n, can be altered without increasing its cost, by merging adjacent connected components not including the root node n. It is noted that merging any pair of such components does not increase the cost, since those components are not separated by negative weight edges.
(25) In step 26, the system 10 adapts optimization in Q (ϕ, n, x). For example, the system 10 can use the node labeling formulation of min-cut, which is expressed by the following notation: m.sub.d=1 for d∈: indicates that a node d is not in the component associated with n, and is otherwise zero. To avoid extra notation m.sub.n is replaced by 0. f.sub.d1d2=1 for (d.sub.1, d.sub.2)∈
.sup.+: indicates that the edge between d.sub.1, d.sub.2 is cut, but is not cut in x. Thus a penalty of ϕ.sub.d1d2 is added to Q(ϕ, n, x). It is observed that x.sup.n.sub.d1 d2=f.sub.d1 d2 for all (d1, d2)∈
.sup.+ f.sub.d1d2=1 for (d.sub.1, d.sub.2)∈
.sub.n.sup.−: indicates that the edge between d.sub.1, d.sub.2 is not cut, but is cut in x. Thus a penalty of −ϕ.sub.d1 d2 is added to Q(ϕ, n, x). Observe that x.sup.n.sub.d1 d2=1−f.sub.d1d2 for all (d.sub.1, d.sub.2)∈
.sub.n.sup.−. For benefit of readability, the edges are re-oriented from (d, n) to (n, d)
(26) The system 10 then expresses Q(ϕ, n, x) (as expressed by Equation 6, below) as a primal/dual linear program, with primal constraints associated with dual variables ψ, λ, which are noted in the primal. Given a binary x, the system 10 enforces that parameters f, m are non-negative to ensure that there is an optimizing solution for the parameter f, m that is binary. This is a consequence of optimization being total unimodular, given that x is binary. Total unimodularity is a known property of the min-cut/max flop linear program.
(27)
(28) In Equation 7, below, the system 10 denotes the binary indicator function, which returns one if the statement is true and zero otherwise.
(29)
(30) In an example, the system 10 considers the constraint that Q (ϕ, n, x)=0. It is observed that any dual feasible solution (λ, ψ in Equation 7) describes an affine function of x, that is a lower bound on Q(ϕ, n, x). The system 10 compacts the terms λ and ψ into ω.sup.z, where ω.sup.z.sub.d1d2 is associated with the term x.sub.d1d2, as expressed below Equations 8-11:
ω.sub.d.sub..sup.+−
.sub.n.sup.+
ω.sub.d.sub..sub.n.sup.+
ω.sub.d.sub..sub.n.sup.−
ω.sub.d.sub..sup.−−
.sub.n.sup.− Equations 8-11, respectively
(31) In step 28, the system 10 formulates the correlation cluster. Specifically, the set of all dual feasible solutions is denoted across n∈N as Z, which is indexed by the term z. It is observed that to enforce Q (ϕ, n, x)=0, it is sufficient to require that 0≥Σ.sub.d1d2∈Ex.sub.d1d2ω.sup.z.sub.d1d2, for all z∈Z. As such, the system 10 formulates the correlation cluster CC as an optimization using Z, as expressed below in Equation 12:
(32)
(33) It is noted that optimization in Equation 12 is intractable, since |Z| equals the number of dual feasible solutions across subproblems, which is infinite. Since the system 10 cannot consider the entire set Z, in step 30, the system 10 uses a cutting plane approach to construct a set {circumflex over (Z)}∈Z, that is sufficient to solve Equation 12. Specifically, the system 10 initializes {circumflex over (Z)} as the empty set and iterates between solving the LP relaxation of Equation 12 over {circumflex over (Z)} (referred to herein as the master problem), and generating new Benders rows until no violated constraints exist. This ensures that no violated cycle inequalities exist, but may not ensure that x in integral. To enforce integrality, the system 10 iterates between solving the ILP in Equation 12 over {circumflex over (Z)}, and adding Benders rows to {circumflex over (Z)}. By solving the LP relaxation first, we avoid unnecessary and expensive calls to the ILP solver.
(34) In step 32, the system 10 generates Benders rows. Specifically, in Benders decomposition, the variable of the original problem are divided into two subsets so that a first-stage master problem is solved over the first set variables, and the values for the second set of variables are determined in a second-stage subproblem for a given first-stage solution. If the subproblem determines that the fixed first-stage decisions are infeasible, then cuts are generated and added to the master problem, which is then resolved until no cuts can be generated. The new constraints added by Benders decomposition as it progresses towards a solution are called Benders rows.
(35) More specifically, given x, the system 10 iterates over N, and generate one Benders row using Equation 7, if n is associated with a violated cycle inequality. The system 10 determines if n is associated with a violated cycle inequality as follows. Given n, x the system 10 iterates over (d.sub.1, d.sub.2)∈.sub.n.sup.−. The system 10 then finds the shortest path from d.sub.1 to d.sub.2 on graph
, with weights equal to vector x. If the length of this path, denoted as Dist(d.sub.1, d.sub.2), is less than x.sub.d1d2, then the system 10 identified a violated cycle inequality associated with n.
(36) .sub.n.sup.−, which is executed by iterating over (d.sub.1, d.sub.2)∈
.sub.n.sup.−, and checking if the shortest path from d.sub.1 to d.sub.2 is less than x.sub.d1d2. This distance is defined on graph
with weights equal to x. In lines 8-10, the system 10 generates Benders rows associated with subproblem n, and add them to nascent set {circumflex over (Z)}. In line 11, the system 10 indicates that a Benders row was added the iteration. In lines 14-16, the system 10 instructs that if no Benders rows were added the iteration, then the system 10 enforces integrality on x, when solving the master problem for the remainder of the algorithm. Finally, in line 18, the system 10 returns solution x.
(37) Prior to the termination of the algorithm of , set x**.sub.d1d2=1, if x.sub.d1d2>½, and otherwise set x**.sub.d1d2=0. Second for each (d.sub.1, d.sub.2)∈
, set x*.sub.d1d2=1, if d.sub.1, d.sub.2 are in separate connected components of the solution described by x**, and otherwise set x*.sub.d1d2=0. The cost of the feasible integer solution x*, provides an upper bound on the cost of the optimal solution. A more sophisticated approach for producing feasible integer solutions will be discussed below.
(38) Returning to
(39) The system 10 uses a random negative valued vector (with unit norm) in place of the objective Equation 7. The random vector is unique each time a Benders subproblem is solved. In an example, the system 10 uses an objective of −1/(0.0001+|ϕ.sub.d1d2|), which encourages the cutting of edges with large positive weight, but it works as well as the random negative objective. Here, 0.0001 is a tiny positive number used to ensure that the terms in the objective do not become infinite.
(40) The system 10 uses Equation 13, below, to enforce that the new Benders row is active at x*, by requiring that the dual cost is within a tolerance v∈(0, 1) of the optimal with regards to the objective in Equation 7 (hereafter parameter v will be referred to as an optimal parameter).
(41)
(42) Specifically, v=1 requires optimality with respect to the objective in Equation 7, and v=0 ignores optimality. By way of example, v=½, provides strong performance.
(43) Testing and analysis of the above systems and methods will now be discussed in greater detail. The system of the present disclosure was applied to on the benchmark Berkeley Segmentation Data Set (“BSDS”). The experiments demonstrate the following: 1) the system solves correlation clustering instances for image segmentation; 2) the system successfully exploits parallelization; and 3) the system dramatically accelerates optimization.
(44) To benchmark performance, cost terms are used provided by the OPENGM2 dataset for BSDS. This allows for a direct comparison of the results of the system of the present disclosure to a benchmark. The present system used the random unit norm negative valued objective when generating MWR. The present system further used the IBM ILOG CPLEX Optimization Studio (“CPLEX”) to solve all linear and integer linear programming problems considered during the course of optimization. A maximum total CPU time of 600 seconds was used, for each problem instance (regardless of parallelization).
(45) The selection of N was formulated as a minimum vertex cover problem, where for every edge (d.sub.1, d.sub.2)∈.sup.−, at least one of d.sub.1, d.sub.2 is in N. The present system solved for the minimum vertex cover exactly as an ILP. Given N, edges are assigned in
.sup.− to a connected selected node in N arbitrarily. It is noted that solving for the minimum vertex cover consumed negligible CPU time for the data set. This can be attributed to the structure of the problem domain, since the minimum vertex cover is an NP-hard problem. For problem instances where solving for the minimum vertex cover exactly is difficult, the minimum vertex cover problem can be solved approximately or greedily.
(46)
(47) In
(48)
(49)
(50) The following demonstrates a proof that there exists an x that minimizes Equation 3, for which Q(ϕ, n, x)=0. The proof maps an arbitrary solution (x, {x.sup.n∀n∈N}) to one denoted (x, {x.sup.n*∀n∈N}) where Q(ϕ, n, x*)=0, without increasing the objective in Equation 3. Equation 14, below, is written in terms of x.sup.n:
(51)
(52) The updates in Equation 14 are equivalent to the following updates, in Equation 15, below, using f.sup.n, f.sup.n*, where f.sup.n, f.sup.n* correspond to the optimizing solution for f in subproblem n, given x, x* respectively.
(53)
(54) It is noted that the updates in Equations 14 and 15 preserve the feasibility of the primal LP in Equation 6. It is further noted that since f.sup.n* is a zero valued vector for all n∈N, then Q(ϕ, n, x)=0 for all n∈N.
(55) The total change in Equation 3 corresponding to edge (d.sub.1, d.sub.2)∈.sup.+, induced by Equation 14, is non-positive. The objective of the master problem increases by ϕ.sub.d1d2 max.sub.n∈N x.sup.n.sub.d1d2, while the total decrease in the objectives of the subproblems is ϕ.sub.d1d2 Σ.sub.n∈Nx.sup.n.sub.d1d2. Next, the total change in Equation 3 is considered corresponding to edge (d.sub.1, d.sub.2)∈
.sup.n, induced by Equation 14, which is zero. The objective of the master problem increases by −ϕ.sub.d1d2(1−x.sup.n.sub.d1d2), while objective of subproblem n decreases by −ϕ.sub.d1d2(1−x.sup.n.sub.d1d2).
(56) The approach for producing feasible integer solutions will now be discussed. Prior to the termination of optimization, it can be valuable to provide feasible integer solutions on demand. This is so that a user can terminate optimization, when the gap between the objectives of the integral solution and the relaxation is small. The production of feasible integer solutions is considered, given the current solution x* to the master problem, which may neither obey cycle inequalities or be integral. This procedure is referred to as rounding.
(57) Rounding is a coordinate descent approach defined on the graph with weights κ, determined using x* as seen in Equation 16, below:
κ.sub.d.sub..sup.+
κ.sub.d.sub..sup.− Equation 16
(58) It is considered that x* is integral and feasible (where feasibility indicates that x* satisfies all cycle inequalities). Let x.sup.n* define the boundaries, in partition x*, of the connected component containing n. Here, x.sup.n*.sub.d1d2=1 if one of d.sub.1, d.sub.2 is in the connected component containing n under cut x*. It is observed that Q(κ, n, x.sup.n0)=0, where x.sup.0n.sub.d1d2=[(d.sub.1, d.sub.2)∈.sub.n.sup.−], is achieved using x.sup.n* as the solution to Equation 6. Thus x.sup.n* is the minimizer of Equation 6. The union of the edges cut in x.sup.n* across n∈N is identical to x*. It is observed that when x* is integral, and feasible, then a solution is produced that has cost equal to that of x*, as seen below in Equation 17:
(59)
(60) The procedure of Equation 17, can be used regardless of whether x* is integral or feasible. It is observed that if x* is close to integral and close to feasible, then Equation 17 is biased to produce a solution that is similar to x* by design of κ.
(61) A serial version of Equation 17 will now be discussed, which can provide improved results. A partition x.sup.+ is constructed by iterating over n∈N, producing component partitions as in Equation 17. The term κ is altered by allowing for the cutting of edges previously cut with cost zero.
(62)
(63) The functionality provided by the present disclosure could be provided by computer vision software code 106, which could be embodied as computer-readable program code stored on the storage device 104 and executed by the CPU 112 using any suitable, high or low level computing language, such as Python, Java, C, C++, C #, .NET, MATLAB, etc. The network interface 108 could include an Ethernet network interface device, a wireless network interface device, or any other suitable device which permits the server 102 to communicate via the network. The CPU 112 could include any suitable single-core or multiple-core microprocessor of any suitable architecture that is capable of implementing and running the computer vision software code 106 (e.g., Intel processor). The random access memory 114 could include any suitable, high-speed, random access memory typical of most modern computers, such as dynamic RAM (DRAM), etc.
(64) Having thus described the system and method in detail, it is to be understood that the foregoing description is not intended to limit the spirit or scope thereof. It will be understood that the embodiments of the present disclosure described herein are merely exemplary and that a person skilled in the art can make any variations and modification without departing from the spirit and scope of the disclosure. All such variations and modifications, including those discussed above, are intended to be included within the scope of the disclosure.