CALCULATOR AND MATRIX FACTORIZATION METHOD
20170262411 · 2017-09-14
Assignee
Inventors
Cpc classification
G06F17/16
PHYSICS
International classification
Abstract
For each process unit, a calculator allocates, in a storing unit, a first storage area with a storage capacity corresponding to the total data volume obtained by adding the data volume of rows and columns included in the process unit to the data volume of a predetermined count of rows and columns. Using the first storage area, the calculator performs first factorization on the rows and columns of each target process unit and rows and columns transferred from process units having already undergone factorization processing. The calculator allocates, in the storing unit, a second storage area when there are, among rows and columns determined to be transferred as a result of the first factorization, those that do not fit in and are left out of the first storage area. The calculator performs second factorization on the left-out rows and columns using the second storage area.
Claims
1. A calculator comprising: a memory configured to store, as results of factorization of a matrix, a plurality of matrices including a lower triangular matrix and an upper triangular matrix, the matrix having nonzero elements arranged symmetrically along a diagonal; and a processor configured to perform a procedure including: dividing, based on an arrangement of the nonzero elements, the matrix into a plurality of process units for factorization processing, each of which is a set of rows and columns sharing successive diagonal elements, and allocating, in the memory, a first storage area with a storage capacity with respect to each of the plurality of process units, the storage capacity of the first storage area corresponding to a total data volume obtained by adding a data volume of the set of rows and columns of said each process unit to a data volume of a predetermined count of rows and columns, selecting each of the plurality of process units as a target process unit in a predetermined order, and performing, with use of the first storage area, a first factorization process on the set of rows and columns of the target process unit and rows and columns transferred to the target process unit, the rows and columns being transferred from process units having already undergone the factorization processing because values of diagonal elements shared by the rows and columns are individually less than or equal to a predetermined value, allocating, in the memory, a second storage area with a storage capacity when there are, among rows and columns determined to be transferred as a result of the first factorization process because values of diagonal elements shared by the rows and columns are individually less than Or equal to a predetermined value, rows and columns that do not fit in and are therefore left out of the first storage area, the storage capacity of the second storage area corresponding to a data volume of the rows and columns left out of the first storage area, and performing a second factorization process on the rows and columns left out of the first storage area with use of the second storage area.
2. The calculator according to claim 1, wherein: the performing the first factorization process and the performing the second factorization process include transferring, when i.sup.th row and column (i is an integer greater than or equal to 1) are transfer targets, elements of the i.sup.th column having row numbers equal to or greater than i to a position following the columns of the target process unit and transferring elements of the i.sup.th row having column numbers greater than i to a position following the rows of the target process unit.
3. The calculator according to claim 1, wherein: the procedure further includes determining parent-child relationships among the plurality of process units according to a predetermined rule, and the performing the first factorization process includes transferring, to positions following the rows and columns of the target process unit, rows and columns having been unable to pass through the factorization processing of a process unit corresponding to a child of the target process unit amongst process units having already undergone the factorization processing, and then performing the first factorization process, the rows and columns having been unable to pass through the factorization processing because values of diagonal elements shared by the rows and columns are individually less than or equal to a predetermined value.
4. A matrix factorization method comprising: dividing, by a processor connected to a memory, based on an arrangement of nonzero elements of a matrix, the matrix into a plurality of process units for factorization processing, each of which is a set of rows and columns sharing successive diagonal elements, and allocating, in the memory, a first storage area with a storage capacity with respect to each of the plurality of process units, the storage capacity of the first storage area corresponding to a total data volume obtained by adding a data volume of the set of rows and columns of said each process unit to a data volume of a predetermined count of rows and columns, the matrix having the nonzero elements arranged symmetrically along a diagonal, the memory being configured to store a plurality of matrices including a lower triangular matrix and an upper triangular matrix as results of factoriztation of the matrix; selecting, by the processor, each of the plurality of process units as a target process unit in a predetermined order, and performing, with use of the first storage area, a first factorization process on the set of rows and columns of the target process unit and rows and columns transferred to the target process unit, the rows and columns being transferred from process units having already undergone the factorization processing because values of diagonal elements shared by the rows and columns are individually less than or equal to a predetermined value; allocating, by the processor, in the memory, a second storage area with a storage capacity when there are, among rows and columns determined to be transferred as a result of the first factorization process because values of diagonal elements shared by the rows and columns are individually less than or equal to a predetermined value, rows and columns that do not fit in and are therefore left out of the first storage area, the storage capacity of the second storage area corresponding to a data volume of the rows and columns left out of the first storage area; and performing, by the processor, a second factorization process on the rows and columns left out of the first storage area with use of the second storage area.
5. A non-transitory computer-readable storage medium storing a computer program that causes a computer including a memory to perform a procedure comprising: dividing based on an arrangement of nonzero elements of a matrix, the matrix into a plurality of process units for factorization processing, each of which is a set of rows and columns sharing successive diagonal elements, and allocating, in the memory, a first storage area with a storage capacity with respect to each of the plurality of process units, the storage capacity of the first storage area corresponding to a total data volume obtained by adding a data volume of the set of rows and columns of said each process unit to a data volume of a predetermined count of rows and columns, the matrix having the nonzero elements arranged symmetrically along a diagonal, the memory being configured to store a plurality of matrices including a lower triangular matrix and an upper triangular matrix as results of factorization of the matrix; selecting each of the plurality of process units as a target process unit in a predetermined order, and performing, with use of the first storage area, a first factorization process on the set of rows and columns of the target process unit and rows and columns transferred to the target process unit, the rows and columns being transferred from process units having already undergone the factorization processing because values of diagonal elements shared by the rows and columns are individually less than or equal to a predetermined value; allocating in the memory, a second storage area with a storage capacity when there are, among rows and columns determined to be transferred as a result of the first factorization process because values of diagonal elements shared by the rows and columns are individually less than or equal to a predetermined value, rows and columns that do not fit in and are therefore left out of the first storage area, the storage capacity of the second storage area corresponding to a data volume of the rows and columns left out of the first storage area; and performing a second factorization process on the rows and columns left out of the first storage area with use of the second storage area.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0012]
[0013]
[0014]
[0015]
[0016]
[0017]
[0018]
[0019]
[0020]
[0021]
[0022]
[0023]
[0024]
[0025]
[0026]
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
DESCRIPTION OF EMBODIMENTS
[0047] Several embodiments will be described below with reference to the accompanying drawings, wherein like reference numerals refer to like elements throughout. Note that two or more of the embodiments below may be combined for implementation in such a way that no contradiction arises.
(a) First Embodiment
[0048]
[0049] The storing unit 11 stores therein results of the factorization of the matrix 1. The matrix 1 is divided into a plurality of process units 2 to 6. Each of the process units 2 to 6 is a set of columns with identical nonzero-element arrangement and rows sharing diagonal elements with the columns. Each of the process units 2 to 6 undergoes a factorization process.
[0050] The parent-child relationship determining unit 12 determines parent-child relationships among the process units 2 to 6 according to predetermined rules. For example, in the case where the values of elements in the process unit 4 are updated as a result of the factorization process of the process unit 3, the process unit 3 is the child and the process unit 4 is the parent in the parent-child relationship of these two process units 3 and 4.
[0051] The first storage area allocating unit 13 divides the matrix 1 into a plurality of process units 2 to 6 based on the arrangement of the nonzero elements. In this regard, a set of rows and columns sharing successive diagonal elements is referred to as a process unit for factorization. For example, the first storage area allocating unit 13 treats a set of rows and columns with an identical or nearly identical nonzero-element arrangement as a factorization process unit. Such factorization process units are referred to as supernodes in the context of the supernodal approach. Then, with respect to each of the process units, the first storage area allocating unit 13 allocates, in the storing unit 11, a first storage area 11a with a storage capacity corresponding to the total data volume obtained by adding the data volume of rows and columns included in the process unit to the data volume of a predetermined count of rows and columns (the hatched area in
[0052] The first factoring unit 14 selects each of the process units 2 to 6 as a target process unit in a predetermined order. Then, using the first storage area 11a, the first factoring unit 14 performs the factorization process on the rows and columns of the target process unit as well as rows and columns transferred from a process unit having already undergone the factorization process because the values of their diagonal elements are less than or equal to a predetermined value. For example, the first factoring unit identifies, amongst process units having already undergone their factorization processes, a child process unit of the target process unit, and moves rows and columns that were excluded from the factorization process of the child process unit due to the value of each of their diagonal elements being less than or equal to a predetermined value to positions following the rows and columns, respectively, of the target process unit. Such a row and column transfer technique is known as delayed pivoting. The delayed pivoting allowing not only row permutations but also column permutations avoids the nonzero-element arrangement of the matrix 1 from losing its symmetric property and, therefore, prevents increased memory usage.
[0053] Note that only rows and columns enough to fit in the first storage area 11a are transferred to the first storage area 11a in delayed pivoting. If there are so many rows and columns to be moved to the target process unit in delayed pivoting that some rows and columns would be left out of the first storage area 11a of the target process unit, the left-out rows and columns are transferred after allocation of a second storage area 11b. Therefore, the first factoring unit 14 performs the factorization process only on the rows and columns allowed to be stored in the first storage area 11a.
[0054] The second storage area allocating unit 15 determines whether all the following fits in the first storage area 11a: the rows and columns transferred from the preceding process unit having already undergone the factorization process; and rows and columns determined to be moved as a result of the factorization process of the target process unit due to the value of each of their diagonal elements being less than or equal to the predetermined value. Then, if there are rows and columns that do not fit in the first storage area 11a, the second storage area allocating unit 15 allocates, in the storing unit 11, the second storage area 11b with a storage capacity corresponding to the data volume of the rows and columns left out of the first storage area 11a. Using the second storage area 11b, the second factoring unit 16 performs the factorization process on the rows and columns left out of the first storage area 11a.
[0055] According to the above-described calculator 10, in factoring the matrix 1, the first storage area 11a with extra spaces is allocated for each of the process units 2 to 6. Then, in the factorization process of each of the process units 2 to 6, the second storage area 11b is allocated only if transfer of rows and columns whose data volume exceeds the capacity of the first storage area 11a has taken place due to delayed pivoting. Therefore, in the case where all rows and columns transferred in association with delayed pivoting fit in the extra spaces of the first storage area 11a, there is no need to allocate the second storage area 11b, thus achieving efficient memory management.
[0056] In addition, as for the allocation of the second storage area 11b, the second storage area 11b has a collective storage capacity that accommodates, amongst a number of rows and columns to be transferred due to delayed pivoting, those not fitting into and left out of the first storage area 11a. That is, the additional storage area allocating process needs to be carried out only once for one process unit. Therefore, compared to the case of allocating storage area each time row and column transfer due to delayed pivoting takes place, more efficient memory management is achieved.
[0057] In the case where a row-and-column transfer target is the i.sup.th row and column (i is an integer greater than or equal to 1; the i.sup.th row and column are hereinafter denoted by “row and column i”), the first factoring unit 14 and the second factoring unit 16 transfer, amongst all elements in column i, those with row numbers equal to and greater than i to a position following the columns belonging to the target process unit. In addition, the first factoring unit 14 and the second factoring unit 16 transfer, amongst all elements in row i, those with column numbers greater than i to a position following the rows belonging to the target process unit. That is, amongst the elements of row and column i of the matrix 1, only those with both row and column numbers are equal to and greater than i are transferred. This reduces storage capacity requirements at the transfer destination.
[0058] Note that the parent-child relationship determining unit 12, the first storage area allocating unit 13, the first factoring unit 14, the second storage area allocating unit 15; and the second factoring unit 16 may be implemented, for example, by a processor of the calculator 10. The storing unit 11 may be implemented, for example, by memory of the calculator 10 or a storage device. In
(b) Second Embodiment
[0059] A second embodiment is described next. The second embodiment is directed to a computer for efficiently calculating the solution of a set of simultaneous linear equations represented by a sparse matrix in running simulations or solving mathematical programming problems based on mathematical models described by, for example, partial differential equations.
[0060] <Hardware Configuration>
[0061] A hardware configuration of the computer according the second embodiment is described first.
[0062] The memory 102 is used as a main memory device of the computer 100. The memory 102 temporarily stores at least part of an operating system (OS) program and application programs to be executed by the processor 101. The memory 102 also stores therein various types of data to be used by the processor 101 for its processing. As the memory 102, a volatile semiconductor storage device such as random access memory (RAM) may be used.
[0063] The peripherals connected to the bus 109 include a storage device 103, a graphics process unit 104, an input interface 105, an optical drive unit 106, a device connection interface 107, and a network interface 108. The storage device 103 electrically or magnetically writes and reads data to and from a built-in memory medium, and is used as a secondary storage device of the computer 100. The storage device 103 stores therein the OS program, application programs, and various types of data. A hard disk drive (HDD) or solid state drive (SSD), for example, may be used as the storage device 103.
[0064] To the graphics process unit 104, a monitor 21 is connected. According to an instruction from the processor 101, the graphics process unit 104 displays an image on a screen of the monitor 21. A cathode ray tube (CRT) display or a liquid crystal display, for example, may be used as the monitor 21. To the input interface 105, a keyboard 22 and a mouse 23 are connected. The input interface 105 transmits signals sent from the keyboard 22 and the mouse 23 to the processor 101. Note that the mouse 23 is just an example of pointing devices, and a different pointing device such as a touch panel, a tablet, a touch-pad, and a track ball, may be used instead.
[0065] The optical drive unit 106 reads data recorded on an optical disk 24 using, for example, laser light. The optical disk 24 is a portable storage medium on which data is recorded in such a manner as to be read by reflection of light. Examples of the optical disk 24 include a digital versatile disc (DVD), a DVD-RAM, a compact disk read only memory (CD-ROM), a CD recordable (CD-R), and a CD-rewritable (CD-RW).
[0066] The device connection interface 107 is a communication interface for connecting peripherals to the computer 100. To the device connection interface 107, for example, a memory device 25 and a memory reader/writer 26 may be connected. The memory device 25 is a storage medium having a function for communicating with the device connection interface 107. The memory reader/writer 26 is a device for writing and reading data to and from a memory card 27 which is a card-type storage medium. The network interface 108 is connected to a network 20. Via the network 20, the network interface 108 transmits and receives data to and from other computers and communication devices.
[0067] The hardware configuration described above achieves the processing functions of the second embodiment. Note that the calculator of the first embodiment may also be built with the same hardware configuration as the computer 100 of
[0068] <Overview of Matrix Factorization Process>
[0069] The computer 100 of
[0070] The computer 100 calculates the size of a panel or panels for storing results obtained by factoring each supernode. Then, the computer 100 uses a memory area assigned to the supernode according to the calculated panel size to thereby perform numerical factorization. In order to enhance the stability of the factorization to thereby improve its accuracy, a pivot is selected from a diagonal block (a region including one or more diagonal elements) corresponding to the supernode. However, if there is no acceptable pivot, the computer 100 implements delayed pivoting. In implementing delayed pivoting in the supernodal approach, the computer 100 establishes, as a panel/panels for storing the results of the factorization of the supernode, a storage area with extra space added in view of row and column transfer. The count of columns/rows representing the size of the extra space to be added is designated to the computer 100 prior to the symbolic factorization. The computer 100 calculates the size of the panel/panels each with the added space based on the information obtained from the symbolic factorization, and assigns the panel/panels to the supernode before the numerical factorization. The supernode to which the panel/panels are assigned at this point in time is here referred to as the “primary supernode”.
[0071] In the case where the added space is not enough and the storage capacity has run out, the computer 100 creates a secondary supernode in the transfer-destination supernode, and assigns a storage panel/panels to the secondary supernode and transfers pivots thereto (i.e., the pivots are delayed). The panel/panels assigned by the computer 100 to the secondary supernode have a size enough to accommodate delayed pivots left out of the added space as well as delayed pivots newly produced by the LU or LDL.sup.T factorization of the transfer-destination supernode.
[0072] The computer 100 assigns a memory pool with designated size used to assign the panel/panels for the secondary supernode at the same time as assigning the panel/panels for the primary supernode, and dynamically allocates and assigns the panel/panels to the secondary supernode from the memory pool. Thus, in order to incorporate delayed pivoting into the supernodal approach, a space is added to each panel of the supernode, and when the space for delayed pivoting runs out, a secondary supernode is created for delayed pivoting.
[0073] <Delayed Pivoting>
[0074] Delayed pivoting is described next in detail. In performing LU factorization of a matrix using a direct method to solve simultaneous linear equations, it is sometimes the case that the factorization is no longer possible if a diagonal element being the pivot takes a small or zero value. In order to avoid this problem, a technique known as delayed pivoting is employed.
[0075] For example, assume here that a matrix P is a permutation matrix, and an element in row a and column b (a and b are integers greater than or equal to 1) of the matrix P is denoted by p.sub.a,b. Assume also that the matrix P is an orthogonal matrix where p.sub.i,i=p.sub.i+1,i+1=0, p.sub.i+1,i=p.sub.i,i+1=1, and p.sub.j,j=1 (j is different from i and i+1) and the remaining elements are all 0. When a factorization-target matrix A is multiplied with the matrix P from the left, row i and row i+1 are permuted with each other. When the matrix A is multiplied with the matrix P from the right, column i and column i+1 are permuted with each other. The permutation operation is expressed as:
B=PAP.sup.T
where B is the matrix after the permutations.
[0076] This operation corresponds to symmetrically permuting row and column i with row and column i+1. When similar permutation operations are applied successively to (i, i+1), (i+1, i+2), . . . , and (j−1, j), row and column i are permuted to row and column j while permuting i+1, . . . , and j to i, . . . , and j−1, respectively. This corresponds to sequentially multiplying the matrix A with P.sub.i, p.sub.i+1, . . . , and P.sub.j−1 from both sides. Note however that P.sub.i is an orthogonal matrix used to permute row i with row i+1 when being multiplied with the matrix A from the left.
[0077] Continuing factorization with such permutations implements update of the element (j, j) during the subsequent factorization of each row and column, and it is expected that the absolute value of this element would become large enough to meet the requirement to be a pivot. If the element fails to take a sufficiently large value in the factorizations after the permutations, the element is again moved to a later position, and this operation is repeated until the element becomes sufficient to be a pivot. Thus, delayed pivoting is a process which transfers a row and column including a pivot candidate to a later position by symmetric permutations.
[0078] In the case of permuting row and column i with row and column i+1, an element a.sub.i,i in row and column i is permuted with an element a.sub.i+1,i+1 in row and column i+1. Then, the element is updated as follows by LU factorization of row and column i and an outer product update: a.sub.i+1,i+1=a.sub.i+1,i+1−a.sub.i+1,i×(a.sub.i,i+1/a.sub.i,i). By the =a.sub.i+1,i+1 permutations and update, the absolute value of the element a.sub.i+1,i+1 is expected to take a sufficiently large value to be a pivot.
[0079] <Application of Delayed Pivoting to Supernodal Approach>
[0080] Let us consider the case of implementing delayed pivoting during matrix factorization using a supernodal approach, which is a direct method to solve simultaneous linear equations with a sparse matrix. In the supernodal approach, pivots are selected in the diagonal block of each supernode. Therefore, by moving target rows and columns to the end of its parent supernode in dependencies, suitable pivots are selected after appropriate reordering. As a result of transferring the rows and columns in delayed pivoting to the end of its parent supernode, the transferred rows and columns undergo updates based on results of the LU factorization of the parent supernode.
[0081] In the direct method used to solve simultaneous linear equations with a sparse matrix, fill-in and dependencies induced by factoring the sparse matrix are determined before the factorization. This step is called symbolic factorization. Each column composed of nonzero elements whose row numbers are larger than the row number of its diagonal element is referred to as a “node”. Then, using such information, adjacent columns with nearly identical nonzero patterns are grouped together to form each supernode.
[0082]
[0083] The computer 100 performs symbolic factorization to determine the size of a storage area for the results of the LU factorization of each supernode. Then, prior to the numerical factorization, the computer 100 assigns a storage area to the supernode according to the determined size. Subsequently, the numerical factorization is performed. In order to provide stability of the numerical factorization, the computer 100 implements delayed pivoting.
[0084] <Issues Associated with Application of Delayed Pivoting to Supernodal Approach>
[0085] For general matrices, an algorithm has been developed which achieves column permutations in such a manner that elements with large values are aligned on the diagonal. The algorithm converts a matrix to be almost diagonally dominant prior to the LU factorization. There is a technique known as threshold pivoting, which accepts an element as a pivot if the absolute value of the element exceeds a certain threshold. When the threshold pivoting strategy is applied to a matrix being diagonally dominant, it is often the case that, in pivot search, a pivot is found in the vicinity. However, application of this preprocessing to a structurally symmetric matrix or symmetric indefinite matrix undermines the symmetry property of the original matrix, which therefore makes the use of the original symmetry impossible. For this reason, delayed pivoting is regarded as a key technique to provide stability while maintaining the symmetry of the original matrix. The term “structurally symmetric” here refers to nonzero elements being symmetric along the diagonal.
[0086] In the supernodal approach for a general real matrix, pivot search is performed within each supernode. If the pivot search is not successful, static pivoting is employed, which replaces a pivot by a numerical value of an appropriate magnitude (a magnitude of about 1.0D-8 to 1.0D-10 in the case of double precision real numbers, i.e., the precision being reduced by approximately 50%), to compute an approximate factorization. Because the accuracy of the factorization is reduced in the process, the solution is improved through iterative refinement with higher precision (for example, using quadruple-precision numbers in the case of double precision real numbers). One approach to computing LU factorization of a general real matrix A is to build an elimination tree associated with A+A.sup.T containing the matrix A. Incorporating delayed pivoting not undermining the symmetry property of the matrix into this approach is expected to produce positive effects. That is, better performance is expected with a fewer number of refinement iterations without causing accuracy loss in the factorization.
[0087] Application of delayed pivoting to a direct method for solving simultaneous linear equations with a sparse matrix causes dynamic changes in matrix elements constituting each supernode, and moving rows and columns to later positions involves an increase in the area for storing the elements of these rows and columns. Further, the exact increment in the storage area is known only when delayed pivoting is actually implemented. Therefore, the size of the storage area needs to be dynamically calculated, and then allocation or release of storage space is made according to the calculated results. In fact, in a multifrontal method, intermediate results of the factorization are temporarily stored in a dynamically allocated area, and columns and rows to be factored are assembled for each step in the factorization while taking a closer look at dependencies. Therefore, the multifrontal method often needs a large amount of memory and has high computation complexity.
[0088] In the supernodal approach, symbolic factorization is performed before numerical factorization to analyze dependencies among nodes, which are expressed by a tree structure. The tree structure is used in the subsequent numerical factorization. Then, the symbolic factorization computes the size of an area for storing results of the numerical factorization, and assigns the calculated storage area before the numerical factorization takes place. However, application of delayed pivoting involves dynamic changes in the storage area usage, which results in dynamic management of almost all the memory and, therefore, very high computational burden.
[0089] <Dependencies Among Supernodes>
[0090] The second embodiment aims at a viable processing scheme used in running simulations or solving mathematical programming problems to preliminarily assign a storage area by making the best use of information obtained from symbolic factorization and then analyze a tree structure representing dependencies. Before functions of the second embodiment are described, an overview of an elimination tree and row subtrees used to determine dependencies among supernodes will be given next.
[0091] The LL.sup.T factorization of a sparse symmetric positive matrix uses an elimination tree and row subtrees. Note that the use of an elimination tree and row subtrees is applicable also to structurally symmetric real matrices and symmetric indefinite matrices. Here, the elimination tree and row subtrees used in the LL.sup.T factorization of a sparse symmetric positive matrix are described.
[0092] An elimination tree representing dependencies among individual columns of a matrix L obtained by factorization is determined from the nonzero pattern of a sparse symmetric positive matrix P. The matrix P is factored as P=LL.sup.T using Cholesky factorization. When an element in row i and column j of the matrix L is denoted by L.sub.ij, the parent of j satisfies the following relation: parent[j]=min{i|i>j, L.sub.ij≈0}.
[0093] The top-most node having no parent in the elimination tree is referred to as the “root” node. When p is the parent of a node q, the node q is referred to as a child node of the parent p. Numbers obtained through a depth-first search from the root node of the elimination tree are called “postorder” numbers. The depth-first search from the root node arrives at a node not partitioned any further in the elimination tree, which is referred to as “leaf” node. Each leaf node is associated with individual nodes visited when the elimination tree is traversed from the leaf node up to the root node through parent nodes. When a plurality of leaf nodes are associated with a given node, one assigned the smallest postorder number among the leaf nodes is referred to as the “first descendant” of the node.
[0094] When column j in the lower triangular matrix of the original symmetric positive matrix is denoted by b.sub.j, the nonzero pattern of the matrix L is the union of b.sub.j and a column l.sub.k of L, which is a child node of j. Based on this concept, nonzero elements in row i of L are represented as a subtree of the elimination tree. This subtree is a row subtree having the i.sup.th node as its root node.
[0095] Next described is a specific example of an elimination tree and row subtrees obtained as a result of the LL.sup.T factorization of a symmetric positive matrix, with reference to
[0096] In the matrix before the factorization, nonzero elements are denoted by black circles. When the LL.sup.T factorization of the matrix is performed, new nonzero elements are added to the lower triangular matrix L after the factorization. These new nonzero elements created by the factorization are denoted in
[0097]
[0098] The order in which the depth-first search visits the nodes of the elimination tree 91 is as follows: 2.fwdarw.3.fwdarw.1.fwdarw.6.fwdarw.7.fwdarw.4.fwdarw.5.fwdarw.8.fwdarw.9.fwdarw.10.fwdarw.11. Then, postorder numbers are assigned to the individual nodes according to the depth-first search order. Referring back to
[0099] In like fashion, a row subtree 93 with Node 11 is formed. Nonzero elements, except for the diagonal element, are located in {1, 5, 8} on the row with row number “11” in the matrix before the factorization. When these nonzero elements are taken out in a postorder sequence, the order stays the same, i.e., 1.fwdarw.5.fwdarw.8. The first descendants of Nodes 1, 5, 8 are Nodes 1, 4, 2, respectively. Node 1 is first and therefore a leaf node. The postorder number of Node 4, which is the first descendant of Node 5, is larger than the postorder number of Node 1, and therefore Node 4 is located on a different branch below a branch node (a depth-first search first visits leaf nodes before branch nodes). Because the postorder number of Node 2, which is the first descendant of Node 8, is smaller than the postorder number of Node 5, Node 2 is located on the original branch below the branch node, and there is no branching between Nodes 5 and 8.
[0100] Thus, the nonzero elements of each row in the matrix before the factorization are taken out in the postorder sequence, and the postorder number of a node taken out the last time is compared with the postorder number of the first descendant of a node currently taken out. If the postorder number of the first descendant of the current node is larger, the current node is a leaf node of the row subtree. That is, because the postorder numbers have been assigned according to the depth-first search order, the postorder number of the first descendant of the current node becomes larger when branching takes place at a common ancestor of the current node and the previous node. The same results will be obtained by taking out nonzero elements of each row in the postorder sequence, then remembering the previous leaf node instead of the previous node, and making an update when a new leaf node is found.
[0101] The row subtree indicates nonzero elements of each row. In the case of a real matrix with nonzero elements being structurally symmetric, the row subtree also indicates nonzero elements of a column in the matrix U, located symmetrically to each row in the matrix L. It is known that, within the matrix L, columns used to update a column corresponding to each node in the postorder sequence (as well as, within the matrix U, rows used to update a row corresponding to each node in the postorder sequence) are those corresponding to nodes included in the row subtree of the node.
[0102] Such is the case with supernodes, each consisting of nodes corresponding to a contiguous set of columns of the matrix L with nearly identical nonzero patterns. That is, when each supernode being a set of a plurality of nodes is viewed as a single node, tree structures representing dependencies among supernodes are generated by the same procedures described in
[0103] <Functions of Computer>
[0104]
[0105] The analyzing unit 120 executes various analytical processing involving the solution to simultaneous linear equations. For example, the analyzing unit 120 runs simulation for fluid analyses, for example. In the analytical processing, the analyzing unit 120 performs LU or LDL.sup.T factorization of a sparse matrix in order to solve simultaneous linear equations.
[0106] Note that the function of each component illustrated in
[0107] <LU Factorization of Structurally Symmetric Sparse Real Matrix>
[0108] Next described is LU factorization of a structurally symmetric sparse real matrix.
[0109] [Step S101] The analyzing unit 120 receives an input designating the size of a space added to each of the column and row panels. For example, the analyzing unit 120 acquires a value indicating the size of the space, input by the user via an input device, e.g., the keyboard 22.
[0110] [Step S102] The analyzing unit 120 performs symbolic factorization. The symbolic factorization achieves generation of an elimination tree, detection of supernodes, and generation of supernodal row subtrees.
[0111] [Step S103] For each of the column panel and the row panel, the analyzing unit 120 calculates the size of the panel with the added space.
[0112] [Step S104] The analyzing unit 120 allocates, in the memory 102, a memory pool used to assign a column-panel area, a row-panel area, and a secondary supernode area.
[0113] [Step S105] The analyzing unit 120 performs LU factorization. In the LU factorization, the analyzing unit 120 uses the spaces prepared for transferring pivots to be delayed (hereinafter referred to as “delayed pivots” for the sake of simplicity). In the case where the count of delayed pivots to be transferred exceeds the size of the spaces, the analyzing unit 120 creates a secondary supernode and stores rows and columns corresponding to delayed pivots being left out.
[0114] The LU factorization is achieved by the above-described procedure. Next, effective use of the memory 102 in the LU factorization is described in detail. In the case of performing LU factorization of a structurally symmetric sparse real matrix, elements of each supernode are stored in a storage area called panels.
[0115] The analyzing unit 120 adds spaces 33 for delayed pivots to the size of each supernode obtained by the symbolic factorization. Specifically, the analyzing unit 120 provides the space 33 for each of a column panel for collectively storing columns and a row panel for collectively storing rows. The panels are two-dimensional, and each of the spaces 33 is provided both in the first and second dimensions.
[0116] The maximum count of delayed pivots allowed to be transferred to the storage panels of the primary supernode area 31 is denoted by nsp1. The maximum count associated with the second dimensional size of the storage panels of the secondary supernode area 32 to be dynamically allocated when the spaces of nsp1 run out is denoted by nsp2. The size to be dynamically allocated corresponds to the count (nmpssp) obtained by subtracting nsp1 from the count of delayed pivots to be transferred to the back of the primary supernode area 31. That is, when nmpssp is smaller than nsp2, the processing is able to be continued. Note that, within the secondary supernode area 32, the analyzing unit 120 allocates spaces of nsp3=nsp1+nsp2 in the first dimension.
[0117] Information indicating the panel configurations illustrated in
[0118] Next, the area management information 111 and the matrix factorization workspace 112 of the storing unit 110 are described in detail with reference to
[0119] The supernode-based management information 111b, 111c, and so on individually includes primary supernode information 111-1 and secondary supernode information 111-2. The primary supernode information 111-1 includes ipscp, ipsrp, ipslpindx, ipsupindx, ipsrex, ipscex, ndb, nbboff, nmppsp, and npppsp. ipscp is an index indicating the column panel of the primary supernode. ipsrp is an index indicating the row panel of the primary supernode. ipslpindx is an index indicating an index list of the column panel of the primary supernode. ipsupindx is an index indicating an index list of the row panel of the primary supernode. ipsrex is the history of row exchanges of the primary supernode. ipscex is the history of column exchanges of the primary supernode. ndb is the size of the supernode determined in the symbolic factorization. nbboff is the size of the supernode determined in the symbolic factorization, excluding the diagonal elements. nmppsp is the count of delayed pivots transferred to the primary supernode. npppsp is the count of nodes remained in the primary supernode as pivots.
[0120] The secondary supernode information 111-2 includes isscp, issrp, isslpindx, issupindx, issrex, isscex, nmpssp, and nppssp. isscp is an index indicating the column panel of the secondary supernode. issrp is an index indicating the row panel of the secondary supernode. isslpindx is an index indicating an index list of the column panel of the secondary supernode. issupindx is an index indicating an index list of the row panel of the secondary supernode. issrex is the history of row exchanges of the secondary supernode. isscex is the history of column exchanges of the secondary supernode. nmpssp is the count of delayed pivots transferred to the secondary supernode. nppssp is the count of nodes remained in the secondary supernode as pivots.
[0121]
[0122] When not all delayed pivots to be transferred to a supernode fit in the primary supernode, a secondary supernode is generated. Also for the generated secondary supernode, the areas illustrated in
[0123] Data on the secondary supernode is stored in a memory pool (a storage area reserved in the memory 102), separately from data on the primary supernode. The size of the entire memory pool is preliminarily designated by the user. The storage area for the secondary supernode is assigned from the preliminarily allocated memory pool. Each area of
[0124] It is possible to hold information on locations, or the like, of the individual data storage areas for the primary and secondary supernodes, for example, using two-dimensional arrays, in each of which the second dimension designates a supernode and the first dimension is used to store corresponding information.
[0125] Next described are supernodes in dependencies. According to the second embodiment, rows and columns corresponding to delayed pivots are transferred from a child supernode to a parent supernode having a dependency relationship with the child supernode.
[0126] Note here that the maximum count of delayed pivots allowed to be transferred to a supernode determined in symbolic factorization from its child supernode is nsp1+nsp2. In addition, if the count of delayed pivots to be transferred from the child supernode exceeds nsp1, a secondary supernode is generated and rows and columns corresponding to delayed pivots being left out are transferred to the secondary supernode. The size of the secondary supernode corresponds to the size obtained by adding the count of the delayed pivot being left out together with the count of delayed pivots produced in the primary supernode.
[0127] The validity of the space allocation in the above-described manner will be seen from the following. In the example of
[0128] Next, the transfer of delayed pivots is described in detail.
[0129] Next described are movements of delayed pivots containing a plurality of blocks.
[0130]
[0131] Now assume the case where the diagonal block b2 is determined to be a delayed pivot by the factorization of the primary supernode. In this case, the row and column sharing the diagonal block b2 are transferred to the back of the rows and column of the delayed pivot of the diagonal block c3.
[0132] <Specific Procedure for LU Factorization>
[0133] Let us consider the case of performing LU factorization of a structurally symmetric sparse real matrix. Nonzero elements in the factorization target matrix are arranged symmetrically along the main diagonal and, therefore, dependencies in the factorization of individual nodes are represented by an elimination tree. Also as for asymmetric real matrixes, it is possible to see dependencies in the factorization in view of a symmetric matrix where A⊂A+A.sup.T. Therefore, we provide a description here using a structurally symmetric real matrix.
[0134] The analyzing unit 120 forms each supernode by grouping nodes in parent-child relationships with nearly identical nonzero patterns. Then, the analyzing unit 120 generates an elimination tree representing dependencies among the supernodes, which is referred to as the supernodal elimination tree. Further, as in the case of an elimination tree, the analyzing unit 120 assigns postorder numbers to the individual supernodes according to the depth-first search order. Let us consider now a factored matrix L. Supernodes that are sets of columns and correspond to parts where nonzero elements are present form a supernodal row subtree, which is an equivalent of a row subtree.
[0135] The analyzing unit 120 uses row and column panels to store nonzero elements of nodes making up a supernode and nonzero elements of nodes with node numbers larger than those of the nodes making up the supernode. The row panel stores rows in a transposed manner. In order to obtain the numbers associated with the nonzero elements to be stored, one-dimensional array as an index list is provided for each of the row and column panels.
[0136] In the case of factoring each node with no consideration for delayed pivoting, node factorization is performed in the following order: [1] the analyzing unit 120 takes out each supernode in the postorder sequence and repeats the following steps [2] and [3]; [2] the analyzing unit 120 updates the values of elements in the selected supernode according to contributions from the supernodal row subtree; and [3] after the updates, the analyzing unit 120 performs LU factorization of the column panel of the supernode.
[0137] Note that each column panel of the matrix L includes diagonal blocks of the matrix U. The analyzing unit 120 updates the row panel of the updated matrix U using factorization results of the diagonal blocks, obtained from the LU factorization of the column panel of the matrix L.
[0138] On the other hand, the factorization process incorporating delayed pivoting is as follows. The analyzing unit 120 determines pivots in LU factorization of each supernode. Note that the analyzing unit 120 chooses pivots only from the diagonal block elements of the supernode. In the case of, for example, a diagonal pivoting strategy which selects the largest diagonal element of the supernode as a pivot, no appropriate pivot is available when the absolute value of such a diagonal element falls below a predetermined value or becomes zero. In such a case, using results obtained from rows and columns for which the LU factorization is completed, the analyzing unit 120 updates the remaining rows and columns for which no pivots are available. Subsequently, the analyzing unit 120 performs symmetric permutations to transfer the rows and columns with no pivots found to the back of the parent supernode of the supernode currently being factored. Note that the analyzing unit 120 preliminarily provides spaces for the transfer to the parent supernode.
[0139] As a result of selecting a pivot within diagonal blocks of a supernode and permuting rows and columns in the above-described manner, switching of row and column indices takes place with the permuted rows and columns. Therefore, the analyzing unit 120 reflects results of the row and column permutations to the indices. That is, when delayed pivots are transferred, the analyzing unit 120 also transfers the indices of rows and columns corresponding to the transferred pivots. As for an index list representing the nonzero pattern of the child supernode for a part consisting of diagonal blocks and elements below them within the column panel of the parent supernode, the index list at the time of the transfer indicates the same as the nonzero pattern of the parent supernode.
[0140] The analyzing unit 120 performs LU factorization of each supernode in the postorder sequence. In the LU factorization of the supernode, the analyzing unit 120 first updates the row and column panels. In this regard, the analyzing unit 120 calculates contributions from supernodes constituting the supernodal row subtree of the supernode, and updates the values of the elements. Note that the supernodes constituting the supernodal row subtree undergo changes in block width (the count of rows and columns) when delayed pivots are transferred; however, there is no change in the range of node numbers of nodes used to update the panels of the parent supernode.
[0141] In the case of a structurally symmetric real matrix, the analyzing unit 120 calculates updates of a factorization-target supernode in the following procedure.
[0142] [1] The analyzing unit 120 takes out each constituent supernode from the supernodal row subtree of the target supernode and repeats the following steps [1.1] and [1.2] until all the constituent supernodes are taken out. In this regard, the analyzing unit 120 treats a secondary supernode of each constituent supernode of the supernodal row subtree as a constituent of the supernodal row subtree.
[0143] [1.1] The analyzing unit 120 updates the column panel. Details of this step are presented in the column panel update procedure (a.1 to a.4) to be described below.
[0144] [1.2] The analyzing unit 120 updates the row panel, as with [1.1].
[0145] [2] The analyzing unit 120 transfers delayed pivots and performs LU factorization.
[0146] [2.1] The analyzing unit 120 calculates the total count of delayed pivots transferred from a child of the target supernode, “numdp”. Then, the analyzing unit 120 takes delayed pivots to fit in the spaces out of numdp and puts them into the spaces.
[0147] [2.2] The analyzing unit 120 performs LU factorization.
[0148] [2.3] If delayed pivots are produced by the LU factorization of the target supernode, the analyzing unit 120 updates the delayed pivots using results of the LU factorization. For example, rows and columns of nodes corresponding to the delayed pivots are transferred.
[0149] [2.4.1] The analyzing unit 120 allocates a secondary supernode sufficient to accommodate remaining pivots of numdp, “restdp”, which did not fit in the spaces, if any, and the delayed pivots produced in [2.3]. The analyzing unit 120 then transfers the appropriate delayed pivots to the secondary supernode. Subsequently, the analyzing unit 120 updates restdp using the results of the LU factorization of the target supernode.
[0150] [2.4.2] The analyzing unit 120 performs LU factorization of the secondary supernode. If delayed pivots are produced by the factorization, the analyzing unit 120 updates these delayed pivots using results of the factorization. For example, rows and columns of nodes corresponding to the delayed pivots are transferred.
[0151] The procedure for calculating updates of each supernode has been described thus far. Next, the column panel update procedure (a.1 to a.4) is described in detail.
[0152] [a.1] The analyzing unit 120 transposes rows of nodes belonging to the supernode, except for the diagonal blocks, to form a row panel. The analyzing unit 120 stores the row panel in a two-dimensional array, rpanel(nr1, nr2).
[0153] [a.2] The analyzing unit 120 copies, amongst columns stored in the row panel, columns with column numbers of the supernode to be updated to a working matrix B(nr1, len). len represents the count of columns copied. Within the column panel, the analyzing unit 120 finds the top (“ntop”) of nodes to be updated, identified by their node numbers. The column panel is a two-dimensional array, cpanel(nc1, nc2).
[0154] [a.3] The analyzing unit 120 calculates a matrix C(nc1−ntop+1, len)←cpanel(ntop-nc1, nc2)×B(nr1, len). Note here that nc2 is equal to nr1.
[0155] [a.4] The analyzing unit 120 updates each column of the matrix C by subtracting each element of the column from an element of the supernode, having the same row and column numbers as those of the element of the column to be updated.
[0156] As for updates of the row panel of a structurally symmetric real matrix, the calculation is made by swapping the column panel and the row panel.
[0157] Next, details of the process of implementing delayed pivoting with the allocation of extra spaces and performing LU factorization are described with reference to flowcharts of
[0158] [Step S111] The analyzing unit 120 sets “nporder” to 1. nporder is a number indicating a processing-target supernode. That is, supernodes included in a factorization-target matrix are assigned identification numbers in ascending sequence, starting with 1. Then, a supernode with an identification numbers indicated by nporder is the target of the current processing.
[0159] [Step S112] The analyzing unit 120 calls a subroutine called “rsupdate” and then executes the subroutine.
[0160]
[0161] [Step S113] The analyzing unit 120 calls a subroutine called “dpcount” and then executes the subroutine.
[0162]
[0163] [Step S114] The analyzing unit 120 calls a subroutine called “mvtonporder” and then executes the subroutine.
[0164]
[0165] [Step S115] The analyzing unit 120 calls a subroutine called “cpupdate” and then executes the subroutine. Subroutine cpupdate is a process for updating the column panel of the supernode with nporder, including therein delayed pivots transferred thereto.
[0166]
[0167] [Step S115a] The analyzing unit 120 performs LU factorization of the column panel of the supernode with nporder, including therein the delayed pivots transferred to the space.
[0168] [Step S115b] The analyzing unit 120 determines whether nodes to be delayed pivots are produced by the LU factorization. If nodes to be delayed pivots are produced, the process moves to step S115c. If no nodes to be delayed pivots are produced, the process of subroutine cpupdate ends.
[0169] [Step S115c] Based on the results of the LU factorization, the analyzing unit 120 updates the nodes to be delayed pivots. Subsequently, the analyzing unit 120 returns to the LU factorization process (
[0170] [Step S116] The analyzing unit 120 calls a subroutine called “rpupdate” and then executes the subroutine. Subroutine rpupdate is a process for updating the row panel of the supernode with nporder, including therein delayed pivots transferred thereto.
[0171]
[0172] [Step S116a] The analyzing unit 120 updates the row panel of the supernode with nporder using the results of the LU factorization of the column panel.
[0173] [Step S116b] The analyzing unit 120 determines whether nodes to be delayed pivots are produced by the LU factorization. If nodes to be delayed pivots are produced, the process moves to step S116c. If no nodes to be delayed pivots are produced, the process of subroutine rpupdate ends.
[0174] [Step S116c] Based on the results of the LU factorization, the analyzing unit 120 updates the nodes to be delayed pivots. Subsequently, the analyzing unit 120 returns to the LU factorization process (
[0175] [Step S117] The analyzing unit 120 determines whether there are delayed pivots that did not fit in the spaces of the supernode with nporder and are therefore left out. If there are left-out delayed pivots, the process moves to step S118. If there is no left-out delayed pivot, the process moves to step S121.
[0176] [Step S118] The analyzing unit 120 calls a subroutine called “createmv” and then executes the subroutine. Subroutine createmv is a process for creating a supernode for accommodating the left-out delayed pivots that did not fit in the supernode with nporder as well as the delayed pivots produced by the LU factorization of the supernode with nporder.
[0177]
[0178] [Step S118a] The analyzing unit 120 creates a secondary supernode.
[0179] [Step S118b] The analyzing unit 120 transfers the remaining ones of the delayed pivots to be transferred to the supernode with nporder to the first halves of the row and column panels of the secondary supernode. In addition, the analyzing unit 120 transfers delayed pivots included in the supernode with nporder to the second halves of the row and column panels of the secondary supernode. Subsequently, the analyzing unit 120 returns to the LU factorization process (
[0180] [Step S119] The analyzing unit 120 calls a subroutine called “cpupdatenew” and then executes the subroutine. Subroutine cpupdatenew is a process for updating, within the column panel, the left-out delayed pivots (i.e., the first half of the column panel) using the results of the factorization of the supernode with nporder and then performing LU factorization of the entire column panel. Further, if delayed pivots are newly produced, updates are made to nodes corresponding to the delayed pivots.
[0181]
[0182] [Step S119a] The analyzing unit 120 updates the first half of the column panel using the results of the LU factorization of the supernode with nporder.
[0183] [Step S119b] The analyzing unit 120 performs LU factorization of the entire column panel of the secondary supernode, including therein the delayed pivots transferred from the supernode with nporder (i.e., the second half of the column panel).
[0184] [Step S119c] The analyzing unit 120 determines whether nodes to be delayed pivots are produced by the LU factorization. If nodes to be delayed pivots are produced, the process moves to step S119d. If no nodes to be delayed pivots are produced, the process of subroutine cpupdatenew ends.
[0185] [Step S119d] Based on the results of the LU factorization of the secondary supernode, the analyzing unit 120 updates the nodes to be delayed pivots in the column panel. Subsequently, the analyzing unit 120 returns to the LU factorization process (
[0186] [Step S120] The analyzing unit 120 calls a subroutine called “rpupdatenew” and then executes the subroutine. Subroutine rpupdatenew is a process for updating the row panel of the secondary supernode using the results of the LU factorization of the supernode with nporder and then updating the row panel again using results of the LU factorization of the column panel of the secondary supernode. Further, if nodes to be delayed pivots are produced, updates are made to the nodes to the delayed pivots.
[0187]
[0188] [Step S120a] The analyzing unit 120 updates the row panel of the secondary supernode using the results of the LU factorization of the supernode with nporder.
[0189] [Step S120b] The analyzing unit 120 updates the row panel of the secondary supernode using the results of the LU factorization of the column panel of the secondary supernode.
[0190] [Step S120c] The analyzing unit 120 determines whether nodes to be delayed pivots are produced by the LU factorization. If nodes to be delayed pivots are produced, the process moves to step S120d. If no nodes to be delayed pivots are produced, the process of subroutine rpupdatenew ends.
[0191] [Step S120d] Based on results of the LU factorization of the secondary supernode, the analyzing unit 120 updates the nodes to be delayed pivots in the row panel of the secondary supernode. Subsequently, the analyzing unit 120 returns to the LU factorization process (
[0192] [Step S121] The analyzing unit 120 adds 1 to nporder.
[0193] [Step S122] The analyzing unit 120 determines whether the value of the current nporder is larger than the total count of supernodes. If the value of the current nporder exceeds the total count of supernodes, the LU factorization process ends. The value of the current nporder is less than or equal to the total count of supernodes, the process moves to step S112.
[0194] By repeating subroutine calls in the above-described procedure, LU factorization of a structurally symmetric matrix is performed efficiently using delayed pivoting.
[0195] <LDL.sup.T Factorization of Sparse Symmetric Indefinite Matrix>
[0196] Next described is LDL.sup.T Factorization of a sparse symmetric indefinite matrix. As for a symmetric indefinite matrix, only the column panel needs to be considered due to its symmetric structure.
[0197] [Step S201] The analyzing unit 120 receives an input designating the size of a space added to the column panel. For example, the analyzing unit 120 acquires a value indicating the size of the space, input by the user via an input device, e.g., the keyboard 22.
[0198] [Step S202] The analyzing unit 120 performs symbolic factorization. The symbolic factorization achieves generation of an elimination tree, detection of supernodes, and generation of supernodal row subtrees.
[0199] [Step S203] The analyzing unit 120 calculates the size of the column panel with the added space.
[0200] [Step S204] The analyzing unit 120 allocates, in the memory 102, a memory pool used to assign a column-panel area and a secondary supernode area.
[0201] [Step S205] The analyzing unit 120 performs LDL.sup.T factorization. In the LDL.sup.T factorization, the analyzing unit 120 uses the space prepared for transferring delayed pivots. In the case where the count of delayed pivots to be transferred exceed the size of the space, the analyzing unit 120 creates a secondary supernode and stores rows corresponding to the delayed pivots being left out.
[0202] The LDL.sup.T factorization is achieved by the above-described procedure. Next, effective use of the memory 102 in the LDL.sup.T factorization of a symmetric indefinite matrix is described in detail.
[0203]
[0204] <Specific Procedure for LDL.sup.T Factorization>
[0205] Also in a symmetric indefinite matrix, nonzero elements are arranged symmetrically along the main diagonal and, therefore, dependencies in the factorization of individual nodes are represented by an elimination tree. The analyzing unit 120 forms each supernode by grouping nodes in parent-child relationships in the elimination tree, with nearly identical nonzero patterns. For such supernodes, it is possible to construct an elimination tree representing dependencies among the supernodes, which is referred to as a supernodal elimination tree. As in the case of an elimination tree, the analyzing unit 120 assigns postorder numbers to the individual supernodes in the supernodal elimination tree. Let us consider now the factored lower triangular matrix L. Supernodes that are sets of columns and correspond to parts where nonzero elements are present form a supernodal row subtree, which is an equivalent of a row subtree.
[0206] The analyzing unit 120 performs factorization of each node in the following order: [1] the analyzing unit 120 takes out each supernode in the postorder sequence and repeats the following steps [2] and [3]; [2] the analyzing unit 120 updates the values of elements in the selected supernode according to contributions from the supernodal row subtree; and [3] after the updates, the analyzing unit 120 performs LDL.sup.T factorization of the column panel of the supernode.
[0207] In the case of a symmetric indefinite matrix, the analyzing unit 120 calculates updates of a factorization-target supernode in the following procedure.
[0208] [1] The analyzing unit 120 takes out each constituent supernode from the supernodal row subtree of the target supernode and repeats the following steps [1.1] and [1.2] until all the constituent supernodes are taken out. In this regard, the analyzing unit 120 treats a secondary supernode of each constituent supernode of the supernodal row subtree as a constituent of the supernodal row subtree.
[0209] [1.1] The analyzing unit 120 updates the column panel. Details of this step are presented in the column panel update procedure (a.1 to a.4) to be described below.
[0210] [1.2] The analyzing unit 120 updates the row panel, as with [1.1].
[0211] [2] The analyzing unit 120 transfers delayed pivots and performs LDL.sup.T factorization.
[0212] [2.1] The analyzing unit 120 calculates the total count of delayed pivots transferred from a child of the target supernode, “numdp”. Then, the analyzing unit 120 takes delayed pivots to fit in the space out of numdp and puts them in the space.
[0213] [2.2] The analyzing unit 120 performs LDL.sup.T factorization.
[0214] [2.3] If delayed pivots are produced by the LDL.sup.T factorization of the target supernode, the analyzing unit 120 updates these delayed pivots using results of the LDL.sup.T factorization.
[0215] [2.4.1] The analyzing unit 120 allocates a secondary supernode sufficient to accommodate restdp of numdp, which did not fit in the space, if any, and the delayed pivots produced in [2.3]. The analyzing unit 120 then transfers the appropriate delayed pivots to the secondary supernode. Subsequently, the analyzing unit 120 updates restdp using the results of the LDL.sup.T factorization of the target supernode.
[0216] [2.4.2] The analyzing unit 120 performs LDL.sup.T factorization of the secondary supernode. If delayed pivots are produced by the factorization, the analyzing unit 120 updates these delayed pivots using results of the factorization.
[0217] The procedure for calculating updates of each supernode has been described thus far. Next, the column panel update procedure (a.1 to a.4) is described in detail. The calculation for updating the column panel is performed in the following procedure.
[0218] [a.1] The analyzing unit 120 takes out each constituent supernode from the supernodal row subtree and repeats the following steps until all the constituent supernodes are taken out.
[0219] [a.2] Due to symmetric property, the transpose of the column panel(nc1, nc2) corresponds to a factored matrix L.sup.T, and is the row panel. The transpose of a set of rows is stored in the column panel. The analyzing unit 120 copies, amongst columns stored in the transposed column panel, columns with column numbers of the supernode to be updated to the working matrix B(nc2, len). len represents the count of columns copied. Within the column panel, the analyzing unit 120 finds the top (“ntop”) of nodes to be updated, identified by their node numbers. The column panel is a two-dimensional array, cpanel(nc1, nc2).
[0220] [a.3] The analyzing unit 120 calculates a matrix C(nc1−ntop+1, len)←cpanel(ntop-nc1, nc2)×D×B(nc2, len).
[0221] [a.4] The analyzing unit 120 updates each column of the matrix C by subtracting each element of the column from an element of the supernode, having the same row and column numbers as those of the element of the column to be updated.
[0222] At this point, delayed pivots are transferred and, then, LDL.sup.T factorization with 1-by-1 or 2-by-2 diagonal blocks is performed. If the space runs out by the transfer of the delayed pivots from its child supernode, the analyzing unit 120 creates a secondary supernode. The 1-by-1 or 2-by-2 diagonal blocks are denoted by D. The analyzing unit 120 stores the diagonal blocks in the diagonal and subdiagonal of the column panel.
[0223] In the factorization of a symmetric indefinite matrix, the analyzing unit 120 uses diagonal pivoting enabling each 1-by-1 or 2-by-2 submatrix. The matrix A is factored as follows, using the matrix P as a 1-by-1 or 2-by-2 sub-matrix. This is performed recursively.
[0224] By applying this strategy to the column panel, LDL.sup.T factorization of the column panel is performed. The matrix D is a 1-by-1 or 2-by-2 symmetric block diagonal matrix. When the diagonal element is 1 and D is a 2-by-2 block, the subdiagonal elements are 0.
[0225]
[0226] [Step S211] The analyzing unit 120 sets nporder to 1.
[0227] [Step S212] The analyzing unit 120 calls a subroutine called “symrsupdate” and then executes the subroutine.
[0228]
[0229] [Step S213] The analyzing unit 120 calls a subroutine called “symdpcount” and then executes the subroutine.
[0230]
[0231] [Step S214] The analyzing unit 120 calls a subroutine called “symmvtonporder” and then executes the subroutine.
[0232]
[0233] [Step S215] The analyzing unit 120 calls a subroutine called “symcpupdate” and then executes the subroutine. Subroutine cpupdate is a process for updating the column panel of the supernode with nporder, including therein delayed pivots transferred thereto.
[0234]
[0235] [Step S215a] The analyzing unit 120 performs LDL.sup.T factorization of the column panel of the supernode with nporder, including therein the delayed pivots transferred to the space.
[0236] [Step S215b] The analyzing unit 120 determines whether nodes to be delayed pivots are produced by the LDL.sup.T factorization. If nodes to be delayed pivots are produced, the process moves to step S215c. If no nodes to be delayed pivots are produced, the process of subroutine symcpupdate ends.
[0237] [Step S215c] Based on the results of the LDL.sup.T factorization, the analyzing unit 120 updates the nodes to be delayed pivots. Subsequently, the analyzing unit 120 returns to the LDL.sup.T factorization process (
[0238] [Step S216] The analyzing unit 120 determines whether there are delayed pivots that did not fit in the space of the supernode with nporder and are therefore left out. If there are left-out delayed pivots, the process moves to step S217. If there is no left-out delayed pivot, the process moves to step S219.
[0239] [Step S217] The analyzing unit 120 calls a subroutine called “symcreatemv” and then executes the subroutine. Subroutine symcreatemv is a process for creating a supernode for accommodating the left-out delayed pivots that did not fit in the supernode with nporder as well as the delayed pivots produced by the LDL.sup.T factorization of the supernode with nporder.
[0240]
[0241] [Step S217a] The analyzing unit 120 creates a secondary supernode.
[0242] [Step S217b] The analyzing unit 120 transfers the remaining ones of the delayed pivots to be transferred to the supernode with nporder to the first half of the column panel of the secondary supernode. In addition, the analyzing unit 120 transfers delayed pivots included in the supernode with nporder to the second half of the column panel of the secondary supernode. Subsequently, the analyzing unit 120 returns to the LDL.sup.T factorization process (
[0243] [Step S218] The analyzing unit 120 calls a subroutine called “symcpupdatenew” and then executes the subroutine. Subroutine symcpupdatenew is a process for updating, within the column panel, the left-out delayed pivots (i.e., the first half of the column panel) using the results of the factorization of the supernode with nporder and then performing LDL.sup.T factorization of the entire column panel. Further, if delayed pivots are newly produced, updates are made to nodes corresponding to the delayed pivots.
[0244]
[0245] [Step S218a] The analyzing unit 120 updates the first half of the column panel using the results of the LDL.sup.T factorization of the supernode with nporder.
[0246] [Step S218b] The analyzing unit 120 performs LDL.sup.T factorization of the entire column panel of the secondary supernode, including therein the delayed pivots transferred from the supernode with nporder (i.e., the second half of the column panel).
[0247] [Step S218c] The analyzing unit 120 determines whether nodes to be delayed pivots are produced by the LDL.sup.T factorization. If nodes to be delayed pivots are produced, the process moves to step S218d. If no nodes to be delayed pivots are produced, the process of subroutine symcpupdatenew ends.
[0248] [Step S218d] Based on the results of the LDL.sup.T factorization of the secondary supernode, the analyzing unit 120 updates the nodes to be delayed pivots in the column panel. Subsequently, the analyzing unit 120 returns to the LDL.sup.T factorization process (
[0249] [Step S219] The analyzing unit 120 adds 1 to nporder.
[0250] [Step S220] The analyzing unit 120 determines whether the value of the current nporder is larger than the total count of supernodes. If the value of the current nporder exceeds the total count of supernodes, the LDL.sup.T factorization process ends. The value of the current nporder is less than or equal to the total count of supernodes, the process moves to step S212.
[0251] By repeating subroutine calls in the above-described procedure, LDL.sup.T factorization of a structurally symmetric matrix is performed efficiently using delayed pivoting.
[0252] According to one aspect, it is possible to improve the efficiency of memory management in matrix factorization.
[0253] All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.