CONSTRUCTION METHOD FOR HEURISTIC METABOLIC CO-EXPRESSION NETWORK AND THE SYSTEM THEREOF

Abstract

The present invention discloses a construction method for heuristic metabolic co-expression network and the system thereof. Based on the max-dependent criteria, the present invention treats the characterized multivariate mutual information of a plurality of metabolites as mutual function value, and applies an optimization searching for the best feature subset, with a heuristics computational intelligence multimodal optimization algorithm. And by running the optimization process in a plurality of times, combining and studying the results in each time running, a co-expression network structure is built. Finally, a threshold for segmentations is calculated through probability models, and an exact and stable metabolic co-expression network is obtained.

Claims

1. A construction method for heuristic metabolic co-expression network, wherein, it comprises the following steps: A. executing preprocess for standardization to the original metabolic features dataset F*, and making all the M's metabolic feature vectors have a zero mean and a unit variance in each dimension: $F_{m} = \frac{F_{m}^{*} -_{m}}{_{m}}, F_{m}^{*} F^{*};$ wherein, F={F.sub.m; m=1, 2, . . . , M} is a pre-treated metabolic features dataset, and .sub.m are the mean and deviance of the m-th original metabolic feature vector F*.sub.m, respectively; B. setting a total running times of K for FSS, and initializing a running counter k=1; C. constructing a multimodal optimized evolutionary population ps, initializing each contained individual for optimization X.sub.ips into an M-dimensional random vector uniformly distributed in the range of R=[0.1]; D. setting a total number G for an iteration algorithm, and initializing an iteration counter g=1; E. calculating a shared fitness function value of each individual for optimization in the evolutionary population ps; F. after calculating all the shared fitness function values of all individuals for optimization, a heuristic computational intelligence algorithm being applied to optimize the evolutionary population ps; G. updating the iteration counter g=g+1, and, if g<G, returning to step E; otherwise, ending the specific optimization process and entering the step H; H. for each individual X.sub.i for optimization in the optimized evolutionary population ps, mapping it into a selection vector S.sub.i; I. constructing a symmetrical co-expression weight matrix W.sub.k={w.sub.p,q}.sub.MM, wherein, the diagonal elements w.sub.p,p representing the selected times of each metabolic feature vector F.sub.p among all the S.sub.i, pM:
w.sub.p,p=.sub.i|ps|s.sub.pS.sub.i; and other elements w.sub.p,q representing the number of selected times when both metabolic feature vectors F.sub.p and F.sub.q, being selected simultaneously in S.sub.i, p, qM, and pq:
w.sub.p,q=.sub.i|ps|s.sub.ps.sub.q;s.sub.p,s.sub.qS.sub.i; J. updating the running counter k=k+1, if k<K, then returning to step C, otherwise, the characters section is done, and entering step K; K. averaging the co-expression weight matrix obtained in each running process and calculating the corresponding probability, before obtaining a final co-expression weight matrix ={.sub.p,q}.sub.MM, wherein, |ps| is the total number of all individuals for optimization in the evolutionary population ps: $_{p, q} = \frac{1}{K .Math. .Math. ps .Math.} .Math. \underset{k K}{.Math.} .Math. .Math. w_{p, q} W_{k};$ L. considering each final S.sub.i output from each FSS as a sampling by an optimization algorithm to the metabolic features dataset space, wherein, S.sub.mS.sub.i and it obeys the Bernoulli distribution of probability p.sub.m, thus, w.sub.p,p is a random variable obeying a secondary distribution of B(|ps|,p.sub.m); M. considering the final co-expression weight matrix as a stable state result of ensemble bagging; N. using the diagonal element .sub.p,p in the final co-expression weight matrix as a weight for importance of the vertex p, and any other .sub.p,q, pq left as a connection weight between the vertices F.sub.p and F.sub.q, before constructing a fully connected weighted network G, then, removing the vertices and edges whose weight is less than a threshold .sub.t, and generating a metabolic co-expression network for the original metabolic features dataset F*; O. outputting the metabolic co-expression network as a result.

2. The construction method for the heuristic metabolic co-expression network according to claim 1, wherein, the step E comprises specifically: E1. supposing the individual for input is X.sub.i={x.sub.m; m=1, 2, . . . , M}, a real number in the range R in all dimensions, then it is binarized into a discrete selection vector S.sub.i={s.sub.m; m=1, 2, . . . , M}: $s_{m} = {\begin{matrix} 1, & if .Math. .Math. x_{m} > 0.5 \\ 0, & otherwise \end{matrix}, s_{m} S_{i};$ E2. for anyone of the m-th selection value s.sub.m in S.sub.i, if the value is 1, then the corresponding metabolic feature vector F.sub.m is selected to be contained in the constructed features subset F.sub.s, otherwise, F.sub.m will not be selected;
F.sub.S={F.sub.m;m=1,2, . . . ,M,s.sub.m=1}; E3. Calculating the approximate multivariate mutual information values in F.sub.S and treating as the original fitness function value; E4. defining a sparse fitness function value as a 1-norm of vector X.sub.i:
f.sub.spr.(X.sub.i)=X.sub.i.sub.1; E5. calculating a total fitness function value of the current individual X.sub.i as:
f(X.sub.i)=f.sub.raw(X.sub.i)+f.sub.spr.(X.sub.i); wherein, is a Lagrange multiplier; E6. if the total fitness function value of each individual for optimization has been calculated, then turning to step E7, otherwise, turning to step E1; E7. calculating a shared fitness function value of each individual for optimization: $f_{share} (X_{i}) = f (X_{i}) .Math. (1 + {.Math.}_{X_{j} ps, {.Math. x_{i} - x_{j} .Math.}_{2} < r, j i} .Math. {(1 - \frac{{.Math. x_{i} - x_{j} .Math.}_{2}}{r})}^{}), .Math. X_{i} ps,$ wherein, r is a radius of aggregation, is a disperse factor.

3. The construction method for the metabolic co-expression network according to claim 2, wherein, the step E3 comprises specifically: E31. supposing C is a labeled vector according to N samples of F, then, the calculation of the mutual information of F.sub.S is:
I(F.sub.S;C)=H(F.sub.S)H(F.sub.s|C)=H(F.sub.S).sub.ccp(c)H(F.sub.s|c); wherein, p(c) is the appearance probability of label c, H( ) is the entropy of variance; E32. Taking N samples in F, as vertices, and using their mutual Euclidean distances as weights for edges, to construct a minimum spanning tree (MST), then L(F.sub.S) is the sum of weights for edges of the specific MST:
L.sub.(F.sub.S)=.sub.e.sub.i,j.sub.MST(F.sub.S.sub.)e.sub.i,j.sup.; wherein, is a positive constant close to 0; E33. the multivariate mutual information of F.sub.s is calculated as:
I.sub.appx.(F.sub.S;C)=L.sub.(F.sub.S).sub.cCp(c)L.sub.(F.sub.S|c); thus, the original fitness function value is defined as:
f.sub.raw(X.sub.i)=I.sub.appx.(F.sub.S;C).

4. A construction system for heuristic metabolic co-expression network, wherein, it comprises: a standardization module, applied to execute preprocess for standardization to the original metabolic features dataset F*, and make all M's metabolic feature vectors have a zero mean and a unit deviation in each dimension; $F_{m} = \frac{F_{m}^{*} -_{m}}{_{m}}, F_{m}^{*} F^{*};$ wherein, F={F.sub.m; m=1, 2, . . . , M} is the metabolic features dataset after preprocess, .sub.m and .sub.m are the mean and deviation of the m-th original metabolic feature vector F*.sub.m, respectively; an initialization module for a running counter, applied to set a total running times K for FSS, and initialize the running counter k=1; an evolutionary population construction module, applied to construct a multimodal optimized evolutionary population ps, and initialize each contained individual for optimization X.sub.ips into an M-dimensional random vector uniformly distributed in the range of R=[0,1]; an iteration counter initialization module, applied to set a total running times of iteration algorithm as G, and initialize an iteration counter g=1; a fitness function value computational module, applied to calculate the shared fitness function value of each individual for optimization in the evolutionary population ps; a population optimization module, applied to use a heuristic computational intelligence algorithm to optimize the evolutionary population ps, after calculating all the shared fitness function values of individuals for optimization; an iteration counter updating module, applied to update the iteration counter g=g+1, if g<G, and return to the fitness function value computational module; otherwise, the specific optimization process finishes, and it enters into a mapping module; a mapping module, applied to map each individual for optimization X.sub.i in the optimized evolutionary population ps into a selection vector S.sub.i; a co-expression weight matrix construction module, applied to construct a symmetrical co-expression weight matrix W.sub.k={w.sub.p,q}.sub.MM, wherein, the diagonal elements w.sub.p,p represent the number of selected times for each metabolic feature vector F.sub.p in all S.sub.i, pM:
w.sub.p,p=.sub.i|ps|s.sub.pS.sub.i, and other elements w.sub.p,q represent the selected times when both metabolic character vectors F.sub.p and F.sub.q are selected simultaneously in S.sub.i, p, qM, and pq:
W.sub.p,q=.sub.i|ps|s.sub.ps.sub.q;s.sub.p,s.sub.qS.sub.i; a running counter updating module, applied to update the running counter k=k+1, if k<K, then return to the evolutionary population construction module, otherwise, the FSS is done, and it enters an average module; an average module, applied to average the co-expression weight matrix obtained in each running process, and calculate the corresponding probability, before obtaining a final co-expression weight matrix ={.sub.p,q}.sub.MM, wherein, |ps| is the total number of all individuals for optimization in the evolutionary population ps: $_{p, q} = \frac{1}{K .Math. .Math. ps .Math.} .Math. \underset{k K}{.Math.} .Math. .Math. w_{p, q} W_{k};$ a sampling module, applied to consider each final S.sub.i output from each FSS as a sampling by the optimization algorithms to the metabolic features dataset space, wherein, S.sub.mS.sub.i and it obeys the Bernoulli distribution of probability p.sub.m, thus w.sub.p,p is a random variable obeying a secondary distribution of B(|ps|,p.sub.m); a stable state result outputting module, applied to consider the final co-expression weight matrix as a stable state result of ensemble bagging; a metabolic co-expression network computational module, applied to use the diagonal element .sub.p,p in the final co-expression weight matrix as a weight for importance of the vertex p, and any other .sub.p,q, pq left as a connection weight between the vertices F.sub.p and F.sub.q, before constructing a fully connected weighted network G, then, remove the vertices and edges whose weight is less than the threshold .sub.t, and generate a metabolic co-expression network for the original metabolic features dataset F*; a metabolic co-expression network outputting module, applied to output the metabolic co-expression network as the result.

5. The construction system for a heuristic metabolic co-expression network according to claim 4, wherein, the said fitness function value computational module comprises specifically: a binarization unit, applied to binarize an individual for input into a discrete selection vector S.sub.i={s.sub.m; m=1, 2, . . . , M}, supposing that the individual for input is X.sub.i={x.sub.m; m=1, 2, . . . , M}, which is a real number in the range R in all dimensions: $s_{m} = {\begin{matrix} 1, & if .Math. .Math. x_{m} > 0.5 \\ 0, & otherwise \end{matrix}, s_{m} S_{i};$ a selection unit, applied to select the corresponding metabolic feature vector F.sub.m to be contained in the constructed features subset F.sub.s, otherwise, F.sub.m will not be selected;
F.sub.S={F.sub.m;m=1,2, . . . ,M,s.sub.m=1}; an original fitness function value computational unit, applied to calculate the approximate multivariate mutual information values in F.sub.S and treat as the original fitness function values; a definition unit, applied to define a sparse fitness function value as a 1-norm of vector X.sub.i:
f.sub.spr.(X.sub.i)=X.sub.i.sub.1; a total fitness function value computational unit, applied to calculate the total fitness function value of the current individual X.sub.i as:
f(X.sub.i)=f.sub.raw(X.sub.i)+f.sub.spr.(X.sub.i); wherein, is a Lagrange multiplier; a judgment unit, applied to decide if the total fitness function value of each individual for optimization has been calculated or not, if so, then turning to a shared fitness function value computational unit, otherwise, turning to the binarization unit; a shared fitness function value computational unit, applied to calculate a shared fitness function value of each individual for optimization: $f_{share} (X_{i}) = f (X_{i}) .Math. (1 + {.Math.}_{X_{j} ps, {.Math. x_{i} - x_{j} .Math.}_{2} < r, j i} .Math. {(1 - \frac{{.Math. x_{i} - x_{j} .Math.}_{2}}{r})}^{}), .Math. X_{i} ps,$ wherein, r is the radius of aggregation, c is the disperse factor.

6. The construction system for a metabolic co-expression network according to claim 5, wherein, the original fitness function value computational unit comprises specifically: a mutual information calculation sub-unit, applied to calculate the mutual information of F.sub.S, supposing C is labeled vectors according to N samples of F:
I(F.sub.S;C)=H(F.sub.S)H(F.sub.s|C)=H(F.sub.S).sub.cCp(c)H(F.sub.s|c), wherein, p(c) is the appearance probability of label c, H( ) is the entropy of variance; an edge weight value computational sub-unit, applied to take N samples in F.sub.s as vertices, and using their mutual Euclidean distances as weights for edges, before constructing an MST, then L.sub.(F.sub.S) is the sum of weights for edges of the specific MST:
L.sub.(F.sub.S)=.sub.e.sub.i,j.sub.MST(F.sub.S.sub.)e.sub.i,j.sup., wherein, is a positive constant close to 0; a functional value computation sub-unit, applied to calculate the multivariate mutual information of F.sub.s as:
I.sub.appx.(F.sub.S;C)=L.sub.(F.sub.S).sub.cCp(c)L.sub.(F.sub.S|c); thus, the original fitness function value is defined as:
f.sub.raw(X.sub.i)=I.sub.appx.(F.sub.S;C).

Description

BRIEF DESCRIPTION OF THE DRAWINGS

[0079] FIG. 1 illustrates a flow chart of a preferred embodiment on the construction method for heuristic metabolic co-expression network as described in the present application.

[0080] FIG. 2 illustrates a detailed flow chart of taking samples in F.sub.S as vertices to construct an MST as described in the present application.

[0081] FIG. 3 illustrates a detailed flow chart of using a threshold for segmentations to construct a metabolic co-expression network as described in the present application.

DETAILED DESCRIPTION

[0082] The present invention provides a construction system for heuristic metabolic co-expression network and the system thereof, In order to make the purpose, technical solution and the advantages of the present invention clearer and more explicit, further detailed descriptions of the present invention are stated here, referencing to the attached drawings and some embodiments of the present invention. It should be understood that the detailed embodiments of the invention described here are used to explain the present invention only, instead of limiting the present invention.

[0083] Referencing to FIG. 1, which is a flow chart of a preferred embodiment on the construction method for heuristic metabolic co-expression network as described in the present application, as shown in the figure, it comprises the following steps:

[0084] 1). Executes preprocess for standardization to an original metabolic features dataset F*, and makes all M's metabolic feature vectors have a zero mean and a unit deviation in each dimension:

[00013] $F_{m} = \frac{F_{m}^{*} -_{m}}{_{m}}, F_{m}^{*} F^{*};$

wherein, F={F.sub.m; m=1, 2, . . . , M} is the metabolic features dataset after preprocess, .sub.m and .sub.m are the mean and deviation of the m-th original metabolic feature vector F*.sub.m, respectively;

[0085] 2). Sets a total running times for FSS as K, and initializes the running counter k=1;

[0086] 3). Constructs a multimodal optimized evolutionary population ps, and initializes each contained individual for optimization X.sub.ips into an M-dimensional random vector equally distributed in a range of R=[0,1];

[0087] 4). Sets a total times of iteration algorithm as G, and initializes the iteration counter g=1;

[0088] 5). Calculates a shared fitness function value for each individual for optimization in the evolutionary population ps;

[0089] 6). Uses a heuristic computational intelligence algorithm to optimize the evolutionary population ps, after calculating all the shared fitness function values of individuals for optimization;

[0090] 7). Updates an iteration counter g=g+1, if g<G, returns to 5); otherwise, the specific optimization finishes, and it enters step 8);

[0091] 8). Maps each individual for optimization X.sub.i in the optimized evolutionary population ps into a selection vector S.sub.i;

[0092] 9). Constructs a symmetrical co-expression weight matrix W.sub.k={W.sub.p,q}.sub.MM, wherein, the diagonal elements w.sub.p,p represent the selected times of each metabolic feature vector F.sub.p in all S.sub.i, pM:

[00014] $w_{p, p} = \underset{i .Math. ps .Math.}{.Math.} .Math. .Math. s_{p} S_{i}$

[0093] and other elements w.sub.p,q represent the selected times when both metabolic character vectors F.sub.p and F.sub.q are selected simultaneously, p, qM, pq:

w.sub.p,q=.sub.i|ps|s.sub.ps.sub.q;s.sub.p,s.sub.qs.sub.i;

[0094] 10). Updates the running counter k=k+1, if k<K, returns to step 3), otherwise, FSS is done, and it enters step 11);

[0095] 11). Averages the co-expression weight matrixes obtained in each running process, and calculates the corresponding probabilities, before obtaining a final co-expression weight matrix ={.sub.p,q}.sub.MM, wherein, |ps| is the total number of all individuals for optimization in the evolutionary population ps:

[00015] $_{p, q} = \frac{1}{K .Math. .Math. ps .Math.} .Math. {.Math.}_{k K} .Math. w_{p, q} W_{k}; .Math.$

[0096] 12). Considers each final S.sub.i output from each FSS as a sampling by the optimization algorithms to the metabolic features dataset space, wherein, S.sub.mS.sub.i, and it obeys the Bernoulli distribution of probability p.sub.m, thus w.sub.p,p is a random variable obeying a secondary distribution of B(|ps|,p.sub.m);

[0097] 13). Considers the final co-expression weight matrix as a stable state result of ensemble bagging;

[0098] 14). Uses the diagonal element .sub.p,p in the final co-expression weight matrix as a weight for importance of the vertex p, and any .sub.p,q, pq left as a connection weight between the vertices F.sub.p and F.sub.q, before constructing a fully connected weighted network G, then, removes the vertices and edges whose weight is less than the threshold .sub.t, and generates a metabolic co-expression network for the original metabolic features dataset F*;

[0099] 15). Outputs the said metabolic co-expression network as the result.

[0100] Specifically, in the step 1), before executing an FSS, preprocess for standardization to the original metabolic features dataset F* are executed, and all M's metabolic feature vectors are made have a zero mean and a unit deviation in each dimension.

[00016] $F_{m} = \frac{F_{m}^{*} -_{m}}{_{m}}, F_{m}^{*} F^{*};$

[0101] In the step 2), sets the total running times for FSS as K, and initializes the running counter k=1;

[0102] In the step 3), constructs a multimodal optimized evolutionary population ps, and initializes each contained individual for optimization X.sub.ips into an M-dimensional random vector equally distributed in a range of R=[0,1];

[0103] In the step 4), an optimized design for FSS is started. Sets the total times of iteration algorithm as G, and initializes the iteration counter g=1;

[0104] In the step 5), calculates a shared fitness function value for each individual for optimization in the evolutionary population ps.

[0105] The said step 5) includes specifically:

[0106] a. Supposing the individual for input (that is, the input individual for optimization) is X.sub.i={x.sub.m; m=1, 2, . . . , M}, which is a real number in the range R for all dimensions, it is then binarized into discrete selection vector S.sub.i={s.sub.m; m=1, 2, . . . , M}:

[00017] $s_{m} = {\begin{matrix} 1, & if .Math. .Math. x_{m} > 0.5 \\ 0, & otherwise \end{matrix}, s_{m} S_{i};$

wherein, otherwise means all cases other than x.sub.m>0.5.

[0107] b. For anyone of the m-th selection value s.sub.m in S.sub.i, if the value is 1, then the corresponding metabolic feature vector F.sub.m is selected to be contained in the constructed features subset F.sub.s; otherwise, F.sub.m will not be selected;

F.sub.S={F.sub.m;=1,2, . . . ,M,s.sub.m=1};

[0108] c. Calculates the approximate multivariate mutual information values in F.sub.S and treats as the original fitness function values;

[0109] d. Defines a sparse fitness function value as the 1-norm of vector X.sub.i:

f.sub.spr.(X.sub.i)=X.sub.i.sub.1;

introducing this specific value may make the algorithm select a feature from the most important core metabolite.

[0110] e. Calculates the total fitness function value of the current individual X.sub.i as:

f(X.sub.i)=f.sub.raw(X.sub.i)+f.sub.spr.(X.sub.i);

wherein, is a Lagrange multiplier;

[0111] f. If the total fitness function value of each individual for optimization has already been calculated, then turns to step 5).g), otherwise, turns to step 5).a);

[0112] g. Calculates the shared fitness function value of each individual for optimization, using a fitness sharing method:

[00018] $f_{share} (X_{i}) = f (X_{i}) .Math. (1 + {.Math.}_{X_{j} ps, {.Math. X_{i} - X_{j} .Math.}_{2} < r, j i} .Math. .Math. {(1 - \frac{{.Math. X_{i} - X_{j} .Math.}_{2}}{r})}^{}), .Math. X_{i} ps;$

wherein, r is a radius of aggregation, is a disperse factor. The specific method may execute a multimodal optimization to the searching algorithm, and obtain all the global or local optima in a features space (that is, an FSS).

[0113] The said step c comprises specifically:

[0114] i. Supposing C is a labeled vector according to N samples of F, then, the calculation of the mutual information of F.sub.S is:

I(F.sub.S;C)=H(F.sub.S)H(F.sub.s|C)=H(F.sub.S).sub.ccp(c)H(F.sub.s|c),

wherein, p(c) is the appearance probability of label c, and its value may be estimated based on the samples in the dataset; H( ) is an entropy of variance, which may be obtained by using Rnyi's -Entropy:

[00019] $H (F_{S}) = \frac{1}{1 -} [\log .Math. .Math. \frac{L_{} (F_{S})}{N^{}} - \log .Math. .Math.]$

wherein, is a constant approaching to 1, is a deviation correction value independent to the probability distribution, so it has:

H(F.sub.S)L.sub.(F.sub.S),

which shows a positive correlation.

[0115] ii. Taking N samples in F.sub.s as vertices, and using their mutual Euclidean distances as weights for edges, before constructing an MST, then L.sub.(F.sub.S) is the sum of weights for edges in the specific MST:

L.sub.(F.sub.S)=.sub.e.sub.i,j.sub.MST(F.sub.S.sub.)e.sub.i,j.sup.,

wherein, is a positive constant close to 0; and a commonly used MST construction algorithm includes a Prim algorithm and more.

[0116] Shown as FIG. 2, F.sub.S={pt.sub.1=(9,3), pt.sub.2=(3,5), pt.sub.3=(7,7), pt.sub.4=(5,10), pt.sub.5=(10,12)}, which is composed by 5 samples, then, its MST has:

e.sub.1,3=pt.sub.1pt.sub.3=4.47;

e.sub.2,3=pt.sub.2pt.sub.3=4.47;

e.sub.3,5=pt.sub.3pt.sub.5=4.47;

e.sub.3,4=pt.sub.3pt.sub.4=4.47;

L.sub.1(F.sub.S)=4.47+4.47+5.83+3.60=18.37.

[0117] iii. The multivariate mutual information of F.sub.s is calculated as:

I.sub.appx.(F.sub.S;C)=L.sub.(F.sub.S).sub.cCp(c)L.sub.(F.sub.S|c),

the greater the value is, the more significant of the linkage between the metabolic feature subset and the physiological state of the target is. Thus, the original fitness function value is defined as:

f.sub.raw(X.sub.i)=I.sub.appx.(F.sub.S;C);

[0118] In the step 6), after calculating all shared fitness function values of the individuals for optimization, a heuristic computational intelligence algorithm is used to optimize the evolutionary population ps; a commonly used method includes Differential evolution (DE) or Memetic algorithm (MA).

[0119] In the step 7), updates the iteration counter g=g+1, if g<G, then returns to 5); otherwise, the specific optimization finishes, and it enters step 8).

[0120] In the step 8), for each individual for optimization X.sub.i in ps after optimization, it is mapped into a selection vector S.sub.i using the method described in 5)a).

[0121] In the step 9), a symmetrical co-expression weight matrix W.sub.k={W.sub.p,q}.sub.MM is constructed, wherein, the diagonal element w.sub.p,p, pM represents a selected times for each metabolic feature vector F.sub.p in all S.sub.i:

w.sub.p,p=.sub.i,|ps|s.sub.pS.sub.i;

[0122] and other elements w.sub.p,q, p, qM, pq represent the selected times when both metabolic character vectors F.sub.p and F.sub.q are selected simultaneously:

w.sub.p,q=.sub.i|ps|s.sub.ps.sub.q;s.sub.p,s.sub.qS.sub.i;

[0123] In the step 10), updates the running counter k=k+1, if k<K, then returns to step 3), otherwise, the FSS is done, and it enters step 11);

[0124] In the step 11), averages the co-expression weight matrixes obtained in each running process, and calculates the corresponding probabilities, then obtains a final co-expression weight matrix ={.sub.p,q}.sub.MM, wherein, |ps| is the total number of all individual for optimization in the evolutionary population ps:

[00020] $_{p, q} = \frac{1}{K .Math. .Math. ps .Math.} .Math. {.Math.}_{k K} .Math. w_{p, q} W_{k};$

[0125] In the step 12), supposing in each FSS, each output final S.sub.i is considered as a sampling by the optimization algorithms to the metabolic features dataset space, wherein, S.sub.mS.sub.i, and obeys the Bernoulli distribution of probability p.sub.m, then w.sub.p,p is a random variable obeying a secondary distribution of B(|ps|, p.sub.m). Then under the condition of the population size |ps| is set as:

[00021] $.Math. ps .Math. = .Math. \frac{5}{\min (p_{m}, 1 - p_{m})} .Math.,$

it may be considered as obeying a normal distribution N(, ) having a mean =|ps|p.sub.m and a deviation =|ps|p.sub.m(1p.sub.m). Thus, the total running times K may be obtained by the following equation:

[00022] $K = .Math. \max ({(\frac{z^{*}}{.Math.})}^{2} .Math. \frac{p_{m} (1 - p_{m})}{.Math. ps .Math.}) .Math.$

wherein, z* is a confidence value, and is a maximum range for error of the mean.

[0126] For example, supposing that p.sub.m [0.05, 0.95] is a selection probability of F.sub.m, then under the condition of using privates for optimization at a number of |ps|=100 in each features selection process and running repeatedly for a times of K=6, then, it is ensured that the average error of .sub.p,p value is no more than =5%, in a confidence range of 98% (z*=2.33).

[0127] In the step 13), under the specific confidence value, it is possible to consider the final co-expression weight matrix a stable state result of ensemble bagging, for example, the threshold for segmentations may be set as .sub.t=0.5.

[0128] In the step 14), as shown in FIG. 3, the diagonal element .sub.p,p in the final co-expression weight matrix is used as a weight for importance of the vertex p (the metabolite feature F.sub.p), and any .sub.p,q, pq left is used as a connection weight between the vertices F.sub.p and F.sub.q, before constructing a fully connected weighted network G, then, the vertices and edges whose weight is less than the threshold .sub.t, are removed and a metabolic co-expression network for the original metabolic features dataset F* is generated.

[0129] In the step 15), the said metabolic co-expression network is output as the result.

[0130] Based on the above described method, the present application further provides a construction system for heuristic metabolic co-expression network, wherein, it comprises:

[0131] a standardization module, applied to execute preprocess for standardization to the original metabolic features dataset F*, and make all M's metabolic feature vectors have a zero mean and a unit deviation in each dimension:

[00023] $F_{m} = \frac{F_{m}^{*} -_{m}}{_{m}}, F_{m}^{*} F^{*};$

[0132] an initialization module for running counter, applied to set a total running times for FSS as K, and initialize the running counter k=1;

[0133] an evolutionary population construction module, applied to construct a multimodal optimized evolutionary population ps, and initialize each contained individual for optimization X.sub.ips into an M-dimensional random vector equally distributed in a range of R=[0,1];

[0134] an iteration counter initialization module, applied to set the total times of iteration algorithm as G, and initialize the iteration counter g=1;

[0135] a fitness function value computational module, applied to calculate the shared fitness function value for each individual for optimization in the evolutionary population ps;

[0136] a population optimization module, applied to use a heuristic computational intelligence algorithm to optimize the evolutionary population ps, after calculating all the shared fitness function values of individuals for optimization;

[0137] an iteration counter updating module, applied to update the iteration counter g=g+1, if g<G, then return to the fitness function value computational module; otherwise, the specific optimization finishes, and it enters into a mapping module;

[0138] a mapping module, applied to map each individual for optimization X.sub.i in the optimized evolutionary population ps into a selection vector S.sub.i;

[0139] a co-expression weight matrix construction module, applied to construct a symmetrical co-expression weight matrix W.sub.k={w.sub.p,q}.sub.MM, wherein, the diagonal elements w.sub.p,p represent the selected times of each metabolic feature vector F.sub.p in all S.sub.i, pM:

w.sub.p,p=.sub.i|ps|s.sub.pS.sub.i,

[0140] while other elements w.sub.p,q represent the selected times when both metabolic character vectors F.sub.p and F.sub.q are selected simultaneously, p, qM, pq:

w.sub.p,q=.sub.i|ps|s.sub.ps.sub.q;s.sub.p,s.sub.qS.sub.i;

[0141] a running counter updating module, applied to update the running counter k=k+1, if k<K, then return to the evolutionary population construction module, otherwise, the FSS is done, and it enters an average module;

[0142] an average module, applied to average all the co-expression weight matrixes obtained in each running process, and calculate the corresponding probabilities, before obtaining a final co-expression weight matrix ={.sub.p,q}.sub.MM, wherein, |ps| is the total number of all individuals for optimization in the evolutionary population ps:

[00024] $_{p, q} = \frac{1}{K .Math. .Math. ps .Math.} .Math. \underset{k K}{.Math.} .Math. .Math. w_{p, q} W_{k};$

[0143] a sampling module, applied to consider each final S.sub.i output from each FSS as a sampling by the optimization algorithms to the metabolic features dataset space, wherein, S.sub.mS.sub.i, and it obeys the Bernoulli distribution of probability p.sub.m, thus, w.sub.p,p is a random variable obeying a secondary distribution of B(|ps|,p.sub.m);

[0144] a stable state result outputting module, applied to consider the final co-expression weight matrix as a stable state result of ensemble bagging;

[0145] a metabolic co-expression network computational module, applied to use the diagonal element .sub.p,p in the final co-expression weight matrix as a weight for importance of the vertex p, and any other .sub.p,q, pq left as a connection weight between the vertices F.sub.p and F.sub.q, before constructing a fully connected weighted network G, then, remove the vertices and edges whose weight is less than the threshold .sub.t, and generate a metabolic co-expression network for the original metabolic features dataset F*;

[0146] a metabolic co-expression network outputting module, applied to output the said metabolic co-expression network as the result.

[0147] Wherein, the said fitness function value computational module comprises specifically:

[0148] a binarization unit, applied to binarize an individual for input into discrete selection vector S.sub.i={s.sub.m; m=1, 2, . . . , M}, supposing that the individual for input is X.sub.i={x.sub.m; m=1, 2, . . . , M}, which is a real number in the range R in all dimensions:

[00025] $s_{m} = {\begin{matrix} 1, & if .Math. .Math. x_{m} > 0.5 \\ 0, & otherwise \end{matrix}, s_{m} S_{i};$

[0149] a selection unit, applied to select a corresponding metabolic feature vector F.sub.m to be contained in the constructed features subset F.sub.s, if anyone of the m-th selection value s.sub.m in S.sub.i is 1, otherwise, F.sub.m will not be selected;

F.sub.S={F.sub.m;m=1,2, . . . ,M,s.sub.m=1};

[0150] an original fitness function value computational unit, applied to calculate the approximate multivariate mutual information values in F.sub.S and treat as the original fitness function values;

[0151] a definition unit, applied to define a sparse fitness function value as a 1-norm of vector X.sub.i:

f.sub.spr.(X.sub.i)=X.sub.i.sub.1;

[0152] a total fitness function value computational unit, applied to calculate the total fitness function value of the current individual X.sub.i as:

f(X.sub.i)=f.sub.raw(X.sub.i)+f.sub.spr.(X.sub.i),

wherein, is a Lagrange multiplier;

[0153] a judgment unit, applied to check if the total fitness function value of each individual for optimization has been calculated or not, if so, then turn to a shared fitness function value computational unit, otherwise, turn to the binarization unit;

[0154] a shared fitness function value computational unit, applied to calculate a shared fitness function value of each individual for optimization:

[00026] $f_{share} (X_{i}) = f (X_{i}) .Math. (1 + {.Math.}_{X_{j} ps, {.Math. x_{i} - x_{j} .Math.}_{2} < r, j i} .Math. {(1 - \frac{{.Math. x_{i} - x_{j} .Math.}_{2}}{r})}^{}), .Math. X_{i} ps,$

wherein, r is the radius of aggregation, is the disperse factor.

[0155] The said construction system for a metabolic co-expression network, wherein, the said original fitness function value computational unit comprises specifically:

[0156] a mutual information calculation sub-unit, applied to calculate the mutual information of F.sub.S, supposing C is labeled vectors according to N samples of F:

I(F.sub.S;C)=H(F.sub.S)H(F.sub.s|C)=H(F.sub.S).sub.cCp(c)H(F.sub.s|c),

wherein, p(c) is the appearance probability of label c, H( ) is the entropy of variance;

[0157] an edge weight value computational sub-unit, applied to take N samples in F.sub.s as vertices, and using their mutual Euclidean distances as weights for edges, before constructing an MST, then L.sub.(F.sub.S) is the sum of weights for edges of the specific MST:

L.sub.(F.sub.S)=.sub.e.sub.i,j.sub.MST(F.sub.S.sub.)e.sub.i,j.sup.;

wherein, is a positive constant close to 0;

[0158] a functional value computation sub-unit, applied to calculate the multivariate mutual information of F.sub.s as:

I.sub.appx.(F.sub.S;C)=L.sub.(F.sub.S).sub.cCp(c)L.sub.(F.sub.S|c);

thus, the original fitness function value is defined as:

f.sub.raw(X.sub.i)=I.sub.appx.(F.sub.S;C).

[0159] It should be understood that, the application of the present invention is not limited to the above examples listed. Ordinary technical personnel in this field can improve or change the applications according to the above descriptions, all of these improvements and transforms should belong to the scope of protection in the appended claims of the present invention.

CONSTRUCTION METHOD FOR HEURISTIC METABOLIC CO-EXPRESSION NETWORK AND THE SYSTEM THEREOF

Assignee

Inventors

Cpc classification

Classification Explorer

G16B40/00

PHYSICS

Classification Explorer

G16B5/20

PHYSICS

Classification Explorer

G16B5/00

PHYSICS

International classification

Classification Explorer

G06F19/12

PHYSICS

Classification Explorer

G06F19/24

PHYSICS

Abstract

Claims

Description