METHOD OF HOP COUNT MATRIX RECOVERY BASED ON DECISION TREE CLASSIFIER
20220292318 · 2022-09-15
Assignee
Inventors
Cpc classification
H04W84/18
ELECTRICITY
H04W40/023
ELECTRICITY
G06F18/214
PHYSICS
H04W64/006
ELECTRICITY
Y02D30/70
GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
International classification
Abstract
A method of hop count matrix recovery based on a decision tree classifier, includes: S1: performing a flooding process to acquire a hop count matrix {tilde over (H)} with missing entries; S2: constructing a training sample set according to relationships between a part of observed hop counts in the hop count matrix {tilde over (H)}, and modeling the observed hop counts in the hop count matrix as labels of the training sample set, wherein a maximum hop count represents a number of classes; S3: training a decision tree classifier according to the training sample set obtained in step S2; and S4: constructing a feature for an unobserved hop count, to obtain an unknown sample; and inputting the unknown sample to the trained decision tree classifier, to obtain a class of the unknown sample which represents a missing hop count at a corresponding position in the matrix, to recover a complete hop count matrix H.
Claims
1. A method of hop count matrix recovery based on a decision tree classifier, comprising the following steps: S1: performing a flooding process to acquire a hop count matrix {tilde over (H)} with missing entries; S2: constructing a training sample set according to relationships between a part of observed hop counts in the hop count matrix {tilde over (H)}, and modeling the observed hop counts in the hop count matrix as labels of the training sample set, wherein a maximum hop count represents a number of classes; S3: training a decision tree classifier according to the training sample set obtained in step S2; and S4: constructing a feature for an unobserved hop count, to obtain an unknown sample; and inputting the unknown sample to the trained decision tree classifier, to obtain a class of the unknown sample which equals to a missing hop count at a corresponding position in the matrix, so as to recover a complete hop count matrix H.
2. The method o according to claim 1, wherein a hop count vector of any node i is constructed: h.sub.i={h.sub.i1, h.sub.i2, . . . , h.sub.in}, h.sub.ij represents a hop count between node i and node j, i=1, . . . , n; j=1, . . . , n; and the hop count matrix H with missing entries is represented by formula (2):
3. The method according to claim 2, wherein after step S1 and before step S2, if a symmetric position of a missing hop count is observed, the missing hop count is supplemented by using a hop count at the symmetric position.
4. The method according to claim 3, wherein step S2 specifically comprises: S201: traversing the hop count matrix {tilde over (H)}, and performing a next step if a hop count at a certain position is observed; otherwise, traversing a next value in the hop count matrix {tilde over (H)}; S202: using node i and node j to represent two nodes between which a hop count is missing, and with respect to all other nodes k, k=1, 2, . . . , n in a network, calculating a minimum hop count sum of a hop count {tilde over (h)}.sub.ik from node i to node k and a hop count {tilde over (h)}.sub.kj from node k to node j, as a first feature of a training sample; S203: calculating an average value of hop counts from neighboring nodes of node i to node j and hop counts from neighboring nodes of node j to node i, as a second feature of the training sample; S204: using the observed hop count as a class, forming the training sample by using the class together with the first feature and second feature, and adding the training sample to the training sample set; and S205: traversing the entire hop count matrix {tilde over (H)}, and obtaining the training sample set after the traversing is finished.
5. The method according to claim 4, wherein step S203 specifically comprises: initializing two neighbor lists L.sub.i and L.sub.j, selecting neighbors of node i according to a hop count vector {tilde over (h)}.sub.i, and selecting neighbors of node j according to a hop count vector {tilde over (h)}.sub.j; storing indexes of neighboring nodes of node i and indexes of neighboring nodes of node j in the corresponding neighbor lists L.sub.i and L.sub.j respectively, wherein a variable n.sub.i represents a number of available neighboring nodes of node i, and a variable n.sub.j represents a number of available neighboring nodes of node j; and if a hop count {tilde over (h)}.sub.L.sub.
6. The method according to claim 5, wherein the constructed training sample set is expressed as formula (5):
S=[s.sub.1,s.sub.2, . . . ,s.sub.N]=[(f.sub.1.sup.1,f.sub.1.sup.2,c.sub.1);(f.sub.2.sup.1,f.sub.2.sup.2,c.sub.2); . . . (f.sub.N.sup.1,f.sub.N.sup.2,c.sub.N)], (5) wherein f.sub.m.sup.1 represents a first feature of an m.sup.th training sample, f.sub.m.sup.2 represents a second feature of the m.sup.th training sample, and c.sub.m∈{1, 2, . . . , h.sub.max} represents a class of the m.sup.th training sample, m=1, 2, . . . ,N, and h.sub.max is a maximum hop count.
7. The method according to claim 6, wherein step S4 specifically comprises: S401: if the hop count at a certain position in the hop count matrix {tilde over (H)} is not observed, performing a next step; otherwise, traversing a next value in the hop count matrix {tilde over (H)}; S402: using node i and node j to represent two nodes between which a hop count is missing, and with respect to all other nodes k, k=1, 2, . . . , n in the network, calculating a minimum hop count sum of a hop count {tilde over (h)}.sub.ik from node i to node k and a hop count {tilde over (h)}.sub.kj from node k to node j, as a first feature of the unknown sample; S403: calculating an average value of hop counts from neighboring nodes of node i to node j and hop counts from neighboring nodes of node j to node i, as a second feature of the unknown sample; S404: forming the unknown sample by using the constructed first feature and second feature of the unknown sample, inputting the unknown sample to the trained decision tree classifier to obtain the class of the unknown sample, that is, obtain the hop count at the position; and S405: if traversing of the hop count matrix {tilde over (H)} is not finished, traversing a next value in the matrix and returning to step S401; and if traversing of the hop count matrix {tilde over (H)} is finished, obtaining the recovered hop count matrix.
8. The method according to claim 7, wherein the constructed unknown sample is expressed as follows:
9. A computer system, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to claim 1 when executing the computer program.
10. A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the steps of the method according to claim 1 is implemented.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0051] The present disclosure is described in detail below with reference to the accompanying drawings and embodiments.
Embodiment 1
[0052] As shown in
[0053] S1: Perform a flooding process to acquire a hop count matrix {tilde over (H)} with missing entries.
[0054] S2: Construct a training sample set according to relationships between a part of observed hop counts in the hop count matrix {tilde over (H)}, and model the observed hop counts in the hop count matrix as labels of the training sample set, where a maximum hop count represents the number of classes.
[0055] S3: Train a decision tree classifier according to the training sample set obtained in step S2.
[0056] S4: Construct a feature for an unobserved hop count, to obtain an unknown sample; and inputting the unknown sample to the trained decision tree classifier, to obtain a class of the unknown sample which equals to a missing hop count at a corresponding position in the matrix, so as to recover a complete hop count matrix H.
[0057] In a specific embodiment, as shown in
[0058] Through a flooding process, every node in the network can obtain minimum hop counts of other nodes. A hop count vector of any node i is constructed: h.sub.i={h.sub.i1, h.sub.i2, . . . , h.sub.in}, h.sub.ij represents a hop count between node i and node j; i=1, . . . , n; j=1, . . . , n. A hop count matrix is constructed by using hop count vectors of all the nodes, as shown in formula (1):
[0059] However, in the flooding process, due to node failures or attacks from malicious nodes, some entries in the hop count matrix H are missing, where the hop count matrix with missing entries is expressed as formula (2):
wherein ⊙ represents a Hadamard product; Ω=[ω.sub.ij].sub.n*n is a binary matrix; indicates whether a hop count at a corresponding position of the hop count matrix is missing, and is expressed as follows:
[0060] The objective of the method in this embodiment is to obtain the original hop count matrix H according to the matrix {tilde over (H)} with missing entries, to model a recovery problem of the hop count matrix as a classification problem.
[0061] In a specific embodiment, after step S1 and before step S2, in order to increase the number of training samples, if a symmetric position of a missing hop count is observed, the missing hop count is supplemented by using a hop count at the symmetric position. That is, if one of the hop count {tilde over (h)}.sub.ij from node i to node j and the hop count {tilde over (h)}.sub.ji from node j to node i is observed, the hop count at the other position is supplemented by the hop count at the symmetric position.
[0062] In a specific embodiment, the training sample set is initialized to be null. To construct the training sample set, the hop count matrix needs to be traversed. When the hop count {tilde over (h)}.sub.ij between node i and node j is not equal to 0, it indicates that the hop count between node i and node j is observed, and the hop count is used as a label of a training sample.
[0063] In a specific embodiment, for the recovery of the hop count matrix, the only available information is partial observed entries in the incomplete hop count matrix. Therefore, in this embodiment, the training sample set is constructed by using the observed entries, and topological information in the network is used to train the decision tree classifier. The training sample set is constructed by using relationships between hop counts in the network. The hop counts in the hop count matrix are modeled as labels of the training sample set, and a maximum hop count h.sub.max in the network represents the number of classes, that is, the hop count in the network is an integer from 1 to {tilde over (h)}.sub.max.
[0064] Step S2 specifically includes the following sub-steps:
[0065] S201: Initialize the training sample set to be null; in order to construct the training sample set, traverse the hop count matrix {tilde over (H)}, and when the hop count {tilde over (h)}.sub.ij between node i and node j is not equal to 0, which indicates that the hop count between node i and node j is observed, use the hop count is used as a label of a training sample, and perform a next step; otherwise, traverse a next value in the hop count matrix {tilde over (H)}.
[0066] S202: According to features of the hop count matrix, calculate hop counts of node i and node j with respect to other nodes k in the network, as shown in formula (3):
h.sub.ij=min(h.sub.ik+h.sub.kj),k=1,2, . . . ,n,k≠i,k≠j (3)
[0067] For a hop count matrix without missing entries, when h.sub.ij≠1, h.sub.ij=min(h.sub.ik+h.sub.kj), k=1, 2, . . . , n, k≠i, k≠j. However, for a hop count matrix with missing entries, the relationship between {tilde over (h)}.sub.ij and min({tilde over (h)}.sub.ik+{tilde over (h)}.sub.kj) is not clear. Therefore, the relationship between the two can be learned through a decision tree classifier.
[0068] Therefore, for all other nodes k, k=1, 2, . . . N in the network, if the hop count {tilde over (h)}.sub.ik from node i to node k and the hop count {tilde over (h)}.sub.kj from node k to node j are observed, a sum of the two hop counts is calculated. All the nodes in the network are traversed to obtain a minimum value as a first feature of a training sample:
f.sup.1=min({tilde over (h)}.sub.ik+{tilde over (h)}.sub.kj).
[0069] S203: Due to a limited prediction capability of a single feature, construct neighbor information of a node pair as features, to jointly classify an unknown hop count. The hop count between node i and node j is closely related to hop counts from neighboring nodes of node i to node j and hop counts from neighboring nodes of node j to node i. Because the neighboring nodes are distributed around the node, values of the hop counts from the neighboring nodes of node i to node j can be [{tilde over (h)}.sub.ij−1, {tilde over (h)}.sub.ij, {tilde over (h)}.sub.ij+1]. Therefore, when the node distribution is even enough and no neighboring node is missing, the hop count between node i and node j can be obtained by calculating the hop counts from the neighboring nodes of node i to node j. However, the node distribution in the network is not completely even and some neighboring nodes may be missing, their relationship also needs to be learned by using the decision tree classifier. An average value of the hop counts from the neighboring nodes of node i to node j and the hop counts from the neighboring nodes of node j to node i is calculated to serve as a second feature of the training sample.
[0070] Specifically, two neighbor lists L.sub.i and L.sub.j are initialized, neighbors of node i are selected according to a hop count vector {tilde over (h)}.sub.i, and neighbors of node j are selected according to a hop count vector {tilde over (h)}.sub.j. Indexes of neighboring nodes of node i are stored in the neighbor list L.sub.i, and indexes of neighboring nodes of node j are stored in the neighbor list L.sub.j. Variables n.sub.i and n.sub.j respectively represent the numbers of available neighboring nodes of node i and node j. If a hop count {tilde over (h)}.sub.L.sub.
[0071] The second feature of the training sample is calculated as shown in equation (4):
[0072] S204: Use the observed hop count as a class, and form a training sample by using the class together with the constructed first feature and second feature of the sample, where the training sample is expressed as follows:
[0073] The training sample is added to the training sample set.
[0074] S205: Traverse the hop count matrix {tilde over (H)}, and obtain the training sample set after the traversing is finished, where the training sample set is expressed as formula (5):
S=[s.sub.1,s.sub.2, . . . ,s.sub.N]=[(f.sub.1.sup.1,f.sub.1.sup.2,c.sub.1);(f.sub.2.sup.1,f.sub.2.sup.2,c.sub.2); . . . (f.sub.N.sup.1,f.sub.N.sup.2,c.sub.N)], (5)
wherein f.sub.m.sup.1 represents a first feature of an m.sup.th training sample, f.sub.m.sup.2 represents a second feature of the m.sup.th training sample, and c.sub.m∈{1, 2, . . . , h.sub.max} represents a class of the m.sup.th training sample; m=1, 2, . . . , N, and N is the number of training samples.
[0075] The decision tree classifier is trained by using the obtained training sample set.
[0076] In a specific embodiment, a feature is constructed for each unobserved hop count, to obtain unknown samples; and the hop count matrix {tilde over (H)} is traversed again, to find missing hop counts in the hop count matrix {tilde over (H)}. Specifically:
[0077] S401: If a hop count at a certain position in the hop count matrix {tilde over (H)} is not observed, perform a next step; otherwise, traverse a next value in the hop count matrix {tilde over (H)}.
[0078] S402: For each missing hop count, construct a first feature of an unknown samples, and with respect to all other nodes k, k=1, 2, . . . N in the network, if a hop count {tilde over (h)}.sub.ik from node i to node k and a hop count {tilde over (h)}.sub.kj from node k to node j are observed, calculate a sum of the two hop counts. All the nodes in the network are traversed to obtain a minimum value as a first feature of the unknown sample:
f′.sup.1=min({tilde over (h)}.sub.ik+{tilde over (h)}.sub.kj).
[0079] S403: Initialize two neighbor lists L.sub.i and L.sub.j, select neighbors of node i according to a hop count vector {tilde over (h)}.sub.i, and select neighbors of node j according to a hop count vector {tilde over (h)}.sub.j. Indexes of neighboring nodes of node i and indexes of neighboring nodes of node j are stored in the neighbor lists L.sub.i and L.sub.j respectively. Variables n.sub.i and n.sub.j respectively represent the numbers of available neighboring nodes of node i and node j. If a hop count {tilde over (h)}.sub.L.sub.
[0080] S404: Form an unknown sample by using the constructed first feature and second feature of the unknown sample, where the unknown sample is expressed as follows:
[0081] The unknown sample is inputted to the trained decision tree classifier to obtain a class of the unknown sample, that is, obtain the hop count at the position.
[0082] S405: If traversing of the hop count matrix {tilde over (H)} is not finished, traverse a next value in the matrix and return to step S401; and if traversing of the hop count matrix {tilde over (H)} is finished, obtain the recovered hop count matrix.
[0083] In this embodiment, features are constructed for all missing values in the hop count matrix {tilde over (H)}, and then are classified by using the decision tree classifier, to obtain the hop count at each position with the missing value, thereby obtaining the recovered complete hop count matrix.
[0084] An application scenario of this embodiment is described below, to reflect the advantage of the method in this embodiment over other methods. However, various parameters in the experiments are not set fixedly. The performance of this embodiment is compared with the performance of a method of hop count matrix recovery based on naive Bayes classifier (HCMR-NBC) and the performance of a singular value thresholding (SVT) method. For ease of description, the contrast solutions and the method of hop count matrix recovery based on decision tree (HCMR-DT) in this embodiment are all denoted by abbreviations in the figures. Normalized root mean square errors (NRMSEs) of the recovered hop count matrix
[0085] A 100×100 square monitored area containing 20 anchor nodes is considered, where the nodes are randomly distributed in the monitored area. The communication radii of the anchor nodes and unknown nodes are set to 20. In the experiments, effects of observation ratio, matrix dimension (obtained by varying the node density) and network topology (specifically, uniform network, O-type network and S-type network) on the performance of the algorithm were considered. In addition, to reduce random errors, all the experiments were repeated 100 times to obtain final results.
[0086] Effects of different matrix dimensions on the reconstruction performance of the algorithm were first compared experimentally.
[0087] As can be learned from
[0088] This embodiment also compares the effect of observation ratio on the reconstruction performance of the algorithm, and
[0089] In addition, the adaptability of the method described in this embodiment for different networks is also proved. Comparison was performed carried out on topological networks with uniform node distribution, S-type node distribution and O-type node distribution.
[0090] The hop count matrix recovered by using the method proposed in this embodiment has a wide range of applications in the field of IoT. In order to verify the significance of hop count recovery for IoT applications, the effectiveness of hop count matrix recovery is verified here with localization as a practical scenario. The classical DV-hop algorithm is used as a localization algorithm. Hop count matrices recovered by using the solution of this embodiment and other two contrast solutions, and the hop count matrix without missing entries are used for localization at the same time, and an average localization error of unknown nodes is evaluated using the NRMSE of localization, which is calculated as shown in equation (8):
wherein ({circumflex over (x)}.sub.k, ŷ.sub.k) represents an estimated position of an unknown node k, (x.sub.k,y.sub.k) represents a real position of the unknown node k. N.sub.u represents the number of unknown nodes, and R is the communication radius of each node.
[0091]
[0092] It can be seen that the proposed method in this embodiment maintains the best localization performance all the time. In addition, the differences of location performance of the recovered hop count matrix at different matrix dimensions with an observation ratio of 40%. With reference to
Embodiment 2
[0093] A computer device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the following method steps when executing the computer program:
[0094] S1: Perform a flooding process to acquire a hop count matrix {tilde over (H)} with missing entries.
[0095] S2: Construct a training sample set according to relationships between a part of observed hop counts in the hop count matrix {tilde over (H)}, and model the observed hop counts in the hop count matrix as labels of the training sample set, where a maximum hop count represents the number of classes.
[0096] S3: Train a decision tree classifier according to the training sample set obtained in step S2.
[0097] S4: Construct features for unobserved hop counts, to obtain unknown samples.
[0098] S5: Input the unknown samples to the trained decision tree classifier, to obtain classes of the unknown samples, that is, obtain missing hop counts at corresponding positions in the matrix, to recover a complete hop count matrix H.
Embodiment 3
[0099] A computer readable storage medium storing a computer program is provided, where the following method steps are implemented when the computer program is executed by a processor:
[0100] S1: Perform a flooding process to acquire a hop count matrix {tilde over (H)} with missing entries.
[0101] S2: Construct a training sample set according to relationships between a part of observed hop counts in the hop count matrix {tilde over (H)}, and model the observed hop counts in the hop count matrix as labels of the training sample set, where a maximum hop count represents the number of classes.
[0102] S3: Train a decision tree classifier according to the training sample set obtained in step S2.
[0103] S4: Construct features for unobserved hop counts, to obtain unknown samples.
[0104] S5: Input the unknown samples to the trained decision tree classifier, to obtain classes of the unknown samples, that is, obtain missing hop counts at corresponding positions in the matrix, to recover a complete hop count matrix H.
[0105] It is apparent that the above embodiments are merely intended to describe the present disclosure clearly, rather than to limit the implementations of the present disclosure. Any modification, equivalent substitution and improvement made within the spirit and principle of the present disclosure should fall within the protection scope of the claims of the present disclosure.