TWO-STAGE FREQUENCY SELECTION METHOD AND DEVICE FOR MICROWAVE FREQUENCY SWEEP DATA
20230048665 · 2023-02-16
Inventors
- Zhenbo WEI (Hangzhou, CN)
- Jinyang ZHANG (Hangzhou, CN)
- Jun Wang (Hangzhou, CN)
- Dongdong DU (Hangzhou, CN)
- Shaoming CHENG (Hangzhou, CN)
Cpc classification
G06F18/21
PHYSICS
International classification
Abstract
Disclosed is a two-stage frequency selection method and device for microwave frequency sweep data. The method includes: acquiring microwave frequency sweep data; performing frequency selection on the microwave frequency sweep data by using a random forest-recursive feature elimination algorithm, taking a preset parameter in the random forest-recursive feature elimination algorithm as a hyper-parameter, changing the value of the hyper-parameter, and generating a series of candidate frequency subsets within different frequencies; building prediction models on the basis of the frequency sweep data corresponding to the candidate frequency subsets of different frequencies; evaluating the performance of each prediction model by means of 10 fold cross validation, and calculating evaluation index values of model performance; and taking the evaluation indexes as a voting basis, and selecting an optimal frequency subset by using a majority voting method.
Claims
1. A two-stage frequency selection method for microwave frequency sweep data, comprising: acquiring microwave frequency sweep data; normalizing the microwave frequency sweep data, and then dividing out an attenuation training data set and a phase shift training data set, wherein the two data sets exist in the form of a data table, the vertical direction of the data table represents a frequency domain {f.sub.1, f.sub.2, K, f.sub.i, K, f.sub.n}, the horizontal direction represents a sample domain {X.sub.1, X.sub.2, K, X.sub.j, K, X.sub.m}, and the corresponding data elements are attenuation values A or phase shift values Phi; by using a random forest-recursive feature elimination algorithm, performing frequency selection on the microwave frequency sweep data, taking a preset parameter in the random forest-recursive feature elimination algorithm as a hyper-parameter, changing the value of the hyper-parameter, and generating a series of candidate frequency subsets within different frequencies, wherein the step comprises: (2.1) training, by using the random forest algorithm, a sample attribute prediction model on the attenuation training data set; (2.2) obtaining the importance of attenuation features corresponding to each frequency, sorting the frequencies according to the importance of features, and finding out frequencies with the lowest importance of the corresponding features; (2.3) removing attenuation feature data corresponding to the frequencies with the lowest importance of the corresponding attenuation features from the attenuation training data set, and retraining the sample attribute prediction model on the updated attenuation training data set by using the random forest algorithm; (2.4) repeating steps (2.2) and (2.3) until only the data corresponding to PreNum frequencies remain in the attenuation training data set, and recording the set consisting of the PreNum frequencies as a frequency set F.sub.A; (2.5) training a sample attribute prediction model on the phase shift training data set by using the random forest algorithm; (2.6) obtaining the importance of phase shift features corresponding to each frequency, sorting the frequencies according to the importance of features, and finding out frequencies with the lowest importance of the corresponding features; (2.7) removing phase shift feature data corresponding to the frequencies with the lowest importance of the corresponding phase shift features from the phase shift training data set, and retraining the sample attribute prediction model on the updated phase shift training data set by using the random forest algorithm; (2.8) repeating steps (2.6) and (2.7) until only the data corresponding to PreNum frequencies remains in the phase shift training data set, and recording the set consisting of the PreNum frequencies as a frequency set F.sub.P; (2.9) taking the intersection of the frequency set F.sub.A and the frequency set F.sub.Pto obtain a candidate frequency subset F.sub.sub; and (2.10) changing the value of the preset parameter PreNum of the random forest-recursive feature elimination algorithm, and repeating steps (2.1) to (2.9) to obtain a series of candidate frequency subsets within different frequencies; building prediction models on the basis of the frequency sweep data corresponding to the candidate frequency subsets of different frequencies, wherein this step comprises: each candidate frequency subset corresponding to a frequency sequence number subset, extracting corresponding data from the attenuation training data set and the phase shift training data set respectively by using the frequency sequence number subsets, and combining the two parts of data into attenuation-phase shift frequency sweep data sets; and taking each attenuation-phase shift frequency sweep data set as input data and sample attribute values as output data, and building prediction models for the sample attribute values by using learning algorithms; wherein each candidate frequency subset corresponding to a frequency sequence number subset, extracting corresponding data from the attenuation training data set and the phase shift training data set respectively by using the frequency sequence number subsets, and combining the two parts of data into attenuation-phase shift frequency sweep data sets, which comprises: (4.1) searching for the sequence number of each frequency in the candidate frequency subset in the normalized attenuation frequency sweep data set or phase shift frequency sweep data set to form a frequency sequence number subset; (4.2) repeating step (4.1) until the frequency sequence number subset corresponding to each candidate frequency subset in step (3) is obtained; (4.3) extracting corresponding data from the attenuation training data set according to the frequency sequence number subset; (4.4) extracting corresponding data from the phase shift training data set according to the frequency sequence number subset; (4.5) vertically splicing the two parts of data extracted from the attenuation training data set and the phase shift training data set respectively to obtain an attenuation-phase shift frequency sweep data set corresponding to the candidate frequency subset; and (4.6) repeating steps (4.3)-(4.5) until a corresponding attenuation-phase shift frequency sweep data set is obtained for each candidate frequency subset; evaluating the performance of each prediction model by means of 10 fold cross validation, and calculating evaluation index values of model performance; and taking the evaluation indexes as a voting basis, and selecting an optimal frequency subset by using a majority voting method, which comprises: (6.1) using R.sup.2 as an index of the voting basis, selecting top k models with the maximum R.sup.2 value under each of T algorithms, obtaining a frequency subset corresponding to each model, and selecting a frequency subset with the most votes by using the majority voting method on the T×k candidate results, denoted as F.sub.opt.sup.R.sup.
2. The two-stage frequency selection method for microwave frequency sweep data according to claim 1, wherein taking the evaluation indexes as a voting basis, and selecting an optimal frequency subset by using a majority voting method comprises: taking the evaluation indexes as a voting basis, by using the majority voting method, selecting an optimal prediction model, obtaining an attenuation-phase shift frequency sweep data set corresponding to the optimal prediction model, and then obtaining a frequency subset corresponding to the attenuation-phase shift frequency sweep data set, namely the optimal frequency subset.
Description
BRIEF DESCRIPTION OF DRAWINGS
[0054] The accompanying drawings described herein are used to provide further understanding of the present disclosure and constitute a part of the present disclosure. The exemplary embodiments of the present disclosure and their descriptions are used to explain the present disclosure and do not constitute improper limitations of the present disclosure. In the drawings:
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
DESCRIPTION OF EMBODIMENTS
[0061] In order to make the objectives, technical solutions and advantages of the present disclosure clearer, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, not all of them. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without any creative efforts shall fall within the protection scope of the present disclosure.
Embodiment 1
[0062]
[0063] Step S102, microwave frequency sweep data is acquired;
[0064] In the embodiment, the test device shown in
[0065] Step S103, after the microwave frequency sweep data is acquired, the method further includes:
[0066] The microwave frequency sweep data is normalized, and then an attenuation training data set and a phase shift training data set are divided out.
[0067] In an embodiment, z-score normalization is performed on the original attenuation frequency sweep data set A.sub.original and phase shift frequency sweep data set P.sub.original, and the specific formula is as follows:
[0068] In the formula, x* is the normalized data, x is the original data, m represents a mean value of the data, and s represents a variance of the data. Normalized frequency sweep data sets A.sub.normalization and P.sub.normalization are obtained; 70% of the frequency sweep data is randomly divided out from the A.sub.normalization and combined into an attenuation training data set A.sub.training; and 70% of the frequency sweep data is randomly divided out from the P.sub.normalization and combined into a phase shift training data set P.sub.training.
[0069] Both the attenuation frequency sweep data set and the phase shift frequency sweep data set exist in the form of a data table, the vertical direction of the data table represents a frequency domain {f.sub.1, f.sub.2, K, f.sub.i, K, f.sub.n}, the horizontal direction represents a sample domain {X.sub.1, X.sub.2, K, X.sub.j, K, X.sub.m}, and the corresponding data elements are attenuation values A or phase shift values Phi.
[0070] The data normalization belongs to the category of data non-dimensionalization. The effect of step S103 is to convert data of different specifications to the same specification, which will help model training.
[0071] Step S104, frequency selection is performed on the microwave frequency sweep data by using a random forest-recursive feature elimination algorithm, a preset parameter in the random forest-recursive feature elimination algorithm is taken as a hyper-parameter, the value of the hyper-parameter is changed, and a series of candidate frequency subsets within different frequencies are generated;
[0072] In an embodiment, feature selection is performed on the attenuation training data set and the phase shift training data set respectively by using the random forest-recursive feature elimination algorithm to obtain a frequency set selected on the basis of the attenuation training data set and a frequency set selected on the basis of the phase shift training data set, the intersection of the two frequency sets is taken to obtain a candidate frequency subset, as shown in
[0073] Further, the specific process of this step is shown in
[0074] (2.1) A sample attribute prediction model is trained on the attenuation training data set by using the random forest algorithm.
[0075] (2.2) The importance of attenuation features corresponding to each frequency is obtained, the frequencies are sorted according to the importance of features, and frequencies with the lowest importance of the corresponding features are found out.
[0076] (2.3) Attenuation feature data corresponding to the frequencies with the lowest importance of the corresponding attenuation features is removed from the attenuation training data set, and the sample attribute prediction model is retrained on the updated attenuation training data set by using the random forest algorithm.
[0077] (2.4) Steps (2.2) and (2.3) are repeated until only the data corresponding to PreNum frequencies remain in the attenuation training data set, and the set consisting of the PreNum frequencies is recorded as a frequency set F.sub.A .
[0078] (2.5) A sample attribute prediction model is trained on the phase shift training data set by using the random forest algorithm.
[0079] (2.6) The importance of phase shift features corresponding to each frequency is obtained, the frequencies are sorted according to the importance of features, and frequencies with the lowest importance of the corresponding features are found out.
[0080] (2.7) Phase shift feature data corresponding to the frequencies with the lowest importance of the corresponding phase shift features is removed from the phase shift training data set, and the sample attribute prediction model is retrained on the updated phase shift training data set by using the random forest algorithm.
[0081] (2.8) Steps (2.6) and (2.7) are repeated until only the data corresponding to PreNum frequencies remain in the phase shift training data set, and the set consisting of the PreNum frequencies is recorded as a frequency set F.sub.P.
[0082] (2.9) The intersection of the frequency set F.sub.A and the frequency set F.sub.P, is taken to obtain a candidate frequency subset F.sub.sub.
[0083] (2.10) The value of the preset parameter PreNum of the random forest-recursive feature elimination algorithm is changed, and steps (2.1) to (2.9) are repeated to obtain a series of candidate frequency subsets within different frequencies.
[0084] The effect of step S104 is, on the basis of the attenuation training data set and the phase shift training data set obtained in step S103, generating candidate frequency subsets by using the random forest-recursive feature elimination algorithm.
[0085] Step S105, prediction models are built on the basis of the frequency sweep data corresponding to the candidate frequency subsets of different frequencies.
[0086] In an embodiment, this step includes two sub-steps:
[0087] Step S1051, each candidate frequency subset corresponds to a frequency sequence number subset, corresponding data is extracted from the attenuation training data set and the phase shift training data set respectively by using the frequency sequence number subsets, and the two parts of data are combined into attenuation-phase shift frequency sweep data sets; specifically, this step specifically includes:
[0088] (4.1) The sequence number of each frequency in the candidate frequency subset is searched in the normalized attenuation frequency sweep data set or phase shift frequency sweep data set to form a frequency sequence number subset.
[0089] (4.2) Step (4.1) is repeated until the frequency sequence number subset corresponding to each candidate frequency subset in step (3) is obtained.
[0090] (4.3) Corresponding data is extracted from the attenuation training data set according to the frequency sequence number subset.
[0091] (4.4) Corresponding data is extracted from the phase shift training data set according to the frequency sequence number subset.
[0092] (4.5) The two parts of data extracted from the attenuation training data set and the phase shift training data set respectively are vertically spliced to obtain an attenuation-phase shift frequency sweep data set corresponding to the candidate frequency subset.
[0093] (4.6) Steps (4.3)-(4.5) are repeated until a corresponding attenuation-phase shift frequency sweep data set is obtained for each candidate frequency subset.
[0094] Step S1052, each attenuation-phase shift frequency sweep data set is taken as input data, sample attribute values are taken as output data, and prediction models are built for the sample attribute values by using learning algorithms.
[0095] In an embodiment, as shown in
[0096] The effect of step S105 is, on the basis of the generated candidate frequency subsets, combining the obtained original microwave frequency sweep data into corresponding attenuation-phase shift frequency sweep data sets, and then building models by using different regression algorithms.
[0097] Step S106, the performance of each prediction model is evaluated by means of 10 fold cross validation, and evaluation index values of model performance are calculated.
[0098] In an embodiment, as shown in
[0099] The determination coefficient R.sup.2 is:
[0100] The RMSE is:
[0101] The MAE is:
[0102] Where y.sub.i, is the real moisture content of a corn sample, ŷ.sub.i is a predicted value of the moisture content of the corn sample,
[0103] Step S110, the evaluation indexes are taken as a voting basis, and an optimal frequency subset is selected by using a majority voting method.
[0104] In an embodiment, the evaluation indexes are taken as a voting basis, an optimal prediction model is selected by using the majority voting method, an attenuation-phase shift frequency sweep data set corresponding to the optimal prediction model is obtained, and then a frequency subset corresponding to the attenuation-phase shift frequency sweep data set, that is, the optimal frequency subset, is obtained. More specifically, this step includes:
[0105] (6.1) In the embodiment, R.sup.2 is first used as an index of the voting basis, top 5 models with the maximum R.sup.2 value are selected under each algorithm, a frequency subset sequence number corresponding to each model is obtained, a frequency subset with the most votes is selected by using the voting method MVM on the 6×5 candidate results, as shown in Table 1, the 3.sup.rd frequency subset F.sub.sub3 obtains the most votes;
[0106] (6.2) Then RMSE is used as an index of the voting basis, top 5 models with the minimum RMSE value are selected under each algorithm, a frequency subset sequence number corresponding to each model is obtained, a frequency subset with the most votes is selected by using the voting method MVM on the 6×5 candidate results, as shown in Table 1, the 3.sup.rd and 4.sup.th frequency subsets F.sub.sub3 and F.sub.sub4 obtain the most votes at the same time;
[0107] (6.3) Finally, MAE is used as an index of the voting basis, and top 5 models with the minimum MAE value are selected under each algorithm, a frequency subset sequence number corresponding to each model is obtained, a frequency subset with the most votes is selected by using the voting method MVM on the 6×5 candidate results, as shown in Table 1, the 3.sup.rd and 4.sup.th frequency subsets F.sub.sub3 and F.sub.sub4 obtain the most votes again at the same time.
[0108] (6.4) The optimal frequency set is selected after two times of voting for the following reasons:
[0109] 1. The frequency subset F.sub.sub3 is selected as the optimal frequency set under the three evaluation indexes.
[0110] 2. The frequency subset F.sub.sub3 involve fewer measurement frequencies than the frequency subset F.sub.sub 4.
[0111] Therefore, the frequency subset F.sub.sub3 is selected as the final optimal frequency set.
TABLE-US-00001 TABLE 1 Results of selecting optimal frequency sets from candidate frequency subsets by using the voting method MVM. Optimal Sequence numbers of frequency top 5 frequency subsets subset Evaluation Regression Top Top Top Top Top sequence index algorithm 1 2 3 4 5 number R.sup.2 MLR 4 12 8 3 6 3 SVM 4 3 5 2 1 RF 5 16 17 12 3 AdaBoost 16 5 7 1 10 XGBoost 2 1 3 4 17 DNN 8 14 4 3 7 RMSE MLR 4 12 8 3 6 3a 4a SVM 4 3 5 2 1 RF 5 3 17 12 4 AdaBoost 5 7 1 16 10 XGBoost 2 1 3 4 17 DNN 8 14 7 4 3 MAE MLR 12 7 11 6 8 3a 4a SVM 5 4 3 6 2 RF 3 5 17 4 14 AdaBoost 5 16 7 10 1 XGBoost 2 1 4 3 17 DNN 14 7 8 3 4 .sup.aindicates that the frequency subset obtains the same votes as the other frequency subset
[0112] The effect of step S110 is to complete the selection of the optimal frequency set by using the majority voting method (MVM).
Embodiment 2
[0113] As shown in
[0114] an acquisition module 102, configured to acquire microwave frequency sweep data;
[0115] a generation module 104, configured to perform frequency selection on the microwave frequency sweep data by using a random forest-recursive feature elimination algorithm, take a preset parameter in the random forest-recursive feature elimination algorithm as a hyper-parameter, change the value of the hyper-parameter, and generate a series of candidate frequency subsets within different frequencies;
[0116] a building module 106, configured to build prediction models on the basis of the frequency sweep data corresponding to the candidate frequency subsets of different frequencies;
[0117] a calculation module 108, configured to evaluate the performance of each prediction model by means of 10 fold cross validation, and calculate evaluation index values of model performance; and
[0118] a selection module 110, configured to take the evaluation indexes as a voting basis, and select an optimal frequency subset by using a majority voting method.
[0119] The sequence numbers of the foregoing embodiments of the present disclosure are merely for description, and do not imply the preference among the embodiments.
[0120] In the above embodiments of the present disclosure, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
[0121] In the several embodiments provided in the present disclosure, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiment described above is only illustrative. For example, the division of the units may be a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be integrated into another system, or some features may be ignored or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection by some interfaces, units or modules, and may be in electrical or other forms.
[0122] The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. The objectives of the solutions of the embodiments may be implemented by selecting part of or all of the units according to actual needs.
[0123] In addition, the functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software functional unit.
[0124] When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present disclosure substantially, or the part of the present disclosure making contribution to the prior art, or all of or part of the technical solution may be embodied in the form of a software product, and the computer software product is stored in a storage medium, which includes a plurality of instructions enabling a computer device (which may be a personal computer, a server or a network device) to execute all of or part of the steps in the methods of the embodiments of the present disclosure. The aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, a magnetic disk or an optical disk.
[0125] Described above are only the preferred embodiments of the present disclosure, and the present disclosure is not limited thereto. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.