Biologically inspired methods and systems for automatically determining the modulation types of radio signals using stacked de-noising autoencoders
10003483 ยท 2018-06-19
Assignee
Inventors
- Benjamin J. Migliori (San Diego, CA, US)
- Daniel J. Gebhardt (San Diego, CA, US)
- Daniel C. Grady (La Jolla, CA, US)
- Riley Zeller-Townson (Hampstead, NC, US)
Cpc classification
H04B1/0003
ELECTRICITY
G06N3/049
PHYSICS
H04L27/0008
ELECTRICITY
International classification
G06N99/00
PHYSICS
H04B1/00
ELECTRICITY
Abstract
Class types of input signals having unknown class types are automatically classified using a neural network. The neural network learns features associated with a plurality of different observed signals having respective different known class types. The neural network then recognizes features of the input signals having unknown class types that at least partially match at least some of the features associated with the plurality of different observed signals having respective different known class types. The neural network determines probabilities that each of the input signals has each of the known class types based on strengths of the matches between the recognized features of the input signals and the features associated with plurality of different observed signals. The neural network classifies each of the input signals as having one of the respective different known class types based on a highest determined probability.
Claims
1. A method for automatically determining class types of input signals having unknown class types, comprising: a) learning, by a neural network including multiple stacked sparse denoising autoencoders (SSDA) with weighted connections, features associated with a plurality of different observed signals having respective different known class types, wherein step a) comprises adjusting assigned weights of the connections based on the features of the plurality of different observed signals; b) refining, by a softmax component, the adjusted weights of the connections based on outputs of the SSDA; c) recognizing, by the SSDA, features of the input signals having unknown class types that at least partially match at least some of the features associated with the plurality of different observed signals having respective different known class types; d) determining, by the softmax component, probabilities that each of the input signals have each of the known class types based on strengths of matches between recognized features of each of the input signals and the features associated with the plurality of different observed signals; and e) classifying, by the softmax component, each of the input signals as having one of the respective different known class types based on a highest determined probability for each input signal in a manner that is accurate in noisy environments.
2. The method of claim 1, wherein the class types are modulation types.
3. The method of claim 1, wherein adjusting the assigned weights of the connections includes comparing outputs of the SSDA to corresponding inputs and adjusting the assigned weights of the connections automatically based on a difference between the outputs and the corresponding inputs.
4. The method of claim 3, wherein the assigned weights are repeatedly adjusted to minimize the difference between the outputs and the corresponding inputs.
5. The method of claim 1, wherein refining the adjusted weights of the connections includes estimating, for each output of the SSDA, a probability that the output has a known class type, determining whether the estimated probability is correct, and repeatedly refining the adjusted weights of the connections until the estimated probability is substantially correct.
6. A system for automatically determining modulation types of input signals having unknown modulation types, comprising: multiple stacked sparse denoising autoencoders (SSDA) with weighted connections, the SSDA configured to: during a training phase, learn features associated with a plurality of different observed signals having different respective known modulation types and adjusting assigned weights of the connections based on the features of the plurality of different observed signals; and during a classification phase, recognize features of the input signals that at least partially match at least some of the features associated with the plurality of different observed signals having different respective known modulation types and produce outputs indicative of strengths of the matches of the recognized features of the input signals with the features associated with the plurality of different observed signals; and a softmax component configured to: during the training phase, refine the adjusted weights of the connections based on outputs of the SSDA; and during the classification phase, determine probabilities that each of the input signals has each of the known modulation types based on outputs of the SSDA and classify each of the input signals as having one of the different respective known modulation types based on a highest determined probability for each input signal in a manner that is accurate in noisy environments.
7. The system of claim 6, wherein, during the training phase, the SSDA adjusts the assigned weights of the connections by comparing outputs of SSDA to corresponding inputs and adjusting the assigned weights of the connections automatically based on a difference between the outputs and the corresponding inputs.
8. The system of claim 7, wherein the assigned weights of the connections are repeatedly adjusted until the outputs of the SSDA are substantially the same as the corresponding inputs.
9. The system of claim 6, wherein the softmax component refines the adjusted weights of the connections by estimating, for each output of the SSDA, a probability that the output has a known modulation type, determining whether the estimated probability is correct, and repeatedly refining the adjusted weights of the connections of the SSDA until the estimated probability is substantially correct.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The novel features of the present invention will be best understood from the accompanying drawings, taken in conjunction with the accompanying description, in which similarly-referenced characters refer to similarly-referenced parts, and in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
DETAILED DESCRIPTION OF THE EMBODIMENTS
(14) According to an illustrative embodiment, a method, device and system are provided for automatically classifying feature types of input signals having unknown feature types. Features associated with a plurality of different observed signals having respective different class types are learned by a neural network. The neural network then recognizes features of the input signals having unknown class types that at least partially match at least some of the features associated with the plurality of different observed signals having respective different known class types. The neural network determines probabilities that each of the input signals has each of the known modulation types based on strengths of matches between the recognized features of the input signals and the features associated with the plurality of different observed signals. The neural network classifies each of the input signals as having one of the respective different known class types based on a highest determined probability.
(15) The method, device and system are accurate in noisy environments and can be considered to be biologically inspired in that they learn modulation types much in the same way an animal might learn patterns without advance knowledge.
(16) Although modulated radio signals are in many ways quite different from the signals that biological neural systems have evolved to detect and process, they nonetheless propagate through an environment that produces many biologically relevant sources of noise. It is reasonable to consider whether principles of biological sensing may lead to useful alternatives to existing statistical approaches for radio signal processing. As demonstrated below, the architecture for classifying modulation types according to illustrative embodiments does generate useful receptive fields that allow classification methods to better discriminate amongst classes.
(17) The approach described herein differs significantly from conventional approaches to automatic modulation classification in that it is model-free and expert-free on the ingest side, and requires only on example signals to detect future signals modulated in the same manner. This stands in sharp contrast to the conventional LB and FB AMC methods which, as noted above, require expert input at each stage.
(18) For example, referring to
(19) As can be seen from
(20) To aid in understanding of the modulation classification performed according to illustrative embodiments, a problem formulation is presented below. Consider a deterministic system M which accepts a signal s and emits a prediction M(s) with which one of a fixed number N.sub.mod of modulation families s is encoded. Letting (s) be the true modulation of s, one way to characterize M is with its full joint distribution, where P.sub.c.sup.(i|i) represents the probability of correct prediction for a given signal and is given by:
P.sub.c.sup.(i|i)=Prob[M(s)=i and T(s)=i](1)
(21) A measure of the performance of M is the average correct classification across all tested modulation families, P.sub.cc:
(22)
(23) Cases are also considered in which P.sub.cc is allowed to be a function of signal to noise ratio (SNR).
(24) According to illustrative embodiments, the automatic modulation classification methodology relies on a deep neural network, shown as an autoencoder 230 in
(25) Referring first to
(26) According to illustrative embodiments, the autoencoder 230 includes inputs, multiple hidden layers, e.g., Layer 1 and Layer 2, and outputs. Each hidden layer includes weighted connections going into or coming out of neurons. The values coming into neurons are multiplied in the neurons by the weights of the connections through which they enter the neurons. Thus, each neuron has a number of weights or weight values, also called receptive fields. The neurons output a strong signal when a feature of the input signal matches the feature corresponding to a receptive field of the neuron. That is, the stronger the similarity or match between a feature of an input signal to the feature corresponding to a receptive field of a neuron, the stronger the signal output by the neuron.
(27) The weights of the connections may be set initially based on observed characteristics associated with the known modulation types. By adjusting the weights of the connections through training of the autoencoder 230, the neurons can be tuned to respond to particular characteristics of the input signals, e.g., to produce strong outputs when input signals have features that correspond to (or strongly match) features of signals having known modulation types.
(28) As noted above, according to illustrative embodiments, training includes two stages: unsupervised pre-training and supervised fine-tuning.
(29) Referring to
(30) Once the unsupervised training is completed, the configuration shown in
(31) Once the autoencoder 230 is trained, the weight values (also referred to as receptive fields) of the neurons will correspond to features of signals having known modulation types. Thus, when new input signals having unknown modulation types enter the neurons of the autoencoder 230, the neurons will output strong signals for those input signals that have features that substantially match the features corresponding to the weight values. Through unsupervised training and fine-tuning, the autoencoder 230 learns different features of signals that are associated with different known modulation types, outputting signals having features that substantially match features of at least one known modulation type.
(32) Once the autoencoder 230 is fine-tuned, it can be used along with the softmax 250 in the configuration shown in
(33) Although only one autoencoder 230 is shown in
(34) The architecture depicted in
(35) A binary file, produced by randomly choosing byte values, was used as the signal input. This binary data was modulated as in-phase and quadrature I/Q samples using each of six modulation methods including: on-off keying (OOK), Gaussian frequency-shift keying (GFSK), Gaussian minimum-shift keying (GMSK), differential binary phase-shift keying (DBPSK), differential quadrature phase-shift keying (DQPSK), and orthogonal frequency-division multiplexing (OFDM).
(36) For each modulation, the samples were upconverted to the carrier frequency by a BladeRF SDR. The SDR was configured in RF loop-back mode, such that the RF signal was sent and received only within the device's circuitry, and not to an external antenna. This arrangement provides added realism by incorporating the upconversion and radio effects, but without unwanted third-party signals that could pollute the controlled testing.
(37) The signal sampling rate was set so that the number of samples per symbol (N.sub.Ss) was consistent for every modulation type, except for OFDM. In contrast with the other modulation techniques, OFDM encodes data on multiple carrier frequencies simultaneously, within the same symbol, and modulates each carrier frequency independently. For experimental purposes, an existing OFDM signal processing component was used that operates with a symbol rate different than the other configurations, but with the same sample rate. This rate is identical for both the transmission and reception of the signal. The received RF signal was down-converted at the radio and the resulting I/Q samples were stored for analysis.
(38) For generation and preprocessing of training data, data files need to be arranged into a format and structure for use by the neural network. For this purpose, the I/Q data was split into segments consisting of N.sub.Sv samples, or samples per vector. A segment is composed of interleaved I and Q values for each sample, forming a vector of length 2N.sub.Sv. Thus, each vector contained N.sub.SV/N.sub.Ss symbols. These vectors were placed into two sets, train and test (sizes N.sub.Vtrain and N.sub.Vtest), such that both the modulation type and positions within the set were random. The parameter N.sub.SV is identical for each modulation type for all the experiments described herein. The specific values of all parameters are shown in Table I. It should be appreciated that these parameters are shown by way of example, and that other parameters could be used.
(39) TABLE-US-00001 TABLE I Description Parameter Value samples per symbol N.sub.SpS 10 samples per vector N.sub.SpV 100 number of training vectors N.sub.Vtrain 60000 number of training vectors N.sub.Vmod 10000 per modulation number of test vectors N.sub.Vtest 10000
(40) Starting from a signal sample vector s as described above, the input units of the first autoencoder may be set to values given by x, where x may be computed as x=Z.Math.s. Then, the values of the hidden layer units within the autoencoder may be calculated according to:
y=(W.Math.c(x)+b.sub.v)(3)
and the values of the output units are calculated as:
z=(W.sup.T.Math.y+b.sub.h)(4)
(41) Here, is a non-linear activation function that operates element-wise on its argument, and c is a stochastic corruptor which adds noise according to some noise model to its input. The function c is non-deterministic. That is, c may corrupt the same sample vector x in different ways every time x is passed through it. As noted above, after training, the decoding output layers of the autoencoder are discarded. For a system having stacked autoencoders, the hidden layer activations are then used as the input layer to the next autoencoder.
(42) An overly sparse or compact representation may be unable to distinguish between identical modulations shifted in time. Thus, the number of neurons on the first and second layers were chosen such that with fully sparse activation constraints (5% of total neurons), there would still be a significant number of neurons active for a given sample (i.e. 25).
(43) The parameters of a single autoencoder are the weight matrix W and bias vectors b.sub.v and b.sub.h. According to illustrative embodiments, these parameters are adjusted via unsupervised pre-training so that the output layer reproduces the input as precisely as possible while also subjecting it to a constraint designed to encourage sparse activation of hidden layer units, that is, to encourage hidden layer unit activations to remain near 0 except for a small fraction. The overall cost function for a single autoencoder layer is:
J(W,b.sub.v,b.sub.h)=<z.sub.ix.sub.i.sup.2>.sub.I+.sub.kKL(,.sub.k)(5)
(44) Here, i indexes over data vectors and k indexes over hidden layer units. Parameters and are weighting and sparsity parameters, respectively, x.sub.i is the i-th data vector, z.sub.i is the corresponding output layer activation, .sub.k is the average activation level of the k-th hidden unit over all data vectors, and
(45)
is recognized as the Killback-Leibler divergence.
(46) The hidden layer activations of one autoencoder can be supplied as the input to another autoencoder, leading to a stacked architecture. Denoting the input, hidden, and output units of a single autoencoder at layer I as x.sup.(I), y.sup.(I), z.sup.(I) respectively, then the process of forward propagation through the entire network of autoencoders proceeds sequentially according to:
y.sup.(I)=(W.sup.(I).Math.c.sub.I(y.sup.(I-1))+b.sub.v.sup.(I))(6)
for I=1 . . . L, and with the convention that y.sup.(0) is the input layer.
(47) Sequential, unsupervised training of individual autoencoder layers was conducted using stochastic gradient descent with a batch size of 100 and the AdaGrad method, based on the I/Q data set described previously. The parameters used in this example for training are listed in Table II below. It should be appreciated that these parameters are provided by way of example, and that other parameters may be used.
(48) TABLE-US-00002 TABLE II Description Symbol Value activation function tanh layer 1 corruption c.sub.1 Bernoulli, p.sub.flip = 0.2 layer 2 corruption c.sub.2 Bernoulli, p.sub.flip+6 = 0.3 layer 1 sparsity target .sub.1 0.05 layer 2 sparsity target .sub.2 0.00
(49) The unsupervised pre-training phase was followed with supervised fine-tuning. For this phase, the pre-trained autoencoders were organized into a purely feed-forward multilayer perceptron according to Equation 6, with an additional final layer given as:
y.sup.(L))=softmax (W.sup.((L).Math.(y.sup.(L-1))+.sub.b.sup.(L))(7)
where L is the total number of layers.
(50) Interpreting the final output vector of the multilayer perceptron as a probability distribution over modulation families, supervised learning attempts were made to minimize the negative log-likelihood function with an additional L2 regularization term to encourage the model to retain the sparsely activating features learned during the unsupervised phase. The regularization term A was set to a value of 1 or 0, depending on the desired experiment configuration. Explicitly, where n is the list of samples, L is the total number of layers, y.sup.(I) is the output of layer I, and W.sup.(I) indicates the weight matrix between layers I and I+1, the loss function of the multi-layer perceptron is given by:
(51)
where t.sub.i indicates the index corresponding to the current label for sample i, and s.sub.t is the number of units in layer I.
(52) Equation 8 can be minimized using batch stochastic gradient descent, resulting in the architecture as shown in
(53) To assess the performance of the system with a more realistic channel model, the test data set was altered with additive white Gaussian noise (AWGN). These data configurations were used as input in a purely feed-forward mode, in that the system was not re-trained, and its modulation classification output evaluated. AWGN was added to each set of signal modulation types, such that for each set the resulting signal-to-noise ratio (SNR) matches a given value. This was necessary since each modulation type, as sampled by the radio, had different average power levels. For each of these signal modulation sets, {S.sub.mod}, the added noise power, P.sub.noise is:
(54)
where N.sub.s(mod) is the number of sample vectors for a particular modulation, s.sub.t is an individual signal sample vector of length , and is a factor chosen such that 10 log (P.sub.{S}/P.sub.noise) matches the desired SNR.
(55) Examples of modulated data input vectors with the addition of noise are shown in
(56) The overall classification accuracy P.sub.cc (Equation 2) was measured for architectures which varied in the number of layers and the types of costs enforced during training. A cost for non-sparse activation was used as an L1 penalty (sparsity), and a cost for weight magnitude was used as an L2 penalty (weight decay). The architectures were chosen to study the effects of adding additional regularizations on the ability of the system to classify radio modulations.
(57) For illustrative purposes, seven architectures were explored as summarized in Table III below. These architectures included a simple softmax classifier, a multi-layer perceptron (MLP) without pre-training, a single layer denoising autoencoder architecture A without sparsity or L2 regularization (weight decay), a single layer denoising autoencoding architecture B with sparsity and L2 regularization (weight decay), a double layer denoising autoencoder architecture C without sparsity and L2 regularization (weight decay), a double layer denoising autoencoder architecture D with sparsity and L2 regularization (weight decay), and a deep five layer denoising autoencoder architecture E with regularization (weight decay). The exact number of neurons (500 in layer 1 and 2, 250 in layers 3 and 4, and 100 in layer 5) was chosen arbitrarily to conform to available computing resources. To prevent learning of a trivial mapping, either the layer-to-layer dimensionality or sparsity constraint was altered between each pair of layers.
(58) The misclassification rates for each of the experimental architectures are shown in Table III. As can be seen from Table III, architectures A, C, D, and E performed approximately two orders of magnitude better than the softmax classifier alone on the test set in the absence of noise. With both L2 regularization and sparsity constraints, the number of training examples required to obtain convergence increased, and in particular architecture D required significantly more than the others. However, this was offset by the increased performance of architecture D in the presence of channel noise.
(59) TABLE-US-00003 TABLE III Label P.sub.cc (%) P.sub.cc (0 dB)(%) Neurons (N.sub.1/N.sub.2) Sparsity (.sub.1/.sub.2) Regularization Softmax Only 46.9 36.6 / / N/A MLP only 55.6 / / Yes, Dropout A 99.91 64.9 500/ 0.05/ No B 90.8 73.0 500/ 0.05/ Yes C 90.86 74.7 500/500 0.05/ No D 99.56 91.9 500/500 0.05/0.00 Yes E 99.10 65.0 500/500/250/250/100 0.05/0.00/0.10/0.00/0.25 Yes
(60) The ability to classify modulations under low signal to-noise ratios (SNR) is one of the crucial abilities of a successful AMC algorithm. The system's performance was tested by measuring P.sub.cc as a function of SNR. Through testing, it was discovered that the AMC algorithm described herein degrades gracefully as the SNR decreases and approaches random chance (P.sub.cc=6) at 20 dB. The performance of each example configuration is shown in
(61)
(62) For each example in the test set, Gaussian noise was added to produce the desired signal-to-noise ratio before presenting the example in the neural network. The architectures represented in
(63) As can be seen from
(64) For applications to real signals, the magnitude of the integral under the curve in
(65) The precision and sensitivity of the classifier described herein for each modulation family as a function of the SNR was also examined. Precision is a measure, within the set of samples predicted to have a given modulation, of the fraction that actually have that modulation. Sensitivity is a measure, within the set of samples that actually have a given modulation, of the fraction predicted to have that modulation.
(66) Let m.sub.i and y.sub.i be the true and predicted class label, respectively, for sample i. Then the precision of the classifier for class k is:
(67)
and the sensitivity is:
(68)
where brackets are the indicator function ([p]=1 if p is true and 0 otherwise).
(69)
(70) Although P.sub.cc is a good indication of classifier performance overall, it is helpful to identify specific modulation types that may be more or less challenging for our method. To do this, a confusion matrix was constructed of dimension N.sub.modN.sub.mod consisting of the values of P.sub.c.sup.(i|i).
(71) The performance of the classifier system according to illustrative embodiments (P.sub.cc=92% at 0 dB in a 6-way AMC task) is competitive when compared with the performance of AMC using LB or FB methods, as well as ANN-based FB methods. Crucially, unlike existing methods, prior knowledge of modulation design or characteristics is completely unnecessary for the classifier system described herein. Additionally, the methodology described herein was evaluated on sequences of 10 symbols or 100 I/Q time points. This is substantially fewer time points than most existing AMC methods use, and makes the methodology for classifying modulation according to illustrative embodiments more likely to be valuable for classification in dynamically shifting environments.
(72) The use of unsupervised pre-training is crucial to the AMC task. This was observed by exploring the overall classification performance of the SSDA neural network with unsupervised pre-training and L2 regularization (architecture D, Table III) versus a multi-layer perceptron (MLP) trained with dropout and L2 regularization, but without unsupervised pre-training. An MLP architecture was configured with 50% dropout on each layer and L2 regularization as in architecture D. This architecture initially failed to converge over the first 200 epochs (time periods) examined. A sweep was then performed to characterize the parameter sensitivity of the MLP architecture. The convergence of the model was found to be highly sensitive to the learning rate. A change of 110.sup.5 could cause the model to have no improvement over random chance.
(73) Choosing a learning rate of 1.510.sup.5, training was then performed for the same number of epochs as the unsupervised pre-trained architectures. Although the initial convergence rate was similar, the MLP convergence became asymptotic at an error rate of 55%. This asymptotic behavior was observed with stochastic gradient descent with momentum and with other learning rules, such as the Adaptive SubGradient (AdaGrad) rule. These results indicate the challenge of using simple machine learning models to perform AMC. Although it may be possible to configure an MLP such that it would converge for an AMC task, the relative robustness of the system is significantly reduced and the difficulty of parameter selection increases. By using unsupervised pre-training, parameter sensitivity is substantially reduced, and total training time and accuracy are improved.
(74) Regularization is typically prescribed in neural networks to prevent overfitting and to improve generalization. Unsupervised pre-training can also be considered a form of regularization, used to find a starting point such that final generalization error is reduced. However, it has been observed that, in an AMC task, regularization assists in classifying exemplars that are corrupted by effects not found in the training set. This was demonstrated by examining the classification performance of the architectures described herein against a dataset corrupted with additive white Gaussian noise (AWGN), a typical challenge in radio-frequency propagation testing.
(75) When classifying test samples from the test set which have been corrupted by noise, the most heavily regularized and pre-trained network tested (architecture D) exhibited the best overall performance. In the absence of noise, the best performance was observed in the unconstrained single-layer architecture (P.sub.cc=99.91%). To quantify performance in the presence of noise, SNR required to achieve a performance of a specific P.sub.cc can be examined, e.g., P.sub.cc=90%, or classification error 1P.sub.cc=10%. By this measure, the unconstrained single-layer network (architecture A) had poorer performance, requiring an SNR of dB to reach P.sub.cc=90%. The addition of a second layer with constraints (architecture C) results in a modest improvement of 2 dB. When sparse pre-training and L2 regularization are included as constraints (architecture D), the same performance can be achieved at an SNR of 1 dB. This represents an improvement of 6 dB over the unconstrained single-layer network. This corresponds to a 4-fold increase in maximum noise level for a given detection rate.
(76) The addition of sparsity appears to be crucial to this performance increase, and may be a result of forcing the selection of the most valuable receptive fields (rather than simply the ones that best fit the training data). Referring again to
(77) The performance of the single-layer architecture also indicates that addition of such regularizations can have drawbacks that must be compensated for; without a second layer, a fully regularized single-layer network does not converge to adequately high performance levels, as shown by the results or architecture B, described above. However, it does generalize better than an MLP alone in the presence of noise. This may be because it must rely on a limited selection of receptive fields, and with a small network and strong constraints, there may not be enough neurons active to adequately represent the necessary features for classification. These same primitive features, however, may remain intact during signal corruption and thus allow higher low-SNR performance.
(78) It should be appreciated that the size and number of layers in the de-noising autoencoder configuration described herein could be altered, e.g., to allow for a longer time series of I/Q samples to be processed. A deeper architecture with more layers was tested to see if additional layers would improve overall classification, or outweigh the regularization effects and reduce generalization for untrained environmental noise. In prior work on deep neural network architectures, it is typical to find that adding a layer improves performance by less than 1%, and in noise-free conditions this agrees with the results shown in Table III. The deeper model that was tested consisted of architecture D with an additional two layers, subject to similar sparsity constraints (see architecture E, Table III). Interestingly, this model converged to high accuracy very quickly. Thus, the addition of additional pre-trained layers resulted in a rapidly converging, highly accurate classifier. Unfortunately, this configuration also performed substantially worse when exposed to signals in an AWGN channel. This may be a somewhat desirable form of overfitting. That is, by adding additional layers, the classifier becomes highly tuned to the properties of the input set but also may be somewhat inflexible. To improve generalization, one could explore the use of convolutional networks to provide strong regularization (in terms of a limited number of shared receptive fields) while using a deeper representation to achieve high accuracy. It is possible such a network may achieve the rapid convergence seen with the deep SSDA, but without the loss in performance in the presence of unmodeled noise.
(79) Some insights come from studying how the classifier begins to fail under noisy conditions, as can be understood from
(80) Recall that precision is a measure, within the set of samples predicted to have a given modulation, of the fraction that actually have that modulation. Sensitivity is a measure, within the set of samples that actually have a given modulation, of the fraction predicted to have that modulation. Precision for a class can be high if only a single example of that class is identified. Sensitivity for a class can be high if every sample in that class is assigned to that class. These results show that the degradation in performance under noise is not random. For example, the classification system systematically over-predicts GMSK (as seen both in the corresponding columns of
(81) Where a traditional AMC architecture would rely on features that are selected for a specific modulation family, the system described herein learns features that are used for classifying multiple families. A single feature vector (receptive field) might play a role in reconstructing or identifying both GMSK and OFDM, for example, and the manner in which these vectors fail to fit noisy versions of their different target families is reflected in the way in which performance does not degrade uniformly for each family. A possible mitigation for this potential crosstalk may be as simple as adding more neurons to the autoencoder layers, as this will increase the number of possible receptive fields that are learned.
(82) The use of unsupervised feature extraction raises an important question with regard to receptive fields. That question is what sort of signal features the classifier system is sensitive to. As explained above, the receptive fields in an autoencoder system are simply the weights between the input layer and the target layer, and they describe the input that maximally excites the target neuron. These features can be thought of as the primitive features of the input.
(83) Through experimentation, it was demonstrated that the biologically-inspired artificial neural network described herein was able to recreate Gabor-like receptive fields when trained on natural images. As those skilled in the art will appreciate, this is a good indicator that the artificial neural network described herein produces accurate classification of input data. This is also an indication that it can be used to generate useful information regarding a non-biological sensory input, e.g., in-phase and quadrature (I/Q) signals acquired in the radio frequency spectrum.
(84)
(85) Though there may be hundreds of neurons in a given classifier system,
(86) The receptive fields shown in
(87)
(88) Referring to
(89) Once trained, at step 830, the autoencoder 230 recognizes features of input signals having unknown modulation types that at least partially match at least some of the features of the sample signals having known modulation types. At step 840, the softmax 250 determines probabilities that each input signal has each of the known modulation types. This determination is made based on the outputs of the autoencoder 230 indicative of the strengths of the matches between the recognized features of the input signals and the features of the sample signals having known modulation types. At step 850, the softmax 250 classifies each input signal has having one of the known modulation types based on the highest determined probability.
(90) Although the process illustrated in
(91) It should further be appreciated that, although the process described above is directed to the analysis of one data stream of radio samples, the system and methodology described herein could be used to simultaneously analyze and classify multiple incoming data streams. The data streams could all be of the same type of data, e.g., I/Q modulated radio data from several phased antennas. The data could also be of different types, for example, I/Q modulated radio data and inertial navigation system data.
(92) It should further be appreciated that the unsupervised training phase described above, which could be based on measure or synthetic data sources, could be replace with other methods of setting the autoencoder weights. This could be important in situations where compartmentalization of information is of great importance. In such situations, the autoencoder weights could be prepared at a high security level based on compartmentalized information. Once trained, the system and methodology descried herein could be could be deployed in a lower-security area.
(93)
(94) The term application, or variants thereof, is used expansively herein to include routines, program modules, program, components, data structures, algorithms, and the like. Applications can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, handheld-computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like. The terminology computer-readable media and variants thereof, as used in the specification and claims, includes non-transitory storage media. Storage media can include volatile and/or non-volatile, removable and/or non-removable media, such as, for example, RAM, ROM, EEPROM, flash memory or other memory technology, CDROM, DVD, or other optical disk storage, magnetic tape, magnetic disk storage, or other magnetic storage devices or any other medium that can be used to store information that can be accessed.
(95) Referring to
(96) Although not shown, the computing device 900 may also include a physical hard drive. The processor 910 communicates with the memory 930 and the hard drive via, e.g., an address/data bus (not shown). The processor 910 can be any commercially available or custom microprocessor. The memory 930 is representative of the overall hierarchy of memory devices containing the software and data used to implement the functionality of the device 900. The memory 930 can include, but is not limited to the types of memory devices described above. As shown in
(97) The applications 940 can be stored in the memory 930 and/or in a firmware (not shown) as executable instructions, and can be executed by the processor 910. The applications 940 include various programs that implement the various features of the device 900. For example, the applications 940 may include applications to implement the functions of the autoencoder 230 and the softmax 240 (including pre-processing, training, and post-training classification0, as well as an application to convert input radio samples to a vector.
(98) The database 950 represents the static and dynamic data used by the applications 940, the OS 960, and other software programs that may reside in the memory. The database may 950 may be used to store various data including data needed to execute the applications 940, e.g., data indicative of different modulation types.
(99) While the memory 930 is illustrated as residing proximate the processor 910, it should be understood that at least a portion of the memory 930 can be a remotely accessed storage system, for example, a server on a communication network, a remote hard disk drive, a removable storage medium, combinations thereof, and the like.
(100) It should be understood that
(101) Unsupervised training methods such as those presented according to illustrative embodiments of the invention allow for much greater flexibility in terms of incorporating unusual characteristics of environmental noise, accommodating signals for which no detailed model may be available, and in adapting to changes in environmental or signal characteristics over time through the use of on-line learning techniques. ANN based methods are also actively being researched for use within the radio front-end processing stages. One example is to use a multilayer perceptron for channel equalization. These efforts are orthogonal and may be potentially complementary to the methodology described herein.
(102) In considering complementary domains and methods, it should be noted that the task of automatic modulation classification for radio signal data is conceptually similar to tasks from related fields, such as phoneme or word detection in speech processing, although the domain presents unique challenges in terms of sample rate and robustness to noisy environments. It is also noted that recent work in acoustic modeling with deep networks has found that significant improvements are possible by leveraging up to 7 layers of autoencoder units, and the architecture presented herein will likely permit many more optimizations. Additional improvements may come in the form of convolutional autoencoders. For example, as can be seen in
(103) Another possible route towards improved performance, especially in the application to streaming or online analysis, is the implementation of the architecture described herein as a spiking neural network. Spiking neural networks (SNNs) are another step towards biologically-relevant systems, as they seek to represent information as discrete temporal events much like biological nervous systems do with action potentials. SNNs can natively represent information contained in signal timing with higher resolution than clocked systems of equivalent sophistication, and open up a much larger parameter space for encoding information. They provide new opportunities for unsupervised learning (spike-timing dependent plasticity, optimization (spiking neuron models), and efficient bandwidth usage (spike coding). Architecture D (Table III) has been implemented as SNN with near-identical performance on the same task described here in full spiking simulation.
(104) As the persistence and level of the sparsity constraints increase, the general performance the classifier system described herein improves in environmental conditions under which the classifier was not specifically trained. Under no noise, all explored architectures that successfully converge perform similarly well, but it was found that biologically motivated principles result in a system which performs markedly better under environmental noise. This is particularly interesting in light of the prevailing explanations for the sparse coding principle, among them robustness to environmental noise. The results presented herein indicate that this principle is still valid and useful in problem domains that are rarely associated with sensing by natural organisms. Importantly, biologically-inspired sensing principles, implemented using hierarchical neural networks, do not require a biologically-inspired input. This suggests that other areas for which both machine and human perception are limited (e.g., network traffic, equipment temperature, and data from power grids) may benefit from application of the methods proposed herein.
(105) The results of the architecture and methodology described herein differ from much prior work in neural-network processing of time-varying signals (speech recognition, for example) by focusing narrowly on ingesting raw waveform data, rather than spectrogram or filter bank features, and extracting useful features for later tasks. Even relatively simple networks can do useful processing of radio signals with extremely limited samples and in the presence of environmental noise. The results also differ from the prior work in AMC, as they do not make use of expert knowledge and can construct effective features that adapt to both signals and the propagation environment with competitive performance. This opens up new opportunities for efficient use of an increasingly complex electromagnetic signaling environment. Biologically-inspired feature extraction, in the form of sparsity and unsupervised pre-training, can enhance neural-network AMC even under noise conditions not modeled in the training data.
(106) It will be understood that many additional changes in the details, materials, steps and arrangement of parts, which have been herein described and illustrated to explain the nature of the invention, may be made by those skilled in the art within the principle and scope of the invention as expressed in the appended claims.