Method, system, and computer program product to employ a multi-layered neural network for classification
11537840 · 2022-12-27
Assignee
Inventors
- Danilo Pietro Pau (Sesto San Giovanni, IT)
- Emanuele Plebani (Sotto il Monte Giovanni, IT)
- Fabio Giuseppe DE AMBROGGI (Biassono, IT)
- Floriana Guido (Milan, IT)
- Angelo Bosco (Giarre, IT)
Cpc classification
G06N5/01
PHYSICS
International classification
Abstract
A neural network classifies an input signal. For example, an accelerometer signal may be classified to detect human activity. In a first convolutional layer, two-valued weights are applied to the input signal. In a first two-valued function layer coupled at input to an output of the first convolutional layer, a two-valued function is applied. In a second convolutional layer coupled at input to an output of the first two-valued functional layer, weights of the second convolutional layer are applied. In a fully-connected layer coupled at input to an output of the second convolutional layer, two-valued weights of the fully connected layer are applied. In a second two-valued function layer coupled at input to an output of the fully connected layer, a two-valued function of the second two-valued function layer is applied. A classifier classifies the input signal based on an output signal of second two-valued function layer.
Claims
1. A method, comprising: applying, in a first convolutional layer of a neural network, two-valued weights of the first convolutional layer to an input signal received via a first input of the first convolutional layer to produce a first output signal; applying, in a first normalization layer of the neural network wherein a second input of the first normalization layer directly receives the first output signal from a first output of the first convolutional layer, normalization of the first output signal directly from the first convolutional layer to produce a second output signal; applying, in a first two-valued function layer of the neural network wherein a third input of the first two-valued function layer directly receives the second output signal from a second output of the first normalization layer, a two-valued function of the first two-valued function layer to the second output signal directly from the first normalization layer to produce a third output signal; applying, in a second convolutional layer of the neural network wherein a fourth input of the second convolutional layer directly receives the third output signal from a third output of the first two-valued function layer, weights of the second convolutional layer to the third output signal directly from the first two-valued function layer to produce a fourth output signal; applying, in a max pooling layer of the neural network wherein a fifth input of the max pooling layer directly receives the fourth output signal from a fourth output of the second convolutional layer, max pooling to the fourth output signal directly from the second convolutional layer to produce a fifth output signal; applying, in a fully-connected layer of the neural network wherein a sixth input of the fully-connected layer directly receives the fifth output signal from a fifth output of the max pooling layer, two-valued weights of the fully connected layer to the fifth output signal directly from the max pooling layer to produce a sixth output signal; applying, in a second normalization layer of the neural network wherein a seventh input of the second normalization layer directly receives the sixth output signal from a sixth output of the fully-connected layer, normalization to the sixth output signal directly from the fully-connected layer to produce a seventh output signal; applying, in a second two-valued function layer of the neural network wherein an eighth input of the second two-value function layer directly receives the seventh output signal from a seventh output of the second normalization layer, a two-valued function of the second two-valued function layer to the seventh output signal directly from the second normalization layer to produce an eighth output signal; and classifying, using a classifier of the neural network wherein a ninth input of the classifier receives the eighth output signal from an eighth output of the second two-valued function layer, the input signal based on the eighth output signal directly from the second two-valued function layer.
2. The method of claim 1, wherein the applying two-valued weights in the first convolutional layer comprises applying a set of filters to the input signal, thereby generating respective filtered output signals.
3. The method of claim 1, wherein the applying weights in the second convolutional layer comprises applying a set of filters to the third output signal from the first two-valued function layer and the method comprises, in the second convolutional layer, adding together outputs from the filters in the set of filters, generating respective single values.
4. The method of claim 1, wherein the classifying comprises applying softmax classification.
5. The method of claim 1, comprising: applying pre-neural network processing to an acceleration signal, thereby generating the input signal of the first convolutional layer, the pre-neural network processing including filtering to separate a dynamic acceleration component from a gravity component of the acceleration signal.
6. The method of claim 5, wherein the filtering to separate the dynamic acceleration component from the gravity component comprises one of infinite impulse response filtering or exponential moving averaging.
7. The method of claim 5 wherein the pre-neural network processing includes: applying a gravitational rotation to the filtered acceleration signal.
8. The method of claim 1, comprising: applying post-neural network processing to an output of the classifier, the post-neural network processing including at least one of: temporal filtering to remove mis-classification errors; and heuristic filtering.
9. The method of claim 1 wherein the weights of the second convolutional layer are two-valued weights.
10. A computing device, comprising: neural network circuitry; a first convolutional layer of the neural network circuitry having a first input and a first output, which, in operation, applies two-valued weights of the first convolutional layer to produce a first output signal; a first normalization layer of the neural network circuitry having a second input and a second output—the second input configured to directly receive the first output signal from the first output of the first convolutional layer, wherein the first normalization layer, in operation, normalizes the first output signal directly from the first convolutional layer to produce a second output signal; a first two-valued function layer of the neural network circuitry having a third input and a third output—the third input configured to directly receive the second output signal from the second output of the first normalization layer, wherein the first two-valued function layer, in operation, applies a two-valued function of the first two-valued function layer to the second output signal directly from the first normalization layer to produce a third output signal; a second convolutional layer of the neural network circuitry having a fourth input and a fourth output—the fourth input configured to directly receive the third output signal from the third output of the first two-valued functional layer, wherein the second convolutional layer, in operation, applies weights of the second convolutional layer to the third output signal directly from the first two-valued function layer to produce a fourth output signal; a max pooling layer of the neural network circuitry having a fifth input and a fifth output—the fifth input configured to directly receive the fourth output signal from the fourth output of the second convolutional layer, wherein the max pooling layer, in operation, applies max pooling to the fourth output signal directly from the second convolutional layer to produce a fifth output signal; a fully-connected layer of the neural network circuitry having a sixth input and a sixth output—the sixth input configured to directly receive the fifth output signal from the fifth output of the max pooling layer, wherein the fully connected layer, in operation, applies two-valued weights of the fully connected layer to the fifth output signal directly from the max pooling layer to produce a sixth output signal; a second normalization layer of the neural network circuitry having a seventh input and a seventh output—the seventh input configured to directly receive the sixth output signal from the sixth output of the fully connected layer, wherein the second normalization layer, in operation, normalizes the sixth output signal directly from the fully-connected layer to produce a seventh output signal; a second two-valued function layer of the neural network circuitry having an eighth input and an eighth output—the eighth input configured to directly receive the seventh output signal from the seventh output of the second normalization layer, wherein the second two-valued function layer, in operation, applies a two-valued function of the second two-valued function layer to the seventh output signal directly from the second normalization layer to produce an eighth output signal; and a classifier of the neural network circuitry having a ninth input configured to directly receive the eighth output signal from the eighth output of the second two-valued function layer, wherein the classifier, in operation, classifies an input signal to the first input of the first convolutional layer based on the eighth output signal directly from the second two-valued function layer.
11. The computing device of claim 10 wherein the first convolutional layer comprises a set of filters, which, in operation, generate respective filtered signals.
12. The computing device of claim 10 wherein the second convolutional layer comprises a set of filters coupled to an adder.
13. A system, comprising: an input interface; and digital signal processing circuitry, coupled to the input interface, wherein the digital signal processing circuitry, in operation, implements a neural network comprising: a first convolutional layer, which, in operation, applies two-valued weights to an input signal received via the input interface to produce a first output; a first normalization layer directly coupled to the first convolutional layer, which, in operation, normalizes the first output directly received from the first convolutional layer to produce a second output; a first two-valued function layer directly coupled to the first normalization layer, which, in operation, applies a first two-valued function to the second output directly received from the first normalization layer to produce a third output; a second convolutional layer directly coupled to the first two-valued function layer, which, in operation, applies weights to the third output directly received from the first two-valued function layer to produce a fourth output; a max pooling layer directly coupled to the second convolutional layer, which, in operation, applies max pooling to the fourth output directly received from the second convolutional layer to produce a fifth output; a max pooling layer directly coupled to the second convolutional layer, which, in operation, applies max pooling to the fourth output directly received from the second convolutional layer to produce a fifth output; a fully-connected layer directly coupled to the max pooling layer, which, in operation, applies two-valued weights to the fifth output directly received from the max pooling layer to produce a sixth output; a second normalization layer directly coupled to the fully-connected layer, which, in operation, normalizes the sixth output directly received from the fully-connected layer to produce a seventh output; a second two-valued function layer directly coupled to the second normalization layer, which, in operation, applies a two-valued function to the seventh output directly received from the second normalization layer to produce an eighth output; and a classifier directly coupled to the second two-valued function layer, which, in operation, classifies the input signal received via the input interface based on the eighth output directly received from the second two-valued function layer.
14. The system of claim 13, comprising: pre-neural network processing circuitry coupled to the input interface, the pre-neural network processing circuitry including a filter and a gravitational rotator.
15. The system of claim 13, comprising: post-neural network processing circuitry coupled to the input interface, the post-neural network processing circuitry including a temporal filter and a heuristic filter.
16. The system of claim 13, comprising: an accelerometer.
17. The system of claim 16, comprising: a gyroscope.
18. The system of claim 16, comprising a chip including the digital signal processing circuitry and the accelerometer.
19. A non-transitory computer-readable medium having contents which configure digital signal processing circuitry to implement a neural network, the neural network comprising: a first convolutional layer which, in operation, applies two-valued weights to an input signal; a first normalization layer directly coupled at a first input to a first output of the first convolutional layer, and which, in operation, normalizes the first output; a first two-valued function layer directly coupled at a second input to a second output of the first normalization layer, and which, in operation, applies a first two-valued function to the second output; a second convolutional layer directly coupled at a third input to a third output of the first two-valued function layer, and which, in operation, applies weights to the third output; a max pooling layer directly coupled at a fourth input to a fourth output of the second convolutional layer, and which, in operation, applies max pooling to the fourth output; a fully-connected layer directly coupled at a fifth input to a fifth output of the max pooling layer, and which, in operation, applies two-valued weights to the fifth output; a second normalization layer directly coupled at a sixth input to a sixth output of the fully-connected layer, and which, in operation, normalizes the sixth output; a second two-valued function layer directly coupled at a seventh input to a seventh output of the second normalization layer, and which, in operation, applies a two-valued function to the seventh output; and a classifier coupled at an eighth input to an eighth output of the second two-valued function layer, and which, in operation, classifies the input signal based on the eighth output of the second two-valued function layer.
20. The non-transitory computer-readable medium of claim 19 wherein the contents comprise instructions executed by the digital signal processing circuitry.
21. The non-transitory computer-readable medium of claim 20 wherein the instructions, when executed by the digital signal processing circuitry, cause the digital signal processing circuitry to filter the input signal provided to the first convolutional layer.
Description
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
(1) One or more embodiments will now be described, by way of example only with reference to the annexed figures, wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
DETAILED DESCRIPTION
(9) In the ensuing description, one or more specific details are illustrated, aimed at providing an in-depth understanding of examples of embodiments of this description. The embodiments may be obtained without one or more of the specific details, or with other methods, components, materials, etc. In other cases, known structures, materials, or operations are not illustrated or described in detail so that certain aspects of embodiments will not be obscured.
(10) Reference to “an embodiment” or “one embodiment” in the framework of the present description is intended to indicate that a particular configuration, structure, or characteristic described in relation to the embodiment is comprised in at least one embodiment. Hence, phrases such as “in an embodiment” or “in one embodiment” that may be present in one or more points of the present description do not necessarily refer to one and the same embodiment. Moreover, particular conformations, structures, or characteristics may be combined in any adequate way in one or more embodiments.
(11) The references used herein are provided merely for convenience and hence do not define the extent of protection or the scope of the embodiments.
(12) A neural network can be defined a two-valued one if both weights and activations are constrained to be enumerated with 2 numbers, e.g, either +1 or −1 at run and training time (during which parameter gradients are computed). This approach can drastically reduce memory size and associated accesses, with most arithmetic operations replaced with narrow bit-wise operations.
(13) In the literature, an early example of such a neural network can be found in Courbariaux, M. et al.: “Binaryconnect: Training deep neural networks with binary weights during propagation,” Advances in Neural Information Processing System, 2015.
(14) There, a neural network is discussed where binarized weights are used during both training and testing phases.
(15) An example of a fully binary network (weights and activations) is provided by the Binarized Neural Network (BNN) also proposed by Courbariaux, M. et al. in: “Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or −1.” arXiv preprint arXiv: 1602.02830 (2016).
(16) In their experiments, Courbariaux et al. refer to a MultiLayer Perceptron (MLP) network as exemplified in
(17) The crosses in
(18) As is well known in computational networks, the activation function (out=f(in)) of a node defines the output of that node given an input or set of inputs. In artificial neural networks this function is also called the transfer function (out=f(in)).
(19) The FC (Fully Connected) block 13 is repeated a number of times N equal to the number of hidden layers in the network, e.g., N=3.
(20) Courbariaux et al. also refer to a Convolutional Network (ConvNet) as exemplified in
(21) The convolutional block structure may differ for the number of filters applied in the convolutional layer.
(22) Courbariaux et al. trained the network of
(23) Test error rates documented with comparable network architectures are 0.94% on the MNIST image dataset (see, e.g., Goodfellow, Ian J. et al.: “Maxout Networks”, arXiv preprint arXiv: 1302.4389 (2013)), 1.69% on SVHN images (see, e.g., Lin, Min et al.: “Network in network”, arXiv preprint arXiv: 1312.4400 (2013)) and 7.62% on CIFAR-10 images (see, e.g., Lee, Chen-Yu et al. “Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree”, International conference on artificial intelligence and statistic. 2016).
(24) It is noted that Courbariaux et al. achieved results very close to those cited by way of comparison: 0.96% on MNIST images, 2.53% on SVHN images and 10.15% on CIFAR-10 images.
(25) The type of networks proposed by Courbariaux et al. may thus facilitate decreasing complexity and memory by paying a price in terms of accuracy, e.g., up to 10.15% on CIFAR-10 images.
(26) It is otherwise noted that satisfactory results obtained in experiments in image classification and with benchmark datasets may translate into inadequate performance of the same procedures if applied to human activity recognition (briefly, HAR) that processes data acquired by an accelerometer and not by an imager, because of the very different nature of the input data (accelerations vs pixels).
(27) As discussed in the following, if applied to recognizing a dataset composed by classes of different human activities (HAR) sampled with accelerometer data, pipelines as depicted in
(28) One or more embodiments may address the HAR accuracy problem by means of a pipeline comprising a neural pipeline which may integrate two-valued layers, normalization layers and max pooling layers in a sort of hybrid arrangement which may distinguish over prior arrangements, for example, as follows: one or more embodiments may go beyond conventional arrangements comprising, e.g., two convolutional layers and two fully connected layers, with less memory required to hold parameters and less computational complexity; weights may be constrained to be two-valued (enumerated by 2 numbers) e.g., either +1 or −1, while, in order to facilitate achieving a desired accuracy, activations are two-valued only where desirable, e.g., in some complex layers where they facilitate reducing execution time and lead to a simpler hardware implementation. Moreover, activations may also be enumerated with two values in some layers to help achieving a desired accuracy and lower complexity, while avoiding to apply two-value enumeration for all network activations in any arbitrary manner.
(29) One or more embodiments may thus provide a hybrid neural network (HNN) in a pipeline which may comprise also pre-processing and post-processing phases.
(30) One or more embodiments may provide a procedure aimed at human activity recognition or HAR where input data (signal x) are acquired from an accelerometer A (plus possibly a gyroscope G), as visible, e.g., in
(31)
(32) Certain possible embodiments of the circuit blocks 101 to 105 in
(33) In one or more embodiments, the filter 101 may comprise a (e.g., IIR) low-pass filter (e.g., of order 4) which separates the fast changing dynamic acceleration component (a) from a slowly changing gravity component (g).
(34) As an alternative to such filtering, in order to remove the g component (so that the average is zero), an exponential moving average—EMA can be used, e.g.:
ĝ.sub.t=αx.sub.t+(1−α)ĝ.sub.t-1
The associated coefficient can be defined experimentally so that for small values it may identify the average component (g acceleration). Therefore
{circumflex over (x)}.sub.t=x.sub.t−ĝ.sub.t;
so that such a filter shall behave as a high pass filter.
(35) The gravity rotation block 102 may facilitate having g always oriented toward the bottom vertical side (conventionally defined as direction −z) e.g., by means of the Rodrigues rotation formula—see, e.g., a website represented as <<https:>> <</>> <</>> <<en.wikipedia.org>> <</>> <<wiki>> <</>> <<Rodrigues %27_rotation_formula>>-aligning z-axis to gravity:
(36)
where θ represents the rotation angle and v the rotation axis.
(37) The (e.g., acceleration) signal obtained by pre-processing (at 101 and 102) the signal x and input to the neural network circuit 103 is indicated as AS.
(38) Turning now for brevity to the elements downstream of the neural network circuit 103, the post-processing performed on the output signal OS from the neural network circuit 103 may comprise different approaches of filtering, e.g., in a temporal filter 104.
(39) A simple approach for 104 is a voting filter, where a class which occurs more frequently in a temporal window is selected. If the temporal window is T steps long and n.sub.k is the number of predictions for class k, the selected class will be:
(40)
(41) Various known prediction models return probabilities for each class which represent how likely the prediction is to be true. A more accurate approach is to average all the probabilities over the window and find only at that point the most likely class at the time t:
(42)
(43) The average may be implemented more efficiently by using an exponential average:
{circumflex over (p)}.sub.k(t)=αp.sub.k(t)+(1−α){circumflex over (p)}.sub.k(t−1)
where {circumflex over (p)} is the currently estimated average and α is a coefficient representing the “inverse effective window length”, e.g., if α=0.1 the average will roughly depend on the last 10 prediction samples. The value of α can also be adapted on the likelihood of the last prediction, using larger values for more confident predictions and smaller values for less confident predictions. In that case, a is an increasing function of the most likely prediction, that is:
(44)
(45) Such temporal filters work satisfactorily if the class does not change over a large temporal period, allowing the errors to average out, but may increase latency and introduce prediction errors near class transitions.
(46) A different procedure may be used to independently estimate when a HAR regime change has occurred, e.g., by estimating an autoregressive moving average model (ARMA—see, e.g., a website represented as <<https:>> <</>> <</>> <<en.wikipedia.org>> <</>> <<wiki>> <</>> <<Autoregressive % E2%80%93moving-average model>>) on short “stretches” of data, which can be assumed to come from signals 5 belonging to a same class, and checking when the predictions exceed a given threshold:
ŷ(t)=a.sub.1y(t−1)++a.sub.py(t−p) |ŷ(t)−y(t)|>thr
(47) Alternatively, as the classifier will be less confident around HAR transitions, changes may be detected by searching short intervals where the filtered probabilities are all below a given threshold.
(48) Once the changes have been detected, the temporal filters are aligned to the changes, e.g., by setting the value of Tin order to fit the temporal window on homogeneous prediction signals.
(49) Alternatively, one or more embodiments may adopt post-processing as exemplified, e.g., in U.S. patent application Ser. No. 15/280,463 to which European Patent Application No. 17193073.8 corresponds (essentially a median filtering, based on finite state automata —FSA).
(50) While capable of removing transient errors as caused by noise or incorrect predictions, a temporal filter 104 as discussed may not have adequate knowledge about the problem and may not correct systematic estimation errors, such as a class predicted with much higher probability than others, because this introduces errors in the mean.
(51) In order to reduce errors, one or more embodiments may adopt a heuristic filter as indicated at 105 in
(52) While such a heuristic filter can be applied to the raw predictions from the classifier 103, if cascaded to (“downstream”) a temporal filter as 104 and after alignment on transition boundaries it may facilitate obtaining higher accuracy.
(53) For instance, the transitions between one class (e.g., source such as jogging) and another (e.g., destination such as walking) may be confirmed only after a given number of predictions of the destination class over a temporal window have been found with the predictions over the maximum time interval over all the pairs stored in a queue.
(54) The size of the temporal window and the number of confirmations may depend on the pair of classes (source, destination). For transitions which are deemed exceedingly unlikely or impossible (e.g., as revealed by post processing FSA, with, e.g., changing from source such as biking to destination such as driving in human activity recognition being a case in point) the number of confirmations required may be set to infinity.
(55) In a simplified version, the map of window sizes and confirmation may depend only on the destination class.
(56) In one or more embodiments, a heuristic filter 105 may exploit the fact that potential estimation errors may be known at training time from the confusion matrix, which shows for each pair of (predicted, ground truth) classes the percentage of predictions; ideally, a perfect classifier has a diagonal confusion matrix with all values equal to 1, while the other values equal to 0.
(57) Given an interval between two detected changes, the filter may estimate the distribution of predictions over the interval, e.g., by counting the occurrences or by estimating the parameters of a multinomial distribution over the predictions. The mis-classification pairs (predicted, ground truth) that are known to occur from the confusion matrix and that have a probability higher than a threshold can be corrected by replacing the predicted class with the estimated ground truth class.
(58) In one or more embodiments, the output from the filters 104, 105 may be a classification C.sub.1 C.sub.2 . . . C.sub.N (corrected over the classification produced at 103) identifying a wearer's activity (e.g., stationary, walking, running, biking, driving) as a function of the input signal x as provided, e.g., from an accelerometer plus possibly a gyroscope and subjected to neural network processing in the network 103.
(59) In one or more embodiments, the hybrid neural network (HNN) circuit 103 of
(60) In the following description of possible exemplary embodiments, various “multi-valued” entities will be discussed, namely entities that can assume plural values, e.g., two, three, and so on, virtually any (positive) integer value. Certain entities (e.g., signals/weights/activations) will be expressly referred to as “two-valued” entities insofar as, in one or more embodiments, these latter entities may be intended to assume only two values (e.g., +1 or −1, +1 or 0, and so on), that is, may have a range of possible values limited to two values.
(61) In
(62) The acceleration signal AS supplied to the convolutional layer 1031 comprises, e.g., a three-dimensional time-varying signal measured with a tri-axial accelerometer, divided into windows of fixed length. Each axis of the input signal is processed separately.
(63) The convolutional layer 1031 applies a set of C filters (each one represents a channel) with length k on the signal and returns C different outputs ASxM, equal in number to the filters, which are passed on to the normalization block 1032.
(64) There, a mean (average) value, e.g., as computed during the neural network training phase, is subtracted from each sample. The mean values calculated are equal in number to the channels C.
(65) The circuit block 1033 in the first stage 103A comprises a two-valued (e.g., Nbits >>>1 bit) function 1033 that returns as an output the sign of the input, e.g., +1 or −1.
(66) The circuit block 1033 exemplified in
(67) The presence of two-value enumerated weights in the layer 1031 may lead to appreciable savings in memory footprint (e.g., 32-bits or 64-bits floating point for the GPU implementation used to train the neural network, or 16-bit fixed point, for a possible hardwired implementation of the neural network, down to 1 bit per each weight) and memory reads/writes because costly floating and fixed point multiplications are replaced with simpler sign changes of the input.
(68) This also facilitates hardware implementations without multipliers, which are a major source of complexity, and considering the area for implementing a multiplier (e.g., one third of the total area in a low power reduced instruction set DSP processor), silicon area costs are significantly reduced as well as power consumption.
(69) In one or more embodiments, the second section 103B (
(70)
(71) In one or more embodiments, weights in the second section 103B are enumerated with two-values, so that, e.g., 16-bit fixed point multiply-accumulations can be replaced with 1-bit XNOR-bitcount operations, thereby substantially reducing the associated hardware complexity and offering the opportunity to exploit parallelization.
(72) Also, the two-valued enumeration of the activations to the convolutional layer 1034 was found to have an appreciable impact on the second stage 103B where most operations (approximately 60%) are performed.
(73) In one or more embodiments, the structure of the third section 103C (1035, 1036, 1037 and 1038 as exemplified in
(74) In comparison to the first circuit section 103A, the convolutional layer is replaced in the third circuit section 103C of
(75) The operations carried out by the units in the layer 1035 can be summarized by the following equation:
(76)
(77) where x.sub.ijk represents the input sample organized in a three-dimensional matrix, i, j and k represent the indices of the sample elements, W.sub.ijk represents the corresponding (two-valued) weights and y.sub.u represents the output of a single unit, working in parallel with the others, of the fully connected layer.
(78) Even though the weights applied are two-value enumerated, the parameters for use in this stage may take most of the memory size (e.g., about 80%) insofar as each neuron embodies a number of parameters equal to the input signal values.
(79) The output of the layer 1035 is a vector with a length equal to the number of units considered, which is supplied to a normalization layer 1036 (e.g., again subtracting a mean value as discussed previously for the layer 1032 in the first section 103A) followed by a two-valued function/circuit 1037 which produces, starting from an Nbit signal, a 1 bit, two-valued signal.
(80) It will be otherwise appreciated that, while not mandatory, the normalization blocks 1032, 1036 may be helpful, e.g., in terms of dynamic of the network nodes.
(81) A classifier 1038, such as, for instance, a SoftMax classifier (see, e.g., a website represented as <<https:>> <</>> <</>> <<en.wikipedia.org>> <</>> <<wiki>> <</>> <<Softmax function>>), as the last stage of the third section 103C may then produce, from the two-valued output of the circuit 1037, an output signal OS to be supplied to the error removal/correction filters 104 and 105.
(82) In one or more embodiments, in this section input activations may not be enumerated with two-values.
(83) The input to the classifier (e.g., SoftMax) layer 1038, in the case exemplified, is the output vector of the previous stage 1037, therefore each unit in this layer implements an equation of the type:
(84)
(85) That is the predicted probability for the j-th class given a sample input vector x (that is the output of 1037) and a weighting vector w, learnt during an (e.g., off-line) training phase, and where the index K represents the number of inputs.
(86) The (multi-valued) output OS from this last stage 1038 represents, e.g., the probability of the input signal x (on the left of
(87) A hybrid neural network as exemplified herein has demonstrated the ability of detecting five human activities with high precision, using a small number of operations and limited memory. Accuracy is illustrated through confusion matrices.
(88) Table 1 reports measured results on an in-house created dataset DB (Dataset version 1.6) which stores 3 axial accelerations at 16 Hz as a result of several human activities, manually annotated to generate the ground truth association between input signals x (
(89) TABLE-US-00001 TABLE 1 Confusion matrix obtained with hybrid neural network with No. of filters = 8; Fully connected units FC 1 = 64; max Pooling = (4, 1); Average Recall = 97.513 Predicted Predicted Predicted Predicted Predicted Stationary Walking Running Biking Driving Stationary 98.383 0.000 0.000 0.013 1.617 Walking 0.000 99.280 0.411 0.309 0.000 Running 0.000 2.531 97.175 0.220 0.073 Biking 2.538 0.887 0.000 94.924 1.651 Driving 0.000 0.000 0.000 2.199 97.801
(90) Tables 2 and 3 below reports the confusion matrices of Courbariaux's MLP and ConvNet, respectively.
(91) TABLE-US-00002 TABLE 2 Confusion matrix obtained for Courbariaux's MLP Model on Dataset 1.6. Average recall = 54.826% Predicted Predicted Predicted Predicted Predicted Stationary Walking Running Biking Driving Stationary 98.217 0.000 0.000 0.000 1.783 Walking 0.000 100.000 0.000 0.000 0.000 Running 0.000 99.725 0.000 0.202 0.073 Biking 1.322 55.693 0.000 33.739 9.247 Driving 57.218 0.000 0.000 0.610 42.173
(92) TABLE-US-00003 TABLE 3 Confusion matrix obtained for Courbariaux's ConvNet Model on Dataset 1.6. Average recall = 76.270% Predicted Predicted Predicted Predicted Predicted Stationary Walking Running Biking Driving Stationary 99.414 0.000 0.420 0.000 0.166 Walking 0.000 12.944 87.056 0.000 0.000 Running 0.018 1.119 98.844 0.000 0.018 Biking 2.423 0.559 9.025 87.151 0.843 Driving 11.713 0.000 3.832 1.459 82.996
(93) Table 4 below provides some data on the complexity of a hybrid neural network according to embodiments.
(94) TABLE-US-00004 TABLE 4 Complexity data Parameters Parallel Layer [bytes] Operations per window* Op. Conv 1 (8 × 5)/8 = 5 5 × (20 × 3 × 20 × 3 × 8 1031 8) = 2400 ADD Norm 1 Mean: (8 × 16)/ (20 × 3 × 8) = 20 × 3 × 8 1032 8 = 16 480 SUB Conv 2 (8 × 8 × 5)/ 5 × 8 × (16 × 3 × 16 × 3 × 8 1034 8 = 40 8) = 15360 ADD Max / 3 × (4 × 3 × 8) = 4 × 3 × 8 pool layer 576 SUB + COMP 1040 FC 1st (8 × 4 × 3 × 64 × (4 × 3 × 8) = 64 stage 64)/8 = 768 6144 ADD 1035 Norm 2 Mean: (64 × 16)/ 64 SUB 64 1036 8 = 128 FC 2 (64 × 5)/ 64 × 5 = 320 ADD 5 SoftMax 8 = 40 1038 TOT 997 25344 / (*)per second @16 Hz if window is shifted by 16 samples)
(95) As shown in Table 4, only 1 Kbyte of parameters may be stored in memory and about 25,000 operations (sums and subtractions) are carried out, assuming 3-axial acceleration acquired at 16 Hz.
(96) The rightmost column in Table 4 also reports the notional inner parallelism available for each layer of the hybrid neural network 103.
(97) The average accuracy obtained was 97.513%, while the (best) validation error was 5.98%.
(98) By way of comparison, accuracy measured using the Courbariaux models was 54.826 and 76.27%, while the validation error rate does not fall below 16%. Therefore, even if all multiply-accumulations are replaced with 1-bit XNOR-count operations, thus reducing complexity, the accuracy of the state of the art algorithms (Courbariaux's MLP and ConvNet) is largely lower than the accuracy which may be achieved with one or more embodiments.
(99) It is otherwise noted that a digital implementation of one or more embodiments is advantageous, as this will be adapted to run, e.g., at 25 kHz or lower by exploiting the inner parallelism of each layer.
(100) Table 5 summarizes further differences between a hybrid neural network (HNN) according to embodiments and Courbariaux's MLP and ConvNet pipeline stages.
(101) A first difference is that one or more embodiments do not require replication of a well-defined group of layers as is the case of conventional solutions such as MLP and ConvNet.
(102) Another difference is that one or more embodiments as exemplified herein do not involve batch normalization after max pooling and an enumeration with two-values before a fully connected layer.
(103) Still another difference lies in the ability of one or more embodiments of benefitting from both pre-processing (e.g., the input filter 101 and the gravity rotation 102) and post-processing (e.g., the temporal filter 104, suited for processing acceleration, while Courbariaux's MLP and ConvNet (discussed previously) are conceived for image processing and not for processing acceleration signals: therefore they are applied to pixels do not deal with gravity-related pre-processing as implemented, e.g., in stages 101 and 102.
(104) Furthermore, one or more embodiments can use a SoftMax classification layer with weights each one enumerated with two values.
(105) TABLE-US-00005 TABLE 5 differences between embodiments (HNN) and MLP and ConvNet pipeline stages. Courbariaux MLP Courbariaux ConvNet HNN 54.8% 76.3% 97.5% FC Conv IIr or EMA - 101 BN BN GR - 102 B (FC, BN and B repeated N times) B Conv - 1031 FC Conv N - 1032 BN MP TVE - 1033 EMA = Exponential moving average BN Conv - 1034 IIR = Infinite Impulse Response B (Conv, BN, B, Conv, MP, BN, MP - 1040 GR = Gravity rotation B repeated 3 times) TVE = Two-value enumeration FC FC - 1035 (N bits >>> 1 bit) BN N - 1036 BN = Batch Normalization B (FC, BN, B repeated 2 times) TVE - 1037 N = Normalization FC SM - 1038 MP = Max Pooling BN TF - 104 FC = Fully Connected HF - 105 Conv = Convolutional SM = SoftMax TF = Temporal Filter HF = Heuristic Filter
(106) One or more embodiments may significantly reduce the set of possible output values.
(107) For instance, in the case of the SoftMax layer 1038 at the end of the network, the number of distinct values is (n_inputs×n_outputs×2), where, e.g., n_input=128 is the number of hidden binary states and n_outputs=5 is the number of recognized classes.
(108) A discretization of output values is thus indicative of the possible activation and weight enumeration with two-values according to embodiments. Also, two-valued enumeration patterns (e.g., +1/−1) applied (by way of testing) as an input (AS) may correspondingly restrict the number of distinct values processed in the first convolutional layer (e.g., 1031) and affect the statistics of activations in (all) subsequent layers.
(109) One or more embodiments may feature a range of a few kHz, a low memory footprint (e.g., 1 KB) which, associated with multiplier-less circuits, enables a (very) low-frequency implementation as depicted in
(110)
(111) For example, the accelerometer A may produce samples of three-axis acceleration at a certain frequency (e.g., 16 Hz) that feed the pipeline 100, whose output is an index C.sub.1 C.sub.2 . . . C.sub.N to a class of recognized human activity (e.g., walking, running, biking etc.).
(112) The pipeline 100 can be implemented on a digital signal processor according to a general layout as exemplified in
(113) The following designations may apply to the blocks illustrated in
(114) 2000: program counter
(115) 2002: instruction cache
(116) 2004: instruction fetch unit
(117) 2006: instruction decode
(118) 2008: address generation
(119) 2010: arithmetic logic unit
(120) 2012: register file
(121) 2014: single instruction multiple data two-valued operations
(122) 2016: load/store unit
(123) 2018: co-processor interface
(124) 2020: arithmetic floating point unit
(125) 2022: bus interface
(126) 2024: data memory
(127) A processor as exemplified in
(128) A coprocessor floating point unit (see, e.g., the interface 2018) can optionally accelerate pre- and post-processing operations (for example, 101, 102 in
(129) A typical power dissipation figure of such a digital signal processor (e.g., as in the STREW™ family of processors available with the assignee company) can be as low as 20 μW per MHz with 90 nm technology (eembc benchmark see, e.g., a website represented as <<http:>> <</>> <</>> <<www>> <<.>> <<eembc.org>> <</>>).
(130) A non-parallelized implementation at 25 kHz may involve a power consumption of about 0.5 μW. A pipeline spreading intermediate calculations, in a parallel implementation, for each input acceleration sample may turn out to be (at least) ⅓ less complex (e.g., 16 kHz) while a ×2 parallel implementation can be operated at about 8 kHz, if not even lower. This corresponds to power consumption figures at least three times lower (conservatively) than the one achieved by current sensor solutions such as LIS2DW12 (e.g., 1.1 μA (ODR=12.5 Hz) in active low-power mode with a minimum power supply of 1.62 V) available with the assignee company.
(131) This facilitates producing ultra-low-power high-performance three-axis (linear) A.I. accelerometers.
(132) While
(133) The following designations may apply to the blocks illustrated in
(134) 300: digital signal processor—DSP
(135) 302: host central processing unit—CPU
(136) 304: on-chip memory (e.g., RAM, ROM, FLASH)
(137) 306: memory controller
(138) 308: off-chip RAM memory
(139) 310: memory controller
(140) 312: off-chip ROM/FLASH memory.
(141) An approach as exemplified in
(142) In one or more embodiments, a method may comprise: receiving an input signal (e.g., AS, possibly obtained by pre-processing a “raw” signal x) and applying (artificial) neural network processing (e.g., 103) to the input signal to produce an output signal (e.g., OS) therefrom, wherein the neural network processing comprises: first neural network processing (e.g., 103A) comprising first convolutional layer processing (e.g., 1031), wherein two-valued weights are applied to the input signal, and (possibly after normalization at 1032) a two-valued function (e.g., 1033) to produce a two-valued signal from the result of the first convolutional layer processing, second neural network processing (e.g., 103B) comprising further convolutional layer processing (e.g., 1034) applied to the two-valued signal from the first neural network processing with two-valued weights and (two-valued) activations, and third neural network processing (e.g., 103C) comprising fully connected layer processing (e.g., 1035), wherein two-valued weights are applied to the signal from the second neural network processing, and (possibly after normalization at 1036) a respective two-valued function (e.g., 1037) to produce a two-valued signal from the result of the fully connected layer processing, and classifier processing (e.g., 1038) to produce said output signal from the two-valued signal from the respective two-valued function.
(143) One or more embodiments may comprise applying normalization (e.g., at 1032 and/or 1036) to: the output from the first convolutional layer processing fed to the two-valued function in the first neural network processing, and/or the output from the fully connected layer processing fed to the respective two-valued function in the third neural network processing.
(144) In one or more embodiments, the first convolutional layer processing in the first neural network processing may comprise applying a set of filters to the input signal and returning respective filtered outputs (e.g., ASxM).
(145) In one or more embodiments, the second convolutional layer processing in the second neural network processing may comprises applying a set of filters to the signal from the first neural network processing and adding together (e.g., 103a) the outputs from the filters in the set of filters to provide respective single values (e.g., y.sub.1, y.sub.2, . . . , y.sub.C) for processing in the third neural network processing.
(146) In one or more embodiments, the second neural network processing may comprise max pooling processing (e.g., 1040) of the result of the second convolutional layer processing.
(147) In one or more embodiments, classifier processing in the third neural network processing may comprise softmax classifier processing.
(148) One or more embodiments may comprise applying neural network processing to an input signal pre-processed by at least one of: filtering (e.g., 101) to separate a dynamic acceleration component from gravity, and/or gravity rotation (e.g., 102).
(149) In one or more embodiments, filtering to separate a dynamic acceleration component from gravity may comprise one of infinite impulse response filtering or exponential moving averaging.
(150) One or more embodiments may comprise post-processing (e.g., 104, 105) the output signal from the neural network processing by at least one of: temporal filtering (e.g., 104) to remove mis-classification errors, and/or heuristic filtering (e.g., 105) to correct systematic prediction errors and/or to correct transition between classification classes.
(151) In one or more embodiments, a system (e.g., 100) may comprise an (artificial) neural network circuit (e.g., 103) having first (e.g., 103A), second (e.g., 103B) and third (e.g., 103C) neural network circuit blocks, wherein: the first neural network circuit block comprises a first convolutional layer, wherein two-valued weights are applied to the input signal, and a two-valued circuit to produce a two-valued signal from the result of the first convolutional layer, the second neural network circuit block comprises a further convolutional layer active on the signal from the first neural network processing with two-valued weights and (two-valued) activations, and the third neural network circuit block comprises a fully connected layer, wherein two-valued weights are applied to the signal from the second neural network circuit block, a respective two-valued circuit to produce a two-valued signal from the result of the fully connected layer (1035), and a classifier to produce an output signal (OS) from the two-valued signal from the respective two-valued function,
(152) wherein the first, second and third neural network circuit blocks are configured to operate with the method of one or more embodiments.
(153) One or more embodiments may comprise pre-processing circuitry of the input signal (e.g., x>>>>AS) applied to said neural network circuit, the pre-processing circuitry comprising at least one of: a filter (e.g., 101) to separate a dynamic acceleration component from gravity, and/or a gravity rotator (e.g., 102).
(154) One or more embodiments may comprise post-processing circuits (e.g., 104, 105) of the output signal (OS) from the neural network circuit, the post-processing circuitry comprising at least one of: a temporal filter (e.g., 104) to remove mis-classification errors, and/or a heuristic filter (e.g., 105) to correct systematic prediction errors and/or to correct transition between classification classes.
(155) One or more embodiments may comprise both the temporal filter and the heuristic filter with the heuristic filter downstream of the temporal filter.
(156) One or more embodiments may comprise: at least one sensor (e.g., A, G) providing said input signal, and a processing pipeline (e.g., 100) implementing said neural network circuit having first, second and third neural network circuit blocks.
(157) In one or more embodiments, the at least one sensor and the processing pipeline may be integrated in a single chip (e.g., CP).
(158) In one or more embodiments, the at least one sensor and the processing pipeline may be integrated in distinct chips (e.g., CP1, CP2).
(159) In one or more embodiments, the at least one sensor may comprise one of: an accelerometer, or the combination of an accelerometer and a gyroscope.
(160) One or more embodiments may comprise a computer program product loadable in the memory (e.g., 304, 308, 312) of at least one processing circuit (e.g., 300, 302) and comprising software code portions for executing the steps of the method of one or more embodiments when the product is run on at least one processing circuit.
(161) Without prejudice to the underlying principles, the details and embodiments may vary, even significantly, with respect to what has been described by way of example only, without departing from the extent of protection.
(162) Some embodiments may take the form of or include computer program products. For example, according to one embodiment there is provided a computer readable medium including a computer program adapted to perform one or more of the methods or functions described above. The medium may be a physical storage medium such as for example a Read Only Memory (ROM) chip, or a disk such as a Digital Versatile Disk (DVD-ROM), Compact Disk (CD-ROM), a hard disk, a memory, a network, or a portable media article to be read by an appropriate drive or via an appropriate connection, including as encoded in one or more barcodes or other related codes stored on one or more such computer-readable mediums and being readable by an appropriate reader device.
(163) Furthermore, in some embodiments, some of the systems and/or modules and/or circuits and/or blocks may be implemented or provided in other manners, such as at least partially in firmware and/or hardware, including, but not limited to, one or more application-specific integrated circuits (ASICs), digital signal processors, discrete circuitry, logic gates, standard integrated circuits, state machines, look-up tables, controllers (e.g., by executing appropriate instructions, and including microcontrollers and/or embedded controllers), field-programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), etc., as well as devices that employ RFID technology, and various combinations thereof.
(164) The various embodiments described above can be combined to provide further embodiments. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, applications and publications to provide yet further embodiments.
(165) These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.