Neuromorphic Analog Signal Processor for Predictive Maintenance of Machines
20230081715 · 2023-03-16
Inventors
Cpc classification
G06F18/24147
PHYSICS
A61B5/165
HUMAN NECESSITIES
A61B5/02416
HUMAN NECESSITIES
A61B5/1123
HUMAN NECESSITIES
A61B5/02055
HUMAN NECESSITIES
G06N3/0442
PHYSICS
A61B2562/0219
HUMAN NECESSITIES
International classification
Abstract
Systems, methods, and devices are provided for predictive maintenance of machines. An example apparatus includes a vibration sensor configured to sense vibrations of a vibration source and an analog circuit. The analog circuit comprises a plurality of operational amplifiers and a plurality of resistors. The analog circuit is coupled to the vibration sensor and configured to: receive an analog signal from the vibration sensor; and compute an output based on the analog signal, by performing a portion of a trained neural network.
Claims
1. A system comprising: a hardware apparatus comprising: a vibration sensor configured to sense vibrations of a vibration source of a machine; an analog circuit comprising a plurality of operational amplifiers and a plurality of resistors, wherein the analog circuit is coupled to the vibration sensor and configured to: receive an analog signal from the vibration sensor; and compute an output based on the analog signal, by performing a portion of a trained neural network; a transceiver coupled to the analog circuit and configured to receive the output from the analog circuit and transmit the output over a low power wide area network (LPWAN); and a digital circuit communicatively coupled to the transceiver of the hardware apparatus via the LPWAN, wherein the digital circuit is configured to: receive the output from the analog circuit; and predict a state of the machine for maintenance, based on the output.
2. The system of claim 1, wherein the digital circuit comprises one or more digital computing units selected from the group consisting of: CPUs, GPUs, RISCs, FPGAs, and ASICs.
3. The system of claim 1, wherein the digital circuit comprises a processor configured to perform data classification.
4. The system of claim 3, wherein the data classification is performed by a neural network that is distinct from the trained neural network.
5. The system of claim 3, wherein the data classification is performed using k-nearest neighbors (k-NN).
6. The system of claim 1, wherein the output of the analog circuit represents embeddings and the digital circuit is configured to use the embeddings to classify the analog signal.
7. A hardware apparatus comprising: a vibration sensor configured to sense vibrations of a vibration source; an analog circuit comprising a plurality of operational amplifiers and a plurality of resistors, wherein the analog circuit is coupled to the vibration sensor and configured to: receive an analog signal from the vibration sensor; and compute an output based on the analog signal, by performing a portion of a trained neural network.
8. The hardware apparatus of claim 7, further comprising: a transceiver coupled to the analog circuit, wherein the transceiver is configured to receive the output from the analog circuit and transmit the output over a low power wide area network (LPWAN).
9. The hardware apparatus of claim 7, wherein the vibration sensor is disposed adjacent to a movable part of a machine, and the vibration sensor is configured to collect vibration data for the movable part.
10. The hardware apparatus of claim 9, wherein the movable part includes a ball bearing of the machine.
11. The hardware apparatus of claim 7, wherein the output of the analog circuit represents embeddings used for at least one of: defining a source of vibration, predicting failures of a machine coupled to the vibration source, and generating suggestions for maintenance of the machine.
12. The hardware apparatus of claim 7, wherein the vibration sensor is disposed in or on a tire and is configured to collect vibration data for the tire.
13. The hardware apparatus of claim 7, wherein the output represents embeddings used to predict at least one of: a road surface, a physical condition, a tire condition, a suspension condition, or a time-to-failure of the vibration source.
14. The hardware apparatus of claim 7, wherein the vibration sensor is configured to sample signals in a range of 0 to 20 kilohertz (kHz).
15. The hardware apparatus of claim 7, wherein the vibration sensor is configured to sample signals up to 41 kHz.
16. The hardware apparatus of claim 7, wherein the vibration sensor is configured to sample signals below a Nyquist sampling rate for fault detection and classification in a compressed domain.
17. The hardware apparatus of claim 7, wherein the vibration sensor is configured to sample signals for compressed sensing (CS) for condition classification of rolling element bearings in rotating machines.
18. The hardware apparatus of claim 7, wherein the trained neural network comprises a plurality of layers of neurons including a first set of layers and a second set of layers and the analog circuit is configured to implement the first set of layers.
19. The hardware apparatus of claim 18, wherein the second set of layers consists of a last layer of the trained neural network.
20. The hardware apparatus of claim 7, wherein the trained neural network comprises a deep neural network for unsupervised learning based on a sparse autoencoder.
21. The hardware apparatus of claim 7, wherein the trained neural network comprises a ResNet Convolutional Neural Network (CNN) with global average pooling (GAP) for feature learning and fault diagnosis of rolling bearings.
22. The hardware apparatus of claim 7, wherein the vibration sensor is configured to sample and output a one-dimensional time domain signal of a rolling bearing fault signal.
23. The hardware apparatus of claim 7, wherein the trained neural network comprises a stacked noise reduction autoencoder.
24. The hardware apparatus of claim 7, wherein the analog circuit is configured to be powered by vibrations of the vibration source.
25. The hardware apparatus of claim 24, further comprising a power harvesting circuit configured to harvest power from vibrations of the vibration source and supply power to the analog circuit.
26. The hardware apparatus of claim 7, wherein the plurality of operational amplifiers is configured to implement neurons of the portion of the trained neural network, and wherein the plurality of resistors is configured to implement connections between neurons of the portion of the trained neural network.
27. The hardware apparatus of claim 7, wherein the analog circuit is configured to implement an optimized neural network corresponding to the trained neural network.
28. The hardware apparatus of claim 7, wherein values of the plurality of resistors are based on weights of connections of the trained neural network.
29. The hardware apparatus of claim 7, wherein the plurality of resistors is configured to connect the plurality of operational amplifiers.
30. The hardware apparatus of claim 7, wherein the analog circuit comprises resistors in a backend-of-the-line (BEOL).
31. The hardware apparatus of claim 7, wherein: (i) the trained neural network is an autoencoder comprising an encoder portion and a decoder portion; (ii) the encoder portion reconstructs an input vector at an output layer after nonlinear transformations performed by hidden layers; (iii) the analog circuit corresponds to the encoder portion of the autoencoder, the encoder portion comprising the hidden layers; and (iv) the analog circuit is configured to compute a representation of the input vector in fewer dimensions than an input space of the input vector.
32. The hardware apparatus of claim 7, wherein the analog circuit is configured to generate compressed data that encodes vibration sensor data based on vibration features from the vibration sensor.
33. A method comprising: sensing vibrations of a vibration source using a vibration sensor to obtain an analog signal; computing an output based on the analog signal, by performing a portion of a trained neural network, using an analog circuit comprising a plurality of operational amplifiers and a plurality of resistors; and transmitting the output over a low power wide area network (LPWAN) using a transceiver.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0164] For a better understanding of the aforementioned systems, methods, and graphical user interfaces, as well as additional systems, methods, and graphical user interfaces that provide data visualization analytics and data preparation, reference should be made to the Description of Implementations below, in conjunction with the following drawings in which like reference numerals refer to corresponding parts throughout the figures.
[0165]
[0166]
[0167]
[0168]
[0169]
[0170]
[0171]
[0172]
[0173]
[0174]
[0175]
[0176]
[0177]
[0178]
[0179]
[0180]
[0181]
[0182]
[0183]
[0184]
[0185]
[0186]
[0187]
[0188]
[0189]
[0190]
[0191]
[0192]
[0193]
[0194]
[0195]
[0196]
[0197]
[0198]
[0199]
[0200]
[0201]
[0202]
[0203]
[0204]
[0205]
[0206]
[0207]
[0208]
[0209]
[0210]
[0211]
[0212]
[0213]
[0214]
[0215]
[0216]
[0217]
[0218]
[0219]
[0220]
[0221]
[0222]
[0223]
[0224] Reference will now be made to implementations, examples of which are illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without requiring these specific details.
DESCRIPTION OF IMPLEMENTATIONS
[0225]
[0226]
[0227]
[0228] In some implementations, components of the system 100 described above are implemented in one or more computing devices or server systems as computing modules.
[0234] Some implementations include one or more optional modules 244 as shown in
[0235] Some implementations include a lithographic mask generation module 248 that further includes lithographic masks 250 for resistances (corresponding to connections), and/or lithographic masks for analog components (e.g., operational amplifiers, multipliers, delay blocks, etc.) other than the resistances (or connections). In some implementations, lithographic masks are generated based on chip design layout following chip design using Cadence, Synopsys, or Mentor Graphics software packages. Some implementations use a design kit from a silicon wafer manufacturing plant (sometimes called a fab). Lithographic masks are intended to be used in that particular fab that provides the design kit (e.g., TSMC 65 nm design kit). The lithographic mask files that are generated are used to fabricate the chip at the fab. In some implementations, the Cadence, Mentor Graphics, or Synopsys software packages-based chip design is generated semi-automatically from the SPICE or Fast SPICE (Mentor Graphics) software packages. In some implementations, a user with chip design skill drives the conversion from the SPICE or Fast SPICE circuit into Cadence, Mentor Graphics or Synopsis chip design. Some implementations combine Cadence design blocks for single neuron unit, establishing proper interconnects between the blocks.
[0236] Some implementations include a library generation module 254 that further includes libraries of lithographic masks 256. Examples of library generation are described below in reference to
[0237] Some implementations include Integrated Circuit (IC) fabrication module 258 that further includes Analog-to-Digital Conversion (ADC), Digital-to-Analog Conversion (DAC), or similar other interfaces 260, and/or fabricated ICs or models 262. Example integrated circuits and/or related modules are described below in reference to
[0238] Some implementations include an energy efficiency optimization module 264 that further includes an inferencing module 266, a signal monitoring module 268, and/or a power optimization module 270. Examples of energy efficiency optimizations are described below in reference to
[0239] Each of the above identified executable modules, applications, or sets of procedures may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e., sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or otherwise rearranged in various implementations. In some implementations, the memory 214 stores a subset of the modules and data structures identified above. Furthermore, in some implementations, the memory 214 stores additional modules or data structures not described above.
[0240] Although
Example Process for Generating Schematic Models of Analog Networks
[0241]
[0242] In the description above and below, a math neuron is a mathematical function which receives one or more weighted inputs and produces a scalar output. In some implementations, a math neuron can have memory (e.g., long short-term memory (LSTM), recurrent neuron). A trivial neuron is a math neuron that performs a function, representing an ‘ideal’ mathematical neuron, V.sup.out=f(Σ(V.sub.i.sup.in.Math.ω.sub.t+bias), where f(x) is an activation function. A SNM is a schematic model with analog components (e.g., operational amplifiers, resistors R.sub.1, . . . , R.sub.n, and other components) representing a specific type of math neuron (for example, trivial neuron) in schematic form. SNM output voltage is represented by a corresponding formula that depends on K input voltages and SNM component values V″.sup.t=g (V.sub.i.sup.in, . . . , R.sub.1 . . . R.sub.n). According to some implementations, with properly selected component values, SNM formula is equivalent to math neuron formula, with a desired weights set. In some implementations, the weights set is fully determined by resistors used in a SNM. A target (analog) neural network 304 (sometimes called a T-network) is a set of math neurons which have defined SNM representation, and weighted connections between them, forming a neural network. A T-network follows several restrictions, such as an inbound limit (a maximum limit of inbound connections for any neuron within the T-network), an outbound limit (a maximum limit of outbound connections for any neuron within the T-network), and a signal range (e.g., all signals should be inside pre-defined signal range). T-transformation (322) is a process of converting some desired neural network, such as MobileNet, to a corresponding T-network. A SPICE model 306 is a SPICE Neural Network model of a T-network 304, where each math neuron is substituted with corresponding one or more SNMs. A Cadence NN model 310 is a Cadence model of the T-network 304, where each math neuron is substituted with a corresponding one or more SNMs. Also, as described herein, two networks L and M have mathematical equivalence, if for all neuron outputs of these networks |V.sub.i.sup.L-V.sub.i.sup.M|<eps, where eps is relatively small (e.g., between 0.1-1% of operating voltage range). Also, two networks L and M have functional equivalence, if for a given validation input data set {I.sub.1, . . . , I.sub.n}, the classification results are mostly the same, i.e., P(L(I.sub.k)=M(I.sub.k))=1−eps, where eps is relatively small.
[0243]
Example Input Neural Networks
[0244]
[0245]
[0246] Some implementations store the layout or the organization of the input neural networks including number of neurons in each layer, total number of neurons, operations or activation functions of each neuron, and/or connections between the neurons, in the memory 214, as the neural network topology 224.
[0247]
[0248]
[0249]
[0250] Some implementations use Keras learning that converges in approximately 1000 iterations, and results in weights for the connections. In some implementations, the weights are stored in memory 214, as part of the weights 222. In the following example, data format is ‘Neuron [1.sup.st link weight, 2.sup.nd link weight, bias]’. [0251] N1 [−0.9824321, 0.976517, −0.00204677]; [0252] N2 [1.0066702, −1.0101418, −0.00045485]; [0253] N3 [1.0357606, 1.0072469, −0.00483723]; [0254] N4 [−0.07376373, −0.7682612, 0.0]; and [0255] N5 [1.0029935, −1.1994369, −0.00147767].
[0256] Next, to compute resistor values for connections between the neurons, some implementations compute resistor range. Some implementations set resistor nominal values (R+, R−) of 1 MΩ, possible resistor range of 100 KΩ to 1 MΩ and nominal series E24. Some implementations compute w1, w2, wbias resistor values for each connection as follows. For each weight value wi (e.g., the weights 222), some implementations evaluate all possible (Ri−, Ri+) resistor pairs options within the chosen nominal series and choose a resistor pair which produces minimal error value
[0257] The following table provides example values for the weights w1, w2, and bias, for each connection, according to some implementations.
TABLE-US-00001 Implemented Model value R− (MΩ) R+ (MΩ) value N1_w1 −0.9824321 0.36 0.56 −0.992063 N1_w2 0.976517 0.56 0.36 0.992063 N1_bias −0.00204677 0.1 0.1 0.0 N2_w1 1.0066702 0.43 0.3 1.007752 N2_w2 −1.0101418 0.18 0.22 −1.010101 N2_bias −0.00045485 0.1 0.1 0.0 N3_w1 1.0357606 0.91 0.47 1.028758 N3_w2 1.0072469 0.43 0.3 1.007752 N3_bias −0.00483723 0.1 0.1 0.0 N4_w1 −0.07376373 0.91 1.0 −0.098901 N4_w2 −0.7682612 0.3 0.39 −0.769231 N4_bias 0.0 0.1 0.1 0.0 N5_w1 1.0029935 0.43 0.3 1.007752 N5_w2 −1.1994369 0.3 0.47 −1.205674 N5_bias −0.00147767 0.1 0.1 0.0
Example Advantages of Transformed Neural Networks
[0258] Before describing examples of transformation, it is worth noting some of the advantages of the transformed neural networks over conventional architectures. As described herein, the input trained neural networks are transformed to pyramid- or trapezium-shaped analog networks. Some of the advantages of pyramid or trapezium over cross bars include lower latency, simultaneous analog signal propagation, possibility for manufacture using standard integrated circuit (IC) design elements, including resistors and operational amplifiers, high parallelism of computation, high accuracy (e.g., accuracy increases with the number of layers, relative to conventional methods), tolerance towards error(s) in each weight and/or at each connection (e.g., pyramids balance the errors), low RC (low Resistance Capacitance delay related to propagation of signal through network), and/or ability to manipulate biases and functions of each neuron in each layer of the transformed network. Also, pyramids are excellent computation block by itself, since it is a multi-level perceptron, which can model any neural network with one output. Networks with several outputs are implemented using different pyramids or trapezia geometry, according to some implementations. A pyramid can be thought of as a multi-layer perceptron with one output and several layers (e.g., N layers), where each neuron has n inputs and 1 output. Similarly, a trapezium is a multilayer perceptron, where each neuron has n inputs and m outputs. Each trapezium is a pyramid-like network, where each neuron has n inputs and m outputs, where n and m are limited by IC analog chip design limitations, according to some implementations.
[0259] Some implementations perform lossless transformation of any trained neural network into subsystems of pyramids or trapezia. Thus, pyramids and trapezia can be used as universal building blocks for transforming any neural networks. An advantage of pyramid- or trapezia-based neural networks is the possibility to realize any neural network using standard IC analog elements (e.g., operational amplifiers, resistors, signal delay lines in case of recurrent neurons) using standard lithography techniques. It is also possible to restrict the weights of transformed networks to some interval. In other words, lossless transformation is performed with weights limited to some predefined range, according to some implementations. Another advantage of using pyramids or trapezia is the high degree of parallelism in signal processing or the simultaneous propagation of analog signals that increases the speed of calculations, providing lower latency. Moreover, many modern neural networks are sparsely connected networks and are much better (e.g., more compact, have low RC values, absence of leakage currents) when transformed into pyramids than into cross-bars, Pyramids and trapezia networks are relatively more compact than cross-bar based memristor networks.
[0260] Furthermore, analog neuromorphic trapezia-like chips possess a number of properties, not typical for analog devices. For example, signal to noise ratio is not increasing with the number of cascades in analog chip, the external noise is suppressed, and influence of temperature is greatly reduced. Such properties make trapezia-like analog neuromorphic chips analogous to digital circuits. For example, individual neurons, based on operational amplifier, level the signal and are operated with the frequencies of 20,000-100,000 Hz, and are not influenced by noise or signals with frequency higher than the operational range, according to some implementations. Trapezia-like analog neuromorphic chip also perform filtration of output signal due to peculiarities in how operational amplifiers function. Such trapezia-like analog neuromorphic chip suppresses the synphase noise. Due to low-ohmic outputs of operational amplifiers, the noise is also significantly reduced. Due to the leveling of signal at each operational amplifier output and synchronous work of amplifiers, the drift of parameters, caused by temperature does not influence the signals at final outputs. Trapezia-like analogous neuromorphic circuit is tolerant towards the errors and noise in input signals and is tolerant towards deviation of resistor values, corresponding to weight values in neural network. Trapezia-like analog neuromorphic networks are also tolerant towards any kind of systemic error, like error in resistor value settings, if such error is same for all resistors, due to the very nature of analog neuromorphic trapezia-like circuits, based on operational amplifiers.
Example Lossless Transformation (T-Transformation) of Trained Neural Networks
[0261] In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or the analog design constraints 236, to obtain the transformed neural networks 228.
[0262]
[0263]
[0264]
Example Transformations with Target Neurons with N Inputs and 1 Output
[0265] In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or analog design constraints 236, to obtain the transformed neural networks 228.
Single Layer Perceptron with One Output
[0266] Suppose a single layer perceptron SLP(K,1) includes K inputs and one output neuron with activation function F. Suppose further U E R.sup.K is a vector of weights for SLP(K,1). The following algorithm Neuron2TNN1 constructs a T-neural network from T-neurons with N inputs and 1 output (referred to as TN(N,1)).
Algorithm Neuron2TNN1
[0267] 1. Construct an input layer for T-NN by including all inputs from SLP(K,1). [0268] 2. If K>N then:
w.sub.j.sup.1=u.sub.j,j=1, . . . ,K [0277] d. Terminate the algorithm [0278] 4. Set 1=1 [0279] 5. If m.sub.1>N: [0280] a. Divide m.sub.1 neurons into
groups, every group consists of no more than N neurons. [0281] b. Construct the hidden layer LTH.sub.l+1 of the T-NN from m.sub.l+1 neurons, every neuron has identity activation function. [0282] c. Connect input neurons from every group to the corresponded neuron from the next layer. [0283] d. Set the weights of the new connections according to the following equation:
w.sub.j.sup.+1=1 a. [0289] d. Terminate the algorithm [0290] 7. Repeat steps 5 and 6.
[0291] Here ┌x┐-minimum integer number being no less than x. Number of layers in T-NN constructed by means of the algorithm Neuron2TNN1 is h=log.sub.NK
. The total number of weights in T-NN is:
[0292]
[0295] Output value of the T-NN is calculated according to the following formula:
γ=F(W.sup.mW.sup.m-1. . . W.sup.2W.sup.1x)
[0296] Output for the first layer is calculated as an output vector according to the following formula:
[0297] Multiplying the obtained vector by the weight matrix of the second layer:
[0298] Every subsequent layer outputs a vector with components equal to linear combination of some sub-vector of x.
[0299] Finally, the T-NN's output is equal to:
[0300] This is the same value as the one calculated in SLP(K,1) for the same input vector x. So output values of SLP(K,1) and constructed T-NN are equal.
Single Layer Perceptron with Several Outputs
[0301] Suppose there is a single layer perceptron SLP(K, L) with K inputs and L output neurons, each neuron performing an activation function F. Suppose further U E R″.sup.K is a weight matrix for SLP(K, L). The following algorithm Layer2TNN1 constructs a T-neural network from neurons TN(N, 1).
Algorithm Layer2TNN1
[0302] 1. For every output neuron i=1, . . . ,L [0303] a. Apply the algorithm Neuron2TNN1 to SLP.sub.i(K, 1) consisting of K inputs, 1 output neuron and weight vector U.sub.ij, j=1, 2, . . . ,K. A TNN1 is constructed as a result. [0304] 2. Construct PTNN by composing all TNN1 into one neural net: [0305] a. Concatenate input vectors of all TNN1, so the input of PTNN has L groups of K inputs, with each group being a copy of the SLP(K, L)'s input layer.
[0306] Output of the PTNN is equal to the SLP(K, L)'s output for the same input vector because output of every pair SLP.sub.i(K, 1) and TNN1 are equal.
Multilayer Perceptron
[0307] Suppose a multilayer perceptron (MLP) includes K inputs, S layers and L.sub.i calculation neurons in i-th layer, represented as MLP(K, S, L.sub.1, . . . L.sub.S). Suppose U.sub.i∈R.sup.L.sup.
[0308] The following is an example algorithm to construct a T-neural network from neurons TN(N, 1), according to some implementations.
Algorithm MLP2TNN1
[0309] 1. For every layer i=1, . . . ,S a. Apply the algorithm Layer2TNN1 to SLP.sub.i(L.sub.i-1, L.sub.i) consisting of L.sub.i-1 inputs, L.sub.i output neurons, and a weight matrix U.sub.i, constructing PTNN.sub.i as a result. [0310] 2. Construct MTNN by stacking all PTNN.sub.i into one neural net; output of a TNN.sub.i-1 is set as input for TNN.sub.i.
[0311] Output of the MTNN is equal to the MLP(K, S, L.sub.1, . . . L.sub.S)'s output for the same input vector because output of every pair SLP.sub.i(L.sub.i-1, L.sub.i) and PTNN.sub.i are equal.
Example T-Transformations with Target Neurons with N.sub.I Inputs and N.sub.O Outputs
[0312] In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or the analog design constraints 236, to obtain the transformed neural networks 228.
Example Transformation of Single Layer Perceptron with Several Outputs
[0313] Suppose a single layer perceptron SLP(K, L) includes K inputs and L output neurons, each neuron performing an activation function F. Suppose further U∈R.sup.L×K is a weight matrix for SLP(K,L). The following algorithm constructs a T-neural network from neurons TN(N.sub.I, N.sub.O), according to some implementations.
Algorithm Layer2TNNX
[0314] 1. Construct a PTNN from SLP(K,L) by using the algorithm Layer2TNN.sub.i (see description above). PTNN has an input layer consisting of L groups of K inputs. [0315] 2. Compose
subsets from L groups. Each subset contains no more than N.sub.O groups of input vector copies. [0316] 3. Replace groups in every subset with one copy of input vector. [0317] 4. Construct PTNNX by rebuild connections in every subset by making N.sub.O output connections from every input neuron.
[0318] According to some implementations, output of the PTNNX is calculated by means of the same formulas as for PTNN (described above), so the outputs are equal.
[0319]
Example Transformation of Multilayer Perceptron
[0320] Suppose a multilayer perceptron (MLP) includes K inputs, S layers and L.sub.i calculation neurons in ith layer, represented as MLP(K, S, L.sub.1, . . . L.sub.S). Suppose U.sub.i∈R.sup.L.sup.
Algorithm MLP2TNNX
[0321] 1. For every layer i=1, . . . ,S: [0322] a. Apply the algorithm Layer2TNNX to SLP.sub.i(L.sub.i-1, L.sub.i) consisting of L.sub.i-1 inputs, L.sub.i output neuron and weight matrix U.sub.i. PTNNX.sub.i is constructed as a result. [0323] 2. Construct MTNNX by stacking all PTNNX into one neural net: [0324] a. Output of a TNNX.sub.i-1 is set as input for TNNX.sub.i.
[0325] According to some implementations. output of the MTNNX is equal to the MLP(K, S, L.sub.1, . . . L.sub.S_)'s output for the same input vector, because output of every pair SLP.sub.i(L.sub.i-1, L.sub.i) and PTNNX.sub.i are equal.
Example Transformation of Recurrent Neural Network
[0326] A Recurrent Neural Network (RNN) contains backward connection allowing saving information.
[0327] Data processing in an RNN is performed by means of the following formula:
h.sub.t=f(W.sup.(hh)h.sub.t-1+W.sup.(hx)x.sub.t)
[0328] In the equation above, x.sub.t is a current input vector, and h.sub.t-1 is the RNN's output for the previous input vector x.sub.t-1. This expression consists of the several operations: calculation of linear combination for two fully connected layers W.sup.(hh)h.sub.t-1 and W.sup.(hx)x.sub.t, element-wise addition, and non-linear function calculation (f). The first and third operations can be implemented by trapezium-based network (one fully connected layer is implemented by pyramid-based network, a special case of trapezium networks). The second operation is a common operation that can be implemented in networks of any structure.
[0329] In some implementations, the RNN's layer without recurrent connections is transformed by means of Layer2TNNX algorithm described above. After transformation is completed, recurrent links are added between related neurons. Some implementations use delay blocks described below in reference to
Example Transformation of LSTM Network
[0330] A Long Short-Term Memory (LSTM) neural network is a special case of a RNN. A LSTM network's operations are represented by the following equations:
f.sub.t=σ(W.sub.f[h.sub.t-1,x.sub.t]+b.sub.f);
i.sub.t=σ(W.sub.i[h.sub.t-1,x.sub.t]+b.sub.i);
D.sub.t=tan h(W.sub.D[h.sub.t-1,x.sub.t]+b.sub.D);
C.sub.t=(f.sub.t×C.sub.t-1+i.sub.t×D.sub.t);
o.sub.t=σ(W.sub.o[h.sub.t-1,x.sub.t]+b.sub.o); and
h.sub.t=o.sub.t×tan h(C.sub.t).
[0331] In the equations above, W.sub.f, W.sub.i, W.sub.D, and W.sub.O are trainable weight matrices, b.sub.f, b.sub.i, b.sub.D, and b.sub.O are trainable biases, x.sub.t is a current input vector, h.sub.t-1 is an internal state of the LSTM calculated for the previous input vector x.sub.t-1, and of is output for the current input vector. In the equations, the subscript t denotes a time instance t, and the subscript t-1 denotes a time instance t-1.
[0332]
[0333] There are several types of operations utilized in these expressions: (i) calculation of linear combination for several fully connected layers, (ii) elementwise addition, (iii) Hadamard product, and (iv) non-linear function calculation (e.g., sigmoid (σ) and hyperbolic tangent (tan h)). Some implementations implement the (i) and (iv) operations by a trapezium-based network (one fully connected layer is implemented by a pyramid-based network, a special case of trapezium networks). Some implementations use networks of various structures for the (ii) and (iii) operations which are common operations.
[0334] The layer in an LSTM layer without recurrent connections is transformed by using the Layer2TNNX algorithm described above, according to some implementations. After transformation is completed, recurrent links are added between related neurons, according to some implementations.
[0335]
[0336]
Example Transformation of GRU Networks
[0337] A Gated Recurrent Unit) (GRU) neural network is a special case of RNN. A RNN's operations are represented by the following expressions:
z.sub.t=σ(W.sub.zx.sub.t+U.sub.zh.sub.t-1);
r.sub.t=σ(W.sub.zx.sub.t+U.sub.rh.sub.t-1);
j.sub.t=tan h(Wx.sub.t+r.sub.t×Uh.sub.t-1);
h.sub.t=z.sub.t×h.sub.t-1+(1−z.sub.t)×j.sub.t).
[0338] In the equations above, x.sub.t is a current input vector, and h.sub.t-1 is an output calculated for the previous input vector x.sub.t-1.
[0339]
[0340]
[0341] Operation types used in GRU are the same as the operation types for LSTM networks (described above), so GRU is transformed to trapezium-based networks following the principles described above for LSTM (e.g., using the Layer2TNNX algorithm), according to some implementations.
Example Transformation of Convolutional Neural Network
[0342] In general, Convolutional Neural Networks (CNN) include several basic operations, such as convolution (a set of linear combinations of image's (or internal map's) fragments with a kernel), activation function, and pooling (e.g., max, mean, etc.). Every calculation neuron in a CNN follows the general processing scheme of a neuron in an MLP: linear combination of some inputs with subsequent calculation of activation function. So a CNN is transformed using the MLP2TNNX algorithm described above for multilayer perceptrons, according to some implementations.
[0343] Conv1D is a convolution performed over time coordinate.
[0344] In some implementations, convolutional layers are represented by trapezia-like neurons and fully connected layer is represented by cross-bar of resistors. Some implementations use cross-bars, and calculate resistance matrix for the cross-bars.
Example Approximation Algorithm for Single Layer Perceptron with Multiple Outputs
[0345] In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, and/or the analog neural network optimization module 246, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or the analog design constraints 236, to obtain the transformed neural networks 228.
[0346] Suppose a single layer perceptron SLP(K, L) includes K inputs and L output neurons, each output neuron performing an activation function F. Suppose further that U E R.sup.L×K is a weight matrix for SLP(K, L). The following is an example for constructing a T-neural network from neurons TN(N.sub.I, N.sub.O) using an approximation algorithm Layer2TNNX_Approx, according to some implementations. The algorithm applies Layer2TNN1 algorithm (described above) at the first stage in order to decrease a number of neurons and connections, and subsequently applies Layer2TNNX to process the input of the decreased size. The outputs of the resulted neural net are calculated using shared weights of the layers constructed by the Layer2TNN1 algorithm. The number of these layers is determined by the value p, a parameter of the algorithm. If p is equal to 0 then Layer2TNNX algorithm is applied only and the transformation is equivalent. If p >0, then p layers have shared weights and the transformation is approximate.
Algorithm Layer2TNNX_Approx
[0347] 1. Set the parameter p with a value from the set {0,1, . . . , log.sub.N.sub.
−1} . . . [0348] 2. If p>0 apply the algorithm Layer2TNN1 with neuron TN(N.sub.I, 1) to the net SLP(K, L) and construct first p layers of the resulted subnet (PNN). The net PNN has
neurons in the output layer. [0349] 3. Apply the algorithm Layer2TNNX with a neuron TN(N.sub.I, N.sub.O) and construct a neural subnet TNN with N.sub.p inputs and L outputs. [0350] 4. Set the weights of the PNN net. The weights of every neuron i of the first layer of the PNN are set according to the rule w.sub.ik.sub.
for all weights j of this neuron except k.sub.i. All other weights of the PNN net are set to 1. w.sub.ik.sub.
All other weights of the TNN are set to 1. [0352] 6. Set activation functions for all neurons of the last layer of the TNN subnet as F. Activation functions of all other neurons are identity.
[0353]
Approximation Algorithm for Multilayer Perceptron with Several Outputs
[0354] Suppose a multilayer perceptron (MLP) includes K inputs, S layers and L.sub.i calculation neurons in i-th layer, represented as MLP(K, S, L.sub.1, . . . L.sub.S). Suppose further U.sub.i∈R.sup.L.sup.
Algorithm MLP2TNNX_Approx
[0355] 1. For every layer i=1, . . . ,S: [0356] a. Apply the algorithm Layer2TNNX_Approx (described above) to SLP.sub.i(L1-1, L.sub.i) consisting of L.sub.i-1 inputs, L.sub.i output neuron, and weight matrix U.sub.i. If i=1, then L.sub.0=K. Suppose this step constructs PTNNX.sub.i as a result. [0357] 2. Construct a MTNNX (a multilayer perceptron) by stacking all PTNNX.sub.i into one neural net, where output of a TNNX.sub.i-1 is set as input for TNNX.sub.i.
Example Methods of Compression of Transformed Neural Networks
[0358] In some implementations, the example transformations described herein are performed by the neural network transformation module 226 that transform trained neural networks 220, and/or the analog neural network optimization module 246, based on the mathematical formulations 230, the basic function blocks 232, the analog component models 234, and/or the analog design constraints 236, to obtain the transformed neural networks 228.
[0359] This section describes example methods of compression of transformed neural networks, according to some implementations. Some implementations compress analog pyramid-like neural networks in order to minimize the number of operational amplifiers and resistors, necessary to realize the analog network on chip. In some implementations, the method of compression of analog neural networks is pruning, similar to pruning in software neural networks. There is nevertheless some peculiarities in compression of pyramid-like analog networks, which are realizable as IC analog chip in hardware. Since the number of elements, such as operational amplifiers and resistors, define the weights in analog based neural networks, it is crucial to minimize the number of operational amplifiers and resistors to be placed on chip. This will also help minimize the power consumption of the chip. Modern neural networks, such as convolutional neural networks, can be compressed 5-200 times without significant loss of the accuracy of the networks. Often, whole blocks in modern neural networks can be pruned without significant loss of accuracy. The transformation of dense neural networks into sparsely connected pyramid or trapezia or cross-bar like neural networks presents opportunities to prune the sparsely connected pyramid or trapezia-like analog networks, which are then represented by operational amplifiers and resistors in analog IC chips. In some implementations, such techniques are applied in addition to conventional neural network compression techniques. In some implementations, the compression techniques are applied based on the specific architecture of the input neural network and/or the transformed neural networks (e.g., pyramids versus trapezia versus cross-bars).
[0360] For example, since the networks are realized by means of analog elements, such as operational amplifiers, some implementations determine the current which flows through the operational amplifier when the standard training dataset is presented, and thereby determine if a knot (an operational amplifier) is needed for the whole chip or not. Some implementations analyze the SPICE model of the chip and determine the knots and connections, where no current is flowing and no power is consumed. Some implementations determine the current flow through the analog IC network and thus determine the knots and connections, which are then pruned. Besides, some implementations also remove the connections if the weight of connection is too high, and/or substitute resistor to direct connector if the weight of connection is too low. Some implementations prune the knot if all connections leading to this knot have weights that are lower than a predetermined threshold (e.g., close to 0), deleting the connections where an operational amplifier always provides zero at output, and/or changing an operational amplifier to a linear junction if the amplifier gives linear function without amplification.
[0361] Some implementations apply compression techniques specific to pyramid, trapezia, or cross-bar types of neural networks. Some implementations generate pyramids or trapezia with larger amount of inputs (than without the compression), thus minimizing the number of layers in pyramid or trapezia. Some implementations generate a more compact trapezia network by maximizing the number of outputs of each neuron.
Example Generation of Optimal Resistor Set
[0362] In some implementations, the example computations described herein are performed by the weight matrix computation or weight quantization module 238 (e.g., using the resistance calculation module 240) that compute the weights 272 for connections of the transformed neural networks, and/or corresponding resistance values 242 for the weights 272.
[0363] This section describes an example of generating an optimal resistor set for a trained neural network, according to some implementations. An example method is provided for converting connection weights to resistor nominals for implementing the neural network (sometimes called a NN model) on a microchip with possibly less resistor nominals and possibly higher allowed resistor variance.
[0364] Suppose a test set ‘Test’ includes around 10,000 values of input vector (x and y coordinates) with both coordinates varying in the range [0;1], with a step of 0.01. Suppose network NN output for given input X is given by Out=NN(X). Suppose further that input value class is found as follows: Class_nn(X)=NN(X)>0.61?1:0.
[0365] The following compares a mathematical network model M with a schematic network model S. The schematic network model includes possible resistor variance of rv and processes the ‘Test’ set, each time producing a different vector of output values S(Test)=Out_s. Output error is defined by the following equation:
[0366] Classification error is defined by the following equation:
[0367] Some implementations set the desired classification error as no more than 1%.
Example Error Analysis
[0368]
[0369] Suppose another network O produces output values with a constant shift versus relevant M output values, there would be classification error between 0 and M. To keep the classification error below 1%, this shift should be in the range of [−0.045, 0.040]. Thus, possible output error for S is 45 mV.
[0370] Possible weight error is determined by analyzing dependency between weight/bias relative error over the whole network and output error. The charts 1710 and 1720 shown in
Example Process for Choosing Resistor Set
[0371] A resistor set together with a {R+, R−} pair chosen from this set has a value function over the required weight range [−wlim; wlim] with some degree of resistor error r_err. In some implementations, value function of a resistor set is calculated as follows: [0372] Possible weight options array is calculated together with weight average error dependent on resistor error; [0373] The weight options in the array is limited to the required weight range [−wlim; wlim]; [0374] Values that are worse than neighboring values in terms of weight error are removed; [0375] An array of distances between neighboring values is calculated; and [0376] The value function is a composition of square mean or maximum of the distances array.
[0377] Some implementations iteratively search for an optimal resistor set by consecutively adjusting each resistor value in the resistor set on a learning rate value. In some implementations, the learning rate changes over time. In some implementations, an initial resistor set is chosen as uniform (e.g., [1;1; . . . ;1]), with minimum and maximum resistor values chosen to be within two orders of magnitude range (e.g., [1;100] or [0.1;10]). Some implementation choose R+=R−. In some implementations, the iterative process converges to a local minimum. In one case, the process resulted in the following set: [0.17, 1.036, 0.238, 0.21, 0.362, 1.473, 0.858, 0.69, 5.138, 1.215, 2.083, 0.275]. This is a locally optimal resistor set of 12 resistors for the weight range [−2; 2] with rmin=0.1 (minimum resistance), rmax=10 (maximum resistance), and r_err=0.001 (an estimated error in the resistance). Some implementations do not use the whole available range [rmin; rmax] for finding a good local optimum. Only part of the available range (e.g., in this case [0.17; 5.13]) is used. The resistor set values are relative, not absolute. Is this case, relative value range of 30 is enough for the resistor set.
[0378] In one instance, the following resistor set of length 20 is obtained for abovementioned parameters: [0.300, 0.461, 0.519, 0.566, 0.648, 0.655, 0.689, 0.996, 1.006, 1.048, 1.186, 1.222, 1.261, 1.435, 1.488, 1.524, 1.584, 1.763, 1.896, 2.02]. In this example, the value 1.763 is also the R−=R+ value. This set is subsequently used to produce weights for NN, producing corresponding model S. The model S's mean square output error was 11 mV given the relative resistor error is close to zero, so the set of 20 resistors is more than required. Maximum error over a set of input data was calculated to be 33 mV. In one instance, S, DAC, and ADC converters with 256 levels were analyzed as a separate model, and the result showed 14 mV mean square output error and 49 mV max output error. An output error of 45 mV on NN corresponds to a relative recognition error of 1%. The 45 mV output error value also corresponds to 0.01 relative or 0.01 absolute weight error, which is acceptable. Maximum weight modulus in NN is 1.94. In this way, the optimal (or near optimal) resistor set is determined using the iterative process, based on desired weight range [−wlim; wlim], resistors error (relative), and possible resistors range.
[0379] Typically, a very broad resistor set is not very beneficial (e.g., between 1-1/5 orders of magnitude is enough) unless different precision is required within different layers or weight spectrum parts. For example, suppose weights are in the range of [0, 1], but most of the weights are in the range of [0, 0.001], then better precision is needed within that range. In the example described above, given the relative resistor error is close to zero, the set of 20 resistors is more than sufficient for quantizing the NN network, with given precision. In one instance, on a set of resistors [0.300, 0.461, 0.519, 0.566, 0.648, 0.655, 0.689, 0.996, 1.006, 1.048, 1.186, 1.222, 1.261, 1.435, 1.488, 1.524, 1.584, 1.763, 1.896, 2.02] (note values are relative), an average S output error of 11 mV was obtained.
Example Process for Quantization of Resistor Values
[0380] In some implementations, the example computations described herein are performed by the weight matrix computation or weight quantization module 238 (e.g., using the resistance calculation module 240) that compute the weights 272 for connections of the transformed neural networks, and/or corresponding resistance values 242 for the weights 272.
[0381] This section describes an example process for quantizing resistor values corresponding to weights of a trained neural network, according to some implementations. The example process substantially simplifies the process of manufacturing chips using analog hardware components for realizing neural networks. As described above, some implementations use resistors to represent neural network weights and/or biases for operational amplifiers that represent analog neurons. The example process described here specifically reduces the complexity in lithographically fabricating sets of resistors for the chip. With the procedure of quantizing the resistor values, only select values of resistances are needed for chip manufacture. In this way, the example process simplifies the overall process of chip manufacture and enables automatic resistor lithographic mask manufacturing on demand.
[0382]
[0383] The following equations determine the weights, based on resistor values: [0384] Voltage at the output of neuron is determined by the following equation:
[0386] The following example optimization procedure quantizes the values of each resistance and minimize the error of neural network output, according to some implementations: [0387] 1. Obtain a set of connection weights and biases {w1, . . . , wn, b}. [0388] 2. Obtain possible minimum and maximum resistor values {Rmin, Rmax}. These parameters are determined based on the technology used for manufacturing. Some implementations use TaN or Tellurium high resistivity materials. In some implementations, the minimum value of resistor is determined by minimum square that can be formed lithographically. The maximum value is determined by length, allowable for resistors (e.g., resistors made from TaN or Tellurium) to fit to the desired area, which is in turn determined by the area of an operational amplifier square on lithographic mask. In some implementations, the area of arrays of resistors is smaller than the area of one operational amplifier, since the arrays of resistors are stacked (e.g., one in BEOL, another in FEOL). [0389] 3. Assume that each resistor has r_err relative tolerance value [0390] 4. The goal is to select a set of resistor values {R1, . . . , Rn} of given length N within the defined [Rmin; Rmax], based on {w1, . . . , wn, b} values. An example search algorithm is provided below to find sub-optimal {R1, . . . , Rn} set based on particular optimality criteria. [0391] 5. Another algorithm chooses {Rn, Rp, Rni, Rpi} for a network given that {R1 . . . Rn} is determined.
Example {R1, . . . , Rn} Search Algorithm
[0392] Some implementations use an iterative approach for resistor set search. Some implementations select an initial (random or uniform) set {R1, . . . , Rn} within the defined range. Some implementations select one of the elements of the resistor set as a R−=R+ value. Some implementations alter each resistor within the set by a current learning rate value until such alterations produce ‘better’ set (according to a value function). This process is repeated for all resistors within the set and with several different learning rate values, until no further improvement is possible.
[0393] Some implementations define the value function of a resistor set as follows: [0394] Possible weight options are calculated according to the formula (described above):
[0399] Suppose the required weight range [−wlim; wlim] for a model is set to [−5; 5], and the other parameters include N=20, r_err=0.1%, rmin=100 KΩ, rmax=5 MΩ. Here, rmin and rmax are minimum and maximum values for resistances, respectively.
[0400] In one instance, the following resistor set of length 20 was obtained for abovementioned parameters: [0.300, 0.461, 0.519, 0.566, 0.648, 0.655, 0.689, 0.996, 1.006, 1.048, 1.186, 1.222, 1.261, 1.435, 1.488, 1.524, 1.584, 1.763, 1.896, 2.02] MΩ. R−=R+=1.763 MΩ.
Example {Rn, Rp, Rni, Rpi} Search Algorithm
[0401] Some implementations determine R.sub.n and Rp using an iterative algorithm such as the algorithm described above. Some implementations set Rp=Rn (the tasks to determine Rn and Rp are symmetrical—the two quantities typically converge to a similar value). Then for each weight w.sub.i, some implementations select a pair of resistances {Rni, Rpi} that minimizes the estimated weight error value:
[0402] Some implementations subsequently use the {Rni; Rpi; Rn; Rp} values set to implement neural network schematics. In one instance, the schematics produced mean square output error (sometimes called S mean square output error, described above) of 11 mV and max error of 33 mV over a set of 10,000 uniformly distributed input data samples, according to some implementations. In one instance, S model was analyzed along with digital-to-analog converters (DAC), analog-to-digital converters (ADC), with 256 levels as a separate model. The model produced 14 mV mean square output error and 49 mV max output error on the same data set, according to some implementations. DAC and ADC have levels because they convert analog value to bit value and vice-versa. 8 bits of digital value is equal to 256 levels. Precision cannot be better than 1/256 for 8-bit ADC.
[0403] Some implementations calculate the resistance values for analog IC chips, when the weights of connections are known, based on Kirchhoff's circuit laws and basic principles of operational amplifiers (described below in reference to
[0404] Some implementations manufacture resistors in a lithography layer where resistors are formed as cylindrical holes in the SiO2 matrix and the resistance value is set by the diameter of hole. Some implementations use amorphous TaN, TiN of CrN or Tellurium as the highly resistive material to make high density resistor arrays. Some ratios of Ta to N Ti to N and Cr to N provide high resistance for making ultra-dense high resistivity elements arrays. For example, for TaN, Ta5N6, Ta3N5, the higher the N ratio to Ta, the higher is the resistivity. Some implementations use Ti2N, TiN, CrN, or Cr5N, and determine the ratios accordingly. TaN deposition is a standard procedure used in chip manufacturing and is available at all major Foundries.
Example Operational Amplifier
[0405]
[0406]
[0407] In some implementations, operational amplifiers such as the example described above are used as the basic element of integrated circuits for hardware realization of neural networks. In some implementations, the operational amplifiers are of the size of 40 square microns and fabricated according to 45 nm node standard.
[0408] In some implementations, activation functions, such as ReLU, Hyperbolic Tangent, and Sigmoid functions are represented by operational amplifiers with modified output cascade. For example, RELU, Sigmoid, or Tangent function is realized as an output cascade of an operational amplifier (sometimes called OpAmp) using corresponding well-known analog schematics, according to some implementations.
[0409] In the examples described above and below, in some implementations, the operational amplifiers are substituted by inverters, current mirrors, two-quadrant or four quadrant multipliers, and/or other analog functional blocks, that allow weighted summation operation.
Example Scheme of a LSTM Block
[0410]
[0411] a “neuron O” assembled on the operational amplifiers U1 20094 and U2 20100, shown in
[0412] a “neuron C” assembled on the operational amplifiers U3 20098 (shown in
[0413] a “neuron I” assembled on the operational amplifiers U5 20102 and U6 20104, shown in
[0414] The outputs of modules X2 20080 (
[0415]
[0416]
Example Scheme of a Multiplier Block
[0417]
[0418] Referring to
[0419] Referring back to
[0420] Similar transformations that occur with the signals include: [0421] negB 21012 and V_one 21020 are input to a multiplexer assembled on NMOS transistors M11 21070, M12 2072, M13 2074, M14 21076, and PMOS transistors M15 2078 and M16 21080. The output of this multiplexer is input to the M5 21058 NMOS transistor (shown in
[0429] The current mirror (transistors M1 21052, M2 21053, M3 21054, and M4 21056) powers the portion of the four quadrant multiplier circuit shown on the left, made with transistors M5 21058, M6 21060, M7 21062, M8 21064, M9 21066, and M10 21068. Current mirrors (on transistors M25 21098, M26 21100, M27 21102, and M28 21104) power supply of the right portion of the four-quadrant multiplier, made with transistors M29 21106, M30 21108, M31 21110, M32 21112, M33 21114, and M34 21116. The multiplication result is taken from the resistor Ro 21022 enabled in parallel to the transistor M3 21054 and the resistor Ro 21188 enabled in parallel to the transistor M28 21104, supplied to the adder on U3 21044. The output of U3 21044 is supplied to an adder with a gain of 7,1, assembled on U5 21048, the second input of which is compensated by the reference voltage set by resistors R1 21024 and R2 21026 and the buffer U4 21046, as shown in
[0430]
Example Scheme of a Sigmoid Block
[0431]
[0432] The sigmoid function is formed by adding the corresponding reference voltages on a differential module assembled on the transistors M1 2266 and M2 2268. A current mirror for a differential stage is assembled with active regulation operational amplifier U3 2254, and the NMOS transistor M3 2270. The signal from the differential stage is removed with the NMOS transistor M2 and resistor R5 2220 is input to the adder U2 2252. The output signal sigm_out 2210 is removed from the U2 adder 2252 output.
[0433]
Example Scheme of a Hyperbolic Tangent Block
[0434]
[0435]
Example Scheme of a Single Neuron OP1 CMOS OpAmp
[0436]
[0437]
Example Scheme of a Single Neuron OP3 CMOS OpAmp
[0438]
[0439] The weights of the connections of a single neuron (with two inputs and one output) are set by the resistor ratio: w1=(R feedback/R1+)−(R feedback/R1−); w2=(R feedback/R2+)−(R feedback/R2−); wbias=(R feedback/Rbias+)−(R feedback/Rbias−); w1=(R p*K amp/R1+)−(R n*K amp/R1−); w2=(R p*K amp/R2+)−(R n*K amp/R2−); wbias=(R p*K amp/Rbias+)−(R n*K amp/Rbias−), where K amp=R1ReLU/R2ReLU. R feedback=100k—used only for calculating w1, w2, wbias. According to some implementations, example values include: R feedback=100k, R.sub.n=Rp=Rcom=10k, K amp ReLU=1+90k/10k=10, w1=(10k*10/22.1k)−(10k*10/21.5k)=−0.126276, w2=(10k*10/75k)−(10k*10/71.5k)=−0.065268, wbias=(10k*10/71.5k)−(10k*10/78.7k)=0.127953.
[0440] The input of the negative link adder of the neuron (M1-M17) is received from the positive link adder of the neuron (M17-M32) through the Rcom resistor.
[0441]
Example Methods for Analog Hardware Realization of Trained Neural Networks
[0442]
[0443] The method also includes transforming (2710) the neural network topology to an equivalent analog network of analog components. Referring next to
[0444] Referring next to
V represents an input, and A and B are predetermined coefficient values (e.g., A=−0.1; B=11.3) of the sigmoid activation block; (iv) a hyperbolic tangent activation block (2742) with a block output V″.sup.t=A*tan h (B*V.sup.in). V.sup.in represents an input, and A and B are predetermined coefficient values (e.g., A=0.1, B=−10.1); and a signal delay block (2744) with a block output U(t)=V(t−dt). t represents a current time-period, V(t-1) represents an output of the signal delay block for a preceding time period t-1, and dt is a delay value.
[0445] Referring now back to
[0446] Referring now back to
[0447] Referring next to
[0448] Referring next to
[0449] Referring now back to
[0450] The method also includes generating (2714) a schematic model for implementing the equivalent analog network based on the weight matrix, including selecting component values for the analog components. Referring next to
[0451] Referring next to
[0452] Referring now back to
[0453] Referring now back to
Example Methods for Constrained Analog Hardware Realization of Neural Networks
[0454]
[0455] The method also includes calculating (28008) one or more connection constraints based on analog integrated circuit (IC) design constraints (e.g., the constraints 236). For example, IC design constraints can set the current limit (e.g., 1A), and neuron schematics and operational amplifier (OpAmp) design can set the OpAmp output current in the range [0-10 mA], so this limits output neuron connections to 100. This means that the neuron has 100 outputs which allow the current to flow to the next layer through 100 connections, but current at the output of the operational amplifier is limited to 10 mA, so some implementations use a maximum of 100 outputs (0.1 mA times 100=10 mA). Without this constraint, some implementations use current repeaters to increase number of outputs to more than 100, for example.
[0456] The method also includes transforming (28010) the neural network topology (e.g., using the neural network transformation module 226) to an equivalent sparsely connected network of analog components satisfying the one or more connection constraints.
[0457] In some implementations, transforming the neural network topology includes deriving (28012) a possible input connection degree N.sub.i and output connection degree N.sub.o, according to the one or more connection constraints.
[0458] Referring next to log N.sub.i K
+
log.sub.N.sub.
−1 layers, such that input connection degree does not exceed N.sub.i, and output connection degree does not exceed N.sub.o.
[0459] Referring next to log.sub.N.sub.
,
log.sub.N.sub.
) layers. Each layer m is represented by a corresponding weight matrix U.sub.m, where absent connections are represented with zeros, such that input connection degree does not exceed N.sub.i, and output connection degree does not exceed N.sub.o. The equation U=Π.sub.m=1 . . . MU.sub.m is satisfied with a predetermined precision. The predetermined precision is a reasonable precision value that statistically guarantees that altered networks output differs from referent network output by no more than allowed error value, and this error value is task-dependent (typically between 0.1% and 1%).
[0460] Referring next to log.sub.N.sub.
,
log.sub.N.sub.
) layers. Each layer m is represented by a corresponding weight matrix U.sub.m, where absent connections are represented with zeros, such that input connection degree does not exceed N.sub.i, and output connection degree does not exceed N.sub.o, and the equation U=Π.sub.m=1 . . . M U.sub.m is satisfied with a predetermined precision.
[0461] Referring next to
[0462] Referring back to
[0463] Referring now to
[0464] Referring next to
[0465] Referring next to log.sub.NK
; and (iii) constructing (28062) the equivalent sparsely connected network with the K inputs, m layers and the connection degree N. The equivalent sparsely connected network includes respective one or more analog neurons in each layer of the m layers. Each analog neuron of first m-1 layers implements identity transform, and an analog neuron of last layer implements the activation function F of the calculation neuron of the single layer perceptron. Furthermore, in such cases, computing (28064) the weight matrix for the equivalent sparsely connected network includes calculating (28066) a weight vector W for connections of the equivalent sparsely connected network by solving a system of equations based on the weight vector U. The system of equations includes K equations with S variables, and S is computed using the equation
[0466] Referring next to log.sub.NK
; (iii) decomposing (28076) the single layer perceptron into L single layer perceptron networks. Each single layer perceptron network includes a respective calculation neuron of the L calculation neurons; (iv) for each single layer perceptron network (28078) of the L single layer perceptron networks, constructing (28080) a respective equivalent pyramid-like sub-network for the respective single layer perceptron network with the K inputs, the m layers and the connection degree N. The equivalent pyramid-like sub-network includes one or more respective analog neurons in each layer of the m layers, each analog neuron of first m-1 layers implements identity transform, and an analog neuron of last layer implements the activation function of the respective calculation neuron corresponding to the respective single layer perceptron; and (v) constructing (28082) the equivalent sparsely connected network by concatenating each equivalent pyramid-like sub-network including concatenating an input of each equivalent pyramid-like sub-network for the L single layer perceptron networks to form an input vector with L*K inputs. Furthermore, in such cases, computing (28084) the weight matrix for the equivalent sparsely connected network includes, for each single layer perceptron network (28086) of the L single layer perceptron networks, (i) setting (28088) a weight vector U=V.sub.i, i.sup.th row of the weight matrix V corresponding to the respective calculation neuron corresponding to the respective single layer perceptron network, and (ii) calculating (28090) a weight vector W.sub.i for connections of the respective equivalent pyramid-like sub-network by solving a system of equations based on the weight vector U. The system of equations includes K equations with S variables, and S is computed using the equation
[0467] Referring next to log.sub.N K.sub.i,j
. K.sub.i,j is number of inputs for the respective calculation neuron in the multi-layer perceptron, and (b) constructing (28104) the respective equivalent pyramid-like sub-network for the respective single layer perceptron network with K.sub.i,j inputs, the m layers and the connection degree N. The equivalent pyramid-like sub-network includes one or more respective analog neurons in each layer of the m layers, each analog neuron of first m-1 layers implements identity transform, and an analog neuron of last layer implements the activation function of the respective calculation neuron corresponding to the respective single layer perceptron network; and (iv) constructing (28106) the equivalent sparsely connected network by concatenating each equivalent pyramid-like sub-network including concatenating input of each equivalent pyramid-like sub-network for the Q single layer perceptron networks to form an input vector with Q*K.sub.i,j inputs. In such cases, computing (28108) the weight matrix for the equivalent sparsely connected network includes: for each single layer perceptron network (28110) of the Q single layer perceptron networks, (i) setting (28112) a weight vector U=V.sub.i.sup.j, the i.sup.th row of the weight matrix V corresponding to the respective calculation neuron corresponding to the respective single layer perceptron network, where j is the corresponding layer of the respective calculation neuron in the multi-layer perceptron; and (ii) calculating (28114) a weight vector W.sub.i for connections of the respective equivalent pyramid-like sub-network by solving a system of equations based on the weight vector U. The system of equations includes K.sub.i,j equations with S variables, and S is computed using the equation
[0468] Referring next to log.sub.N K.sub.i,j
. j is the corresponding layer of the respective calculation neuron in the CNN, and K.sub.i,j is number of inputs for the respective calculation neuron in the CNN; and (b) constructing the respective equivalent pyramid-like sub-network for the respective single layer perceptron network with K.sub.i,j inputs, the m layers and the connection degree N. The equivalent pyramid-like sub-network includes one or more respective analog neurons in each layer of the m layers, each analog neuron of first m-1 layers implements identity transform, and an analog neuron of last layer implements the activation function of the respective calculation neuron corresponding to the respective single layer perceptron network; and (iv) constructing (28130) the equivalent sparsely connected network by concatenating each equivalent pyramid-like sub-network including concatenating input of each equivalent pyramid-like sub-network for the Q single layer perceptron networks to form an input vector with Q*K.sub.i,j inputs. In such cases, computing (28132) the weight matrix for the equivalent sparsely connected network includes, for each single layer perceptron network (28134) of the Q single layer perceptron networks: (i) setting a weight vector U=V.sub.i.sup.j, the i.sup.th row of the weight matrix V corresponding to the respective calculation neuron corresponding to the respective single layer perceptron network, where j is the corresponding layer of the respective calculation neuron in the CNN; and (ii) calculating weight vector W.sub.i for connections of the respective equivalent pyramid-like sub-network by solving a system of equations based on the weight vector U. The system of equations includes K.sub.i,j equations with S variables, and S is computed using the equation
[0469] Referring next to
analog neurons performing identity activation function, and a layer LA.sub.o with L analog neurons performing the activation function F, such that each analog neuron in the layer LA.sub.p has N.sub.O outputs, each analog neuron in the layer LA.sub.h has not more than N.sub.I inputs and N.sub.O outputs, and each analog neuron in the layer LA.sub.o has N.sub.I inputs. In some such cases, computing (28148) the weight matrix for the equivalent sparsely connected network includes generating (2850) a sparse weight matrices W.sub.o and W.sub.h by solving a matrix equation W.sub.o.Math.W.sub.h=W that includes K.Math.L equations in K.Math.N.sub.o+L.Math.N.sub.I variables, so that the total output of the layer LA.sub.o is calculated using the equation Y.sub.o=F(W.sub.o.Math.W.sub.h.Math.x). The sparse weight matrix W.sub.o∈R.sup.K×M represents connections between the layers LA.sub.p and LA.sub.h, and the sparse weight matrix W.sub.h∈R.sup.M×L represents connections between the layers LA.sub.h and LA.sub.o.
[0470] Referring next to
[0471] Referring next to
[0472] Referring next to
[0473] Referring next to
[0474] Referring next to
[0475] Referring next to log.sub.N.sub.
−1}; (iii) in accordance with a determination that p >0, constructing (28186) a pyramid neural network that forms first p layers of the equivalent sparsely connected network, such that the pyramid neural network has N.sub.p=
K/N.sub.I.sup.p
neurons in its output layer. Each neuron in the pyramid neural network performs identity function; and (iv) constructing (28188) a trapezium neural network with N.sub.p inputs and L outputs. Each neuron in the last layer of the trapezium neural network performs the activation function F and all other neurons perform identity function. Also, in such cases, computing (28190) the weight matrix for the equivalent sparsely connected network includes: (i) generating (28192) weights for the pyramid neural network including (i) setting weights of every neuron i of the first layer of the pyramid neural network according to following rule: (a) w.sub.ik.sub.
for all weights j of the neuron except k.sub.i; and (ii) setting all other weights of the pyramid neural network to 1; and (ii) generating (28194) weights for the trapezium neural network including (i) setting weights of each neuron i of the first layer of the trapezium neural network (considering the whole net, this is (p+1)th layer) according to the equation
and (ii) setting other weights of the trapezium neural network to 1.
[0476] Referring next to
[0477] Referring back to
Example Methods of Calculating Resistance Values for Analog Hardware Realization of Trained Neural Networks
[0478]
[0479] The method includes obtaining (2906) a neural network topology (e.g., the topology 224) and weights (e.g., the weights 222) of a trained neural network (e.g., the networks 220). In some implementations, weight quantization is performed during training. In some implementations, the trained neural network is trained (2908) so that each layer of the neural network topology has quantized weights (e.g., a particular value from a list of discrete values; e.g., each layer has only 3 weight values of +1, 0, −1).
[0480] The method also includes transforming (2910) the neural network topology (e.g., using the neural network transformation module 226) to an equivalent analog network of analog components including a plurality of operational amplifiers and a plurality of resistors. Each operational amplifier represents an analog neuron of the equivalent analog network, and each resistor represents a connection between two analog neurons.
[0481] The method also includes computing (2912) a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents a respective connection.
[0482] The method also includes generating (2914) a resistance matrix for the weight matrix. Each element of the resistance matrix corresponds to a respective weight of the weight matrix and represents a resistance value.
[0483] Referring next to
within the range [−R.sub.base, R.sub.base] for all combinations of {R.sub.i,R.sub.j} within the limited length set of resistance values. In some implementations, weight values are outside this range, but the square average distance between weights within this range is minimum; (iii) selecting (2922) a resistance value R.sup.+=R.sup.−, from the limited length set of resistance values, either for each analog neuron or for each layer of the equivalent analog network, based on maximum weight of incoming connections and bias w.sub.max of each neuron or for each layer of the equivalent analog network, such that R.sup.+=R.sup.− is the closest resistor set value to R.sub.base*w.sub.max. In some implementations, R.sup.+ and R.sup.− are chosen (2924) independently for each layer of the equivalent analog network. In some implementations, R.sup.+ and R.sup.− are chosen (2926) independently for each analog neuron of the equivalent analog network; and (iv) for each element of the weight matrix, selecting (2928) a respective first resistance value R.sub.1 and a respective second resistance value R.sub.2 that minimizes an error according to equation
for all possible values of R.sub.1 and R.sub.2 within the predetermined range of possible resistance values. w is the respective element of the weight matrix, and r.sub.err is a predetermined relative tolerance value for the possible resistance values.
[0484] Referring next to
[0485] Referring next to
[0486] Referring next to
[0487] Referring next to
Example Methods of Optimizations for Analog Hardware Realization of Trained Neural Networks
[0488]
[0489] The method includes obtaining (3006) a neural network topology (e.g., the topology 224) and weights (e.g., the weights 222) of a trained neural network (e.g., the networks 220).
[0490] The method also includes transforming (3008) the neural network topology (e.g., using the neural network transformation module 226) to an equivalent analog network of analog components including a plurality of operational amplifiers and a plurality of resistors. Each operational amplifier represents an analog neuron of the equivalent analog network, and each resistor represents a connection between two analog neurons.
[0491] Referring next to
[0492] Referring next to
[0493] Referring back to
[0494] Referring next to
[0495] Referring next to
[0496] Referring back to
[0497] The method also includes pruning (3014) the equivalent analog network to reduce number of the plurality of operational amplifiers or the plurality of resistors, based on the resistance matrix, to obtain an optimized analog network of analog components.
[0498] Referring next to
[0499] Referring next to
[0500] Referring next to
[0501] Referring next to
[0502] Referring next to
[0503] Referring next to
[0504] Referring next to
[0505] Referring next to
Example Analog Neuromorphic Integrated Circuits and Fabrication Methods
Example Methods for Fabricating Analog Integrated Circuits for Neural Networks
[0506]
[0507] The method also includes transforming (3106) the neural network topology (e.g., using the neural network transformation module 226) to an equivalent analog network of analog components including a plurality of operational amplifiers and a plurality of resistors (for recurrent neural networks, also use signal delay lines, multipliers, Tanh analog block, Sigmoid Analog Block). Each operational amplifier represents a respective analog neuron, and each resistor represents a respective connection between a respective first analog neuron and a respective second analog neuron.
[0508] The method also includes computing (3108) a weight matrix for the equivalent analog network based on the weights of the trained neural network. Each element of the weight matrix represents a respective connection.
[0509] The method also includes generating (3110) a resistance matrix for the weight matrix. Each element of the resistance matrix corresponds to a respective weight of the weight matrix.
[0510] The method also includes generating (3112) one or more lithographic masks (e.g., generating the masks 250 and/or 252 using the mask generation module 248) for fabricating a circuit implementing the equivalent analog network of analog components based on the resistance matrix, and fabricating (3114) the circuit (e.g., the ICs 262) based on the one or more lithographic masks using a lithographic process.
[0511] Referring next to
[0512] Referring next to
[0513] Referring next to
[0514] Referring next to
[0515] Referring next to
[0516] Referring next to
[0517] Referring next to
[0518] Referring next to
[0519] Referring next to
[0520] Referring next to
[0521] Referring next to
[0522] Referring next to
[0523] Referring next to
[0524] Referring next to
[0525] Referring next to
[0526] Referring next to
[0527] Some implementations include components that are not integrated into the chip (i.e., these are external elements, connected to the chip) selected from the group consisting of: voice recognition, video signal processing, image sensing, temperature sensing, pressure sensing, radar processing, LiDAR processing, battery management, MOSFET circuits current and voltage, accelerometers, gyroscopes, magnetic sensors, heart rate sensors, gas sensors, volume sensors, liquid level sensors, GPS satellite signal, human body conductance sensor, gas flow sensor, concentration sensor, pH meter, and IR vision sensors.
[0528] Examples of analog neuromorphic integrated circuits manufactured according to the processes described above are provided in the following section, according to some implementations.
Example Analog Neuromorphic IC for Selective Gas Detection
[0529] In some implementations, a neuromorphic IC is manufactured according to the processes described above. The neuromorphic IC is based on a Deep Convolutional Neural Network trained for selective sensing of different gases in the gas mixture containing some amounts of gases to be detected. The Deep Convolutional Neural Network is trained using training datasets, containing signals of arrays of gas sensors (e.g., 2 to 25 sensors) in response to different gas mixtures. The integrated circuit (or the chip manufactured according to the techniques described herein) can be used to determine one or more known gases in the gas mixture, despite the presence of other gases in the mixture.
[0530] In some implementations, the trained neural network is a Multi-label 1D-DCNN network used for Mixture Gases Classification. In some implementations, the network is designed for detecting 3 binary gas components based on measurements by 16 gas sensors. In some implementations, the 1D-DCNN includes sensor-wise 1D convolutional block (16 such blocks), 3 common 1D convolutional blocks, and 3 Dense layers. In some implementations, the 1D-DCNN network performance for this task is 96.3%.
[0531] In some implementations, the original network is T-transformed with following parameters: maximum input and output connections per neuron=100; delay blocks could produce delay by any number of time steps; and signal limit of 5.
[0532] In some implementations, the resulting T-network has the following properties: 15 layers, approximately 100,000 analog neurons, approximately 4,900,000 connections.
Example Analog Neuromorphic IC for MOSFET Failure Prediction
[0533] MOSFET on-resistance degradation due to thermal stress is a well-known serious problem in power electronics. In real-world applications, frequently, MOSFET device temperature changes over a short period of time. This temperature sweeps produce thermal degradation of a device, as a result of which the device might exhibit exponential. This effect is typically studied by power cycling that produces temperature gradients, which cause MOSFET degradation.
[0534] In some implementations, a neuromorphic IC is manufactured according to the processes described above. The neuromorphic IC is based on a network discussed in the article titled “Real-time Deep Learning at the Edge for Scalable Reliability Modeling of SI-MOSFET Power Electronics Converters” for predicting remaining useful life (RUL) of a MOSFET device. The neural network can be used to determine Remaining Useful Life (RUL) of a device, with an accuracy over 80%.
[0535] In some implementations, the network is trained on NASA MOSFET Dataset which contains thermal aging timeseries for 42 different MOSFETs. Data is sampled every 400 ms and typically includes several hours of data for each device. The network contains 4 LSTM layers of 64 neurons each, followed by 2 Dense layers of 64 and 1 neurons.
[0536] In some implementations, the network is T-transformed with following parameters: maximum input and output connections per neuron is 100; signal limit of 5, and the resulting T-network had following properties: 18 layers, approximately 3,000 neurons (e.g., 137 neurons), and approximately 120,000 connections (e.g., 123200 connections).
Example Analog Neuromorphic IC for Lithium Ion Battery Health and SoC Monitoring
[0537] In some implementations, a neuromorphic IC is manufactured according to the processes described above. The neuromorphic IC can be used for predictive analytics of Lithium Ion batteries to use in Battery Management Systems (BMS). BMS device typically presents such functions as overcharge and over-discharge protection, monitoring State of Health (SOH) and State of Charge (SOC), and load balancing for several cells. SOH and SOC monitoring normally requires digital data processor, which adds to the cost of the device and consumes power. In some implementations, the Integration Circuit is used to obtain precise SOC and SOH data without implementing digital data processor on the device. In some implementations, the Integrated Circuit determines SOC with over 99% accuracy and determines SOH with over 98% accuracy.
[0538] In some implementations, network operation is based on analysis of the discharge curve of the battery, as well as temperature, and/or data is presented as a time series. Some implementations use data from NASA Battery Usage dataset. The dataset presents data of continuous usage of 6 commercially available Li-Ion batteries. In some implementations, the network includes an input layer, 2 LSTM layers of 64 neurons each, and an output dense layer of 2 neurons (SOC and SOH values).
[0539] In some implementations, the network is T-transformed with following parameters: maximum input and output connections per neuron=100, and a signal limit of 5. In some implementations, the resulting T-network include the following properties: 9 layers, approximately 1,200 neurons (e.g., 1,271 neurons), and approximately 50,000 connections (e.g., 51,776 connections). In some implementations, the network operation is based on analysis of the discharge curve of the battery, as well as temperature. The network is trained using Network IndRnn disclosed in the paper titled “State-of-Health Estimation of Li-ion Batteries in Electric Vehicle Using IndRNN under VariableLoad Condition” designed for processing data from NASA Battery Usage dataset. The dataset presents data of continuous usage of 6 commercially available Li-Ion batteries. The IndRnn network contains an input layer with 18 neurons, a simple recurrent layer of 100 neurons and a dense layer of 1 neuron.
[0540] In some implementations, the IndRnn network is T-transformed with following parameters: maximum input and output connections per neuron=100 and signal limit of 5. In some implementations, the resulting T-network had following properties: 4 layers, approximately 200 neurons (e.g., 201 neurons), and approximately 2,000 connections (e.g., 2,300 connections). Some implementations output only SOH with an estimation error of 1.3%. In some implementations, the SOC is obtained similar to how the SOH is obtained.
Example Analog Neuromorphic IC for Keyword Spotting
[0541] In some implementations, a neuromorphic IC is manufactured according to the processes described above. The neuromorphic IC can be used for keyword spotting.
[0542] The input network is a neural network with 2-D Convolutional and 2-D Depthwise Convolutional layers, with input audio mel-spectrogram of size 49 times 10. In some implementations, the network includes 5 convolutional layers, 4 depthwise convolutional layers, an average pooling layer, and a final dense layer.
[0543] In some implementations, the networks are pre-trained to recognize 10 short spoken keywords (yes”, “no”, “up”, “down”, “left”, “right”, “on”, “off”, “stop”, “go”) from Google Speech Commands Dataset, with a recognition accuracy of 94.4%.
[0544] In some implementations, the Integration Circuit is manufactured based on Depthwise Separable Convolutional Neural Network (DS-CNN) for the voice command identification. In some implementations, the original DS-CNN network is T-transformed with following parameters: maximum input and output connections per neuron=100, signal limit of 5. In some implementations, the resulting T-network had following properties: 13 layers, approximately 72,000 neurons, and approximately 2.6 million connections.
Example DS-CNN Keyword Spotting Network
[0545] In one instance, a keyword spotting network is transformed to a T-network, according to some implementations. The network is a neural network of 2-D Convolutional and 2-D Depthwise Convolutional layers, with input audio spectrogram of size 49×10. Network consists of 5 convolutional layers, 4 depthwise convolutional layers, average pooling layer and final dense layer. Network is pre-trained to recognize 10 short spoken keywords (yes”, “no”, “up”, “down”, “left”, “right”, “on”, “off”, “stop”, “go”) from Google Speech Commands Dataset https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html. There are 2 additional classes which correspond to ‘silence’ and ‘unknown’. Network output is a softmax of length 12.
[0546] The trained neural network (input to the transformation) had a recognition accuracy of 94.4%, according to some implementations. In the neural network topology, each convolutional layer is followed with BatchNorm layer and ReLU layer, and ReLU activations are unbounded, and included around 2.5 million multiply-add operations.
[0547] After transformation, the transformed analog network was tested with a test set of 1000 samples (100 of each spoken command). All test samples are also used as test samples in the original dataset. Original DS-CNN network gave close to 5.7% recognition error for this test set. Network was converted to a T-network of trivial neurons. Batch normalization (sometimes referred to as BatchNorm) layers in ‘test’ mode produce simple linear signal transformation, so can be interpreted as weight multiplier+some additional bias. Convolutional, AveragePooling and Dense layers are T-transformed quite straightforwardly. Softmax activation function was not implemented in T-network and was applied to T-network output separately.
[0548] Resulting T-network had 12 layers including an Input layer, approximately 72,000 neurons and approximately 2.5 million connections.
[0549]
[0550] Various examples for setting network limitations for the transformed network are described herein, according to some implementations. For signal limit, as ReLU activations used in the network are unbounded, and some implementations use a signal limit on each layer. This could potentially affect mathematical equivalence. For this, some implementations use a signal limit of 5 on all layers which corresponds to power voltage of 5 in relation to input signal range.
[0551] For quantizing the weights, some implementations use a nominal set of 30 resistors [0.001, 0.003, 0.01, 0.03, 0.1, 0.324, 0.353, 0.436, 0.508, 0.542, 0.544, 0.596, 0.73, 0.767, 0.914, 0.985, 0.989, 1.043, 1.101, 1.149, 1.157, 1.253, 1.329, 1.432, 1.501, 1.597, 1.896, 2.233, 2.582, 2.844].
[0552] Some implementations select R− and R+ values (see description above) separately for each layer. For each layer, some implementations select a value which delivers most weight accuracy. In some implementations, subsequently all the weights (including bias) in the T-network are quantized (e.g., set to the closest value which can be achieved with the input or chosen resistors).
[0553] Some implementations convert the output layer as follows. Output layer is a dense layer that does not have ReLU activation. The layer has softmax activation which is not implemented in T-conversion and is left for digital part, according to some implementations. Some implementations perform no additional conversion.
Example Analog Neuromorphic IC for Obtaining Heartrate
[0554] PPG is an optically obtained plethysmogram that can be used to detect blood volume changes in the microvascular bed of tissue. A PPG is often obtained by using a pulse oximeter which illuminates the skin and measures changes in light absorption. PPG is often processed to determine heart rate in devices, such as fitness trackers. Deriving heart rate (HR) from PPG signal is an essential task in edge devices computing. PPG data obtained from device located on wrist usually allows to obtain reliable heartrate only when the device is stable. If a person is involved in physical exercise, obtaining heartrate from PPG data produces poor results unless combined with inertial sensor data.
[0555] In some implementations, an Integrated Circuit, based on combination of Convolutional Neural Network and LSTM layers, can be used to precisely determine the pulse rate, basing on the data from photoplethysmography (PPG) sensor and 3-axis accelerometer. The integrated circuit can be used to suppress motion artifacts of PPG data and to determine the pulse rate during physical exercise, such as jogging, fitness exercises, and climbing stairs, with an accuracy exceeding 90%
[0556] In some implementations, the input network is trained with PPG data from the PPG-Dalia dataset. Data is collected for 15 individuals performing various physical activities for a predetermined duration (e.g., 1-4 hours each). The training data included wrist-based sensor data contains PPG, 3-axis accelerometer, temperature and electro-dermal response signals sampled from 4 to 64 Hz, and a reference heartrate data obtained from an ECG sensor with sampling around 2 Hz. The original data was split into sequences of 1000 time steps (around 15 seconds), with a shift of 500 time steps, thus producing 16541 samples total. The dataset was split into 13233 training samples and 3308 test samples.
[0557] In some implementations, the input network included 2 Conv1D layers with 16 filters each, performing time series convolution, 2 LSTM layers of 16 neurons each, and 2 dense layers of 16 and 1 neurons. In some implementations, the network produces MSE error of less than 6 beats per minute over the test set.
[0558] In some implementations, the network is T-transformed with following parameters: delay blocks could produce delay by any number of time steps, maximum input and output connections per neuron=100, and signal limit of 5. In some implementations, the resulting T-network had following properties: 15 layers, approximately 700 neurons (e.g., 713 neurons), and approximately 12,000 connections (e.g., 12072 connections).
Example Processing PPG Data with T-Converted LSTM Network
[0559] As described above, for recurrent neurons, some implementations use signal delay block which is added to each recurrent connection of GRU and LSTM neurons. In some implementations, the delay block has an external cycle timer (e.g., a digital timer) which activates the delay block with a constant period of time dt. This activation produces an output of x(t-dt) where x(t) is input signal of delay block. Such activation frequency can, for instance, correspond to network input signal frequency (e.g., output frequency of analog sensors processed by a T-converted network). Typically, all delay blocks are activated simultaneously with the same activation signal. Some blocks can be activated simultaneously on one frequency, and other blocks can be activated on another frequency. In some implementations, these frequencies have common multiplier, and signals are synchronized. In some implementations, multiple delay blocks are used over one signal producing additive time shift. Examples of delay blocks are described above in reference to
[0560] The network for processing PPG data uses one or more LSTM neurons, according to some implementations. Examples of LSTM neuron implementations are described above in reference to
[0561] The network also uses Conv1D, a convolution performed over time coordinate. Examples of Conv1D implementations are described above in reference to
[0562] Details of PPG data are described herein, according to some implementations. PPG is an optically obtained plethysmogram that can be used to detect blood volume changes in the microvascular bed of tissue. A PPG is often obtained by using a pulse oximeter which illuminates the skin and measures changes in light absorption. PPG is often processed to determine heart rate in devices such as fitness trackers. Deriving heart rate (HR) from PPG signal is an essential task in edge devices computing.
[0563] Some implementations use PPG data from the Capnobase PPG dataset. The data contains raw PPG signal for 42 individuals of 8 min duration each, sampling 300 samples per second, and a reference heartrate data obtained from ECG sensor with sampling around 1 sample per second. For training and evaluation, some implementations split the original data into sequences of 6000 time steps, with a shift of 1000 time steps, thus getting a total set of 5838 samples total.
[0564] In some implementations, the input trained neural network NN-based allows for 1-3% accuracy in obtaining heartrate (HR) from PPG data.
[0565] This section describes a relatively simple neural network in order to demonstrate how T-conversion and analog processing can deal with this task. This description is provided as an example, according to some implementations.
[0566] In some implementations, dataset is split into 4,670 training samples and 1,168 test samples. The network included: 1 Conv1D layer with 16 filters and kernel of 20, 2 LSTM layers with 24 neurons each, 2 dense layers (with 24 and 1 neurons each). In some implementations, after training this network for 200 epochs, test accuracy was found to be 2.1%.
[0567] In some implementations, the input network was T-transformed with following parameters: delay block with periods of 1, 5 and 10 time steps, and the following properties: 17 layers, 15,448 connections, and 329 neurons (OP3 neurons and multiplier blocks, not counting delay blocks).
Example Analog Neuromorphic Integrated Circuit for Object Recognition Based on Pulsed Doppler Radar Signal
[0568] In some implementations, an Integration Circuit is manufactured, based on a multi-scale LSTM neural network, that can be used to classify the objects, based on pulse Doppler Radar signal. The IC can be used to classify different objects, like humans, cars, cyclists, scooters, based on Doppler radar signal, removes clutter, and provides the noise to Doppler radar signal. In some implementations, the accuracy of classification of object with multi-scale LSTM network exceeded 90%.
Example Analog Neuromorphic IC for Human Activity Type Recognition Based on Inertial Sensor Data
[0569] In some implementations, a neuromorphic Integrated Circuit is manufactured, and can be used for human activity type recognition based on multi-channel convolutional neural networks, which have input signals from 3-axes accelerometers and possibly magnetometers and/or gyroscopes of fitness tracking devices, smart watches or mobile phones. The multi-channel convolutional neural network can be used to distinguish between different types of human activities, such as walking, running, sitting, climbing stairs, exercising and can be used for activity tracking. The IC can be used for detection of abnormal patterns of human activity, based on accelerometer data, convolutionally merged with heart rate data. Such IC can detect pre-stroke or pre heart attack states or signal in case of sudden abnormal patterns, caused by injuries or malfunction due to medical reasons, like epilepsy and others, according to some implementations.
[0570] In some implementations, the IC is based on a channel-wise 1D convolutional network discussed in the article “Convolutional Neural Networks for Human Activity Recognition using Mobile Sensors.” In some implementations, this network accepts 3-axis accelerometer data as input, sampled at up to 96 Hz frequency. In some implementations, the network is trained on 3 different publicly available datasets, presenting such activities as “open then close the dishwasher”, “drink while standing”, “close left hand door”, “jogging”, “walking”, “ascending stairs,” etc. In some implementations, the network included 3 channel-wise Cony networks with Cony layer of 12 filters and kernel of 64, followed by MaxPooling (4) layer each, and 2 common Dense layers of 1024 and N neurons respectively, where N is a number of classes. In some implementations, the activity classification was performed with a low error rate (e.g., 3.12% error).
[0571] In some implementations, the network is T-transformed with following parameters: delay blocks could produce delay by any number of time steps, maximum input and output connections per neuron=100, an output layer of 10 neurons, and a signal limit of 5. In some implementations, the resulting T-network had following properties: 10 layers, approximately 1,200 neurons (e.g., 1296 neurons), and approximately 20,000 connections (e.g., 20022 connections).
Example Transformation of Modular Net Structure for Generating Libraries
[0572] A modular structure of converted neural networks is described herein, according to some implementations. Each module of a modular type neural network is obtained after transformation of (a whole or a part of) one or more trained neural network. In some implementations, the one or more trained neural networks is subdivided into parts, and then subsequently transformed into an equivalent analog network. Modular structure is typical for some of the currently used neural networks, and modular division of neural networks corresponds to a trend in neural network development. Each module can have an arbitrary number of inputs or connections of input neurons to output neurons of a connected module, and an arbitrary number of outputs connected to input layers of a subsequent module. In some implementations, a library of preliminary (or a seed list of) transformed modules is developed, including lithographic masks for manufacture of each module. A final chip design is obtained as a combination of (or by connecting) preliminary developed modules. Some implementations perform commutation between the modules. In some implementations, the neurons and connections within the module are translated into chip design using ready-made module design templates. This significantly simplifies the manufacture of the chip, accomplished by just connecting corresponding modules.
[0573] Some implementations generate libraries of ready-made T-converted neural networks and/or T-converted modules. For example, a layer of CNN network is a modular building block, LSTM chain is another building block, etc. Larger neural networks NNs also have modular structure (e.g., LSTM module and CNN module). In some implementations, libraries of neural networks are more than by-products of the example processes, and can be sold independently. For example, a third-party can manufacture a neural network starting with the analog circuits, schematics, or designs in the library (e.g., using CADENCE circuits, files and/or lithography masks). Some implementations generate T-converted neural networks (e.g., networks transformable to CADENCE or similar software) for typical neural networks, and the converted neural networks (or the associated information) are sold to a third-party. In some instances, a third-party chooses not to disclose structure and/or purpose of the initial neural network, but uses the conversion software (e.g., SDK described above) to converts the initial network into trapezia-like networks and passes the transformed networks to a manufacturer to the fabricate the transformed network, with a matrix of weights obtained using one of the processes described above, according to some implementations. As another example, where the library of ready-made networks are generated according to the processes described herein, corresponding lithographic masks are generated and a customer can train one of the available network architectures for his task, perform lossless transformation (sometimes called T transformation) and provide the weights to a manufacturer for fabricating a chip for the trained neural networks.
[0574] In some implementations, the modular structure concept is also used in the manufacture of multi-chip systems or the multi-level 3D chips, where each layer of the 3D chip represents one module. The connections of outputs of modules to the inputs of connected modules in case of 3D chips will be made by standard interconnects that provide ohmic contacts of different layers in multi-layer 3D chip systems. In some implementations, the analog outputs of certain modules is connected to analog inputs of connected modules through interlayer interconnects. In some implementations, the modular structure is used to make multi-chip processor systems as well. A distinctive feature of such multi-chip assemblies is the analog signal data lines between different chips. The analog commutation schemes, typical for compressing several analog signals into one data line and corresponding de-commutation of analog signals at receiver chip, is accomplished using standard schemes of analog signal commutation and de-commutation, developed in analog circuitry.
[0575] One main advantage of a chip manufactured according to the techniques described above, is that analog signal propagation can be broadened to multi-layer chips or multi-chip assemblies, where all signal interconnects and data lines transfer analog signals, without a need for analog-to-digital or digital-to-analog conversion. In this way, the analog signal transfer and processing can be extended to 3D multi-layer chips or multi-chip assemblies.
Example Methods for Generating Libraries for Hardware Realization of Neural Networks
[0576]
[0577] The method includes obtaining (3206) a plurality of neural network topologies (e.g., the topologies 224), each neural network topology corresponding to a respective neural network (e.g., a neural network 220).
[0578] The method also includes transforming (3208) each neural network topology (e.g., using the neural network transformation module 226) to a respective equivalent analog network of analog components.
[0579] Referring next to
[0580] Referring back to
[0581] Referring next to
[0582] Referring next to
[0583] Referring next to
Example Methods for Optimizing Energy Efficiency of Neuromorphic Analog Integrated Circuits
[0584]
[0585] The method includes obtaining (3306) an integrated circuit (e.g., the ICs 262) implementing an analog network (e.g., the transformed analog neural network 228) of analog components including a plurality of operational amplifiers and a plurality of resistors. The analog network represents a trained neural network (e.g., the neural networks 220), each operational amplifier represents a respective analog neuron, and each resistor represents a respective connection between a respective first analog neuron and a respective second analog neuron.
[0586] The method also includes generating (3308) inferences (e.g., using the inferencing module 266) using the integrated circuit for a plurality of test inputs, including simultaneously transferring signals from one layer to a subsequent layer of the analog network. In some implementations, the analog network has layered structure, with the signals simultaneously coming from previous layer to the next one. During inference process, the signals propagate through the circuit layer by layer; simulation at device level; time delays every minute.
[0587] The method also includes, while generating inferences using the integrated circuit, determining (3310) if a level of signal output of the plurality of operational amplifiers is equilibrated (e.g., using the signal monitoring module 268). Operational amplifiers go through a transient period (e.g., a period that lasts less than 1 millisecond from transient to plateau signal) after receiving inputs, after which the level of signal is equilibrated and does not change. In accordance with a determination that the level of signal output is equilibrated, the method also includes: (i) determining (3312) an active set of analog neurons of the analog network influencing signal formation for propagation of signals. The active set of neurons need not be part of a layer/layers. In other words, the determination step works regardless of whether the analog network includes layers of neurons; and (ii) turning off power (3314) (e.g., using the power optimization module 270) for one or more analog neurons of the analog network, distinct from the active set of analog neurons, for a predetermined period of time. For example, some implementations switch off power (e.g., using the power optimization module 270) of operational amplifiers which are in layers behind an active layer (to where signal propagated at the moment), and which do not influence the signal formation on the active layer. This can be calculated based on RC delays of signal propagation through the IC. So all the layers behind the operational one (or the active layer) are switched off to save power. So the propagation of signals through the chip is like surfing—the wave of signal formation propagate through chip, and all layers which are not influencing signal formation are switched off. In some implementations, for layer-by-layer networks, signal propagates layer to layer, and the method further includes decreasing power consumption before a layer corresponding to the active set of neurons because there is no need for amplification before the layer.
[0588] Referring next to
[0589] Referring next to
[0590] Referring next to
[0591] Referring next to
[0592] Referring next to
[0593] Referring next to
[0594] Referring next to
[0595] Referring next to
[0596] Referring next to
[0597] Some implementations include means for delaying and/or controlling signal propagation from layer to layer of the resulting hardware-implemented neural network.
Example Transformation of MobileNet v.1
[0598] An example transformation of MobileNet v.1 into an equivalent analog network is described herein, according to some implementations. In some implementations, single analog neurons are generated, then converted into SPICE schematics with a transformation of weights from MobileNet into resistor values. MobileNet v1 architecture is depicted in the Table shown in
[0599] In some implementations, the resulting transformed network included 30 layers including an input layer, approximately 104,000 analog neurons, and approximately 11 million connections. After transformation, the average output absolute error (calculated over 100 random samples) of transformed network versus MobileNet v.1 was 4.9e-8.
[0600] As every convolutional and other layers of MobileNet have an activation function ReLU6, the output signal on each layer of the transformed network is also limited by the value 6. As part of the transformation, the weights are brought into accordance with a resistor nominal set. Under each nominal set, different weight values are possible. Some implementations use resistor nominal sets e24, e48 and e96, within the range of [0.1-1] Mega Ohm. Given that the weight ranges for each layer vary, and for most layers weight values do not exceed 1-2, in order to achieve more weight accuracy, some implementations decrease R− and R+ values. In some implementations, the R− and R+ values are chosen separately for each layer from the set [0.05, 0.1, 0.2, 0.5, 1] Mega Ohm. In some implementations, for each layer, a value which delivers most weight accuracy is chosen. Then all the weights (including bias) in the transformed network are ‘quantized’, i.e., set to the closest value which can be achieved with used resistors. In some implementations, this reduced transformed network accuracy versus original MobileNet according to the Table shown below. The Table shows mean square error of transformed network, when using different resistor sets, according to some implementations.
TABLE-US-00002 Resistor set Mean Square Error E24 0.1-1 MΩ 0.01 E24 0.1-5 MΩ 0.004 E48 0.1-1 MΩ 0.007 E96 0.1-1 MΩ 0.003
Example Analog Hardware Realization of Trained Neural Networks for Voice Clarity
[0601] Some implementations provide a method for fabricating a neuromorphic Integrated Circuit for voice clarification, using techniques described above. Various types of trained neural networks can be used for this purpose. For example, a neural network can be trained to identify only one voice, suppressing and removing everything else. In particular, the neural network can identify the voice that is the closest to the microphone. As another example, a neural network can be trained to identify several voices, suppressing and removing everything else. Voices can be identified and preserved regardless of their distance from the microphone(s). Alternatively, voices can be prioritized by their distances from the microphone(s) and given different weights in the output signal, based on their respective distances form the microphone. As another alternative, voices can be identified and preserved regardless of their relative strength (e.g., volume). As yet another alternative, voices can be prioritized by their relative strength and be given different weights in the output signal, based on their respective relative strength. A neural network can process the signal that is originating from the microphone(s). Such a signal may include analog and/or digital signals. A neural network can process an analog and/or a digital signal that is transmitted over a transmission media and received by the neural network. Such a signal can be transmitted across wireless or digital/internet networks for the purposes of phone communication. Such a signal can also be input after pre- and post-processing of the original voice(s), either before the signal is ready to be transmitted, or after the signal has been transmitted and delivered to the recipient. As another example, a neural network can process a signal that is a mix of several voice signals, with associated noises. In particular, such a mix can be delivered to the recipient from several different sources. Such a signal can be pre- and post-processed by different methods for different components. As yet another example, a neural network can process the signal that is a mix of several external voice signals, with associated noises, combined with the own voice(s) on the recipient side. In particular, such a mix can be delivered to the recipient from several different sources, including the recipient's own voice overlapped with recipient's own noises. Such a signal can be pre- and/or post-processed by different methods for different components. The clarification of voice(s) can be performed for the combined signal. As another example, a neural network can process a signal that includes voice(s) from the recipient side. In particular, such a signal can be processed before it is transmitted to the other party. Such a signal can be processed by the neural network before it is pre- and/or post-processed by different methods prior to transmission.
Example Methods for Extracting Voice from Inbound or Outbound Analog Noisy Signal
[0602] Described herein are example techniques for the extraction of voice from a noisy signal, both inbound and outbound, where noise can be either stationary or non-stationary, using a neuromorphic analog Integrated Circuit. Such a circuit implements a noise suppression neural network at the hardware level. The circuit design of the analog neuromorphic Integrated Circuit is realized by converting (using techniques described above) a noise suppression (or voice extraction) neural network.
[0603] As described in the Background and Summary sections, the task of extracting the voice from noisy signal is of great importance for communication in smartphones, smartwatches, notebooks, or other voice transmitting devices. There are conventional realizations of noise cancellation or active noise suppression using dual microphone scheme, where the signal from one microphone is used to cancel noise at a main microphone. But these solutions do not cancel all noises, especially non-stationary ones, since not all noise is canceled in such combination of two microphones. There are also filters, which can filter out stationary noise from inbound or outbound analog signal. There are also software realizations of neural networks, which extract voice from a noisy signal by converting some part of the signal using Fourier transformation, thereby reducing components that are not similar to voice. These products are realized as software applications, which can be installed on smartphones or notebook computers, and can effectively suppress noise coming from a microphone. However, such applications require high computational power and consequently lead to higher power consumption. Also, such applications require powerful processors, which cannot be installed in earbuds or other miniature devices.
[0604] Described herein are techniques for voice extraction using a specially designed Integrated Circuit, realized from a trained neural network. The Integrated Circuit is realized as a hardware solution and is represented by a set of operational amplifiers and resistors, connected in such a way that the resulting neuromorphic hardware chip operates similarly to the initial neural network (e.g., the neural network realized in software), with the absolute error not exceeding a maximum threshold percentage (e.g., 1% absolute) from the error corresponding to the software neural network. The schematics of the Integration Circuit are obtained using techniques described above, thus ensuring full equivalency of analog neuromorphic hardware realization of the neural network and its initial software neural network model. The analog Integrated Circuit may be used for voice extraction from noisy analog inbound or outbound signals, with low latency and low power consumption.
[0605] In some implementations, the hardware realization of a voice extraction neural network can be used to process both inbound and outbound noisy signals. In some implementations, the Integrated Circuit has direct analog input and is placed adjacent to a microphone or a speaker of a smartphone, smartwatch, earbuds, notebook computer, or similar device. The Integrated Circuit provides telecommunication voice transfer, extracting voice from noisy analog signals. Such a solution suppresses both stationary and non-stationary noise from inbound or outbound analog signals (e.g., signals from a microphone or signals directed to a speaker or earbuds) and is characterized by excellent noise suppression, unlike conventional methods.
[0606] The resulting hardware realization of a voice extraction algorithm is characterized by low power operation, small latency, and small die area, which makes analog hardware realization an advantageous solution for noise reduction in smartphones, earbuds, notebook computers, tablets, or other voice transmitting devices, in comparison with software neural network voice extraction algorithms. The small die area makes it possible to include the Integrated Circuit application in true wireless (TWS) earbuds or other miniature devices. Such analog Integrated Circuits may also be used for two-way voice extraction (noise reduction) in Notebook PCs or Smartphones, where a neuromorphic analog integration circuit is installed both at the analog output of the microphone and at the analog input of the speaker or earbuds.
Example Neuromorphic Analog Integrated Circuit for Voice Clarity
[0607] Some implementations obtain a convolutional neural network with 1D convolutions (e.g., as described in “Single Channel Speech Enhancement Using A Convolutional Neural Network,” by T. Kounovsky and J. Malek, 2017), an example of which is shown in
[0608] The network architecture shown in
Example Transformation of Fully Connected (Dense) Layers
[0609]
Example Transformation of Conv1D Layers
[0610]
Example Transformation of MaxPooling Layers
[0611]
Human Activity Recognition Using Analog Neuromorphic Computing Hardware
[0617] Some implementations include two neural networks in a pipeline. Details on data, model, training, validation main results, for the two neural networks, are described herein. In some implementations, a dataset contains labeled experiments with different types of activity (e.g., more than 50 types of activity), which correspond to one of several basic activity classes. For example, the basic activity classes include ‘SLS’ that indicates a resting state, ‘Walk’ that indicates walking, including on a treadmill, ‘Run’ that indicates running, including on a treadmill, and ‘Intensive’ that indicates various exercises from crossfit workouts, characterized by high load. The two neural networks accomplish two tasks: creation of a descriptor (model converts a time series into a vector, embedding) and classification of base classes of activities (model matches the time series with one of four base classes). The task of creating a descriptor includes training and testing a model (sometimes called a descriptor) that receives data with fixed-length (window) from one or more sensors (e.g., an accelerometer) as input and outputs a vector. Embeddings of windows with the same activity class are likely to be close and can be easily separated and a simple classifier can be built on top of this model. The classification task includes training and testing the classifier for the base classes of activities.
[0618] A dataset may correspond to a set of experiments. In some implementations, the dataset includes input for one or more users. For example, one person could participate in several experiments. In some implementations, each person may participate in only one experiment. Each experiment contains information, such as the values of the signal from an accelerometer and activity marks. After processing the dataset, a sequence of windows (data sequences of a fixed length) is obtained. In some implementations, the sequence of windows contains data from 3 channels from the accelerometer, as well as an activity label for each window (or a mode of activity for each window). Training a descriptor model may include two stages: training a neural network model for the descriptor and examining the quality of resulting embeddings. For the first stage, the dataset is divided into two parts: training and validation. The first part is used to train the descriptor, and the second part is used to analyze the quality of these models, according to some implementations. For example,
[0619] In some implementations, special types of activities are selected. These are activities, windows for which are excluded from the training dataset and included in the validation process. To study the quality of the descriptor, five special activities, namely crossfit_airbike, crossfit_row, crossfit_ski, crossfit_situp, walk_treadmill, were selected, and patterns in the accelerometer data included data for these activities. Based on the selected special classes, nine experiments with the largest number of windows with these activities were selected. This resulted in a validation set that contained each specific activity class in at least two experiments. In the course of the experiments, a window length of 125 and a step of 25 were chosen at a sampling frequency of 25 Hz.
Example Dataset for Descriptor Validation
[0620] The descriptor was trained on the previously selected training experiments, but windows for special classes were removed. The validation process includes division of the validation dataset into training or testing samples. Training samples are used to train classifiers (e.g., KNN), and testing samples are used to test the classifiers. To maintain the balance of classes between training and testing, samples are divided as follows.
Example Dataset for Base Class Classifier Validation
[0621] In addition to testing descriptors, the quality of the basic activity classification model is also tested. For testing the classifiers, same experiments may be used as for training the descriptor. However, these experiments are no longer labeled into types of activities, but labeled into basic classes of activities. The dataset contains four basic activity classes: SLS, Walk, Run, Intensive. Thus, all previously selected 9 experiments participate in the study of the quality of the classifier of basic classes. However, for a greater variety of activity types in the validation set, two more experiments are added with transport activities, including transport subway, transport escalator, transport bus, transport tram. As a result, the training dataset consists of 99 experiments (from 64 users) with lengths from 37 minutes to 122 minutes, (in total there are about 338,000 windows), and the validation dataset consists of 11 experiments (from 11 users) with lengths from 38 minutes to 89 minutes (in total there are about 43,000 windows).
Example Model
[0622] In some implementations, the model consists of three parts: an encoder, a decoder, and a classifier head. The process of training the model is divided into two stages: training the encoder and decoder (sometimes called an autoencoder) and training the classifier. First, the autoencoder is trained, then the autoencoder is frozen and the classifier is trained, which receives embedding from the encoder as input, according to some implementations.
Example Descriptor Model
[0623] As a first step, the autoencoder is trained. The autoencoder includes an encoder (e.g., the encoder 4200) and a decoder (e.g., the decoder 4222). In some implementations, the autoencoder includes one-dimensional convolutions (Conv1d in the encoder and ConvTranspose1d in the decoder), and after each convolutional layer, BatchNorm is used. An example kernel size in Conv1d and ConvTranspose1d layers is 5. In some implementations, the first 4 layers of convolutional layers have stride 1, the remaining layers have stride 2. After each layer, ReLu is used as the activation function. In some implementations, the output of the encoder is a vector of length 16. The result of training this model is an autoencoder that can be used as a descriptor. In some implementations, a window of signals from the accelerometer is input to the descriptor, and at the output, an embedding (a vector of fixed length) is formed for the window, which is characterized by the activity that was performed by the user (similar activities will have close vectors in the resulting vector space). In some implementations, each embedding is a string of 16 bytes. Each of these 16 bytes encodes a feature of a user's movement. In other words, the 16 bytes together represent a specific movement, and represent a digital fingerprint of the movement. To illustrate, suppose an example embedding includes the following values: −2.3534, −28.4428, 10.2809, −10.5756, −1.0527, −7.4458, 1.3814, −1.9068, 10.5118, 1.2902, 26.5022, 1.9261, −2.9055, 3.9552, −0.3831, and 13.4464. Each value represents a feature for a specific movement.
[0624] In some implementations, as shown in the example shown in
[0625] In some implementations, as shown in the example shown in
Example Base Class Classifier Model
[0626] Next, the classifier of the base classes is trained. The encoder trained in the first step (described above) forms an embedding, which is input to a multilayer perceptron, according to some implementations. Some implementations use linear layers with ReLU at the output. In some implementations, the model outputs a vector of length 4 (e.g., a classification into 4 basic activity classes). If the window was marked as unknown activity type, then it received the label 1 and did not participate in training the model and calculating metrics. This is due to the fact that it cannot be said unambiguously which base class of activity took place on a given window.
[0627] In some implementations, as shown in the example shown in
Example Training
[0628] In some experiments, the training process of the model was carried out on a V100 video card, the consumption of RAM during the training process reached 40 GB, the average time of one epoch was 2 minutes 15 seconds. The training was split in two stages: training the autoencoder (for descriptor task) and training the classifier of the basic classes. Training process consisted of 40 epochs for training for the descriptor and 40 epochs for training the classifier, learning rate was 0.001, and for descriptor learning rate scheduler was used (every 10 epochs learning rate was decreased by 2 times). The main metric used to assess the quality of models is F2 score. This metric can be calculated using Equation (1) shown below. In Equation (1), P indicates Precision, and R indicates Recall.
f2_score(P,R)=5*P*R/(4*P+R) (1)
[0629] For descriptor model, some implementations use MSE loss (an example of which is shown in Equation (2) below) as a loss function. In Equation (2), T is true vector, P is predicted vector, and N is vector length.
[0630] For base class classifier model, some implementations use CrossEntropyLoss (an example of which is shown in Equation (3) below) as a loss function. In this formula, P is the predicted vector of length C, C is a number of classes, and class is an index of truth class.
Example Results
Descriptor Model
[0631] Using a model trained on a training sample, embeddings were obtained for all windows of a valid dataset sample. Further, the embeddings were again divided into training or testing using the algorithm described above for training and testing a simple classifier of activity types. KNN (K-Nearest Neighbors) was chosen as the classifier with the number of neighbors 5. This classifier was trained separately for each special type of activity—a binary classification was performed. The predictions from the binary classifier are smoothed using a median filter with a window of 15 predictions (equivalent to 15 seconds). The values of the metrics on the training and testing samples of this classifier are presented below in Tables 1 and 2, respectively.
TABLE-US-00003 TABLE 1 Training scores for KNN model, trained on embeddings Accuracy F1 F2 Precision Recall crossfit_airbike 0.984 0.922 0.903 0.961 0.892 crossfit_row 0.996 0.943 0.922 0.983 0.909 crossfit_situp 0.997 0.947 0.918 1.000 0.901 crossfit_ski 0.993 0.868 0.838 0.945 0.822 walk_treadmill 0.997 0.989 0.996 0.979 1.000
TABLE-US-00004 TABLE 2 Test scores for KNN model, trained on embeddings Accuracy F1 F2 Precision Recall crossfit_airbike 0.984 0.895 0.935 0.839 0.965 crossfit_row 0.987 0.876 0.900 0.849 0.921 crossfit_situp 0.991 0.789 0.727 0.987 0.696 crossfit_ski 0.989 0.878 0.862 0.908 0.852 walk_treadmill 0.989 0.928 0.907 0.963 0.894
[0632]
[0633]
[0634]
Example Results for Base Class Classifier Model
[0635] In some experiments, at the end of training, the loss value on the training dataset stopped at level 0.6993, and on the validation dataset at level 0.7253.
TABLE-US-00005 TABLE 3 Mean base class metrics across valid experiments Accuracy F1 F2 Precision Recall SLS 0.701 0.638 0.676 0.656 0.744 Walk 0.794 0.406 0.478 0.394 0.606 Run 0.998 0.982 0.988 0.972 0.993 Intensive 0.733 0.692 0.604 0.936 0.557
[0636]
[0637]
[0638]
[0639]
[0640]
Example Human Activity Recognition Device
[0641]
[0642] In some implementations, the human activity recognition device 5000 further includes the one or more sensors (e.g., the internal sensors 5008) configured to collect the plurality of electrical signals during the human activity.
[0643] In some implementations, the trained neural network model (implemented by the integrated circuit 5002) is an autoencoder that includes an encoder (e.g., the encoder 4200) and a decoder (e.g., the decoder 4222).
[0644] In some implementations, the one or more digital components 5004 implement a trained machine learning classifier that is a KNN (K-Nearest Neighbors) classifier (e.g., the classifier 4244) which can be retrained. In some implementations, wherein a number of neighbors for the KNN classifier equals five. In some implementations, the trained machine learning classifier is trained separately for each of the plurality of predefined human activities using binary classification.
[0645] In some implementations, the one or more digital components 5004 are further configured to smooth the output of the trained machine learning classifier to obtain a basic class of activity.
[0646] In some implementations, the one or more sensors include one or more of IMUs, cameras, microphones, and biofeedback devices.
[0647] In some implementations, the integrated circuit 5002 is fabricated by steps comprising: obtaining a neural network topology and weights of the trained neural network model; transforming the neural network topology into an equivalent analog network of analog components; computing a weight matrix for the equivalent analog network based on the weights of the trained neural network model. Each element of the weight matrix represents one or more connections between analog components of the equivalent analog network; generating a schematic model for implementing the equivalent analog network based on the weight matrix, including selecting component values for the analog components; and fabricating the integrated circuit, according to the schematic model, using a lithographic process. Examples of related neural network topologies, trained neural networks, analog networks, conversion methods, and fabrication steps, are shown and described above in reference to
Example Methods for Recognizing Human Activities
[0648]
[0649] The method also includes forming (5108) a feature vector by extracting a plurality of features from the plurality of electrical signals. The features correspond to inputs for a neural network model trained to generate a plurality of descriptors for a plurality of predefined human activities. For example, raw electrical signals from the sensors may be input to a deserializer, and may be sampled (e.g., at 20-25 Hz), to extract features from the electrical signals. The processed signals may be scaled and/or shifted. An accelerometer generates digital signals, which are converted to analog signals before applying transformations. Some implementations select only a single axis or a pair of axes of acceleration data, when extracting features for forming the feature vectors. Some implementations normalize acceleration data, and/or create segments from the acceleration data (sample size may define the number of segments per feature vector). In some implementations, the feature vectors are formed from analog raw signals using a deserializer and sampling using a hold array, where the number of samples and holds equals the number of inputs of an analog neural network.
[0650] The method also includes applying (5110) an analog neurocomputing hardware device to the feature vector (in other words, inputting the feature vector to the analog neurocomputing hardware) to generate an embedding vector that specifies a descriptor. The analog neurocomputing hardware device implements the trained neural network model. In some implementations, the trained neural network model is an autoencoder, which includes an encoder and a decoder. An autoencoder is also suitable for keyword spotting, and the exact keywords can be defined after the network is trained. Pictures need 2D convolutions while sound can be done with 1D convolutions, which resemble the example autoencoder architecture described above in reference to
[0651] The method also includes applying (5112) a trained machine learning classifier to the embedding vector to classify the activity of the user as one of the predefined human activities. In some implementations, the trained machine learning classifier is a KNN (K-Nearest Neighbors) classifier. In some implementations, the number of neighbors for the KNN classifier equals five. In some implementations, the trained machine learning classifier is trained separately for each of the predefined human activities using binary classification. Some implementations also smooth the output of the trained machine learning classifier to obtain a basic class of activity. Some implementations perform post-processing of the output of the human activity recognition device in an I/O device 5016 which may be a consumer device, such as a smart watch. Smoothing the activity labels may include using average(X) predictions within some time range instead of taking just one prediction during each time step, or using some other more complicated formula. Some implementations perform prediction every second. A user may not need to know the activity every second. For example, the user may want to know that information during a specific time of a day or in certain time periods (e.g., from 1:00 till 1:45, when the user is typically running). Smoothing may be used for that purpose, according to some implementations.
[0652] In some implementations, the analog neurocomputing hardware device is fabricated by steps including: obtaining a neural network topology and weights of the trained neural network model; transforming the neural network topology into an equivalent analog network of analog components; computing a weight matrix for the equivalent analog network based on the weights of the trained neural network model (each element of the weight matrix represents one or more connections between analog components of the equivalent analog network); generating a schematic model for implementing the equivalent analog network based on the weight matrix, including selecting component values for the analog components; and fabricating an integrated circuit, according to the schematic model, using a lithographic process. Examples of these steps are described above in reference to
[0653]
[0654] The method also includes using (5212) the plurality of embedding vectors to classify the activity of the user as one of the predefined human activities. In some implementations, the method further includes: receiving (5214), from the user, a set of descriptors that describes specific physical activities; and using the set of descriptors and the plurality of embedding vectors to classify the activity of the user as one of the specific physical activities. In some implementations, the method further includes generating statistics of personal daily routines of the user based on classifying the activity of the user as one of the specific physical activities.
[0655] In some implementations, the method further includes: storing (5216), for the user, the plurality of embedding vectors as describing a specific activity; and using the plurality of embedding vectors for classifying subsequent activities of the user as the specific activity.
[0656] In some implementations, the method further includes: receiving (5218), from a trainer (not the user), a set of descriptors that describes a specific activity; and providing feedback to the user if the activity matches the specific activity based on the plurality of embedding vectors and the set of descriptors.
Neuromorphic Analog Signal Processor for Predictive Maintenance of Machines
[0657] Sensors that monitor rotation or reciprocating parts of machines typically generate signals in the range of 0 to 20 kHz. A wideband connection to a control unit is necessary. This makes the sensors consume a lot of power, making it almost impossible to use batteries. An apparatus based on analog neuromorphic hardware is suitable for this application. The apparatus provides embeddings (descriptors) from the analog signal, which reduces the signal bandwidth substantially. Embeddings fully characterize the rotational or reciprocating movements of machine parts and distinguish between normal movements and abnormalities.
[0658] Systems and devices according to the techniques described herein may be used to receive raw data signals and add intelligence to various sensors. In some implementations, an architecture contains artificial neurons (the nodes performing computations) and axons (the connections with weights between the nodes) implemented using circuitry elements. The neurons may be implemented as operational amplifiers, and axons may be implemented using thin-film resistors. Some implementations embody the approach of a sparse neural network with only the necessary connections between neurons required for inference. In contrast to in-memory designs, where each neuron is connected to each neighboring neuron, the devices according to the techniques described here simplify chip layout. The techniques work well for Convolutional Neural Networks (CNN), where connections are very sparse, as well as Recurrent Neural Networks (RNNs), transformers, and autoencoders. Techniques for converting a trained and optimized neural network model into a chip structure are described above. The chip architecture offers area utilization close to 100% and may use 8 bits per weight of a trained neural network. Such techniques enable faster time to market, lower technical risks, and better performance. Furthermore, processors according to the techniques described herein may use hybrid cores (using analog and digital hardware) similar to human brain data processing.
[0659] Some implementations combine a fixed weights method, which includes complete separation between inference and training, with a fixed chip structure, similar to the human visual nerve and retina, and a flexible function, which differs depending on the application. The flexible function is responsible for further classification of the received embeddings.
[0660] In machine learning, it is well-known that, after several hundred training cycles (also known as epochs), a deep convolutional neural network maintains fixed weights and structure for the first 80-90% of the layers, and in the following cycles only the few last layers responsible for classification continue to change weights. This property is also used in transfer learning techniques. This property is used for implementing hybrid analog/digital hardware. In this hybrid approach, a fixed neural network is responsible for pattern detection (e.g., for generating embeddings), and it is combined with a flexible algorithm responsible for the pattern interpretation. The flexible algorithm can be implemented in digital or analog hardware, and may include additional flexible neural networks. In some implementations, the hybrid core includes (i) a fixed neuromorphic analog core that has ultra-low power consumption and low latency, for generating embeddings, and (ii) a flexible digital core for final classification.
[0661] Embeddings are representations containing densely packed information about sensory input, and are formed by a neural network or biological nervous system. Embeddings are formed in hidden layers of a neural network, and contain the most significant information about input data. Embeddings may be used as input data for further efficient processing, for data classification, and interpretation.
[0662] A classification task may be accomplished in two stages. In the first stage (i), a neural network G is applied to an input vector x from R.sup.N to generate an embedding v, where v is a vector with M dimensions and M<N. Applying the neural network G uses a set of trainable parameters WG. This can be expressed concisely as v=G(x, WG), where x is input data, xϵR.sup.N, G is a neural network for building embeddings, WG are trainable parameters of G, v is an embedding, vϵR.sup.M, R represents the set of real numbers, and N and M are positive integers.
[0663] In the second stage (ii), a classifier C is applied to the embedding v to compute an output result y, which has one or more components. Applying the classifier C uses trainable parameters WC. This can be expressed concisely as y=C(v, WC), where C is the final classifier, WC are trainable parameters of C, and y is a classification result for input vector x.
[0664] As described above, 80-90% of neural network weights do not change after several epochs of training a network and can be subjected to so-called transfer learning. The neural network weights can be a part of new neural network with fixed weights (weights corresponding to 80% of the original neural network). The rest of the neural network (e.g., 20% of the layers) can be trained separately and may be implemented using a flexible hardware system (without fixed weights). This combines fixed parts of a neural network, placed as resistors into a neuromorphic analog signal processor, with flexible parts of the neural network realized in digital circuitry (e.g., in the MCU of the device, at a RISC-V processor, at an FPGA, or at a CPU), which may be integrated (e.g., on the same chip as the neuromorphic analog signal processor). The flexible part of a network may be implemented by means of standard technologies and may be developed using programming languages such as Python, C/C++, Assembler and specialized frameworks (e.g., TensorFlow or Torch) and may run on conventional digital computing units, such as CPUs, GPUs, RISCs, FPGAs. or similar devices, depending on the target device and/or application. The digital computing units may also perform the role of a digital controller, provide signals to interfaces and multiplexing power signals within the neuromorphic analog signal processor. The flexible part may be implemented using compute-in-memory and programmable memory tiles (e.g., SST superflash memory), memristors or other types of programmable memory. The flexible part may also be implemented using a classification neural network or classification algorithms at a CPU of the system, using external computation capabilities, since classification is rather simple and not resource intensive. Since the fixed part is computationally intensive and the flexible part is not, the two parts of a neural network may be logically distributed.
[0665] As described above in the Background section, vibrational sensors are commonly used to measure vibrations in machinery, tracks, railway cars, wind turbines, and oil and gas pumps. The signals due to the vibrations may be transferred wirelessly to analytic equipment. Typically, the signals include a large amount of data. The data flow may shorten the battery life of operating sensor nodes. Analog neuromorphic hardware according to the disclosed technology helps reduce the data flow from vibration sensors (sometimes up to 99.9%).
[0666] Some implementations use an encoder-decoder approach, and transmit, via long-range technology (sometimes referred to as LoRa or similar low power technology), embeddings extracted from the initial data. The autoencoder systems and embeddings may create new classes, describing new signals of vibration sensors, even if they were not trained to recognize these types of signal patterns.
[0667] Analog neuromorphic hardware based on the disclosed technology may be used to implement an encoder neural network, which obtains a range of different vibration signals from various vibration sensors. The output of the analog hardware may be analyzed by a digital system to recognize machine malfunctions. Some implementations detect a predetermined range of frequencies (e.g., up to 20 kHz or up to 60 kHz) and may use an extended sampling rate (e.g., up to 41 kHz). This allows early failure prediction (e.g., predictions up to one month or more in advance) for rotating or moving parts of machinery. The use of the embeddings reduces the amount of data sent to a cloud infrastructure, and addresses the fundamental problem of low bandwidth required by internet-of-things (IoT) systems.
[0668] Compressed sensing is one of the technologies used for condition classification of rolling element bearings in rotating machines. This technology allows sampling below the Nyquist sampling rate. It is one of several methods for fault detection and classification in the compressed domain (i.e., without reconstructing the original signal). Compressed sensing may be used for intelligent condition monitoring for bearing faults from highly compressed measurements using sparse over-complete features. Compressed sensing may be used to produce highly compressed measurements of an original bearing dataset. Then a deep neural network (DNN) with an unsupervised feature learning algorithm based on a sparse autoencoder may be used for learning overcomplete sparse representations of these compressed datasets. Finally, the fault classification may be achieved using a softmax regression layer. An autoencoder may be viewed as a source of embeddings that are subsequently processed by a classifier (a softmax regression layer).
[0669] Compression of raw data received from sensors enables transmission of a smaller amount of data when compared to uncompressed signals. It is possible to transmit the lower amount of data using low-throughput lines. It is also possible to include redundancy or error-corrections over noisy lines. Additionally, the lower amount of data may be easily processed by a receiver using digital processing techniques. These techniques lead to energy savings, due to which it is possible to have longer battery life and/or include battery-less energy harvesting. Neural networks can be used to extract only relevant features of a large data set. In this way, neural networks may be used to implement a higher quality of signal pre-processing than conventional alternatives of algorithmic data compression.
[0670]
[0671] A ResNet Convolutional Neural Network (CNN) may be used for fault analysis, as shown in
[0672] To effectively detect, locate, and identify faults in rolling bearings, a stacked noise reduction autoencoder may be utilized for abstracting characteristics from the original vibration signals, and then, the characteristics may be provided as input for a backpropagation (BP) network classifier as described in Y. Gu, et al., A Denoising Autoencoder-Based Bearing Fault Diagnosis System for Time-Domain Vibration Signals, 2021. The results output by this classifier represent different fault categories.
[0673]
[0674] As described above, analog neuromorphic hardware may be used for predictive maintenance of rotating and reciprocating parts of machines. The hardware may be used to generate embeddings (descriptors) from an analog 20 kHz bandwidth signal, which may be packed into an embedding having a size of 1000 bits, and then transferred by a low bandwidth channel (e.g., LoRa or other LPWAN (Low-Power Wide-Area Network) technology).
[0675] Some implementations separately generate embeddings using analog neuromorphic hardware using fixed weights, according to the techniques described above. Subsequently, the signal obtained may be analyzed in a cloud or an edge cloud. The cloud's infrastructure may use a neural network (or layers of a neural network) that uses flexible weights (e.g., the neural network may be retrained). In this way, it is possible to apply fixed weights using an analog neuromorphic hardware for predictive maintenance tasks.
[0676] Some implementations train a neural network (or a portion thereof) so that the trained neural network uses analog neuromorphic hardware according to techniques described above, and generates descriptors for vibrational or reciprocating signals, which enables distinguishing abnormal operation of machines from normal operation.
[0677] Some implementations use a hybrid core including analog neuromorphic hardware according to techniques described above. The analog neuromorphic hardware implements a fixed part, in a chip that is close to the sensors. The chip can use battery power due to extremely low power consumption. Hardware and/or software for analysis may be implemented in the cloud after transmitting the signal output of the analog neuromorphic hardware through a low bandwidth wireless connection, like LoRa.
[0678] In some implementations, an apparatus or a system that includes the hybrid core may be used for control of bearings or any rotating parts. Bearings or other rotating parts (or any movable part) may generate vibrations caused by rotation during operation. The movable part may include any rotating or reciprocating part, a transverse moving part, feed mechanisms, or auxiliary parts of a machine. Periodicity in the vibrations may be used for predicting machine maintenance problems. Some implementations use a vibration wireless sensor node, which is a device that measures the amplitude and frequency of vibration in equipment. Such nodes may be installed on or near parts of a machine and may consist of a vibration sensor (with analog or digital output), an analog-digital converter (ADC), a communication module, and a battery. Such devices may work according to schedule, but may be time-limited because of high power consumption of a wide band communication module. The communication module may transmit a large amount of raw data to analytic equipment for analysis and interpretation. Instead of, or in addition to, such hardware, analog neuromorphic hardware or a hybrid core containing such analog hardware may be used as a part of a sensor node, which receives raw data coming from a sensor. Such hardware may be used to pre-process raw signals and to extract embeddings. Pre-processed data may be 99.9% less when compared to the original input data size. LPWAN technology like LoRa can be used instead of traditional wide band communication technology (for example Wi-Fi or Bluetooth). At the start of any deviation from a normal working mode, there are typically changes to the vibration pattern of a bearing or rotating part of the equipment. After collecting sufficient information, signals may be analyzed to determine the source of the vibration, make failure predictions, and suggest appropriate maintenance. Use of ultra-low power chips provides much longer life from a battery. In addition, energy harvesting technology may be used to collect power from the vibration source. In this way, the techniques described here may be used to collect and transmit vibration data much more frequently and it is possible to make earlier and more precise analytical predictions.
[0679] In some implementations, hybrid hardware may be used for the control of vibration in tires. A smart tire sensor node is a device that includes a vibration sensor (with analog or digital output), a neuromorphic analog signal processor, a LoRa communication module (or other LPWAN), and a battery with or without an energy harvesting module. This device may be disposed in or on a tire and may collect vibration data while a vehicle is moving. Initial data coming to the neuromorphic analog signal processor or chip may be pre-processed and/or transformed into embeddings. This reduces data volume by as much as 99.9%. The reduced data may be transmitted by an LPWAN module to a gateway and then to an application (e.g., an application hosted in a cloud) for further analytics and interpretation. The embeddings output by such hardware may be used to predict road surface and physical conditions, as well as monitor tire and suspension condition or a time-to-failure of the vibration source (or a corresponding machine).
[0680]
[0681] In some implementations, the hardware apparatus 5620 further includes a transceiver 5606 coupled to the analog circuit 5602 and configured to receive the output 5616 from the analog circuit 5602 and transmit the output 5616 over a low power wide area network (LPWAN) 5618.
[0682] In some implementations, the vibration sensor 5608 is disposed adjacent to a movable part (e.g., a rotating or reciprocating part) of a machine 5610 (sometimes called a vibration source), and the vibration sensor 5608 is configured to collect vibration signals 5612 from the rotating or reciprocating part. In some implementations, the rotating part includes a ball bearing of the machine.
[0683] In some implementations, the output 5616 of the analog circuit 5602 represents embeddings used for at least one of: defining a source of vibration, predicting failures of a machine coupled to the vibration source, or generating suggestions for maintenance of the machine.
[0684] In some implementations, the vibration sensor 5608 is disposed in or on a tire and is configured to collect vibration data for the tire, which is an example of a vibration source.
[0685] In some implementations, the output 5616 represents embeddings used to predict at least one of: a road surface, a physical condition, a tire condition, a suspension condition, or a time-to-failure of the vibration source.
[0686] In some implementations, the vibration sensor 5608 is configured to sample signals in a range of 0 to 20 kilohertz (kHz).
[0687] In some implementations, the vibration sensor 5608 is configured to sample signals up to 41 kHz.
[0688] In some implementations, the vibration sensor 5608 is configured to sample signals below a Nyquist sampling rate for fault detection and classification in a compressed domain.
[0689] In some implementations, the vibration sensor 5608 is configured to sample signals for compressed sensing (CS) for condition classification of rolling element bearings in rotating machines.
[0690] In some implementations, the trained neural network comprises a plurality of layers of neurons including a first set of layers (e.g., all the layers in the ResNet CNN shown in
[0691] In some implementations, the trained neural network includes a deep neural network for unsupervised learning based on a sparse autoencoder (e.g., the example sparse autoencoder shown and described above in reference to
[0692] In some implementations, the trained neural network comprises a ResNet Convolutional Neural Network (CNN) with global average pooling (GAP) for feature learning and fault diagnosis of rolling bearings.
[0693] In some implementations, the vibration sensor 5608 is configured to sample and output a one-dimensional time domain signal of a rolling bearing fault signal.
[0694] In some implementations, the trained neural network comprises a stacked noise reduction autoencoder.
[0695] In some implementations, the analog circuit 5602 is configured to be powered by vibrations of the vibration source 5610.
[0696] In some implementations, the hardware apparatus 5620 further includes a power harvesting circuit configured to harvest power from vibrations 5612 of the vibration source 5610 and supply power to the analog circuit 5602.
[0697] In some implementations, the plurality of operational amplifiers is configured to implement neurons of the portion of the trained neural network, and the plurality of resistors is configured to implement axons or connections between neurons of the portion of the trained neural network.
[0698] In some implementations, the analog circuit 5602 is configured to implement an optimized neural network corresponding to the trained neural network.
[0699] In some implementations, values of the plurality of resistors are based on weights of connections of the trained neural network.
[0700] In some implementations, the plurality of resistors is configured to connect the plurality of operational amplifiers.
[0701] In some implementations, the analog circuit 5602 comprises resistors in the backend-of-the-line (BEOL).
[0702] In some implementations, the trained neural network is an autoencoder comprising an encoder portion and a decoder portion. The encoder portion reconstructs an input vector at an output layer after nonlinear transformations performed by hidden layers. The analog circuit 5602 corresponds to the encoder portion of the autoencoder. The encoder portion comprises the hidden layers. The analog circuit 5602 is configured to compute a representation of the input vector in fewer dimensions than an input space of the input vector.
[0703] In some implementations, the analog circuit 5602 is configured to generate compressed data that encodes vibration sensor data based on vibration features from the vibration sensor 5608.
[0704] In another aspect, the system 5600 includes the hardware apparatus 5620, the transceiver 5606, and a digital circuit 5604. The hardware apparatus 5620 includes the vibration sensor 5608 configured to sense vibrations of the vibration source 5610 of a machine. The hardware apparatus 5620 also includes the analog circuit 5602 comprising a plurality of operational amplifiers and a plurality of resistors. The analog circuit 5602 is coupled to the vibration sensor 5608 and configured to receive an analog signal 5614 from the vibration sensor 5608 and compute an output 5616 based on the analog signal 5614, by performing a portion of a trained neural network. The transceiver 5606 is coupled to the analog circuit 5602 and configured to receive the output 5616 from the analog circuit 5602 and transmit the output 5616 over a low power wide area network (LPWAN) 5618. The analog signal is converted before transmitting. An analog-to-digital converter (ADC) may be included in the hardware apparatus 5620, or may be included as part of the transceiver 5606. The digital circuit 5604 is communicatively coupled to the transceiver 5606 of the hardware apparatus 5620 via the LPWAN. The digital circuit 5604 is configured to receive the output from the analog circuit 5602 (via the transceiver 5606) and predict the state 5624 of the machine for maintenance (e.g., by a device 5622 for maintaining the machine), based on the output.
[0705] In some implementations, the digital circuit 5604 includes one or more digital computing units, including one or more of: CPUs, GPUs, RISCs, FPGAs, and ASICs.
[0706] In some implementations, the digital circuit 5604 comprises a processor configured to perform data classification. In some implementations, the data classification is performed by a neural network that is distinct from the trained neural network. In some implementations, the data classification is performed using k-nearest neighbors (k-NN).
[0707] In some implementations, the output of the analog circuit 5602 represents embeddings and the digital circuit 5604 is configured to use the embeddings to classify the analog signal 5614.
[0708] In some implementations, a method is provided for vibration sensing. The method includes sensing vibrations of a vibration source using a vibration sensor (e.g., the vibration sensor 5608) to obtain an analog signal (e.g., the analog signal 5614). The method also includes computing an output (e.g., the output 5616) based on the analog signal, by performing a portion of a trained neural network, using an analog circuit (e.g., the analog circuit 5602) comprising a plurality of operational amplifiers and a plurality of resistors. The method also includes transmitting the output over a low power wide area network (LPWAN) 5618 using a transceiver 5606.
[0709] The terminology used in the description of the invention herein is for the purpose of describing particular implementations only and is not intended to be limiting of the invention. As used in the description of the invention and the appended claims, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof.
[0710] The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The implementations were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various implementations with various modifications as are suited to the particular use contemplated.