INDUSTRIAL DEVICE AND METHOD FOR BUILDING AND/OR PROCESSING A KNOWLEDGE GRAPH
20220229400 · 2022-07-21
Inventors
Cpc classification
G06F16/254
PHYSICS
G06N3/042
PHYSICS
G06N7/01
PHYSICS
G06N5/01
PHYSICS
G06N3/049
PHYSICS
International classification
Abstract
Provided is an industrial device for building and/or processing a knowledge graph, with at least one sensor and/or at least one data source configured for providing raw data, with an ETL component, configured for converting the raw data into triple statements, using mapping rules, with a triple store, storing the triple statements as a dynamically changing knowledge graph (with a learning component, configured for processing the triple statements in a learning mode, and for performing an inference in an inference mode, and with a control component, configured for switching between different modes of operation of the learning component.
Claims
1. An industrial device for building and/or processing a knowledge graph comprising: at least one sensor and/or at least one data source configured for providing raw data; an ETL component, configured for converting the raw data into triple statements, using mapping rules; a triple store, storing the triple statements as a dynamically changing knowledge graph; a learning component, configured for processing the triple statements in a learning mode, and for performing an inference in an inference mode; and a control component, configured for switching between different modes of operation of the learning component.
2. The industrial device according to claim 1, wherein the learning component and/or the control component implement a RESCAL algorithm, a TransE algorithm, a DistMult algorithm, or a Graph convolutional neural network.
3. The industrial device according to claim 1, wherein the industrial device is a field device, an edge device, a sensor device, an industrial controller, a PLC controller, an industrial PC implementing a SCADA system, a network hub, a network switch, an industrial ethernet switch, or an industrial gateway connecting an automation system to cloud computing resources.
4. The industrial device according to claim 1, wherein the control component is autonomous or processing external signals.
5. The industrial device according to claim 1, wherein the learning component is configured for calculating a likelihood of a triple statement during inference mode.
6. The industrial device according to claim 1, wherein the triple store also stores a pre-loaded static sub-graph.
7. The industrial device according to claim 1, further comprising a statement handler, configured for triggering an automated action based on the inference of the learning component.
8. The industrial device) according to claim 1, wherein the knowledge graph is an industrial knowledge graph describing parts of an industrial system; with nodes of the knowledge graph representing physical objects, sensors, industrial controllers, robots, drives, manufactured objects, tools and/or elements of a bill of materials; and with nodes of the knowledge graph representing abstract entities, attributes, configurations or skills of the physical objects, production schedules and plans, and/or sensor measurements.
9. The industrial device according to claim 1, wherein the learning component and/or the control component are implemented as neuromorphic hardware as an application specific integrated circuit, a field-programmable gate array, a wafer-scale integration, a hardware with mixed-mode VLSI neurons, or a neuromorphic processor, a neural processing unit or a mixed-signal neuromorphic processor.
10. The industrial device according to claim 1, wherein the learning component includes: an input layer containing node embedding populations of neurons, with each node embedding populations representing an entity contained in the triple statements; and an output layer, containing output neurons configured for representing a likelihood for each possible triple statement; and models a probabilistic, sampling-based model derived from an energy function, wherein the triple statements have minimal energy; and wherein the control component is configured for switching the learning component into a data-driven learning mode, configured for training the component with a maximum likelihood learning algorithm minimizing energy in the probabilistic, sampling-based model, using only the triple statements, which are assigned low energy values; into a sampling mode, in which the learning component supports generation of triple statements; and into a model-driven learning mode, configured for training the component with the maximum likelihood learning algorithm using only the generated triple statements, with the learning component learning to assign high energy values to the generated triple statements.
11. The industrial device according to claim 10, wherein the control component is configured to alternatingly: present inputs to the learning component by selectively activating subject and object populations among the node embedding populations; set hyperparameters of the learning component, in particular a factor (η) that modulates learning updates of the learning component; read output of the learning component; and use output of the learning component as feedback to the learning component.
12. The industrial device according to claim 10, wherein the output layer has one output neuron for each possible relation type of the knowledge graph.
13. The industrial device according to claim 12, wherein the output neurons are stochastic dendritic output neurons, storing embeddings of relations that are given between a subject and an object in the triple statements in their dendrites, summing all dendritic branches into a final score, which is transformed into a probability using an activation function.
14. The industrial device according to claim 13, wherein depending on the mode of the learning component, an output of the activation function is a prediction of the likelihood of a triple statement or a transition probability.
15. The industrial device according to claim 13, wherein learning updates for relation embeddings are computed directly in dendritic trees of the stochastic, dendritic output neurons.
16. The industrial device according to claim 10, wherein learning updates for entity embeddings are computed using static feedback connections from each output neuron to neurons of the node embedding populations.
17. The industrial device according to claim 10, wherein in the sampling mode, by sampling from the activation function, a binary output signals to the control component whether a triple statement is accepted.
18. The industrial device according to claim 10, wherein the learning component includes first neurons forming a first node embedding population, representing a first entity contained in the triple statements by first spike times of the first neurons during a recurring time interval; wherein the learning component includes second neurons forming a second node embedding population, representing a second entity contained in the triple statements by second spike times of the second neurons during the recurring time interval; and wherein a relation between the first entity and the second entity is represented as the differences between the first spike times and the second spike times.
19. The industrial device according to claim 18, wherein the differences between the first spike times and the second spike times consider an order of the first spike times in relation to the second spike times, or wherein the differences are absolute values.
20. The industrial device according to claim 18, wherein the relation is stored in one of the output neurons; and wherein the relation is given by vector components that are stored in dendrites of the output neuron.
21. The industrial device according to claim 18, wherein the first neurons are connected to a monitoring neuron; wherein each first neuron is connected to a corresponding parrot neuron; wherein the parrot neurons are connected to the output neurons; and wherein the parrot neurons are connected to an inhibiting neuron.
22. The industrial device according to claim 18, wherein the first neurons and the second neurons are spiking neurons, non-leaky integrate-and-fire neurons or current-based leaky integrate-and-fire neurons.
23. The industrial device according to claim 18, wherein each of the first neurons and second neurons only spikes once during the recurring time interval, or wherein only a first spike during the recurring time interval is counted.
24. The industrial device according to claim 10, wherein each node embedding population is connected to an inhibiting neuron, and therefore selectable by inhibition of the inhibiting neuron.
25. A method for building and/or processing a knowledge graph by an industrial device, the method comprising: providing, by at least one sensor and/or at least one data source raw data; converting, by an ETL component, the raw data into triple statements, using mapping rules; storing, by a triple store, the triple statements as a dynamically changing knowledge graph; processing, by a learning component, the triple statements in a learning mode; switching, by a control component, operation of the learning component from the learning mode to an inference mode; and performing, by the control component, an inference in the inference mode.
Description
BRIEF DESCRIPTION
[0065] Some of the embodiments will be described in detail, with reference to the following figures, wherein like designations denote like members, wherein:
[0066]
[0067]
[0068]
[0069]
[0070]
[0071]
[0072]
[0073]
[0074]
[0075]
[0076]
[0077]
[0078]
[0079]
[0080]
[0081]
[0082]
[0083]
[0084]
[0085]
[0086]
[0087]
[0088]
[0089]
DETAILED DESCRIPTION
[0090] In the following description, various aspects of embodiments of the present invention and embodiments thereof will be described. However, it will be understood by those skilled in the art that embodiments may be practiced with only some or all aspects thereof. For purposes of explanation, specific numbers and configurations are set forth in order to provide a thorough understanding. However, it will also be apparent to those skilled in the art that the embodiments may be practiced without these specific details.
[0091] In the following description, the terms “mode” and “phase” are used interchangeably. If a learning component runs in a first mode, then it also runs for the duration of a first phase, and vice versa. Also, the terms “triple” and “triple statement” will be used interchangeably.
[0092] Nickel, M., Tresp, V. & Kriegel, H.-P.: A three-way model for collective learning on multi-relational data, in Icml 11 (2011), pp. 809-816, disclose RESCAL, a widely used graph embedding algorithm. The entire contents of that document are incorporated herein by reference.
[0093] Yang, B., Yih, W.-t., He, X., Gao, J. and Deng, L.: Embedding entities and relations for learning and inference in knowledge bases, arXiv preprint arXiv: 1412.6575 (2014), disclose DistMult, which is an alternative to RESCAL. The entire contents of that document are incorporated herein by reference.
[0094] Bordes, A. et al.: Translating embeddings for modeling multi-relational data, in Advances in neural information processing systems (2013), pp. 2787-2795, disclose TransE, which is a translation based embedding method. The entire contents of that document are incorporated herein by reference.
[0095] Schlichtkrull, M., Kipf, T. N., Bloem, P., van den Berg, R., Titov, I. and Welling, M.: Modeling Relational Data with Graph Convolutional Networks, arXiv preprint arXiv:1703.06103 (2017), disclose Graph Convolutional Neural networks. The entire contents of that document are incorporated herein by reference.
[0096] Hopfield, J. J.: Neural networks and physical systems with emergent collective computational abilities, in Proceedings of the national academy of sciences 79, pp. 2554-2558 (1982), discloses energy-based models for computational neuroscience and artificial intelligence. The entire contents of that document are incorporated herein by reference.
[0097] Hinton, G. E., Sejnowski, T. J., et al.: Learning and relearning in Boltzmann machines, Parallel distributed processing: Explorations in the microstructure of cognition 1, 2 (1986), disclose
[0098] Boltzmann machines, which combine sampling with energy-based models, using wake-sleep learning. The entire contents of that document are incorporated herein by reference.
[0099] Mostafa, H.: Supervised learning based on temporal coding in spiking neural networks, in IEEE transactions on neural networks and learning systems 29.7 (2017), pp. 3227-3235, discloses the nLIF model, which is particularly relevant for the sections “Weight gradients” and “Regularization of weights” below. The entire contents of that document are incorporated herein by reference.
[0100] Comsa, I. M., et al.: Temporal coding in spiking neural networks with alpha synaptic function, arXiv preprint arXiv: 1907.13223 (2019), disclose an extension of the results of Mostafa (2017) for the current-based LIF model. The entire contents of that document are incorporated herein by reference.
[0101] Göltz, J., et al.: Fast and deep: Energy-efficient neuromorphic learning with first-spike times, arXiv: 1912.11443 (2020), also discloses an extension of the results of Mostafa (2017) for the current-based LIF model, allowing for broad applications in neuromorphics and more complex dynamics. The entire contents of that document are incorporated herein by reference.
[0102] The industrial device ED contains one or more sensors S or is connected to them. The industrial device can also be connected to one or more data sources DS or contain them. In other words, the data sources DS can also be local, for example containing or providing internal events in a PLC controller.
[0103] Examples of the industrial device are a field device, an edge device, a sensor device, an industrial controller, in particular a PLC controller, an industrial PC implementing a SCADA system, a network hub, a network switch, in particular an industrial ethernet switch, or an industrial gateway connecting an automation system to cloud computing resources.
[0104] The sensors S and data sources DS feed raw data RD into an ETL component ETLC of the industrial device ED. The task of the ETL component ETLC is to extract, transform and load (ETL) sensor data and other events observed at the industrial device ED and received as raw data RD into triple statements T according to a predefined vocabulary (a set of entities and relationships) externally deployed in the industrial device ED in the form of a set of mapping rules MR. The mapping rules MR can map local observations contained in the raw data RD such as sensor values, internal system states or external stimuli to the triples statements T, which are semantic triples in the form ‘s-p-o’ (entity s has relation p with entity o), for example RDF triples. Different alternatives for mapping the raw data RD to the triple statements T exist in the literature, e.g., R2RML for mapping between relational database data and RDF. In this case a similar format can be generated to map events contained in the raw data RD to the triple statements T. An alternative to R2RML is RML, an upcoming, more general standard that is not limited to relational databases or tabular data.
[0105] Examples for the triple statements T are [0106] “temperature_sensor has_reading elevated”, [0107] “ultrasonic sensor has_state positive”, [0108] “machine_operator sets_mode test”, or [0109] “applicationX reads_data variableY”,
which correspond to events such as [0110] a built-in temperature sensor as one of the sensors S showing a higher than usual reading, [0111] an ultrasonic sensor as one of the sensors S detecting an object, [0112] an operator setting the device in test mode, or [0113] an external application reading certain local variables.
[0114] The latter information may be available from events that are logged in an internal memory of the industrial device ED and fed into the raw data RD. The ETL component ETLC applies the mapping rules MR, converting specific sets of local readings contained in the raw data RD into the triple statements T.
[0115] The triple statements T are stored in an embedded triple store ETS, creating a dynamically changing knowledge graph. The embedded triple store ETS is a local database in a permanent storage of the industrial device ED (e.g., a SD card or hard disk).
[0116] Besides the previously described triple statements T, which are created locally and dynamically by the ETL component ETLC, and which can be termed observed triple statements, the embedded triple store ETS can contain a pre-loaded set of triple statements which constitute a static sub-graph SSG, i.e., a part of the knowledge graph which does not depend on the local observations contained in the raw data RD, i.e., is static in nature. The static sub-graph SSG can provide, for example, a self-description of the system (e.g., which sensors are available, which user-roles or applications can interact with it, etc). The triple statements of the static sub-graph SSG are also stored in the embedded triple store ETS. They can be linked to the observed data and provide additional context.
[0117] All triple statements stored in the embedded triple store ETS are provided to a learning component LC, the central element of the architecture. The learning component LC implements a machine learning algorithm such as the ones described below. The learning component LC can perform both learning as well as inference (predictions). It is controlled by a control component CC that can switch between different modes of operation of the learning component LC, either autonomously (e.g., periodically) or based on external stimuli (e.g., a specific system state, or an operator provided input).
[0118] One of the selected modes of operation of the learning component LC is a learning mode, where the triple statements T are provided to the learning component LC, which in response iteratively updates its internal state with learning updates LU according to a specific cost function as described below. A further mode of operation is inference mode, where the learning component LC makes predictions about the likelihood of unobserved triple statements. Inference mode can either be a free-running mode, whereby random triple statements are generated by the learning component LC based on the accumulated knowledge, or a targeted inference mode, where the control component CC specifically sets the learning component LC in such a way that the likelihood of specific triple statements is evaluated.
[0119] Finally, the industrial device ED can be programmed to take specific actions whenever the learning component LC predicts specific events with an inference IF. Programming of such actions is made via a set of handling rules HR that map specific triple statements to software routines to be executed. The handling rules HR are executed by a statement handler SH that receives the inference IF of the learning component LC.
[0120] For instance, in a link prediction setting, the inference IF could be a prediction of a certain triple statement, e.g., “system enters_state error”, by the learning component LC. This inference IF can trigger a routine that alerts a human operator or that initiates a controlled shutdown of the industrial device ED or a connected system. Other types of trigger are also possible, different than a link prediction. For instance, in an anomaly detection setting, a handler could be associated to the actual observation of a specific triple statement, whenever its predicted likelihood (inference IF) by the learning component LC is low, indicating that an unexpected event has occurred.
[0121] In a simple case, the handling rules HR can be hardcoded in the industrial device ED (e.g., a fire alarm that tries to predict the likelihood of a fire), but in a more general case can be programmed in a more complex device (e.g., a PLC controller as industrial device ED) from an external source, linking the predictions of the learning component LC to programmable software routines such as PLC function blocks.
[0122] Various learning algorithms and optimization functions are described in the following, which are suitable for implementing the learning component LC and/or control component CC. Some of these algorithms combine learning and inference in a seamless manner and are suitable for implementation in low-power, highly scalable processing units, e.g., neural network accelerators or neuromorphic processors such as spiking neural network systems.
[0123] The learning component LC (and the control component CC if it guides the learning process) can be implemented with any algorithm that can be trained on the basis of knowledge graphs. The embedded triple store ETS contains potentially multiple graphs derived from system observation (triple statements T generated by the ETL component ETLC, plus the pre-loaded set of triple statements which constitute the static sub-graph SSG). Separation into multiple graphs can be done on the basis of time (e.g., separating observations corresponding to specific time periods), or any other similar criteria, for example, in an industrial manufacturing system, separating the triple statements T into independent graphs can be performed depending on the type of action being carried out by the industrial manufacturing system, or the type of good being manufactured, when the triple statements T are observed.
[0124] The learning component LC (and the control component CC if it guides the learning process) can be implemented using either transductive algorithms, which are able to learn representations for a fixed graph, for example RESCAL, TransE, or DistMult, or inductive algorithms, which can learn filters that generalize across different graphs, for example Graph Convolutional Neural networks (Graph CNN). In the case of the former an individual model is trained for each graph (feeding triple statements T corresponding to each single graph to independent model instances) whereas in the case of the latter, a single model is trained based on all the graphs.
[0125] In either case, we can differentiate between a learning mode, where the triple statements T are presented to the learning component LC which learns a set of internal operations, parameters and coefficients required to solve a specific training objective, and an inference mode, where learning component LC evaluates the likelihood of newly observed or hypothetical triple statements on the basis of the learned parameters. The training objective defines a task that the learning algorithm implemented in the learning component LC tries to solve, adjusting the model parameters in the process. If the industrial device ED is an embedded device, then it is advantageous to perform this step in a semi-supervised or unsupervised manner, i.e., without explicitly providing ground truth labels (i.e., the solution to the problem). In the case of a graph algorithm, this can be accomplished for instance by using a link prediction task as the training objective. In this setting, the learning process is iteratively presented with batches containing samples from the observed triples, together with internally generated negative examples (non-observed semantic triples), with the objective of minimizing a loss function based on the selected examples, which will assign a lower loss when positive and negative examples are assigned high and low likelihood respectively by the algorithm, iteratively adjusting the model parameters accordingly.
[0126] The algorithm selected determines the specific internal operations and parameters as well as the specific loss/scoring function that guides the learning process, which can be implemented in a conventional CPU or DSP processing unit of the industrial device ED, or alternatively on specialized machine learning co-processors. For example, in the case of a RESCAL implementation a graph is initially converted to its adjacency form with which the RESCAL gradient descent optimization process is performed. The mathematical foundations of this approach will be explained in more detail in later embodiments. An alternative is provided by the scoring function of DistMult, which reduces the number of parameters by imposing additional constraints in the learned representations. A further alternative would be to use a translation based embedding method, such as TransE which uses the distance between object embedding and subject embedding translated by a vectorial representation of the predicate connecting them.
[0127] The previous examples can be considered as decoder based embedding methods. In the case of a Graph CNN based implementation, the algorithm to be trained consists of an encoder and a decoder. The encoder comprises multiple convolutional and dense filters which are applied to the observed graph provided in a tensor formulation, given by an adjacency matrix indicating existing edges between nodes, and a set of node features which typically correspond to literal values assigned to the corresponding node in the RDF representation in the embedded triple store ETS, to which a transformation can be optionally applied in advance (e.g. a clustering step if the literal is of numeric type, or a simple encoding into integer values if the literal is of categorical type). On the other hand, the decoder can be implemented by a DistMult or similar decoder network that performs link scoring from pairs of entity embeddings.
[0128] It should be noted that most of the score functions required by knowledge graph learning algorithms, in addition to tunable parameters which are optimized during learning, typically also contain a set of hyperparameters that control the learning process of the learning component LC itself, such as learning rates, batch sizes, iterations counts, aggregation schemes and other model hyperparameters present in the loss function. In the context of the present embodiment, these can be preconfigured within the control component CC and/or the learning component LC in the industrial device ED with known working values determined by offline experimentation. An alternative, performing a complete or partial hyperparameter search and tuning directly on the industrial device ED would also be possible, at the cost of potentially having to perform an increased number of learning steps, in order to locally evaluate the performance of the algorithms for different sets of hyperparameters on the basis of an additional set of triple statements reserved for this purpose.
[0129] To set up the industrial device ED, the mapping rules MR need to be defined and stored on the industrial device ED. The learning process can be controlled with external operator input into the control component CC and feedback, or be autonomous as described above.
[0130]
[0131] The learning component LC is composed of two parts: first, a pool of node embedding populations NEP of neurons N that represent embeddings of graph entities (i.e., the subjects and objects in the triple statements), and second, a population of stochastic, dendritic output neurons SDON that perform the calculations (scoring of triple statements, proposing of new triple statements). Similar to
[0132] Each entity in the graph is represented by one of the node embedding populations NEP, storing both its embeddings (real-valued entries) and accumulated gradient updates. The neurons N of each node embedding population NEP project statically one-to-one to dendritic compartments of the stochastic, dendritic output neurons SDON, where inputs are multiplied together with a third factor R, as shown in
[0133] In the example shown in
[0134]
[0135] Returning to
[0136] In general, the learning component LC can be operated in three modes or phases controlled by a single parameter η=[1,0,−1]: A data-driven learning mode (η=1) as shown in
[0137] An additional input ζ is used to explicitly control plasticity, i.e., how to clamp the stochastic, dendritic output neurons SDON, apply updates or clear (reset to 0) accumulated updates. Learning updates LU (as shown in
[0138] In other words,
[0139]
[0140] In the data-driven learning mode shown in
[0141] In the sampling mode shown in
[0142]
[0143]
[0144] In case of many entities, to reduce the amount of required wiring, a sparse connectivity can be used between the node embedding populations NEP and the stochastic, dendritic output neurons SDON. To realize the RESCAL score function, each node embedding population NEP has to be doubled (once for subjects and objects, as the scoring function is not symmetric). This way, each graph entity has now two embeddings (for subject and object, respectively), which can be synchronized again by including “subj_embedding isIdenticalTo obj_embedding” triple statements in the training data.
[0145] The learning component LC combines global parameters, feedback and local operations to realize distributed computing rendered controllable by a control component CC to allow seamless transition between inference and learning in the same system.
Tensor-Based Graph Embeddings
[0146] A widely used graph embedding algorithm is RESCAL. In RESCAL, a graph is represented as a tensor X.sub.s,p,o, where entries are 1 if a triple ‘s-p-o’ (entity s has relation p with entity o) occurs in the graph and 0 otherwise. This allows us to rephrase the goal of finding embeddings as a tensor factorization problem
with each graph entity s being represented by a vector e.sub.s and each relation p by a matrix R.sub.p. The problem of finding embeddings is then equivalent to minimizing the reconstruction loss
which can either be done using alternating least-square optimization or gradient-descent-based optimization. Usually, we are only aware of valid triples, and the validity of all other triples are unknown to us and cannot be modeled by setting the respective tensor entries to 0. However, only training on positive triples would result in trivial solutions that score all possible triples high. To avoid this, so-called ‘negative samples’ are generated from the training data by randomly exchanging either subject or object entity in a data triple, e.g., ‘s-p-o’ E D.fwdarw.‘a-p-o’ or ‘s-p-o’∈D.fwdarw.‘s-p-b’. During training, these negative samples are then presented as invalid triples with tensor entry 0. However, negative samples are not kept but newly generated for each parameter update.
Energy-Based Tensor Factorization
[0147] We propose a probabilistic model of graph embeddings based on an energy function that takes inspiration from the RESCAL scoring function. Energy-based models have a long history in computational neuroscience and artificial intelligence, and we use this as a vehicle to explore possible dynamic systems that are capable of implementing computations on multi-relational graph data.
Energy Function for Triples
[0148] Given a tensor X that represents a graph (or subgraph), we assign it the energy
where θ.sub.s,p,o is the RESCAL score function (Eq. (4)). From this, we define the probability of observing X
where we sum over all possible graph realizations X′. Here, the X.sub.s,p,o∈[0,1] are binary random variables indicating whether a triple exists, with the probability depending on the score of the triple. For instance, a triple (s, p, o) with positive score θ.sub.s,p,o is assigned a negative energy and hence a higher probability that X.sub.s,p,o,=1. This elevates RESCAL to a probabilistic model by assuming that the observed graph is merely a sample from an underlying probability distribution, i.e., it is a collection of random variables. Since triples are treated independently here, the probability can be rewritten as
where σ(.Math.) is the logistic function. Thus, the probability of a single triple (s,p,o) appearing is given by σ(θ.sub.s,p,o).
Maximum Likelihood Learning
[0149] The model is trained using maximum likelihood learning, i.e., node and edge embeddings are adjusted such that the likelihood (or log-likelihood) of observed triples is maximized
where D is a list of subgraphs (data graphs) available for learning. These update rules can be rewritten as
[0150] Relations learn to match the inner product of subject and object embeddings they occur with, while node embeddings learn to match the latent representation of their counterpart, e.g., e.sub.s learns to match the latent representation of the object R.sub.pe.sub.o if the triple ‘s-p-o’ is in the data. Both learning rules consist of two phases, a data-driven phase and a model-driven phase—similar to the wake-sleep algorithm used to train, e.g., Boltzmann machines. In contrast to the data-driven phase, during the model-driven phase, the likelihood of model-generated triples S is reduced. Thus, different from graph embedding algorithms like RESCAL, no negative samples are required to train the model.
Sampling for Triple-Generation
[0151] To generate triples from the model, we use Markov Chain Monte Carlo (MCMC) sampling—more precisely, the Metropolis-Hastings algorithm—with negative sampling as the proposal distribution. For instance, if the triple (s, p, o) is in the data set, we propose a new sample by randomly replacing either subject, predicate or object, and accepting the change with probability
T({s,p,o}.fwdarw.{s,p,q})=max[1,exp (e.sub.s.sup.TR.sub.p(e.sub.q−e.sub.o))] (13)
[0152] The transition probability directly depends on the distance between the embeddings, i.e., if the embeddings of nodes (or relations) are close to each other, a transition is more likely. This process can be repeated on the new sample to generate a chain of samples, exploring the neighborhood of the data triple under the model distribution. It can further be used to approximate conditional or marginal probabilities, e.g., by keeping the subject fixed and sampling over predicates and objects.
Network Implementation
[0153] The described learning rules and sampling dynamics suggest a neural network structure with specific connectivity and neuron types as shown in
with η∈[−1, 0, 1]. Through η, the stochastic, dendritic output neurons SDON can both return the probability σ(.Math.) of a triple statement to be true (η=0) and the transition probabilities T(.Math.) required for sampling (η=−1 or 1).
[0154]
[0155]
[0156]
[0157]
[0158] n is further used to gate between three different phases or modes for learning: the data-driven learning mode shown in
ΔR.sub.p∝η.Math.s(θ.sub.s,p,o)e.sub.s.sup.Te.sub.o (15.1)
Δe.sub.s∝η.Math.s(θ.sub.s,p,o)R.sub.pe.sub.o (15.2)
Δe.sub.o∝η.Math.s(θ.sub.s,p,o)e.sub.s.sup.TR.sub.p (15.3)
where updates are only applied when the stochastic, dendritic output neuron SDON ‘spiked’, i.e., sampling σ(θ.sub.s,p,o) returns s (θ.sub.s,p,o)=1.
[0159] In this architecture, the learning rule Eq. (11) takes the form of a contrastive Hebbian learning rule and Eq. (12) of a contrastive predictive learning rule. To update the embeddings of the node embedding populations NEP, feedback signals have to be sent from the stochastic, dendritic output neurons SDON to the neurons N—which can be done through a pre-wired feedback structure due to the simple and static forward connectivity, as shown in
[0160] Input is presented to the network by selecting the according node embedding populations NEP and stochastic, dendritic output neurons SDON, which can be achieved through inhibitory gating, resembling a ‘memory recall’ of learned concepts. Alternatively, the learned embeddings of concepts could also be interpreted as attractor states of a memory network. During the sampling phase, feedback from the stochastic, dendritic output neurons SDON (Eq. (13)) is used to decide whether the network switches to another memory (or attractor state).
[0161]
[0162] Both the forward inference path and the learning path only require spike times and utilize a biologically inspired neuron model found in the current generation of neuromorphic, spike-based processors, as will be described with more detail in later embodiments. Furthermore, similarly to the previous embodiments, static feedback connections between the node embedding populations NEP and the output neurons ON are utilized to transmit parameter updates. Different from the previous embodiments, no probabilistic sampling is performed by the system.
[0163]
[0164]
[0165] To select node embedding populations NEP, for example the two active node embedding populations NEP shown in
[0166]
[0167] For the following embodiments, numbering of the equations will begin new.
[0168] In the following, we explain our spike-based graph embedding model (SpikE) and derive the required learning rule.
[0169] Spike-based graph embeddings
[0170] From graphs to spikes:
[0171] Our model takes inspiration from TransE, a shallow graph embedding algorithm where node embeddings are represented as vectors and relations as vector translations (see Section “Translating Embeddings” for more details). In principle, we found that these vector representations can be mapped to spike times and translations into spike time differences, offering a natural transition from the graph domain to SNNs.
[0172] We propose that the embedding of a node s is given by single spike times of a first node embedding population NEP1 of size N,t.sub.s∈[t.sub.0, t.sub.max].sup.N as shown in
[0173] In other words,
[0174]
[0175] This coding scheme maps the rich semantic space of graphs into the spike domain, where the spike patterns of two populations encode how the represented entities relate to each other, but not only for one single relation p, but the whole set of relations spanning the semantic space. To achieve this, learned relations encompass a range of patterns from mere coincidence detection to complex spike time patterns. In fact, coding of relations as spike coincidence detection does naturally appear as a special case in our model when training SNNs on real data, see for instance
[0176] Formally, the ranking of triples can be written as
ϑ.sub.s,p,o=Σ||d(t.sub.s, t.sub.o)−r.sub.p|| (1)
where d is the distance between spike times and the sum is over vector components. In the remaining document, we call ϑ.sub.s,p,o the score of triple (s, p, o), where valid triples have a score close to 0 and invalid ones >>0. We define the distance function for SpikE to be
d.sub.A(t.sub.s,t.sub.o)=t.sub.s−t.sub.o (2)
where both the order and distance of spike times are used to encode relations. The distance function can be modified to only incorporate spike time differences,
d.sub.s(t.sub.s,t.sub.o)=||t.sub.s−t.sub.o|| (3)
such that there is no difference between subject and object populations. We call this version of the model Spike-S.
Network Implementation
[0177]
[0178] A suitable neuron model that suffices the requirements of the presented coding scheme, i.e., single-spike coding and being analytically treatable, is the nLIF neuron model. For similar reasons, it has recently been used in hierarchical networks utilizing spike-latency codes. For the neuron populations encoding entities (the node embedding populations), we use the nLIF model with an exponential synaptic kernel
where u.sub.s,i is the membrane potential of the ith neuron of population s, τ.sub.s the synaptic time constant and θ(.Math.) the Heaviside function. A spike is emitted when the membrane potential crosses a threshold value u.sub.th. W.sub.s,i,j are synaptic weights from a pre-synaptic neuron population, with every neuron j emitting a single spike at fixed time t.sub.j (
[0179] Eq. (4) can be solved analytically
which is later used to derive a learning rule for the embedding populations. For relations, we use output neurons ON. Each output neuron ON consists of a ‘dendritic tree’, where branch k evaluates the kth component of the spike pattern difference, i.e., ||d(t.sub.s, t.sub.o)−r.sub.p||.sub.k), and the tree structure subsequently sums over all contributions, giving ϑ.sub.s,p,o (
[0180] Different from ordinary feedforward or recurrent SNNs, the input is not given by a signal that first has to be translated into spike times and is then fed into the first layer (or specific input neurons) of the network. Instead, inputs to the network are observed triples ‘s-p-o’, i.e., statements that have been observed to be true. Since all possible entities are represented as neuron populations, the input simply gates which populations become active (
[0181] Learning Rules
[0182] To learn spike-based embeddings for entities and relations, we use a soft margin loss
where η.sub.s,p,o∈{−1} is a modulating teaching signal that establishes whether an observed triple ‘s-p-o’ is regarded as valid (η.sub.s,p,o=1) or invalid (η.sub.s,p,o,=−1). This is required to avoid collapse to zero-embeddings that simply score all possible triples with 0. In the graph embedding literature, invalid examples are generated by corrupting valid triples, i.e., given a training triple ‘s-p-o’, either s or o are randomly replaced—a procedure called ‘negative sampling’.
[0183] The learning rules are derived by minimizing the loss Eq. (6b) via gradient descent. In addition, we add a regularization term to the weight learning rule that counters silent neurons. The gradient for entities can be separated into a loss-dependent error and a neuron-model-specific term
while the gradient for relations only consists of the error
The error terms are given by (see section “Spike-based model”)
for SpikE and
[0184]
for SpikE-S, where σ(.Math.) is the logistic function.
[0185] The neuron-specific term can be evaluated using Eq. (5), resulting in (see section “Spike-based model”)
[0186] For relations, all quantities in the update rule are accessible in the output neuron ON. Apart from an output error, this is also true for the update rules of nLIF spike times. Specifically, the learning rules only depend on spike times—or rather spike time differences—pre-synaptic weights and neuron-specific constants, compatible with recently proposed learning rules for SNNs.
Experiments
[0187] Data:
[0188]
[0189] To evaluate the performance of the spike-based model, we generated graph data from an industrial automation system as shown in
[0190] For the following experiments, we use a recording from the industrial automation system with some default network and app activity, resulting in a knowledge graph KG with 3529 nodes, 11 node types, 2 applications, 21 IP addresses, 39 relations, 360 network events and 472 data access events. We randomly split the graph with a ratio of 8/2 into mutually exclusive training and test sets, resulting in 12399 training and 2463 test triples.
[0191]
[0192]
[0193]
[0194]
[0195]
[0196] We present a model for spike-based graph embeddings, where nodes and relations of a knowledge graph are mapped to spike times and spike time differences in a SNN, respectively. This allows a natural transition from symbolic elements in a graph to the temporal domain of SNNs, going beyond traditional data formats by enabling the encoding of complex structures into spikes. Representations are learned using gradient descent on an output cost function, which yields learning rules that depend on spike times and neuron-specific variables.
[0197] In our model, input gates which populations become active and consequently updated by plasticity. This memory mechanism allows the propagation of knowledge through all neuron populations—despite the input being isolated triple statements.
[0198] After training, the learned embeddings can be used to evaluate or predict arbitrary triples that are covered by the semantic space of the knowledge graph. Moreover, learned spike embeddings can be used as input to other SNNs, providing a native conversion of data into spike-based input.
[0199] The nLIF neuron model used in this embodiment is well suited to represent embeddings, but it comes with the drawback of a missing leak term, i.e., the neurons are modeled as integrators with infinite memory. This is critical for neuromorphic implementations, where—most often—variations of the nLIF model with leak are realized. Gradient-based optimization of current-based LIF neurons, i.e., nLIF with leak, can be used in alternative embodiments, making them applicable to energy-efficient neuromorphic implementations. Moreover, output neurons take a simple, but function-specific form that is different from ordinary nLIF neurons. Although realizable in neuromorphic devices, we believe that alternative forms are possible. For instance, each output neuron might be represented by a small forward network of spiking neurons, or relations could be represented by learnable delays.
[0200] Finally, the presented results bridge the areas of graph analytics and SNNs, promising exciting industrial applications of event-based neuromorphic devices, e.g., as energy efficient and flexible processing and learning units for online evaluation of industrial graph data.
METHODS
Translating Embeddings
[0201] In TransE, entities and relations are embedded as vectors in an N-dimensional vector space. If a triple ‘s-p-o’ is valid, then subject e.sub.s and object e.sub.o vectors are connected via the relation vector r.sub.p, i.e., relations represent translations between subjects and objects in the vector space
e.sub.s+r.sub.p≈e.sub.o (11)
[0202] In our experiments, similar to SpikE, we use a soft margin loss to learn the embeddings of TransE.
[0203] Spike-Based Model
[0204] Spike Time Gradients
[0205] The gradients for d.sub.s can be calculated as follows
[0206] All other gradients can be obtained similarly.
[0207] Weight gradients:
[0208] The spike times of nLIF neurons can be calculated analytically by setting the membrane potential equal to the spike threshold u.sub.th, i.e., u.sub.s,i(t*)u.sub.th:
[0209] In addition, for a neuron to spike, three additional conditions have to be met: [0210] the neuron has not spiked yet, [0211] the input is strong enough to push the membrane potential above threshold, i.e.,
the spike occurs before the next causal pre-synaptic spike t.sub.c
t*<t.sub.c (16)
[0212] From this, we can calculate the gradient
where we used that
[0213] Regularization of weights:
[0214] To ensure that all neurons in the embedding populations spike, we use the regularization term L.sub.δ
with w.sub.s,i=Σ.sub.jW.sub.s,ij.
Alternative Gating
[0215] As was shown in
Synchronizing Subject and Object Population
[0216] If an entity is represented by distinct subject s and object o populations, these representations will differ after training—although they represent the same entity. By adding triples of the form ‘s—#isIdenticalTo—o’ and keeping r.sub.isIdenticalTo=0, further alignment can be enforced that increases performance during training.
[0217] Although the present invention has been disclosed in the form of preferred embodiments and variations thereon, it will be understood that numerous additional modifications and variations could be made thereto without departing from the scope of the invention.
[0218] For the sake of clarity, it is to be understood that the use of “a” or “an” throughout this application does not exclude a plurality, and “comprising” does not exclude other steps or elements.