Deep neural network system for similarity-based graph representations

Abstract

There is described a neural network system implemented by one or more computers for determining graph similarity. The neural network system comprises one or more neural networks configured to process an input graph to generate a node state representation vector for each node of the input graph and an edge representation vector for each edge of the input graph; and process the node state representation vectors and the edge representation vectors to generate a vector representation of the input graph. The neural network system further comprises one or more processors configured to: receive a first graph; receive a second graph; generate a vector representation of the first graph; generate a vector representation of the second graph; determine a similarity score for the first graph and the second graph based upon the vector representations of the first graph and the second graph.

Claims

1. A neural network system implemented by one or more computers for determining graph similarity, the neural network system comprising: one or more neural networks configured to: process an input graph to generate a node state representation vector for each node of the input graph and an edge representation vector for each edge of the input graph; and process the node state representation vectors and the edge representation vectors to generate a vector representation of the input graph; one or more processors configured to: receive a first graph; receive a second graph; generate a vector representation of the first graph using the one or more neural networks; generate a vector representation of the second graph using the one or more neural networks; determine a similarity score for the first graph and the second graph based upon the vector representations of the first graph and the second graph, wherein the one or more neural networks are configured to process a pair of input graphs to generate a vector representation of each input graph of the pair of input graphs, the vector representation of each input graph being based upon both input graphs of the pair of graphs; and wherein the vector representation of the first graph and the vector representation of the second graph are generated based upon inputting the first graph and the second graph as a pair of input graphs to the one or more neural networks.

2. The system of claim 1, wherein the one or more neural networks comprises: an encoder neural network subsystem configured to process an input graph to generate the node state representation vector for each node of the input graph and the edge representation vector for each edge of the input graph; a propagation neural network subsystem configured to update the node state representation vector associated with a particular node of the input graph based upon the node state representation vectors associated with one or more adjacent nodes in the input graph and the edge representation vectors associated with the edges connecting the particular node to the one or more adjacent nodes in the input graph; an aggregator neural network subsystem configured to process the node state representation vectors associated with each node of the input graph to generate the vector representation of the input graph.

3. The system of claim 2, wherein the encoder neural network subsystem comprises: a node encoder neural network configured to generate the node state representation vector; and an edge encoder neural network configured to generate the edge representation vector.

4. The system of claim 3, wherein the node encoder neural network comprises a plurality of hidden layers and an output layer; and the node state representation vector comprises a plurality of node state representation vectors for each node of the input graph comprising a vector corresponding to each of the plurality of hidden layers and the output layer.

5. The system of claim 2, wherein the propagation neural network subsystem comprises: a message generation neural network configured to process the node state representation vector associated with a particular node and the node state representation vector associated with an adjacent node in the input graph and the edge representation vector associated with the edge connecting the particular node and the adjacent node to generate a message vector associated with the adjacent node in the input graph; and a node update neural network configured to generate an updated node state representation vector associated with the particular node based upon the current node state representation vector associated with the particular node and message vectors generated for each adjacent node adjacent the particular node in the input graph.

6. The system of claim 5, wherein the node update neural network is further configured to process a summation of the message vectors generated for each adjacent node adjacent the particular node in the input graph to generate the updated node state representation vector.

7. The system of claim 6, wherein the summation is a weighted sum or an attention-based weighted sum.

8. The system of claim 5, wherein the node update neural network is a recurrent type of neural network.

9. The system of claim 5, wherein the propagation neural network subsystem is further configured to generate a cross-graph matching vector based upon a similarity between a particular node of the input graph and one or more nodes of a second input graph; and wherein the node update neural network is further configured to generate an updated node state representation vector associated with the particular node based upon the current node state representation vector associated with the particular node, the message vectors generated for each adjacent node adjacent the particular node in the input graph, and the cross-graph matching vector.

10. The system of claim 9, wherein the similarity between a particular node of the input graph and one or more nodes of the second input graph is based upon a difference between the node state representation vector associated with the particular node of the input graph and a weighted sum of the node state representation vectors associated with each of the one or more nodes of the second input graph.

11. The system of claim 10, wherein the weighted sum is a weighted sum of all of the node state representation vectors associated with each of the nodes of the second input graph.

12. The system of claim 11, wherein the weight for each respective node of the one or more nodes of the second input graph is based upon a similarity score between the node state representation vector associated with the particular node of the input graph and the node state representation vector associated with the respective node of the second input graph.

13. The system of claim 2, wherein the node state representation vectors associated with each node of the input graph undergo a plurality of updates.

14. The system of claim 4, wherein the one or more neural networks comprises a propagation neural network subsystem for each of the plurality of hidden layers and the output layer of the node encoder neural network.

15. The system of claim 2, wherein the aggregator neural network subsystem comprises an aggregator neural network configured to process a node state representation vector to generate a transformed vector associated with the node; and wherein the vector representation of the input graph is based upon a summation of the transformed vectors associated with each node of the input graph.

16. The system of claim 15, wherein the summation of the transformed vectors is a weighted summation.

17. The system of claim 1, wherein the similarity score is based upon one of the following: a Euclidean distance, a cosine similarity or a Hamming distance.

18. The system of claim 1, wherein the one or more neural networks are trained based upon optimization of a loss function based upon a similarity between a pair of graphs.

19. The system of claim 1, wherein the one or more neural networks are trained based upon optimization of a loss function based upon a relative similarity between a triplet of graphs.

20. The system of claim 18, wherein the loss function is a margin-based loss function.

21. The system of claim 1, wherein the graph represents one of the following: a molecule, a computer program function, a parse tree, a computer network, a vehicular traffic network, and a knowledge graph.

22. A method of determining graph similarity, the method comprising: receiving, by one or more processors, a first graph; receiving, by the one or more processors, a second graph; generating, by one or more neural networks, a vector representation of the first graph; generating, by the one or more neural networks, a vector representation of the second graph, wherein generating, by the one or more neural networks, a vector representation of a graph comprises: processing, by the one or more neural networks, the graph to generate a node state representation vector for each node of the graph and an edge representation vector for each edge of the graph; and processing, by the one or more neural networks, the node state representation vectors and the edge representation vectors to generate the vector representation of the graph; determining, by the one or more processors, a similarity score based upon the vector representations of the first graph and the second graph, wherein the one or more neural networks are configured to process a pair of input graphs to generate a vector representation of each input graph of the pair of input graphs, the vector representation of each input graph being based upon both input graphs of the pair of graphs; and wherein the vector representation of the first graph and the vector representation of the second graph are generated based upon inputting the first graph and the second graph as a pair of input graphs to the one or more neural networks.

23. The system of claim 1, wherein the first graph is a first control flow graph associated with a first binary function having a known vulnerability; and wherein the second graph is a second control flow graph associated with a second binary function; and wherein the one or more processors are further configured to: determine that the second binary function is vulnerable if the determined similarity score exceeds a threshold similarity score.

24. The system of claim 23, wherein a node of a control flow graph represents a block of instructions and an edge of a control flow graph represents control flow between blocks of instructions.

25. The system of claim 23, wherein the neural network system is trained based upon a similarity metric such that control flow graphs associated with a binary function having the same source code have higher similarity than control flow graphs associated with a binary function having different source code.

26. The system of claim 24, wherein each node of the control flow graph is associated with a feature vector based upon instruction types associated with the block of instructions represented by the node.

27. The system of claim 26, wherein the one or more neural networks further comprises a neural network configured to generate the feature vector based upon the instruction types associated with the block of instructions represented by the node.

28. The system of claim 1 further comprising: a memory storing a database comprising a plurality of records, each record of the plurality of records being associated with a respective graph; and wherein the first graph is a query graph; and wherein the second graph is a respective graph associated with a respective record of the plurality of records; and wherein determining the similarity score for the first graph and the second graph comprises determining a similarity score between the query graph and each of the respective graphs associated with each record of the plurality of records based upon the vector representations of the query graph and the respective graphs; and wherein the one or more processors are further configured to: output data associated with one or more records based upon the determined similarity scores.

29. The system of claim 28, wherein the one or more neural networks are further configured to process a pair of input graphs to generate a vector representation of each input graph of the pair of input graphs, the vector representation of each input graph being based upon both input graphs of the pair of graphs; and wherein the one or more processors are further configured to: determine a set of candidate graphs based upon the determined similarity scores between the vector representations of the query graph and each respective graph associated with each of the plurality of records; determine a second vector representation for each query graph and candidate graph pair in the set of candidate graphs using the one or more neural networks; determine a similarity score for each query graph and candidate graph pair based upon the determined second vector representations; and outputting data associated with the record associated with a candidate graph based upon the determined similarity scores for each query graph and candidate graph pair.

30. One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to implement a neural network system for determining graph similarity, the neural network system comprising: one or more neural networks configured to: process an input graph to generate a node state representation vector for each node of the input graph and an edge representation vector for each edge of the input graph; and process the node state representation vectors and the edge representation vectors to generate a vector representation of the input graph; one or more processors configured to: receive a first graph; receive a second graph; generate a vector representation of the first graph using the one or more neural networks; generate a vector representation of the second graph using the one or more neural networks; determine a similarity score for the first graph and the second graph based upon the vector representations of the first graph and the second graph, wherein the one or more neural networks are configured to process a pair of input graphs to generate a vector representation of each input graph of the pair of input graphs, the vector representation of each input graph being based upon both input graphs of the pair of graphs; and wherein the vector representation of the first graph and the vector representation of the second graph are generated based upon inputting the first graph and the second graph as a pair of input graphs to the one or more neural networks.

Description

BRIEF DESCRIPTION OF THE DRAWINGS

(1) FIG. 1 shows an example neural network system for determining graph similarity.

(2) FIG. 2 shows exemplary neural network subsystems of the neural network system in more detail.

(3) FIG. 3 shows an exemplary configuration of the propagation neural network subsystem.

(4) FIG. 4 shows another exemplary configuration of the propagation neural network subsystem.

(5) FIGS. 5A and 5B show visualizations of attention weights for two pairs of exemplary graphs.

(6) FIG. 6 is a flow chart of an example process for determining graph similarity.

(7) FIG. 7 shows two exemplary control flow graphs.

DETAILED DESCRIPTION

(8) FIG. 1 shows an example neural network system 100 for determining graph similarity. The neural network system 100 is an example of a system implemented as computer programs on one or more computers in one or more locations, in which the systems, components, and techniques described below can be implemented. As such, the system comprises one or more processors.

(9) The one or more processors are configured to receive a first graph 101 and a second graph 102 of which it is desired to determine a similarity between. The one or more processors are further configured to generate a vector representation 103 of the first graph and a vector representation 104 of the second graph using one or more neural networks 105. The one or more processors are also configured to determine a similarity score 106 for the first graph 101 and the second graph 102 based upon the generated vector representations 103, 104 of the first and second graphs. The similarity score 106 may be determined using a similarity scorer subsystem 107. The computation of a similarity score 106 is described in more detail below.

(10) The system 100 also comprises the one or more neural networks 105 for generating a vector representation of an input graph. In this regard, the one or more neural networks 105 are configured to process an input graph to generate a node state representation vector for each node of the input graph and an edge representation vector for each edge of the input graph. The one or more neural networks 105 are further configured to process the node state representation vectors and the edge representation vectors to generate a vector representation of the input graph.

(11) It is also possible for the one or more neural networks 105 to process a pair of input graphs together to generate a vector representation of each input graph of the pair of input graphs. Thus, the vector representation of each input graph is based upon both input graphs of the pair of input graphs rather than processing each input graph separately and generating the vector representation of the input graphs independently.

(12) That is, the one or more neural networks 105 may be configured to generate the vector representation of the first graph 103 and the vector representation of the second graph 105 by processing the first graph 101 and the second graph 102 individually. Where a vector representation of an input graph is obtained by processing an individual input graph, this is referred to as a graph embedding model. Alternatively, the one or more neural networks 105 may be configured to generate the vector representation of the first graph 103 and the vector representation of the second graph 105 by processing the first graph 101 and the second graph 102 together. This case is referred to as a graph matching model. Both embedding models and matching models will be described in more detail below.

(13) Referring now to FIG. 2, in both graph embedding and graph matching models, the one or more neural networks 105 may comprise an encoder neural network subsystem 201, a propagation neural network subsystem 202 and an aggregator neural network subsystem 203.

(14) The encoder neural network subsystem 201 may be configured to process an input graph 204 to generate the node state representation vector 205 for each node of the input graph and the edge representation vector 206 for each edge of the input graph. For both the graph embedding and graph matching models, input graphs are typically processed individually by the encoder neural network subsystem 201 and thus the node state representation vectors 205 and the edge representation vectors 206 for an input graph 204 are generated based upon the input graph 204 independent of any second input graph. In the graph matching model, the first and second input graphs may be processed in parallel to generate the node state representation vectors and edge representation vectors for the first and second input graphs at the same time.

(15) The encoder neural network subsystem 201 may further comprise a node encoder neural network configured to generate the node state representation vector and an edge encoder neural network configured to generate the edge representation vector. For example, the node encoder neural network may be a multi-layer perceptron network and the node state representation vector for a node may be obtained as follows:
h.sub.i.sup.(0)=MLP.sub.node(x.sub.i),∀i∈V (1)
where V is the set of nodes of the graph, i is an index over the set of nodes, x.sub.i is a feature vector associated with node i, MLP.sub.node is the node encoder neural network and h.sub.i.sup.(0) is the node state representation vector for node i. Where nodes do not have associated feature vectors, x.sub.i may, for example, be initialized to a vector of 1s.

(16) In a similar manner, the edge encoder neural network may also be a multi-layer perceptron network and the edge representation vector for an edge may be obtained as follows:
e.sub.ij=MLP.sub.edge(x.sub.ij),∀(i,j)∈E (2)

(17) where E is the set of edges of the graph, (i,j) is an edge in the set connecting a node i and node j of the graph, x.sub.ij is feature vector associated with edge (i,j), MLP.sub.edge is the edge encoder neural network and e.sub.ij is the edge representation vector for edge (i,j). Where edges do not have associated feature vectors, x.sub.ij may, for example, be initialized to a vector of 1s.

(18) The node encoder neural network may comprise a plurality of hidden layers and an output layer. The node state representation vector may be a concatenation of the hidden layer activations and the output layer activations. As such, the node state representation vector may comprise a plurality of node state representation vectors for each node of the input graph and may comprise a vector corresponding to each of the plurality of hidden layers and the output layers. Alternatively, the node state representation vector may comprise any subset of the hidden layer and output layer vectors concatenated.

(19) The propagation neural network subsystem 202 may be configured to update the node state representation vector 205 associated with a particular node of the input graph based upon the node state representation vectors associated with one or more adjacent nodes in the input graph and the edge representation vectors associated with the edges connecting the particular node to the one or more adjacent nodes in the input graph. That is, for a particular node, the node state representation vector 205 may be updated based upon the nodes and edges that the particular node is connected to. In this way, the node state representation vector 205 for a particular node is updated based upon information from the particular node's local neighbourhood. The node state representation vector is therefore also based upon the structure of the graph rather than treating a graph as just an independent set of nodes. Further details of this updating are provided below with reference to FIGS. 3 and 4.

(20) The aggregator neural network subsystem 203 may be configured to process the node state representation vectors associated with each node of the input graph to generate the vector representation 208 of the input graph 204. The node state representation vectors processed by the aggregator neural network subsystem 203 may be the updated node state representation vectors 207 output by the propagation neural network subsystem 202. Alternatively, the node state representation vectors 205 may be those output by the encoder neural network subsystem 201 or may be a combination of updated node state representation vectors for a subset of nodes and initial node state representation vectors for the rest of the nodes of the input graph 204.

(21) As an example, the vector representation 208 of the input graph 204 may be obtained as follows:

(22) $\begin{matrix} h_{G} = {MLP}_{G} (\underset{i \in V}{.Math.} σ ({MLP}_{gate} (h_{i}^{(T)})) ⊙ MLP (h_{i}^{(T)})) & (3) \end{matrix}$

(23) where h.sub.i.sup.(T) is the node state representation vector for node i after T rounds of updates, ⊙ is an element-wise vector multiplication, MLP is an aggregator neural network configured to process a node state representation vector to generate a transformed vector associated with the node, MLP.sub.gate is a multi-layer perceptron configured to output a gating vector for node i based upon its node state representation vector, MLP.sub.G is a further multi-layer perceptron and h.sub.G is the vector representation of the input graph. In general terms, the above equation (3) computes a vector representation of the input graph by processing a weighted sum of the node state representation vectors of each node of the input graph through a neural network. MLP.sub.gate and MLP may comprise a single linear layer each having the same dimensionality as the required dimensionality of the vector representation of the graph. MLP.sub.G may comprise one hidden layer with ReLU non-linearity. It will be appreciated that other architectures and other non-linearities may be chosen as deemed appropriate by a person skilled in the art.

(24) Equation (3) may be used in both the graph embedding and graph matching models. In the graph matching model, once the node state representation vectors have been obtained and undergone a desired number of rounds of updating, the node state representation vectors for the first and second input graphs may be processed independently to obtain vector representations of the first and second input graphs respectively.

(25) The vector representation of a graph may be a vector of real-values or may be a vector of binary values. It will be appreciated that where the vector is binary valued, the binary values may be two values other than 0 and 1, and may, for example, be −1 and 1. A vector of binary values may be advantageous in cases where it is necessary to search through a large database of graphs with low latency where efficient nearest neighbor search algorithms may be applied.

(26) A similarity score 106 for any two graphs may be determined based upon a comparison of the vector representations of the two respective graphs. For example, the comparison may be based upon any vector space metric such as a Euclidean distance, cosine similarity or Hamming distance.

(27) Referring now to FIG. 3, an exemplary configuration of the propagation neural network subsystem 202 will be described in more detail. The propagation neural network subsystem 202 may comprise a message generation neural network 301 and a node update neural network 302.

(28) The message generation neural network 301 may be configured to process the node state representation vector associated with a particular node and the node state representation vector associated with an adjacent node in the input graph and the edge representation vector associated with the edge connecting the particular node and the adjacent node to generate a message vector associated with the adjacent node in the input graph. That is, for each node that a particular node is connected to, a message vector can be generated using the message generation neural network 301 as exemplified below:
m.sub.j.fwdarw.if.sub.message(h.sub.i.sup.(t),h.sub.j.sup.(t),e.sub.ij) (4)

(29) where m.sub.j.fwdarw.i is a generated message vector for the connection between node j and node i, h.sub.i.sup.(t) is the current node state representation vector for node i after the latest update round t, h.sub.j.sup.(t) is the current node state representation vector for node j after the latest update round t, e.sub.ij is the edge representation vector associated with the edge connecting nodes i and j, and f.sub.message is the message generation neural network 301. As an example, the message generation neural network 301 may be a multi-layer perceptron having one hidden layer with rectified linear units (ReLU). However, it will be appreciated that other forms of neural network may be suitable for implementing the message generation neural network 301. The inputs, h.sub.i.sup.(t), h.sub.j.sup.(t), e.sub.ij to the message generation neural network 301 may be concatenated together to form a single vector input.

(30) The node update neural network 302 may be configured to generate an updated node state representation vector associated with the particular node based upon the current node state representation vector associated with the particular node and the message vectors 303 generated for each adjacent node adjacent the particular node in the input graph. That is, a particular node's state vector representation may be updated based upon information obtained from adjacent nodes as exemplified below

(31) $\begin{matrix} h_{i}^{(t + 1)} = f_{node} (h_{i}^{(t)}, \underset{j : (j, i) \in E}{.Math.} m_{j .fwdarw. i}) & (5) \end{matrix}$

(32) where Σ.sub.j:(j,i)∈E m.sub.j.fwdarw.i is a sum of the message vectors associated with all nodes connected to node i, h.sub.i.sup.(t) is the current node state representation vector for node i after the latest update round t, h.sub.i.sup.(t+1) is the updated node state representation vector for node i, and f.sub.node is the node update neural network 302. As an example, the node update neural network 302 may be a multi-layer perceptron or a recurrent neural network comprising conventional recurrent neural network units (RNN), gated recurrent units (GRU), or long-term short term memory units (LSTM). It will be appreciated that the sum in the above equation (5) may be replaced by an alternative aggregation operator, for example, a mean, a maximum, or an attention based weighted sum amongst others.

(33) The system 100 may be configured to perform multiple rounds of node state vector representation updates. That is, as shown by the dashed line 304 in FIG. 3, an updated node state representation vector 207 may be fed back into the message generation neural network 301 for further processing to generate further message vectors in order to perform a second round of updating by the node update neural network 302. By performing multiple rounds of propagation, nodes may accumulate information in relation to the structure of the graph outside of the node's local neighbourhood. The number of rounds of updates may be chosen as deemed appropriate by a person skilled in the art. As an example, the number of rounds of updates may be between 1 and 5.

(34) Referring now to FIG. 4, an alternative configuration of the propagation neural network subsystem 202 will be described. In particular, this alternative configuration is particularly suitable for use as part of the graph matching model. This configuration also comprises a message generation neural network 401 and a node update neural network 402 along with an additional matching subsystem 403.

(35) The message generation neural network 401 may be configured to process the node state representation vectors 404, 406 and the edge representation vectors 405, 407 of a first and second input graph 101, 102 respectively to generate message vectors 408. The message generation neural network 401 is configured to generate message vectors 408 in a similar manner to the message generation neural network 301 of FIG. 3. For example, a message vector may be generated as follows:
m.sub.j.fwdarw.i=f.sub.message(h.sub.i.sup.(t),h.sub.j.sup.(t),e.sub.ij),∀(i,j)∈E.sub.1∪E.sub.2 (6)

(36) where E.sub.1 and E.sub.2 are the set of edges of the first input graph and the second input graph respectively with the other terms having the same meaning as equation (4) above. As there are no edges connecting nodes of the first and second graphs, the message generation neural network 401 of FIG. 4 effectively operates in the same way as the message generation neural network 301 of FIG. 3 when applied to two graphs individually.

(37) The matching subsystem 403 may be configured to generate a cross-graph matching vector 409 based upon a similarity between a particular node of the first input graph 101 and one or more nodes of the second input graph 102. In general terms, the cross-graph matching vector provides a measure of how well a node in one graph can be matched to one or more nodes in a second graph.

(38) As an example, the cross-graph matching vector may be obtained as follows:
μ.sub.j.fwdarw.i=f.sub.match(h.sub.i.sup.(t),h.sub.j.sup.(t)),∀i∈V.sub.1,j∈V.sub.2, or i∈V.sub.2,j∈V.sub.1 (7)

(39) where V.sub.1 and V.sub.2 are the set of nodes in the first and second graphs respectively, i and j are a pair of nodes, each node of the pair from different graphs for which a matching is to be performed, h.sub.i.sup.(t) and h.sub.j.sup.(t) are the respective current node state representation vectors of nodes i and j, f.sub.match is a chosen matching function, and μ.sub.j.fwdarw.i is the cross-graph matching vector for nodes i and j.

(40) The matching function, f.sub.match, (and hence the similarity between a node of one input graph and a node of another input graph) may be based upon a difference between the node state representation vector associated with the particular node of the input graph and a weighted sum of the node state representation vectors associated with each of the one or more nodes of the second input graph. For example, an attention based weighted sum may be used as follows:

(41) $\begin{matrix} a_{j .fwdarw. i} = \frac{\exp (s_{h} (h_{i}^{(t)}, h_{j}^{(t)}))}{{.Math.}_{j^{'}} \exp (s_{h} (h_{i}^{(t)}, h_{j^{'}}^{(t)}))} & (8) \\ μ_{j .fwdarw. i} = a_{j .fwdarw. i} (h_{i}^{(t)} - h_{j}^{(t)}) & (9) \end{matrix}$

(42) where a.sub.j.fwdarw.i is the attention weight for the pair of nodes i and j, the sum in the denominator runs over one or more nodes j′ in the graph comprising node j, and s.sub.h is vector-based similarity metric such as a Euclidean distance, cosine similarity or Hamming distance.

(43) FIGS. 5A and 5B show visualizations of attention weights for two pairs of exemplary graphs. In FIG. 5A, the pair of graphs has a graph edit distance of 1. In FIG. 5B, the pair of graphs has a graph edit distance of 2. The left-hand drawing of both FIGS. 5A and 5B shows the attention from the left graph to the right graph of the pair of graphs. The right-hand drawing of both FIGS. 5A and 5B show the attention from the right graph to the left graph of the same pair of graphs. The darker lines indicate a larger attention weight between a node of graph and a node of the other graph.

(44) The node update neural network 402 may be configured to generate an updated node state representation vector associated with a particular node based upon the current node state representation vector associated with the particular node, the message vectors generated for each adjacent node adjacent the particular node in its input graph, and the cross-graph matching vector 409. For example, an updated node state representation vector may be generated as follows:

(45) $\begin{matrix} h_{i}^{(t + 1)} = f_{node} (h_{i}^{(t)}, \underset{j}{.Math.} m_{j .fwdarw. i}, \underset{j^{'}}{.Math.} μ_{j^{'} .fwdarw. i}) & (10) \end{matrix}$

(46) where h.sub.i.sup.(t+1) is the updated node state representation vector for node i, f.sub.node is the node update network 402, h.sub.i.sup.(t) is the current node state representation vector for node i, Σ.sub.jm.sub.j.fwdarw.i is a sum of the message vectors associated with all nodes connected to node i, and Σ.sub.j,μ.sub.j′.fwdarw.i is a sum of the computed cross-graph matching vectors for node i and one or more nodes j′ of the other input graph. Where the cross-graph matching vectors are computed using the attention-based weighted sum of equation (9), the sum of the computed cross-graph matching vectors may be re-written as:

(47) $\begin{matrix} \underset{j^{'}}{.Math.} μ_{j^{'} .fwdarw. i} = \underset{j}{.Math.} a_{j^{'} .fwdarw. i} (h_{i}^{(t)} - h_{j^{'}}^{(t)}) = h_{i}^{(t)} - \underset{j^{'}}{.Math.} (a_{j^{'} .fwdarw. i} h_{j^{'}}^{(t)}) & (11) \end{matrix}$

(48) Similar to the node update neural network 302 of FIG. 3, the node update neural network 402 in this configuration may also be a multi-layer perceptron or a recurrent neural network comprising conventional recurrent neural network units (RNN), gated recurrent units (GRU), or long-term short term memory units (LSTM).

(49) Similar to the configuration of FIG. 3, multiple rounds of node state vector representation updates may be performed and as shown by the dashed lines 412, 413 in FIG. 4. Updated node state representation vectors 410, 411 for the first and second input graphs may be fed back into the message generation neural network 401 for further processing to generate further message vectors and into the matching subsystem 403 to generate further cross-graph matching vectors in order to perform additional rounds of updating by the node update neural network 402.

(50) The one or more neural networks of the neural network system 100 may be trained using a training dataset of graphs comprising example pairs or triplets of graphs. Where the training data comprises pairs of graphs, each pair has an associated label indicating whether the two graphs are similar or dissimilar. Where the training data comprises a triplet of graphs, the triplet may be labelled based upon a relative similarity, for example, whether a first graph of the triplet is more similar to the second or third graph of the triplet.

(51) The one or more neural networks may be trained based upon optimizing a loss function using gradient descent based algorithms. The loss function may be a margin-based loss function. For example, where the training dataset comprises pairs of graphs, the loss function may be a margin-based pairwise loss having the form:
L.sub.pair= custom character .sub.(G.sub.1.sub.,G.sub.2.sub.,t)[max{0,γ−t(1−d(G.sub.1,G.sub.2))}] (12)

(52) where γ>0 is a margin parameter, t is the label for the pair of graphs G.sub.1, G.sub.2, for example, −1 for a pair of dissimilar graphs, 1 for a pair of similar graphs, and d(G.sub.1, G.sub.2) is the Euclidean distance between the vector representations of the two graphs. The above loss function encourages d(G.sub.1, G.sub.2)<1−γ where the pair is similar (t=1), and d(G.sub.1, G.sub.2)<1+γ when the pair is dissimilar (t=−1).

(53) Where the training data comprises triplets of graphs having a relative similarity, the loss function may be a margin-based triplet loss having the form:
L.sub.triplet= custom character .sub.(G.sub.1.sub.,G.sub.2.sub.,G.sub.3.sub.)[max{0,d(G.sub.1,G.sub.2)−d(G.sub.1,G.sub.3)+γ}] (13)

(54) where the terms have the same meaning as in equation (12). The loss function encourages d(G.sub.1, G.sub.2) to be smaller than d(G.sub.1, G.sub.3) by at least a margin γ.

(55) Alternatively, rather than using the Euclidean distance as part of the loss function, it is possible to use the Hamming distance and to minimize the Hamming distance between similar graphs and maximize the Hamming distance for dissimilar graphs.

(56) For example, the loss functions may take the form:
L.sub.pair= custom character .sub.(G.sub.1.sub.,G.sub.2.sub.,t)[(t−s(G.sub.1,G.sub.2)).sup.2]/4 (14)
L.sub.triplet=.sub.(G.sub.1.sub.,G.sub.2.sub.,G.sub.3.sub.)[(s(G.sub.1,G.sub.2)−1).sup.2+(s(G.sub.1,G.sub.3))+1).sup.2}]/8 (15)

(57) and where

(58) $\begin{matrix} s (G_{1}, G_{2}) = \frac{1}{H} {.Math.}_{i = 1}^{H} \tanh (h_{G_{1} i}) .Math. \tanh (h_{G_{2} i}) & (16) \end{matrix}$
is an approximate average Hamming distance.

(59) Referring now to FIG. 6, processing for determining graph similarity will now be described. It will be appreciated that the processing may be performed by the neural network system 100. At step S601, a first graph 101 is received by one or more processors. At step S602, a second graph 102 is received by the one or more processors.

(60) A vector representation of the first graph is generated by one or more neural networks at step S603 and a vector representation of the second graph is generated by the one or more neural networks at step S604. Generating a vector representation of a graph may comprise processing the graph to generate a node state representation vector for each node of the graph and an edge representation vector for each edge of the graph by the one or more neural networks. The node state representation vectors and the edge representation vectors are then processed by the one or more neural networks to generate the vector representation of the graph. The operations for generating a vector representation of a graph may be carried out as described above with reference to FIGS. 2 to 4.

(61) At step S605, the one or more processors then determines a similarity score based upon the vector representations of the first and second graphs generated at steps S603 and S604.

(62) It will be appreciated that whilst the above processing is presented as being carried out in a particular order, it is not intended to limit to any particular ordering of steps and the above steps may be carried out in a different order. For example, the vector representation of the first graph may be determined prior to receipt of the second graph. It is also possible that steps are carried out in parallel rather than as a sequential process. For example, the generation of the first and second vector representations of the graph may be performed in parallel.

(63) The above described neural network system may be used as part of a system for binary function vulnerability detection. For example, where a program, library or portion of binary code has a known vulnerability, it is desirable to determine whether there exists other binary code that may be susceptible to the same vulnerability or where there exists a database of code having known vulnerabilities and it is desirable to determine whether a query piece of code is susceptible to any known vulnerabilities in the database. This is particularly useful where access to source code is not available, for example, when dealing with commercial or embedded software or suspicious executables. Alternatively, where source code is available, the system may also be applied to source code and is not limited to binary functions.

(64) A binary function may be represented as a control flow graph. A node in a control flow graph may be a basic block of assembly instructions with edges representing control flow, for example, a jump or return instruction used in branching, loops or function calls. A control flow graph may be obtained by processing the binary function using a disassembler and a code analyzer. FIG. 7 shows two exemplary control flow graphs 701, 702 which are deemed to be similar as they have been generated based upon the same function but using different compilers. FIG. 7 also shows a third exemplary control flow graph 703 which has been generated using a different function and is therefore dissimilar to the first and second control flow graphs 701, 702.

(65) This type of the problem is challenging as graphs having different structures may still be similar whilst graphs having small differences may not be similar.

(66) The system for binary function vulnerability detection may comprise one or more processors configured to receive a first control flow graph associated a first binary function having a known vulnerability and a second control flow graph associated with a second binary function to be tested. Using the neural network system described above, vector representations of the first and second control flow graphs may be obtained and a similarity score between them may be generated. The second binary function may be determined as having the same vulnerability as the first binary function if the similarity score exceeds a threshold. Further binary functions may be compared with the first binary function in a similar manner to determine their susceptibility. By using the neural network system as described above, it is possible to learn a graph representation and similarity scoring that is that improves performance over classical graph theoretical matching algorithms and hand crafted features.

(67) The system for binary function vulnerability detection may be trained based upon a similarity metric such that control flow graphs associated with a binary function having the same source code have higher similarity than control flow graphs associated with a binary function having different source code. For example, a training dataset may be generated by compiling source code using different compilers or different compiler optimization flags.

(68) A control flow graph may also be associated with a feature vector based upon instruction types associated with the block of instructions represented by the node. These feature vectors may also be generated using a neural network and trained jointly with the neural network system for determining graph similarity.

(69) In another example, the neural network system for determining graph similarity may be used as part of a system for retrieval of data. A graph may be provided as a query and a database may be searched to retrieve associated data. For example, the graph may be a representation of a molecule for which it is desired to find other molecules having similar properties.

(70) The system for retrieving data associated with a graph may comprise one or more processors configured to receive a query graph and to generate a vector representation of the query graph using the one or more neural network as described above. The system may further comprise a database comprising a plurality of records, each record of the plurality of records being associated with a respective graph. For each record of the plurality of records, the one or more processors may be configured to process the vector representation of the query graph and a vector representation associated with the respective graph associated with the record to determine a respective similarity score and to output data associated with one or more records based upon the determined similarity score.

(71) The above described neural network system may be used as part of a system for other technical applications. For example, a graph may represent a drug molecule and the system may be used to compare molecules e.g., for generating new potentially viable drugs. Each respective node of the graph may represent an atom of the molecule or a secondary structural element of a protein such as an alpha helix or beta sheet, i.e., a part of the molecule. Each respective edge of the graph may represent an interaction between nodes such as a chemical bond between the atoms of the molecule or one or more hydrogen bonds between the secondary structural elements, i.e., chemical bonds between the parts. The system may be trained on data representing molecules and optionally their known properties. The trained system may then be used to identify graphs representing other physically realistic molecules with the same or similar properties to those in the training data e.g., molecules that bind to a particular site and/or families of drugs. Based on the similarity score the system may be used to generate molecule such as a drug candidate, which may then be evaluated e.g., in silico or by synthesizing and testing the molecule in vitro or in vivo.

(72) In another example, a graph may represent a transportation network in which the nodes represent physical locations and the edges routing paths. The system may be used to determine a route by comparing graphs representing routing paths e.g., to configure an efficient transportation network. Following on from this the system may be used for controlling an object, e.g., a vehicle on a road or a robot in a warehouse, to travel along the determined route.

(73) In a similar way a graph may represent a computer network or electronic communications network in which the nodes represent entities on the network such as servers and routers and the edges routing paths. The system may then be used to determine a route for communications data e.g., a data packet, e.g., by comparing graphs representing different network configurations, and subsequently to route data e.g., the packet.

(74) In a further similar way a graph may represent an electronic circuit design e.g., an integrated circuit design, in which the nodes represent elements in the circuit such as transistors or logic elements and the edges routing paths between the nodes. The system may then be used to determine routes for communicating between or connecting the elements e.g., by comparing graphs representing different routing configurations, and may subsequently be used to build an electronic circuit to the design.

(75) In another similar way a graph may represent a static or moving physical structure in which the nodes represent elements of the structure and the edges connections between the elements. For example in the case of a building the nodes may represent e.g., floors, walls, roof, foundations, piers, decking and the edges connecting elements such as girders, beams, struts and ties. In another application the nodes may represent elements of a machine or deployable structure and edges their connections. The system may then be for comparing structures e.g., to design or evaluate a structure; the result may then be used to construct a structure to the design.

(76) In a further example a graph may represent a visual scene and may capture semantic relationships between objects/parts of objects. Thus the nodes may represent elements of the scene e.g., increasingly larger elements of the scene and the edges relationships between the elements e.g., to which objects elements of the scene belong. The system may then be used to compare different scenes on the basis of their semantic content, e.g., for identifying of classifying a scene or for identifying a group of objects defining an object or coherent part of a scene e.g., to facilitate object/scene manipulation (editing) or information extraction (interpretation).

(77) In a further example a graph may represent a parse trees or other linguistic structures for use in natural language processing and translation. For example the nodes may represent words, word pieces, or phrases, in a natural language and the edges their relations. The system may then be used to compare different pieces of natural language text on the basis of their semantic content.

(78) It will be appreciated that there exists many other technical applications for generated graph structures and representations based upon similarity.

(79) This specification uses the term “configured” in connection with systems and computer program components. For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions.

(80) Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. The computer storage medium is not, however, a propagated signal.

(81) The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.

(82) A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

(83) As used in this specification, an “engine,” or “software engine,” refers to a software implemented input/output system that provides an output that is different from the input. An engine can be an encoded block of functionality, such as a library, a platform, a software development kit (“SDK”), or an object. Each engine can be implemented on any appropriate type of computing device, e.g., servers, mobile phones, tablet computers, notebook computers, music players, e-book readers, laptop or desktop computers, PDAs, smart phones, or other stationary or portable devices, that includes one or more processors and computer readable media. Additionally, two or more of the engines may be implemented on the same computing device, or on different computing devices.

(84) The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit). For example, the processes and logic flows can be performed by and apparatus can also be implemented as a graphics processing unit (GPU).

(85) Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.

(86) Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

(87) To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.

(88) Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.

(89) The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

(90) While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

(91) Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

(92) Particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

Deep neural network system for similarity-based graph representations

Assignee

Inventors

Cpc classification

Classification Explorer

G06N3/044

PHYSICS

Classification Explorer

G06V10/82

PHYSICS

Classification Explorer

G06F16/9024

PHYSICS

Classification Explorer

G06V30/1988

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06F18/24317

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G06F21/563

PHYSICS

Classification Explorer

G06V10/426

PHYSICS

Classification Explorer

G06F18/22

PHYSICS

Classification Explorer

G06F21/577

PHYSICS

International classification

Classification Explorer

G08B23/00

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06F21/57

PHYSICS

Classification Explorer

G06F11/00

PHYSICS

Classification Explorer

G06F12/14

PHYSICS

Classification Explorer

G06F12/16

PHYSICS

Classification Explorer

G06F17/16

PHYSICS

Classification Explorer

G06K9/62

PHYSICS

Classification Explorer

G06F16/901

PHYSICS

Abstract

Claims

Description