Covariant Neural Network Architecture for Determining Atomic Potentials

Abstract

Methods and systems for computationally simulating an N-body physical system are disclosed. A compound object X having N elementary parts E may be decomposed into J subsystems, each including one or more of the elementary parts and having a position vector r.sub.j and state vector .sub.j. A mural network having J nodes each corresponding to one of the subsystems may be constructed, the nodes including leaf nodes, a non-leaf root node, and intermediate non-leaf nodes, each being configured to compute an activation corresponding to the state of a respective subsystem. Upon receiving input data for the parts E, each node may compute .sub.j from r.sub.j and .sub.j of its child nodes using a covariant aggregation rule representing .sub.j as a tensor that is covariant to rotations of the rotation group SO(3). A Clebsch-Gordan transform may be applied to reduce tensor products to irreducible covariant vectors, and .sub.j of the root node may be computed as output of the ANN.

Claims

1. A method for computationally simulating an N-body physical system, wherein the N-body physical system is represented mathematically as a compound object X having N elementary parts E={e.sub.i}, i=1, . . . , N, each e.sub.i representing one of the N bodies of the N-body physical system, wherein X is hierarchically decomposed into J subsystems, P.sub.j, j=1, . . . , J, each P.sub.j comprising one or more of the elementary parts of E, and wherein each P.sub.j is described by a position vector r.sub.j and an internal state vector .sub.j, the method being implemented on a computing device and comprising: constructing a hierarchical artificial neural network (ANN) having J nodes each corresponding to one of the J subsystems, the J nodes including m2 leaf nodes as ANN inputs, m=1 non-leaf root node as ANN output, and m1 intermediate non-leaf nodes, wherein each node is a neuron of the ANN and is configured to compute an activation corresponding to a different one of the internal state vectors .sub.j, and wherein: for each leaf node, .sub.j describes the internal state of a respective one of the P.sub.j subsystems having just a single elementary part e.sub.i, for each given intermediate non-leaf node, .sub.j describes the internal state of a respective one of the P.sub.j subsystems having 2k<N parts e.sub.i that are each comprised in a child node of the given intermediate non-leaf node, and for the root node, .sub.j describes the internal state of a subsystem P.sub.j having k=N elementary parts e.sub.i that are each comprised in a child node of the root node; at the computing device, receiving input data to the leaf nodes specifying respective position vectors and respective internal state vectors of the N elementary parts E; for each given non-leaf node, computing .sub.j from the position vectors and internal states of all the child nodes of the given non-leaf node according to a covariant aggregation rule that represents .sub.j as a tensor object that is covariant to rotations of the rotation group SO(3); applying a Clebsch-Gordan transform to reduce tensor products of the state vectors of the nodes to irreducible covariant vectors; and computing .sub.j of the root node as output of the ANN, to determine a simulation of the internal state of the N-body physical system.

2. The method of claim 1, wherein the tensor products of the state vectors and application of the Clebsch-Gordan transform comprise mathematical operations that are nonlinear, and wherein applying the Clebsch-Gordan transform to reduce the tensor products of the state vectors of the nodes to irreducible covariant vectors comprises applying the nonlinear operations in Fourier space.

3. The method of claim 1, wherein the m>2 leaf nodes form an input layer of the hierarchical ANN, the m=1 non-leaf root node forms an single-node output layer of the hierarchical ANN, and the m1 intermediate non-leaf nodes are distributed among m1 intermediate layers of the hierarchical ANN, and wherein the hierarchical ANN is one of: a strict tree structure, each successive layer after the input layer comprising one or more parent nodes of one or more child nodes that reside only in an immediately preceding layer; or a non-strict tree structure, each successive layer after the input layer comprising one or more parent nodes of one or more child nodes that reside among more than preceding layer.

4. The method of claim 3, wherein each given non-leaf node computing .sub.j from the position vectors and internal states of all the child nodes of the given non-leaf node comprises the given non-leaf node receiving the activation of each of its child nodes, the activation of each given child node comprising the internal state of the given child node.

5. The method of claim 1, wherein the J subsystems correspond to a hierarchy of substructures of the compound object X, from smallest to largest, the largest corresponding to the entirety of X, wherein each of the P.sub.j subsystems that has just a single elementary part e.sub.i corresponds to a single one of the smallest substructures, wherein the subsystem P.sub.j that has k=N elementary parts e.sub.i corresponds to the largest substructure, and wherein the P.sub.j subsystems that have 2k<N parts e.sub.i correspond to substructures between the smallest and largest.

6. The method of claim 1, wherein the J subsystems correspond to a hierarchy of substructures of the compound object X, and wherein each node of the hierarchical ANN corresponds to one of the substructures of the compound object X, wherein each respective non-leaf node corresponds to a respective substructure of the compound object X comprising the substructures of all of the child nodes of the respective non-leaf node, wherein each respective leaf node corresponds to a particular substructure of the compound object X comprising a single elementary part e.sub.i, and wherein the internal state of each given subsystem corresponds to a respective potential energy function due to physical interactions among the substructures of the child nodes of the node corresponding to the given subsystem.

7. The method of claim 6, wherein the hierarchical ANN comprises adjustable weights shared among two or more of the nodes, and wherein the method further comprises training the ANN to learn the potential energy functions of all of the subsystems by adjusting the weights of the nodes corresponding to the subsystems.

8. The method of claim 7, wherein training the ANN to learn the potential energy functions comprises: providing training data to the input layer, the training data comprising for the N-body physical system one or more known training sets, each including: (i) a given configuration of position vectors, and (ii) a known potential function for the given configuration; for each of the training sets, comparing a computed potential function output from the non-leaf root node with the known potential function for the given configuration; and based on the comparing, adjusting the weights to achieve agreement, to within a threshold level, between the computed potential functions and the known potential functions across the training sets.

9. The method of claim 8, wherein each of the training sets is at least one of empirical measurements of the N-body physical system, or ab initio computations of forces and energies of the N-body physical system.

10. The method of claim 1, wherein the compound object X is comprised of molecules, wherein each elementary part e.sub.i is an atom, and wherein .sub.j for each node represents atomic potentials and forces experienced by each corresponding subsystem P.sub.j due the presence and relative positions of each of the other P.sub.j subsystems.

11. A computing device configured for computationally simulating an N-body physical system, wherein the N-body physical system is represented mathematically as a compound object X having N elementary parts E={e.sub.i}, i=1, . . . , N, each e.sub.i representing one of the N bodies of the N-body physical system, wherein X is hierarchically decomposed into J subsystems, P.sub.j, j=1, . . . , J, each P.sub.j comprising one or more of the elementary parts of E, and wherein each P.sub.j is described by a position vector r.sub.j and an internal state vector .sub.j, the computing device comprising: one or more processors; and memory configured to store computer-executable instructions that, when executed by the one or more processors, cause the computing device to carry out computational operations including: constructing a hierarchical artificial neural network (ANN) having J nodes each corresponding to one of the J subsystems, the J nodes including m2 leaf nodes as ANN inputs, m=1 non-leaf root node as ANN output, and m1 intermediate non-leaf nodes, wherein each node is a neuron of the ANN and is configured to compute an activation corresponding to a different one of the internal state vectors .sub.j, and wherein: for each leaf node, .sub.j describes the internal state of a respective one of the P.sub.j subsystems having just a single elementary part e.sub.i, for each given intermediate non-leaf node, .sub.j describes the internal state of a respective one of the P.sub.j subsystems having 2k<N parts e.sub.i that are each comprised in a child node of the given intermediate non-leaf node, and for the root node, .sub.j describes the internal state of a subsystem P.sub.j having k=N elementary parts e.sub.i that are each comprised in a child node of the root node; receiving input data to the leaf nodes specifying respective position vectors and respective internal state vectors of the N elementary parts E; for each given non-leaf node, computing .sub.j from the position vectors and internal states of all the child nodes of the given non-leaf node according to a covariant aggregation rule that represents .sub.j as a tensor object that is covariant to rotations of the rotation group SO(3); applying a Clebsch-Gordan transform to reduce a tensor product of the state vectors of the nodes to irreducible covariant vectors; and computing .sub.j of the root node as output of the ANN to determine the internal state of the N-body physical system.

12. The computing device of claim 11, wherein the tensor products of the state vectors and application of the Clebsch-Gordan transform comprise mathematical operations that are nonlinear, and wherein applying the Clebsch-Gordan transform to reduce the tensor products of the state vectors of the nodes to irreducible covariant vectors comprises applying the nonlinear operations in Fourier space.

13. The computing device of claim 11, wherein the m2 leaf nodes form an input layer of the hierarchical ANN, the m=1 non-leaf root node forms an single-node output layer of the hierarchical ANN, and the m1 intermediate non-leaf nodes are distributed among m1 intermediate layers of the hierarchical ANN, wherein the hierarchical ANN is one of: a strict tree structure, each successive layer after the input layer comprising one or more parent nodes of one or more child nodes that reside only in an immediately preceding layer; or a non-strict tree structure, each successive layer after the input layer comprising one or more parent nodes of one or more child nodes that reside among more than preceding layer, and wherein each given non-leaf node computing .sub.j from the position vectors and internal states of all the child nodes of the given non-leaf node comprises the given non-leaf node receiving the activation of each of its child nodes, the activation of each given child node comprising the internal state of the given child node.

14. The computing device of claim 11, wherein the J subsystems correspond to a hierarchy of substructures of the compound object X, from smallest to largest, the largest corresponding to the entirety of X, wherein each of the P.sub.j subsystems that has just a single elementary part e.sub.i corresponds to a single one of the smallest substructures, wherein the subsystem P.sub.j that has k=N elementary parts e.sub.i corresponds to the largest substructure, and wherein the P.sub.j subsystems that have 2k<N parts e.sub.i correspond to substructures between the smallest and largest.

15. The computing device of claim 11, wherein the J subsystems correspond to a hierarchy of substructures of the compound object X, and wherein each node of the hierarchical ANN corresponds to one of the substructures of the compound object X, wherein each respective non-leaf node corresponds to a respective substructure of the compound object X comprising the substructures of all of the child nodes of the respective non-leaf node, wherein each respective leaf node corresponds to a particular substructure of the compound object X comprising a single elementary part e.sub.i, and wherein the internal state of each given subsystem corresponds to a respective potential energy function due to physical interactions among the substructures of the child nodes of the node corresponding to the given subsystem.

16. The computing device of claim 15, wherein the hierarchical ANN comprises adjustable weights shared among two or more of the nodes, and wherein the computational operations further comprise training the ANN to learn the potential energy functions of all of the subsystems by adjusting the weights of the nodes corresponding to the subsystems.

17. The computing device of claim 16, wherein training the ANN to learn the potential energy functions comprises: providing training data to the input layer, the training data comprising for the N-body physical system one or more known training sets, each including: (i) a given configuration of position vectors, and (ii) a known potential function for the given configuration; for each of the training sets, comparing a computed potential function output from the non-leaf root node with the known potential function for the given configuration; and based on the comparing, adjusting the weights to achieve agreement, to within a threshold level, between the computed potential functions and the known potential functions across the training sets.

18. The computing device of claim 17, wherein each of the training sets is at least one of empirical measurements of the N-body physical system, or ab initio computations of forces and energies of the N-body physical system

19. The computing device of claim 11, wherein the compound object X is comprised of molecules, wherein each elementary part e.sub.i is an atom, and wherein .sub.j for each node represents atomic potentials and forces experienced by each corresponding subsystem P.sub.j due the presence and relative positions of each of the other P.sub.j subsystems.

20. An article of manufacture comprising a non-transitory computer readable media having computer-readable instructions stored thereon for computationally simulating an N-body physical system, wherein the N-body physical system is represented mathematically as a compound object X having N elementary parts E={e.sub.i}, i=1, . . . , N, each e.sub.i representing one of the N bodies of the N-body physical system, wherein X is hierarchically decomposed into J subsystems, P.sub.j, j=1, . . . , J, each P.sub.j comprising one or more of the elementary parts of E, and wherein each P.sub.j is described by a position vector r.sub.j and an internal state vector .sub.j, and wherein the instructions, when executed by one or more processors of a computing device, cause the computing device to carry out operations including: constructing a hierarchical artificial neural network (ANN) having J nodes each corresponding to one of the J subsystems, the J nodes including m2 leaf nodes as ANN inputs, m=1 non-leaf root node as ANN output, and m1 intermediate non-leaf nodes, wherein each node is a neuron of the ANN and is configured to compute an activation corresponding to a different one of the internal state vectors .sub.j, and wherein: for each leaf node, .sub.j describes the internal state of a respective one of the P.sub.j subsystems having just a single elementary part e.sub.i, for each given intermediate non-leaf node, .sub.j describes the internal state of a respective one of the P.sub.j subsystems having 2k<N parts e.sub.i that are each comprised in a child node of the given intermediate non-leaf node, and for the root node, .sub.j describes the internal state of a subsystem P.sub.j having k=N elementary parts e.sub.i that are each comprised in a child node of the root node; receiving input data to the leaf nodes specifying respective position vectors and respective internal state vectors of the N elementary parts E; for each given non-leaf node, computing .sub.j from the position vectors and internal states of all the child nodes of the given non-leaf node according to a covariant aggregation rule that represents .sub.j as a tensor object that is covariant to rotations of the rotation group SO(3); applying a Clebsch-Gordan transform to reduce a tensor product of the state vectors of the nodes to irreducible covariant vectors; and computing .sub.j of the root node as output of the ANN to determine the internal state of the N-body physical system.

Description

BRIEF DESCRIPTION OF DRAWINGS

[0016] FIG. 1 depicts a simplified block diagram of an example computing device, in accordance with example embodiments.

[0017] FIG. 2 is a conceptual illustration of two types of tree-like artificial neural network, one strict tree-like and the other non-strict tree-like, in accordance with example embodiments.

[0018] FIG. 3A is a conceptual illustration of an N-body system, in accordance with example embodiments.

[0019] FIG. 3B is a conceptual illustration of an N-body system showing a second level of substructure, in accordance with example embodiments.

[0020] FIG. 3C is a conceptual illustration of an N-body system showing a third level of substructure, in accordance with example embodiments.

[0021] FIG. 3D is a conceptual illustration of an N-body system showing a fourth level of substructure, in accordance with example embodiments.

[0022] FIG. 3E is a conceptual illustration of an N-body system showing a fifth level of substructure, in accordance with example embodiments.

[0023] FIG. 3F is a conceptual illustration of a decomposition of an N-body system in terms of subsystems and internal states, in accordance with example embodiments.

[0024] FIG. 4A is a conceptual illustration of compositional scheme for a compound object representing an N-body system, in accordance with example embodiments.

[0025] FIG. 4B is a conceptual illustration of compositional neural network for simulating an N-body system, in accordance with example embodiments.

[0026] FIG. 5 is a flow chart of an example method, in accordance with example embodiments.

DETAILED DESCRIPTION

[0027] Example methods, devices, and systems are described herein. It should be understood that the words example and exemplary are used herein to mean serving as an example, instance, or illustration. Any embodiment or feature described herein as being an example or exemplary is not necessarily to be construed as preferred or advantageous over other embodiments or features unless stated as such. Thus, other embodiments can be utilized and other changes can be made without departing from the scope of the subject matter presented herein.

[0028] Accordingly, the example embodiments described herein are not meant to be limiting. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations. For example, the separation of features into client and server components may occur in a number of ways.

[0029] Further, unless context suggests otherwise, the features illustrated in each of the figures may be used in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each embodiment.

[0030] Additionally, any enumeration of elements, blocks, or steps in this specification or the claims is for purposes of clarity. Thus, such enumeration should not be interpreted to require or imply that these elements, blocks, or steps adhere to a particular arrangement or are carried out in a particular order.

I. INTRODUCTION

[0031] Example embodiments of a covariant hierarchical neural network architecture, referred to herein as N-body comp-nets, are described herein in terms of molecular structure, and in particular, atomic potentials of molecular systems. The example of such molecular systems provides a convenient basis for connecting analytic concepts of N-body comp-nets to physical systems that may be illustratively conceptualized. For example, a physical hierarchy of structures and substructures of molecular constituents (e.g., atoms) may lend itself to a descriptive visualization. Similarly, the concept of rotational and/or translational invariance (or, more generally, invariance to spatial transformations) may be easily grasped at a conceptual level in terms of the ability of a neural network to learn to recognized complex systems regardless of their spatial orientations when presented to the neural network. And consideration of learning atomic and/or molecular potentials of such systems can help tie the structure of the constituents to their physics in an intuitive manner. However, the example of molecular/atomic systems and potentials is not, and should not, be viewed as limiting with respect to either the analytical framework or the applicability of N-body comp-nets.

[0032] More specifically, the challenges described abovenamely the ability to recognize multiscale structure while maintaining invariance with respect to spatial transformationmay be met by the inventor's novel application of concepts of group representation theory to neural networks. The inventor's introduction of Clebsch-Gordan decompositions into hierarchically structured neural networks is one aspect of example embodiments described herein that makes N-body comp-nets broadly applicable to problems beyond the example of molecular/atomic systems and potentials. In particular, it supplies an analytical prescription for how neural networks may be constructed and/or adapted to simulate a wide range of physical systems, as well as address problems in areas such as computer vision, and computer graphics (and, more generally, point-cloud representations), among others.

[0033] In relation to physical systems described by way of example herein, neurons of an example N-body comp-net may be described as representing internal states of subsystems of a physical system being modeled. This too, however, is a convenient illustration that may be conceptually connected to the physics of molecular and/or atomic systems. Thus, in accordance with example embodiments, internal state may be a convenient computational representation of the activations of neurons of a comp-net. In other applications, the activations may be associated with other physical properties or analytical characteristics of the problem at hand. In either case (and in others), a common aspect of activations of a comp-net is the transformational properties provided by tensor representation and the Clebsch-Gordan decompositions it admits. These are aspects that enable neural networks to meet challenges that have previously vexed their operation. Practical applications of simulations of N-body comp-nets are extensive.

[0034] In relation to molecular structure and dynamics, N-body comp-nets may be used to learn, compute, and/or predict (in addition to potential energies) forces, metastable states, and transition probabilities. Applied or integrated in a context of larger structure, N-body comp-nets may be extended to areas of material design, such as tensile strength, design of new drug compounds, simulation of protein folding, design of new battery technologies and new types of photovoltaics. Other areas of applicability of N-body comp-nets may include prediction of protein-ligand interactions, protein-protein interactions, and properties of small molecules, including solubility and lipophilicity. Additional applications may also include protein structure prediction and structure refinement, protein design, DNA interactions, drug interactions, protein interactions, nucleic acid interactions, protein-lipid-nucleic acid interactions, molecule/ligand interactions, drug permeability measurements, and predicting protein folding and unfolding. As this list of examples suggests, N-body comp-nets may provide a basis for wide applicability, both in terms of the classes and/or types of specific problems tackled, and the conceptual variety of problems they can address.

II. EXAMPLE COMPUTING DEVICES

[0035] FIG. 1 is a simplified block diagram of a computing device 100, in accordance with example embodiments. As shown, the computing device 100 may include processor(s) 102, memory 104, network interface(s) 106, and an input/output unit 108. By way of example, the components are communicatively connected by a bus 110. The bus could also provide power from a power supply (not shown). In particular, computing device 100 may be configured to perform at least one function of and/or related to implementing all or portions of artificial neural networks 200, 202, and/or 400-B, machine learning system 700, and/or method 500, all of which are described below.

[0036] Memory 104 may include firmware, a kernel, and applications, among other forms and functions of memory. As described, the memory 104 may store machine-language instructions, such as programming code or non-transitory computer-readable storage media, that may be executed by the processor 102 in order to carry out operations that implement the methods, scenarios, and techniques as described herein and in accompanying documents and/or at least part of the functionality of the example devices, networks, and systems described herein. In some examples, memory 104 may be implemented using a single physical device (e.g., one magnetic or disc storage unit), while in other examples, memory 104 may be implemented using two or more physical devices. In some examples, memory 104 may include storage for one or more machine learning systems and/or one or more machine learning models as described herein.

[0037] Processors 102 may include one or more general purpose processors and/or one or more special purpose processors (e.g., digital signal processors (DSPs) or graphics processing units (GPUs). Processors 102 may be configured to execute computer-readable instructions that are contained in memory 104 and/or other instructions as described herein.

[0038] Network interface(s) 106 may provide network connectivity to the computing system 100, such as to the internet or other public and/or private networks. Networks may be used to connect the computing system 100 with one or more other computing devices, such as servers or other computing systems. In an example embodiment, multiple computing systems could be communicatively connected, and example methods could be implemented in a distributed fashion.

[0039] Client device 112 may be a user client or terminal that includes an interactive display, such as a GUI. Client device 112 maybe used for user access to programs, applications, and data of the computing device 100. For example, a GUI could be used for graphical interaction with programs and applications described herein. In some configurations, the client device 112 may itself be a computing device; in other configurations, the computing device 100 may incorporate, or be configured to operate as, a client device.

[0040] Database 114 may include input data, such as images, configurations of N-body systems, or other data used in the techniques described herein. Data could be acquired for processing and/or recognition by a neural network, including artificial neural networks 200, 202, and/or 400-B. The data could additionally or alternatively be training data, which may be input to a neural network, for training, such as determination of weighting factors applied at various layers of the neural network. Database 114 could be used for other purposes as well.

III. EXAMPLE ARTIFICIAL NEURAL NETWORKS FOR REPRESENTING STRUCTURED OBJECTS

[0041] Example embodiments of N-body neural networks for simulation and modeling may be described in terms of some of the structures and features of classical feed-forward neural networks. Accordingly, a brief review of classical feed-forward networks is presented below in order to provide a context for describing an example general purpose neural architecture for representing structured objects referred to herein as compositional networks.

[0042] A prototypical feed-forward neural network consists of some number of neurons { custom-character } arranged in L+1 distinct layers. Layer =0 is referred to as the input layer, and is where training and testing data enter the network, while the inputs of the neurons in layers =1, 2, . . . , L are the outputs {} of the neurons in the previous layer. Each neuron computes its output, also called its activation, using a simple rule such as

custom-character =(+)(1)

where the { custom-character } weights and {} biases are learnable parameters, while is a fixed nonlinearity, such a sigmoid function or a ReLU operator. The output of the network appears in layer L, also referred to as the output layer. As computational entities or constructs implemented as software or other machine language code executable on a computing device, such as computing device 100, neural networks are also commonly referred to as artificial neural networks or ANNs. The term ANN may also refer to a broader class of neural network architectures than feed-forward networks, and is used without loss of generality to refer to example embodiments of neural networks described herein.

[0043] During training of a feed-forward neural network, training data are input, and the output layer results are compared with the desired output by means of a loss function. The gradient of the loss may be back-propagated through the network to update the parameters, typically by some variant of stochastic gradient descent. During real-time or live operation, testing data, representing some object (e.g., a digital image) or system (e.g., a molecule) having an unknown a priori output result, are fed into the network. The result may represent a prediction by the network of the correct output result to within some prescribed statistical uncertainty, for example. The accuracy of the prediction may depend on the appropriateness of the network configuration for solving the problem, as well as the amount and/or quality of the training.

[0044] The neurons and layers of feed-forward neural networks may be arranged in tree-like structures. FIG. 2 is a conceptual illustration of two types of tree-like artificial neural network. In particular, ANN 200 depicts a feed-forward neural network having strict tree-like structure, while ANN 200 depicts a feed-forward neural network having non-strict tree-like. Both ANNs have an input layer 204 having four neurons f1, f2, f3, and f4, and an output layer 206 having a single neuron f11. Neurons are also referred to as nodes in describing their configuration and connections in a neural network. The four input neurons in the example are referred to as leaf-nodes, and the single output neuron is referred to as a root node.

[0045] In each ANN, neurons f5, f6, and f7 reside in a first hidden layer after the input layer 204, and neurons f8, f9, and f10 reside in a second hidden layer, which is also just before the output layer 206. The neurons in the hidden layers are also referred to as hidden node and/or non-leaf nodes. Note that the root node is also a non-leaf node. In addition, there could be ANNs having more than two hidden layers, or even just one hidden layer.

[0046] Input data IN.sub.1, IN.sub.2, IN.sub.3, and IN.sub.4 are input to the input neurons of each layer, and a single output D_OUT is output from the output neuron of each ANN. Connections between neurons (directed arrows in FIG. 2) correspond to activations fed forward from one neuron to the next. In particular, one or more nodes that provide input to a given node are referred to as child nodes of the given node, and the given node is referred to as the parent node of the child nodes. For the purposes of the discussion herein, strict tree-like ANNs, such as ANN 200, are distinguished from non-strict tree-like ANNs, such as ANN 202, by the types of parent-child connections they each have.

[0047] More specifically, in a strict tree-like ANN, the each child node of a parent node resides in a layer immediately prior to the layer in which the parent node resides. Three examples are indicated in ANN 200. Namely, f4 which is a child of f7 resides in the layer immediately prior to the f7's layer. Similarly, f7 which is a child of f10 resides in the layer immediately prior to the f10's layer, and f10 which is a child of f11 resides in the layer immediately prior to the f11's layer. It may be seen by inspection that the same relationship holds for all the connected nodes of ANN 200.

[0048] In a non-strict tree-like ANN, the each child node of a parent node resides in a layer prior to the layer in which the parent node resides, but it need not be the immediately prior layer. Three examples are indicated in ANN 202. Namely, f1 which is a child of f8 resides two layers ahead of f8's layer. Similarly, f4 which is a child of f10 resides two layers ahead of f10's layer. However, and f5 which is a child of f8 resides in the layer immediately prior to the f8's layer. Thus, a non-strict tree-like ANN may include a mix of inter-layer relationships.

[0049] Feed-forward neural networks (especially deep, i.e., ones with many layers) have been demonstrated to be quite successful in their predicative capabilities due in part to their ability to implicitly decompose complex objects into their constituent parts. This may be particularly the case for convolutional neural networks (CNNs), commonly used in computer vision. In CNNs, the weights in each layer are tied together, which tends to force the neurons to learn increasingly complex visual features, from simple edge detectors all the way to complex shapes such as human eyes, mouths, faces, and so on.

[0050] There has been recent interest in extending neural networks to learning from structured objects, such as graphs. A range of architectures have been proposed for this purpose, many of them based on various generalizations of the notion of convolution to these domains.

[0051] One particular architecture, which makes the part-based aspect of neural modeling very explicit, is that of compositional networks (comp-nets), introduced previously by the inventor. In accordance with example embodiments, comp-nets may represent a structured object X in terms of a decomposition of X into a hierarchy of parts, subparts, subsubparts, and so on, down to some number of elementary parts {e.sub.i}. Referring to the parts, subparts, subsubparts, and so on, simply as parts or subsystems P.sub.i, the decomposition may be considered as forming a so-called composition scheme of a collection of P that make up the hierarchy.

[0052] FIGS. 3A-3F illustrate conceptually the decomposition of an N-body physical system, such as a molecule, into an example hierarchy of subsystems. FIG. 3A first shows the example N-body physical system made up of constituent particles, such as atoms of a molecule. In the figure, and arrow point from a central particle and labeled F may represent the aggregate or total vector force on the central particle due to the physical interaction of the other particles. These might be electrostatic or other inter-atomic forces, for example.

[0053] By way of example, FIG. 3B shows a first level of the subsystem hierarchy. In the illustration, particular groupings of the particles represent a first level of subsystem or subparts. As shown, there appears to be six grouping, corresponding to six subparts. FIG. 3C show a next (second) level of the subsystem hierarchy, again by way of example. In this case, there are four groupings, corresponding to four subparts. Similarly, FIG. 3D shows the third level of groupings, this one having three subsystems, and FIG. 3E shows the top level of the hierarchy, having a single grouping that includes all of the lower level subsystems. In practice, the decomposition may be determined according to known or expected properties of the physical system under consideration. It will be appreciated, however, that the conceptual illustrations of FIGS. 3A-3E do not necessarily convey any such physical considerations. Rather, they merely depict the decomposition concept for the purposes of the discussion herein.

[0054] FIG. 3F illustrates how a decomposition scheme may be translated to a comp-net architecture. In accordance with example embodiments, each subsystem of a hierarchy may correspond to a node (or neuron) of an ANN in which successive layers represent successive layers of the compositional hierarchy. In accordance with at least some example embodiments, and as described in detail below, each subsystem may be described by a spatial position vector and an internal state vector. This is indicated by the labels r and |> in each of the subsystems shown in FIG. 3F. In the comp-net representation, the internal state vector of each subsystem may be computed as the activation of the corresponding neuron. The inputs to each non-leaf node may be the activation of one or more child nodes of one or more prior layers, each child node representing a subsystem of a lower level of the hierarchy.

[0055] Returning to consideration of the decomposition and the composition scheme, since each part P.sub.i can be a sub-part of more than one higher level part, the composition scheme is not necessarily a strict tree, but is rather a DAG (directed acyclic graph). An exact definition, in accordance with example embodiments, is as follows.

[0056] Definition 1. Let X be a compound object with n elementary parts ={e.sub.1, . . . , e.sub.n}. A composition scheme D for X is a directed acyclic graph (DAG) in which each node n.sub.i is associated with some subset P.sub.i of (these subsets are called the parts of X) in such a way that

[0057] 1. If n.sub.i is a leaf node, then P.sub.i contains a single elementary part e.sub.(i).

[0058] 2. D has a unique root node n.sub.i, which corresponds to the entire set {e.sub.1 . . . , e.sub.n}.

[0059] 3. For any two nodes n.sub.i and n.sub.j, if n.sub.i is a descendant of n.sub.j, then P.sub.iP.sub.j.

[0060] In accordance with example embodiments, a comp-net is a composition scheme that may be reinterpreted as a feed-forward neural network. In particular, in a comp-net each neuron n.sub.i also has an activation f.sub.i. For leaf nodes, f.sub.i may be some simple pre-defined vector representation of the corresponding elementary part e(i). For internal nodes, f.sub.i may be computed from the activations f.sub.ch.sub.1, . . . , f.sub.ch.sub.k of the children of n.sub.i by the use of some aggregation function (f.sub.ch.sub.1, . . . , f.sub.ch.sub.k) similar to equation (1). Finally, the output of the comp-net is the output of the root node n.sub.r.

[0061] FIGS. 4A and 4B further illustrate by way of example a translation from a hierarchical composition scheme of a compound object to a corresponding compositional neural network (comp-net). Specifically, FIG. 4A depicts a composition scheme 400-A in which leaf-nodes n.sub.1, n.sub.2, n.sub.3, and n.sub.4 of the first (lowest) level of the hierarchy correspond to single-element subsystems {e.sub.1}, {e.sub.2}, {e.sub.3}, and {e.sub.4}, respectively.

[0062] At the next (second) level up in the hierarchy, non-leaf nodes n.sub.5, n.sub.6, and n.sub.7 each contain two-element subsystems, each subsystem being built from a respective combination of two first-level subsystems. For example, as shown, n.sub.5 contains {e.sub.3, e.sub.4} from nodes n.sub.3 and n.sub.4, respectively. The arrows pointing from n.sub.3 and n.sub.4 to n.sub.5 indicate this relationship.

[0063] At the third level up, non-leaf nodes n.sub.8, n.sub.9, and n.sub.10 each contain three-element subsystems, each subsystem being built from a respective combination of subsystems from the previous levels. For example, as shown, n.sub.10 contains {e.sub.1, e.sub.4} from the two-element subsystem at n.sub.6, and {e.sub.2} from the single-element subsystem at n.sub.2. The arrows pointing from n.sub.6 and n.sub.2 to n.sub.10 indicate this relationship.

[0064] Finally, at the top level, the (non-leaf) root node n.sub.i contains all four elementary parts in subsystem {e.sub.i, e.sub.2, e.sub.3, e.sub.4} from the previous level. Note that subsystems at a given level above the lowest (leaf) level may overlap in terms of common (shared) elementary parts and/or common (shared) lower-level subsystems. It may also be seen by inspection that the example composition scheme 400-A corresponds to a non-strict tree-like structure.

[0065] FIG. 4B illustrates an example comp-net 400-B that corresponds to the composition scheme 400-A of FIG. 4A. In this illustration, the neurons {n.sub.1}, {n.sub.2}, . . . , {n.sub.r} of comp-net 400-B correspond, respectively, to the nodes n.sub.1, n.sub.2, . . . , n.sub.r of the composition scheme 400-A, and the arrow connecting the neurons correspond to the relationships between the nodes in the composition scheme. The neuron are also associated with respective activations f.sub.1, f.sub.2, . . . , f.sub.r, as shown. Thus, as illustrated in this example, the activations f.sub.3 and f.sub.4 are inputs to neuron (node) {n.sub.5}, which uses them in computing its activation f.sub.5.

[0066] The inventor has previously detailed the behavior of comp-nets under transformations of X, in particular, how to ensure that the output of the network is invariant with respect to spurious permutations of the elementary parts, whilst retaining as much information about the combinatorial structure of X as possible. This is significant in graph learning, where X is a graph, e.sub.1, . . . , e.sub.n are its vertices, and {P.sub.i} are subgraphs of different radii. The proposed solution, covariant compositional networks (CCNs), involves turning the {f.sub.i} activations into tensors that transform in prescribed ways with respect to permutations of the elementary parts making up each P.sub.i.

[0067] Referring again to FIG. 3F, the activations of the nodes of a comp-net may describe states of the subsystems corresponding to the nodes. In a representation of a physical N-body system, the computation of the state of a given node may characterize physical interactions between the constituent subsystems of the given node. In accordance with example embodiments, and as described in detail below, the activations are constructed to ensure that the states are tensorial objects with spatial character, and in particular that they are covariant with rotations in the sense that they transform under rotations according to specific irreducible representations of the rotation group.

IV. ANALYTICAL DESCRIPTION OF, AND THEORETICAL BASES FOR, COVARIANT COMP-NETS

[0068] A. Compositional Models for Atomic Environments

[0069] Decomposing complex systems into a hierarchy of interacting subsystems at different scales is a recurring theme in physics, from coarse graining approaches to renormalization group theory. The same approach applied to the atomic neighborhood lends itself naturally to learning force fields. For example, to calculate the aggregate force on the central atom, in a first approximation one might just sum up independent contributions from each of its neighbors. In a second approximation, one would also consider the modifying effect of the local neighborhoods of the neighbors. A third order approximation would involve considering the neighborhoods of the atoms in these neighborhoods, and so on.

[0070] The inventor has recognized that the compositional networks formalism is thus a natural framework for force field learning. In particular, comp-nets may be considered in which the elementary parts correspond to actual physical atoms, the internal nodes correspond to subsystems P.sub.i made up of multiple atoms. In accordance with example embodiments, the corresponding activation, now denoted .sub.i, and referred to herein as the state of P.sub.i, may effectively be considered a learned coarse grained representation of P.sub.i. What makes physical problems different from, such as learning graphs, however, is their spatial character. In particular: [0071] 1. Each subsystem P.sub.i may now also be associated with a vector r.sub.i custom-character .sup.3 specifying its spatial position. [0072] 2. The interaction between two subsystems P.sub.i and P.sub.j depends not only on their relative positions, but also on their relative orientations. Therefore, .sub.i and .sub.j must also have spatial character, somewhat similarly to the terms of the monopole, dipole, quadrupole, etc. expansions, for example.
If the entire the atomic environment is rotated around the central atom by some rotation RSO(3).sup.3, the position vectors transform as r.sub.i custom-character Rr.sub.i. Mathematically, the second point above says that the .sub.i activations (states) must also transform under rotations in a predictable way, which is expressed by saying that they must be rotationally covariant.

[0073] Group Representations and N-Body Networks

[0074] Just as covariance to permutations is a critical constraint on the graph CCNs, covariance to rotations is the guiding principle behind CCNs for learning atomic force fields. To describe this concept in its general form, a starting assumption may be taken to be that any given activation is representable as a d dimensional (complex valued) vector, and that the transformation that undergoes under a rotation R is linear, i.e., custom-character (R) for some matrix (R).

[0075] The linearity assumption is sufficient to guarantee that for R, RSO(3), (R)(R)=(RR). Complex matrix valued functions satisfying this criterion are called representations of the group SO(3). Standard theorems in representation theory indicate that any compact group G (such as SO(3)) has a sequence of so-called inequivalent irreducible representations .sub.0, .sub.1, . . . (irreps, for short), and that any other representation of G can be reduced into a direct sum of irreps in the sense that there is some invertible matrix C and sequence of integers .sub.0, .sub.1, . . . such that

(R)=C.sup.1[ custom-character (R)]C.(2)

[0076] Here custom-character is called the multiplicity of in , and r=(.sub.0, .sub.1, . . . ) is called the type of . Another feature of the representation theory of compact groups is that the irreps can always be chosen to be unitary, i.e., (R.sup.1)=(R).sup.1=(R).sup., where M.sup. denotes the Hermitian conjugate (conjugate transpose) of the matrix M. In the following it may be assumed that irreps satisfy this condition. If is also unitary, then the transformation matrix C will be unitary too, so C.sup.1 may be replaced with C.sup..

[0077] In the specific case of the rotation group SO(3), the irreps are sometimes called Wigner D-matrices. The custom-character =0 irrep consists of the one dimensional constant matrices .sub.0(R)=(1), the =0 irrep (up to conjugation) is equivalent to the rotation matrices themselves, while for general , assuming that (, , ) are the Euler angles of R, [(R)].sub.m,m=e.sup.im(, ), where are the well known spherical harmonic functions. In general, the dimensionality of custom-character is +1, i.e., (R).

[0078] Definition 2. custom-character .sup.d is said to be an SO(3)-covariant vector of type =(.sub.0, .sub.1, 2, . . . ) if under the action of rotations it transforms as

custom-character [(R)].(3)

Setting

= custom-character (4)

custom-character may be called the (l, m)-fragment of , and

custom-character =

may be called the custom-character 'th part of . A covariant vector of type =(0, 0, . . . , 0, 1), where the single 1 corresponds to .sub.k, may be called an irreducible vector of order k or an irreducible .sub.k-vector. Note that a first order irreducible vector is just a scalar.

[0079] A benefit of the above definition is that each fragment custom-character transforms in the very simple way (R). Note that the terms fragment and part are not necessarily standard in the literature, but are used here for being useful in describing covariant neural architectures. Also note that unlike equation (2), there is no matrix C in equations (3) and (4). This is because if a given vector w transforms according to a general representation whose decomposition does include a nontrivial C, this matrix may be easily be factored out by redefining as C. Here custom-character is sometimes also called the projection of to the 'th isotypic subspace of the representation space that lives in, and =.sup.0.sup.1 . . . is called the isotypic decomposition of . With these representation theoretic tools in hand, the concept of SO(3)-covariant N-body neural networks may be defined as follows.

[0080] Definition 3. Let S be a physical system made up of n particles .sub.1, . . . , .sub.n. An SO(3)-covariant N-body neural network N for S is a composition scheme D in which [0081] 1. Each node n.sub.j, which may also be referred to as a gate, is associated with [0082] (a) a physical subsystem P.sub.j of S; [0083] (b) a vector r.sub.j custom-character .sup.3 describing the spatial position of P.sub.j; [0084] (c) a vector .sub.j that that describes the internal state of P.sub.j and is type .sub.j covariant to rotations. [0085] 2. If n.sub.j is a leaf node, then .sub.j is determined by the corresponding particle .sub.J. [0086] 3. If n.sub.j is a non-leaf node and its children are n.sub.ch.sub.1, . . . , n.sub.ch.sub.k, then .sub.j is computed as

.sub.j=.sub.j({circumflex over (r)}.sub.ch.sub.1, . . . ,{circumflex over (r)}.sub.ch.sub.k,{circumflex over (r)}.sub.ch.sub.1, . . . ,{circumflex over (r)}.sub.ch.sub.k,.sub.ch.sub.1, . . . ,.sub.ch.sub.k),(5) [0087] where {circumflex over (r)}.sub.ch.sub.1=r.sub.ch.sub.ir.sub.j and {circumflex over (r)}.sub.i=|{circumflex over (r)}.sub.i|. In the discussion herein, .sub.j is referred to as the local aggregation rule. [0088] 4. D has a unique root n.sub.r, and the output of the network, i.e., the learned state of the entire system is .sub.r. In the case of learning scalar valued functions, such as the atomic potential, .sub.r is just a scalar.

[0089] In accordance with example embodiments, Definition 3 may be considered as defining a general architecture for learning the state of N-body physical systems with much wider applicability than just learning atomic potentials. Also in accordance with example embodiments the .sub.j aggregation rules may be defined in such a way as to guarantee that each y is SO(3)-covariant. This is what is addressed in the following section.

[0090] B. Covariant Aggregation Rules

[0091] To define the aggregation function to be used in SO(3)-covariant comp-nets, it may only be assumed that is a polynomial in the relative positions {circumflex over (r)}.sub.ch.sub.1, . . . , {circumflex over (r)}.sub.ch.sub.k, the constituent state vectors .sub.ch.sub.1 . . . , .sub.ch.sub.k and the inverse distances 1/{circumflex over (r)}.sub.ch.sub.1, . . . , 1/{circumflex over (r)}.sub.ch.sub.k. Specifically, it may be said that is a (P,Q,S)-order aggregation function if each component of =.sub.j({circumflex over (r)}.sub.ch.sub.1, . . . , {circumflex over (r)}.sub.ch.sub.k, {circumflex over (r)}.sub.ch.sub.1, . . . , {circumflex over (r)}.sub.ch.sub.k, .sub.ch.sub.1 . . . , .sub.ch.sub.k) is a polynomial of order at most p in each component of r.sub.ch.sub.i, a polynomial of at most q in each component of .sub.ch.sub.i, and a polynomial of order at most s in each 1/({circumflex over (r)}.sub.ch.sub.i). Any such can be expressed as

[00001] $\begin{matrix} (.Math.) = .Math. (_{p, q, s} .Math. r_{{ch}_{1}}^{{.Math.}_{p_{1}}} .Math. .Math. .Math. r_{{ch}_{k}}^{{.Math.}_{p_{k}}} .Math._{{ch}_{1}}^{{.Math.}_{q_{1}}} .Math. .Math. .Math._{{ch}_{k}}^{{.Math.}_{q_{k}}} .Math. {\hat{r}}_{{ch}_{1}}^{- s_{1}} .Math. .Math. .Math. {\hat{r}}_{{ch}_{k}}^{- s_{k}}), & (6) \end{matrix}$

where p, q and s are multi-indices of positive integers with p.sub.iP, q.sub.iQ and s.sub.iS, and custom-character is a linear function. The tensor products appearing in equation (6) are formidably large object and in most cases may be impractical to compute explicitly. Accordingly, this equation is meant to emphasize that any learnable parameters of the network must be implicit in the linear operator custom-character .

[0092] The more stringent requirements on custom-character arise from the covariance criterion. The inventor has recognized that understanding these may be aided by the observation that for any sequence .sub.1, . . . , .sub.p of (not necessarily irreducible) representations of a compact group G, their tensor product

(R)=.sub.1(R).Math..sub.2(R).Math. . . . .Math..sub.p(R)

is also a representation of G. Consequently, has a decomposition into irreps, similar to equation (2). As an immediate corollary, any product of SO(3) covariant vectors can be similarly decomposed. In particular, by applying the appropriate unitary matrix C, the sum of tensor products appearing in equation (6) can be decomposed into a sum of irreducible fragments in the form

[00002] $_{= 0}^{} .Math._{m = 1}^{_{}} .Math._{m}^{} = C (_{p, q, s} .Math. r_{{ch}_{1}}^{{.Math.}_{p_{1}}} .Math. .Math. .Math. r_{{ch}_{k}}^{{.Math.}_{p_{k}}} .Math._{{ch}_{1}}^{{.Math.}_{q_{1}}} .Math. .Math. .Math._{{ch}_{k}}^{{.Math.}_{q_{k}}} .Math. {\hat{r}}_{{ch}_{1}}^{- s_{1}} .Math. .Math. .Math. {\hat{r}}_{{ch}_{k}}^{- s_{k}}) .$

More explicitly,

[00003] $\begin{matrix} _{m}^{} = T_{m}^{} (_{p, q, s} .Math. r_{{ch}_{1}}^{{.Math.}_{p_{1}}} .Math. .Math. .Math. r_{{ch}_{k}}^{{.Math.}_{p_{k}}} .Math._{{ch}_{1}}^{{.Math.}_{q_{1}}} .Math. .Math. .Math._{{ch}_{k}}^{{.Math.}_{q_{k}}} .Math. {\hat{r}}_{{ch}_{1}}^{- s_{1}} .Math. .Math. .Math. {\hat{r}}_{{ch}_{k}}^{- s_{k}}), & (7) \end{matrix}$

where T.sub.1.sup.0, . . . , T.sub..sub.0.sup.0, T.sub.1.sup.1, . . . , T.sub..sub.2.sup.0, . . . , T.sub..sub.L.sup.L is an appropriate sequence of projection operators. In accordance with example embodiments, the following proposition may provide a foundational result.

[0093] Proposition 1. The output of the aggregation function of equation (6) is a -covariant vector if and only if custom-character is of the form

custom-character ( . . . )=.sub.m.sub..(8)

[0094] Equivalently, collecting all custom-character fragments with the same into a matrix , all (.sub.,m).sub.m,m weights into a matrix and reinterpreting the output of as a collection of matrices rather than a single long vector, equation (8) may be expressed as

custom-character ( . . . )=({tilde over (F)}.sup.0W.sup.0,{tilde over (F)}.sup.1W.sup.1, . . . ,{tilde over (F)}.sup.LW.sup.L).(9)

[0095] Proposition 1 indicates that custom-character is only allowed to mix fragments with the same , and that fragments can only be mixed in their entirety, rather than picking out their individual components. These are fundamental consequences of equivariance. However, there are no further restrictions on the ( mixing matrices.

[0096] In accordance with example embodiments, in an N-body neural network, the custom-character matrices are shared across (some subsets of) nodes, and it is these mixing (weight) matrices that the network learns from training data. The matrices can be regarded as generalized matrix valued activations. Since each interacts with the matrices linearly, the network can be trained the usual way by backpropagating gradients of whatever loss function is applied to the output node n.sub.r, whose activation may typically be scalar valued.

[0097] It may be noted that N-body neural networks have no additional nonlinearity outside of , since that would break covariance. In contrast, in most existing neural network architectures, as explained above, each neuron first takes a linear combination of its inputs weighted by learned weights and then applies a fixed pointwise nonlinearity, a. In accordance with architecture of N-body neural networks as described by way of example herein, the nonlinearity is hidden in the way that the custom-character fragments are computed, since a tensor product is a nonlinear function of its factors. On the other hand, mixing the resulting fragments with the weight matrices is a linear operation. Thus, in N-body neural networks as describe herein, the nonlinear part of the operation precedes the linear part.

[0098] The generic polynomial aggregation function of equation (6) may be too general to be used in a practical N-body network, and may be too costly computationally. Instead, in accordance with example embodiments, a few specific types of low order gates may be used, such as those described below

[0099] Zeroth Order Interaction Gates

[0100] Zeroth order interaction gates aggregate the states of their children and combine them with their relative position vectors, but do not capture interactions between the children. A simple example of such a gate would be one where

( . . . )= custom-character (.sub.i=1.sup.k(.sub.ch.sub.i.Math.{circumflex over (r)}.sub.ch.sub.i),.sub.i=1.sup.k{circumflex over (r)}.sub.ch.sub.i.sup.1(.sub.ch.sub.i.Math.{circumflex over (r)}.sub.ch.sub.i),.sub.i=1.sup.k{circumflex over (r)}.sub.ch.sub.i(.sub.ch.sub.i.Math.{circumflex over (r)}.sub.ch.sub.i)).(10)

[0101] Note that the summations in these formulae ensure that the output is invariant with respect to permuting the children and also reduce the generality of equation (6) because the direct sum is replaced by an explicit summation (this can also be interpreted as tying some of the mixing weights together in a particular way). Let L be the largest custom-character for which 0 in the inputs. In the L=0 case each .sub.ch.sub.i state is a scalar quantity, such as electric charge. In the L=1 case it is a vector, such as the dipole moment. In the L=2 case it can encode the quadropole moment, and so on. A gate of the above form can learn how to combine such moments into a single (higher order) moment corresponding to the parent system.

[0102] It may be instructive to see how many parameters a gate of this type has. For this purpose, the simple case that each .sub.ch.sub.i is of type r=(1, 1, . . . , 1) (up to custom-character =L) may be assumed. The type of {circumflex over (r)}.sub.ch.sub.i is (0, 1). According to the Clebsch-Gordan rules, as described in more detail below, the product of two such vectors is a vector of type (1, 3, 2, . . . , 2, 1) (of length L+1). It may be further assume that desired output type is again =(1, 1, . . . , 1) of length L. This means that the custom-character =L+1 fragment does not even have to be computed, and the size of the weight matrices appearing in equation (9) are

W.sub.0 custom-character .sup.13 W.sub.1.sup.19 W.sub.2.sup.16 . . . W.sub.L.sup.16.

[0103] The size of these matrices changes dramatically as more channels are allowed. For example, if each of the input states are of type =(c, c, . . . , c), the type of .sub.ch.sub.i.Math.{circumflex over (r)}.sub.ch.sub.i becomes (c, 3c, 2c, . . . , 2c, 1c). Assuming again an output of type =(c, c, . . . , c), the weight matrices become custom-character

W.sub.0 custom-character .sup.c3c W.sub.1.sup.c9c W.sub.2.sup.c6c . . . W.sub.L.sup.c6c.

[0104] In many networks, however, the number of channels increases with height in the network. Allowing the output type to be as rich as possible, without inducing linear redundancies, the output type becomes (3c, 9c, 6c, . . . , 6c, 3c), and

W.sub.0 custom-character .sup.3c3c W.sub.1.sup.9c9c W.sub.2.sup.9c6c . . . W.sub.L.sup.6c6c.

[0105] First Order Interaction Gates

[0106] In first order interaction, gates each of the children interact with each other, and the parent aggregates these pairwise interactions. A simple example would be computing the total energy of a collection of charged bodies, which might be done with a gate of the form

( . . . )= custom-character (.sub.i,j=1.sup.k(.sub.ch.sub.i.Math..sub.ch.sub.j.Math.{circumflex over (r)}.sub.ch.sub.i.Math.{circumflex over (r)}.sub.ch.sub.j),.sub.i,j=1.sup.k{circumflex over (r)}.sub.ch.sub.i.sup.1{circumflex over (r)}.sub.ch.sub.j.sup.1(.sub.ch.sub.i.Math..sub.ch.sub.j.Math.{circumflex over (r)}.sub.ch.sub.i.Math.{circumflex over (r)}.sub.ch.sub.j),.sub.i,j=1.sup.k{circumflex over (r)}.sub.ch.sub.i.sup.2{circumflex over (r)}.sub.ch.sub.j.sup.2(.sub.ch.sub.i.Math..sub.ch.sub.j.Math.{circumflex over (r)}.sub.ch.sub.i.Math.{circumflex over (r)}.sub.ch.sub.j),.sub.i,j=1.sup.k{circumflex over (r)}.sub.ch.sub.i.sup.3{circumflex over (r)}.sub.ch.sub.j.sup.3(.sub.ch.sub.i.Math..sub.ch.sub.j.Math.{circumflex over (r)}.sub.ch.sub.i.Math.{circumflex over (r)}.sub.ch.sub.j)).(11)

[0107] Generalizing equation (6) slightly, if the interaction only depends on the relative positions of the child systems, another form that may be used is

( . . . )= custom-character (.sub.i,j=1.sup.k(.sub.ch.sub.i.Math..sub.ch.sub.j.Math.{circumflex over (r)}.sub.ch.sub.i.sub.,ch.sub.j),.sub.i,j=1.sup.k{circumflex over (r)}.sub.ch.sub.i.sub.,ch.sub.j.sup.1(.sub.ch.sub.i.Math..sub.ch.sub.j.Math.{circumflex over (r)}.sub.ch.sub.i.sub.ch.sub.j),.sub.i,j=1.sup.k{circumflex over (r)}.sub.ch.sub.i.sub.,ch.sub.j.sup.2(.sub.ch.sub.i.Math..sub.ch.sub.j.Math.{circumflex over (r)}.sub.ch.sub.i.sub.,ch.sub.j),.sub.i,j=1.sup.k{circumflex over (r)}.sub.ch.sub.i.sub.,ch.sub.j.sup.3(.sub.ch.sub.i.Math..sub.ch.sub.j.Math.{circumflex over (r)}.sub.ch.sub.i.sub.,ch.sub.j)),(12)

where {circumflex over (r)}.sub.ch.sub.i.sub.,ch.sub.j={circumflex over (r)}.sub.ch.sub.i{circumflex over (r)}.sub.ch.sub.j and {circumflex over (r)}.sub.ch.sub.i.sub.,ch.sub.j=|{circumflex over (r)}.sub.ch.sub.i.sub.,ch.sub.j|.

[0108] It will be appreciated that in the above, electrostatics was used only as an example. In practice, there would typically be no need to learn electrostatic interactions because they are already described by classical physics. Rather, using the zeroth and first order interaction gates may be envisaged as constituents of a larger network for learning more complicated interactions with no simple closed form that nonetheless broadly follow similar scaling laws as classical interactions.

[0109] C. Clebsch-Gordan Transforms

[0110] It may now be explained how the custom-character projection maps appearing in equation (7) are computed. This is significant because the nonlinearities in N-body neural network as described herein are the tensor products, and, in accordance with example embodiments, the architecture needs to incorporate the ability to reduce vectors into a direct sum of irreducibles again straight after the tensor product operation.

[0111] To this end the inventor has recognized that representation theory provides a clear prescription for how this operation is to be performed. For any compact group G, given two irreducible representations custom-character and , the decomposition of .Math. into a direct sum of irreducibles

custom-character (R).Math.(R)=[(R)](13)

is called the Clebsch-Gordan transform. In the specific case of SO(3), the multiplicities take on the very simple form

[00004] $_{_{1},_{2}} () = {\begin{matrix} 1 .Math. .Math. if .Math. .Math. .Math._{1} -_{2} .Math._{1} +_{2} \\ 0 .Math. .Math. otherwise, \end{matrix}$

and the elements of the custom-character matrices can also be computed relatively easily via closed form formulae.

[0112] It may be seen immediately that equation (13) prescribes how to reduce the product of covariant vectors into irreducible fragments. Assuming for example that .sub.1 is an irreducible custom-character vector and .sub.2 is an irreducible vector, .sub.1.Math..sub.2 decomposes into irreducible fragments in the form

.sub.1.Math..sub.2= custom-character where =(.sub.1.Math..sub.2),

and custom-character is the part of matrix corresponding to the 'th block. Thus, in this case the operator just corresponds to multiplying the tensor product by . By linearity, the above relationship also extends to non-irreducible vectors. If .sub.1 is of type .sub.1 and .sub.2 is of type .sub.2, then

.sub.1.Math..sub.2= custom-character

where

.sub..sub.1.sub.,.sub.2( custom-character )=.Math..Math.[|.sub.1.sub.2|.sub.1+.sub.2],

and custom-character [.Math.] is the indicator function. Once again, the actual fragments are computed by applying the appropriate matrix to the appropriate combination of irreducible fragments of .sub.1 and .sub.2. It is also clear that by applying the Clebsch-Gordan decomposition recurisively, a tensor product of any order may be decomposed, for example,

.sub.1.Math..sub.2.Math..sub.3.Math. . . . .Math..sub.k=((.sub.1.Math..sub.2).Math..sub.3).Math. . . . .Math..sub.k.

[0113] In practical computations of such higher order products, optimizing the order of operations and reusing potential intermediate results may be used to minimize computational cost.

V. EXAMPLE METHOD

[0114] Example methods may be implemented as machine language instructions stored one or another form of the computer-readable storage, and accessible by the one or more processors of a computing device and/or system, and that, when executed by the one or more processors cause the computing device and/or system to carry out the various operations and functions of the methods described herein. By way of example, storage for instructions may include a non-transitory computer readable medium. In example operation, the stored instructions may be made accessible to one or more processors of a computing device or system. Execution of the instructions by the one or more processors may then cause the computing device or system to carry various operations of the example method.

[0115] FIG. 5 is a flow chart of an example method 500, according to example embodiments. Specifically, example method 500 is a computational method for simulating an N-body physical system, wherein the N-body physical system is represented mathematically as a compound object X having N elementary parts E={e.sub.i}, i=1, . . . , N, each e.sub.i representing one of the N bodies of the N-body physical system. In accordance with example embodiments, X may be hierarchically decomposed into J subsystems, P.sub.j, j=1, . . . , J, each P.sub.j may include one or more of the elementary parts of E. Further, each P.sub.j may be described by a position vector r.sub.j and an internal state vector .sub.j. The steps of example method 500 may be carried out by a computing device, such as computing device 100.

[0116] At step 502, a hierarchical artificial neural network (ANN) having J nodes each corresponding to one of the J subsystems, may be constructed. In the context of a computer-implemented method, constructing an ANN may correspond to implementing the ANN in software or other machine language code. This may entail implementing data structures and operational and/or functional objects according to predefined classes as specified in various instructions, for example. The J nodes may m2 leaf nodes as ANN inputs, m=1 non-leaf root node as ANN output, and m1 intermediate non-leaf nodes. Each node may be considered a neuron of the ANN and may be configured to compute an activation corresponding to a different one of the internal state vectors .sub.j according to node type. In particular, for each leaf node, .sub.j may describe the internal state of a respective one of the P.sub.j subsystems having just a single elementary part e.sub.i; for each given intermediate non-leaf node, .sub.j may describe the internal state of a respective one of the P.sub.j subsystems having 2k<N parts e.sub.i that are each comprised in a child node of the given intermediate non-leaf node; and for the root node, .sub.j may describe the internal state of a subsystem P.sub.j having k=N elementary parts e.sub.i that are each part of a child node of the root node.

[0117] At step 502, the computing device may receive input data to the leaf nodes specifying respective position vectors and respective internal state vectors of the N elementary parts E.

[0118] At step 506, for each given non-leaf node, .sub.j may be computed from the position vectors and internal states of all the child nodes of the given non-leaf node according to a covariant aggregation rule that represents .sub.j as a tensor object that is covariant to rotations of the rotation group SO(3).

[0119] At step 508, a Clebsch-Gordan transform may be applied to reduce tensor products of the state vectors of the nodes to irreducible covariant vectors.

[0120] Finally, at step 510, .sub.j of the root node may be computed as output of the ANN. As such, the result may take the form of, or correspond to, a simulation of the internal state of the N-body physical system.

[0121] In accordance with example embodiments, the tensor products of the state vectors and application of the Clebsch-Gordan transform entail mathematical operations that are nonlinear. Further applying the Clebsch-Gordan transform to reduce the tensor products of the state vectors of the nodes to irreducible covariant vectors may entail applying the nonlinear operations in Fourier space.

[0122] In accordance with example embodiments, the m2 leaf nodes may form an input layer of the hierarchical ANN, the m=1 non-leaf root node may form an single-node output layer of the hierarchical ANN, and the m1 intermediate non-leaf nodes may be distributed among m1 intermediate layers of the hierarchical ANN. In addition, the hierarchical ANN is one of a strict tree-like structure, or a non-strict tree-like structure. As described above, in a strict tree-like structure, each successive layer after the input layer may include one or more parent nodes of one or more child nodes that reside only in an immediately preceding layer. As also described above, in a non-strict tree structure, each successive layer after the input layer may include one or more parent nodes of one or more child nodes that reside among more than preceding layer.

[0123] In further accordance with example embodiments, each given non-leaf node computing .sub.j from the position vectors and internal states of all the child nodes of the given non-leaf node may entail the given non-leaf node receiving the activation of each of its child nodes. In an example embodiment, the activation of each given child node may correspond to the internal state of the given child node.

[0124] In accordance with example embodiments, the J subsystems may correspond to a hierarchy of substructures of the compound object X, from smallest to largest, the largest corresponding to the entirety of X. In this scheme, each of the P.sub.j subsystems that has just a single elementary part e.sub.i may correspond to a single one of the smallest substructures, the subsystem P.sub.j that has k=N elementary parts e.sub.i may correspond to the largest substructure, and wherein the P.sub.j subsystems that have 2k<N parts e.sub.i may correspond to substructures between the smallest and largest.

[0125] In further accordance with example embodiments, the J subsystems may correspond to a hierarchy of substructures of the compound object X, such that each node of the hierarchical ANN corresponds to one of the substructures of the compound object X. As such, each respective non-leaf node may correspond to a respective substructure of the compound object X that includes the substructures of all of the child nodes of the respective non-leaf node, and each respective leaf node may correspond to a particular substructure of the compound object X comprising a single elementary part e.sub.i. In an example embodiment, the internal state of each given subsystem may then correspond to a respective potential energy function due to physical interactions among the substructures of the child nodes of the node corresponding to the given subsystem.

[0126] In still further accordance with example embodiments, the hierarchical ANN may include adjustable weights shared among two or more of the nodes, such that the method further comprises training the ANN to learn the potential energy functions of all of the subsystems by adjusting the weights of the nodes corresponding to the subsystems.

[0127] In further accordance with example embodiments, training the ANN to learn the potential energy functions may entail providing training data to the input layer, where the training data includes for the N-body physical system one or more known training sets. Each training set may include (i) a given configuration of position vectors, and (ii) a known potential function for the given configuration. Training may thus entail, for each of the training sets, comparing a computed potential function output from the non-leaf root node with the known potential function for the given configuration, and based on the comparing, adjusting the weights to achieve agreement, to within a threshold level, between the computed potential functions and the known potential functions across the training sets.

[0128] Further, as the training sets may be associated with multiple different known configurations, an N-body comp-net may learn to recognize potentials from multiple examples. In this way, the N-body comp-net may later be applied to provide simulation results for new configurations that have not been previously analyzed. And as discussed above, learning molecular potentials represents a non-limiting example of physical properties or characteristics that an N-body comp-net may learn during training, and later predict from live testing data.

[0129] In further accordance with example embodiments, each of the training sets may include empirical measurements of the N-body physical system, ab initio computations of forces and energies of the N-body physical system, or a mixture of both.

[0130] In an example embodiment, method 500 may be applied to simulate molecules. As such, the compound object X may be or include molecules, and each elementary part e.sub.i may be an atom. In this application of method 500, .sub.j for each node may represent atomic potentials and forces experienced by each corresponding subsystem P.sub.j due the presence and relative positions of each of the other P.sub.j subsystems.

CONCLUSION

[0131] Using neural networks to learn to the behavior and properties of complex physical systems shows considerable promise. However, physical systems have nontrivial invariance properties (in particular, invariance to translations, rotations and the exchange of identical elementary parts) that must be strictly respected.

[0132] Methods and systems disclosed here employ a new type of generalized convolutional neural network architecture, N-body networks, which provides a flexible framework for modeling interacting systems of various types, while taking into account these invariances (symmetries). An example application for N-body networks learning atomic potentials (force fields) for molecular dynamics simulations. However, N-body networks may be used more broadly, for modeling a variety of systems.

[0133] N-body networks are distinguished from earlier neural network models for physical systems in that [0134] 1. The model is based on a hierarchical (but not necessarily strictly tree-like) decomposition of the system into subsystems at different levels, which is directly reflected in the structure of the neural network. [0135] 2. Each subsystem is identified with a neuron (or gate) n.sub.i in the network, and the output (activation) .sub.i of the neuron becomes a representation of the subsystem's internal state. [0136] 3. The .sub.i states are tensorial objects with spatial character, in particular they are covariant with rotations in the sense that they transform under rotations according to specific irreducible representations of the rotation group. The gates are specially constructed to ensure that this covariance property is preserved through the network. [0137] 4. Unlike most other neural network architectures, the nonlinearities in N-body networks are not pointwise operations, but are applied in Fourier space, i.e., directly to the irreducible parts of the state vector objects. This is only possible because (a) the nonlinearities arise as a consequence of taking tensor products of covariant objects (b) the tensor products are decomposed into irreducible parts by the Clebsch-Gordan transform.

[0138] Advantageously, the last of these ideas may be particularly promising, because it allows for constructing neural that operate entirely in Fourier space, and use tensor products combined with Clebsch-Gordan transforms to induce nonlinearities.

[0139] While example embodiments of N-body networks have been described in terms of molecular or atomic systems and potentials, applicability may be significantly broader. In particular, while .sub.j of a given subsystem has be described as the internal state of a system (or subsystem), this should not be interpreted as limiting the scope with respect to other applications.

[0140] In addition, application of N-body networks to learning the energy function of the system is also just one possible non-limiting example. In particular, the architecture can also be used for learning a variety of other things, such as solubility, affinity for binding to some kind of target, and as well as other physical, chemical, or biological properties.

[0141] As a further example of broader applicability, DFT (e.g., ab initio) and other models that may provide training data and models for N-body networks can provide forces in addition to energies. The force information may be relatively easily integrated into the N-body network framework because the force is the gradient of the energy, and neural networks already propagate gradients. This opens the possibility of learning from derivatives as well.

[0142] More generally, neural networks may be flexibly extended and/or applied in joint operation. As such, the example application described herein may be considered a convenient supervised learning setting for illustrative purposes. However, applying the Clebsch-Gordan approach to N-body comp-nets may also be used (possibly as part of a larger architecture) to optimize the structure of atomic systems or generate new molecules for a particular goal, such as drug design.

[0143] Example embodiments herein provide a novel and efficient approach to computationally simulating an N-body physical system with covariant, compositional neural networks.

[0144] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purpose of illustration and are not intended to be limiting, with the true scope being indicated by the following claims.

Covariant Neural Network Architecture for Determining Atomic Potentials

Inventors

Cpc classification

Classification Explorer

G16C20/30

PHYSICS

Classification Explorer

G06N3/084

PHYSICS

Classification Explorer

G16B40/20

PHYSICS

Classification Explorer

G06N5/046

PHYSICS

Classification Explorer

G06N3/045

PHYSICS

Classification Explorer

G16B5/00

PHYSICS

Classification Explorer

G06N3/0463

PHYSICS

Classification Explorer

G16C20/70

PHYSICS

Classification Explorer

G16C10/00

PHYSICS

Classification Explorer

G06N3/048

PHYSICS

International classification

Classification Explorer

G16B5/00

PHYSICS

Classification Explorer

G06N3/04

PHYSICS

Classification Explorer

G06N3/08

PHYSICS

Classification Explorer

G06N5/04

PHYSICS

Classification Explorer

G16B40/20

PHYSICS

Abstract

Claims

Description