Suppressing interaction between bonded particles
11264120 · 2022-03-01
Assignee
Inventors
- Ping Tak Peter Tang (Edison, NJ)
- J. P. Grossman (Huntington, NY, US)
- Brannon Batson (Brooklyn, NY)
- Ron Dror (Stanford, CA, US)
Cpc classification
G16C20/90
PHYSICS
International classification
G16C10/00
PHYSICS
Abstract
A method for managing flow of particles into an array of pairwise-point-interaction-module includes receiving a first set of particles into a first queue. The first set is a proper subset of a second set of particles that comprises all particles that are to be passed into an array of pairwise-point-interaction-modules during a current time period. Prior to having received all particles from the second set, particles from the first set are allowed to pass from the first queue into the array.
Claims
1. A method comprising causing a simulation machine for molecular-dynamic simulation to manage flow of particles into an array of pairwise-point-interaction-modules, wherein said simulation machine comprises nodes connected to each other by a network, said nodes collectively representing a volume with each node corresponding to a particular portion of said simulation space, said simulation machine further comprising a first queue and an array of pairwise-point-interaction modules, wherein causing said simulation machine for molecular-dynamic simulation to manage flow of said particles into said array comprises receiving a first set of particles into said first queue, said first set of particles being a proper subset of a second set of particles, wherein said second set of particles comprises all particles that are to be passed into said array of pairwise-point-interaction-modules during a current time period, and prior to having received all particles from said second set, allowing said particles from said first set to pass from said first queue into said array.
2. The method of claim 1, further comprising continuing to load particles from said first set into said array as additional particles from said second set are received into said first queue.
3. The method of claim 1, further comprising receiving a third set of particles into a second queue, wherein said third set of particles comprises all particles that are to only be loaded into said array during said current time period, and wherein allowing said particles from said first set to pass from said first queue into said array occurs only after all particles from said third set have been loaded into said array.
4. The method of claim 1, further comprising receiving a third set of particles into second and third queues, wherein said third set of particles comprises all particles that are to be loaded into said array during said current time period, and wherein allowing said particles from said first set to pass from said first queue into said array occurs only after all particles from said third set have been loaded into said array.
5. The method of claim 1, further comprising receiving a fourth set of particles into a third queue, wherein said fourth set of particles is to be both loaded and streamed into said array, wherein streaming of particles from said fourth set commences only after completion of loading of particles from said fourth set.
6. The method of claim 1, wherein said first queue is a logical queue, wherein said method further comprises selecting said first and second set of particles from a plurality of selected physical queues.
7. A non-transitory and tangible computer-readable medium having encoded thereon software that, when executed by a molecular dynamic simulation machine, causes execution of the method of claim 1 by said molecular dynamic simulation machine, wherein said molecular dynamic simulation machine comprises nodes connected to each other by a network, said nodes collectively representing a volume with each node corresponding to a particular portion of said simulation space, a first queue, and an array of pairwise-point-interaction modules.
8. An apparatus comprising a molecular dynamics simulator comprising nodes, an array, and a first queue, wherein said nodes are connected to each other by a network and collectively represent a volume with each node corresponding to a particular portion of said simulation space, wherein said array is an array of pairwise-point-interaction modules, wherein said simulation machine is configured to execute the method of claim 1.
9. The apparatus of claim 8, wherein each of said nodes comprises an application-specific integrated circuit that comprises flex tiles comprising geometry cores and a common memory that is available to all of said geometry cores.
10. The apparatus of claim 9, wherein said flex tiles further comprise a network interface for enabling communication with other components of said node.
11. The apparatus of claim 9, wherein said flex tiles further comprise a dispatch unit to provide hardware support for fine-grained event-driven computation.
12. The apparatus of claim 9, wherein said application-specific integrated circuit further comprises a logic analyzer that captures and stores activity of said node.
13. The apparatus of claim 9, wherein said application-specific integrated circuit further comprises a host interface that provides communication with an external host.
14. The apparatus of claim 9, wherein said application-specific integrated circuit further comprises interaction tiles that receive particles and grid points from said flex tiles via an on-chip mesh network and that enqueue said particles into queues that are stored in a local memory.
15. The apparatus of claim 8, further comprising a particle director, a plurality of queues, and an interaction tile, wherein said first queue being one of said queues in said plurality of queues, wherein said particle director places particles arriving at said interaction tile into different queues.
16. The apparatus of claim 8, wherein said first queue is a first-in-first-out queue.
17. The apparatus of claim 8, wherein said first queue is one of a plurality of queues, each of which is programmed to know how many particles are to arrive during said current time-period.
18. The apparatus of claim 8, wherein said first queue is one of a plurality of queues, each of which is programmed to know how many particles that are expected to arrive during said current time-period have arrived.
19. A method for simulating interactions between pairs of particles using nodes, each of which comprises a module set that comprises one or more pairwise-point-interaction-modules, said method comprising: at each node, carrying out iterations, each of which corresponds to a corresponding time interval from a sequence of time intervals, each of said iterations comprising: at a first node, beginning to receive information about particles in a set of particles, said set of particles consisting of first, second, and third parts that are disjoint from each other, as particles from said first part are received, storing said particles in a first queue, as particles from said second part are received, storing said particles in a second queue, as particles from said third part are received, storing said particles in a third queue, and, before having received all of said information, beginning a simulation of interactions between particles in pairs of particles from said first set, said particles in said pairs of particles consisting of only those that have already been received and excluding those that have yet to be received, wherein said interactions include interactions between particles in said first and second parts and interactions between particles in said second and third parts and exclude interactions between particles in said first and second parts and wherein said simulation includes loading information from all particles in said first and third parts into a module set of said first node, streaming information from particles in at least one of said second and third parts through said module set after having loaded and prior to having received information for all particles in said second part, and evaluating said interactions in said module set.
20. An apparatus comprising a molecular-dynamics simulator for simulating interactions between pairs of particles, wherein said simulator comprises nodes, each of which comprises a plurality of queues and a module set, wherein said module set comprises one or more pairwise-point-interaction-modules, wherein said simulator is configured to carry out iterations, each of which corresponds to a corresponding time interval from a sequence of time intervals, each of said iterations comprising: at a first node, beginning to receive information about particles in a set of particles, said set of particles consisting of first, second, and third parts that are disjoint from each other, as particles from said first part are received, storing said particles in a first queue from said plurality of queues, as particles from said second part are received, storing said particles in a second queue from said plurality of queues, as particles from said third part are received, storing said particles in a third queue from said plurality of queues, and, before having received all of said information, beginning a simulation of interactions between particles in pairs of particles from said first set, said particles in said pairs of particles consisting of only those that have already been received and excluding those that have yet to be received, wherein said interactions include interactions between particles in said first and second parts and interactions between particles in said second and third parts and exclude interactions between particles in said first and second parts and wherein said simulation includes loading information from all particles in said first and third parts into a module set of said first node, streaming information from particles in at least one of said second and third parts through said module set after having loaded and prior to having received information for all particles in said second part, and evaluating said interactions in said module set.
Description
BRIEF DESCRIPTION OF THE FIGURES
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
DETAILED DESCRIPTION
(11) Molecular dynamics simulation involves simulating the motion of particles in response to forces. Because many of these forces are short-range forces, most computations involving a particle are restricted to interactions with nearby particles. Thus, computations involving a neighborhood of particles can often be carried out largely independently of computations involving other neighborhoods of particles. This property lends itself to parallel processing.
(12) To take advantage of this inherent parallelism, a simulation machine 10 for molecular dynamic simulation, as shown in
(13) Because of the inherent parallelism, it is useful to divide the simulation volume into node boxes, each of which is handled by one of the nodes 12. A description of the manner in which calculations are allocated among different nodes can be found in Shaw, “
(14) Referring to
(15) The host interface 18 provides communication with an external host via a PCI link. The logic analyzer 20 is used primarily to capture and store node activity for debugging. Each node 12 also includes communication interfaces 22 for data transmission between neighboring nodes in each of the three local coordinate directions. Within a node 12, data transmission between the components of the node 12 is carried out by an on-chip mesh network.
(16) As shown in
(17) As shown in
(18) The interaction tile 16, and specifically the interaction controller 34, receives particles and grid points from the flex tiles 14 via the on-chip mesh network. It then enqueues these particles and grid points into queues 36 that are stored in local memory 38. The operation of the interaction controller 34 is controlled by instructions 40 received from a geometry core 42 in the abbreviated flex tile 44.
(19) In addition to a geometry core 42, the abbreviated flex tile 44 has a local memory 46, a dispatch unit 48, and a network interface 50, all of which serve functions similar to those described in connection with
(20) The simulation machine 10 simulates the evolution of a collection of particles by repeatedly calculating and integrating all inter-particle forces in small time steps. At the beginning of each time step, each flex tile 14 uses the on-chip network to send packets to interaction tiles 16 on the same node and on other nodes. These packets contain information about particles that interact with each other in ways that the interaction tile 16 will ultimately reveal through calculation.
(21) As noted above, each node 12 is responsible for computations concerning particles within its node box. Due to resource limitations, it may be necessary to further divide each node box into sub-boxes. Each sub-box has some particle population. This particle population fluctuates over time as a result of particles moving within the simulation volume in response to forces exerted by other particles. Each interaction tile 16 will receive, from the flex tiles 14, some variable number of particles from multiple sub-boxes. Each interaction tile 16 also receives, from the flex tiles 14, count packets, each of which reports how many particles to expect from each sub-box. These count packets are used by the interaction tile 16 to determine when it has received all particles from all sub-boxes.
(22) To accurately simulate the motion of particles, it has been found necessary to evaluate particle interactions at time intervals that are very close together. In a typical simulation, particle interactions are evaluated every few femtoseconds of simulated time.
(23) Many interesting events occur on timescales of milliseconds or longer, involve hundreds of thousands of particles, or both. Examples of events of this type include those that arise in biochemical systems in which biological macromolecules interact.
(24) Simulation of systems in which events unfold on such long time scales takes a great deal of real time. This is because the exchange rate between simulation time and real time is presently on the order of a billion to one. Thus, it is necessary to spend microseconds in order to compute the interactions required to advance the simulated time by femtoseconds. While this may seem fast, to place matters in perspective, this means that in order to simulate just one millisecond of real time at this exchange rate, it is necessary for the simulation machine 10 to work for one million seconds, which is a little over eleven days of continuous computation.
(25) The following discussion refers to particles being stored in or streaming through certain hardware. It should be understood that a “particle” means information representing a particle, or a simulated particle, and not actual particles.
(26) Particles arrive at an interaction tile 16 in no particular order. As they arrive, a particle director 74 places them into different first-in-first-out (
(27) Since particles arrive in no particular order, they are also placed in queues 76 in no particular order. However, the order in which queues 76 will be used is known in advance. Additionally, each queue is programmed in advance to know how many sub-boxes worth of particles it is expecting. Because of the count packets being received from the flex tiles 14, each queue can also determine how many particles will arrive for each sub-box. As a result, each queue 76 knows how many particles to expect, and whether or not they have all arrived.
(28) The queues 76 are divided into first queues 78, second queues 80, and third queues 82. Particles that are to be loaded into the
(29) For example, particles that are only within a first volume of space are placed in the first queue 78, from which they are loaded into the
(30) In one embodiment, the interaction controller 34 waits until the first and third queues 78, 82 have been filled. As noted above, this information is available because the flex tiles 14 have been sending count packets along with the particles. The interaction controller 34 then loads all the particles from the first and third queues 78, 82 into the
(31) Upon detecting that loading is complete, the interaction controller 34 determines whether the second queue 80 is full. The third queue 82 is of course known to be full by this point since loading into the
(32) In an alternative method for managing flow of particles into the
(33) Once the particles are all loaded into the
(34) In an alternative practice, it is useful to regard the queues shown in
(35) In the course of being streamed through the
(36) A test must therefore be devised to answer the question, “Should an interaction between these two particles be computed?”.
(37) One test for deciding whether or not an interaction should be computed is to ask whether or not the distance between particles is close enough to make computation worthwhile. If two particles are too far apart, no interaction will be calculated.
(38) However, although this test is a good approximation, it is complicated by the fact that sometimes interactions between particles should not be calculated even if the particles are close together. This complication arises in molecular dynamics because atoms can be covalently bonded together to form molecules. In that case, the forces that hold these particles in a bond easily dwarf the inter-particle forces that are being simulated. In known simulation machines, these interactions are still calculated, but are later removed in a correction pipeline.
(39) To remedy this deficiency, and to thereby eliminate the need for a correction pipeline, each particle is associated with a topological identifier that communicates the nature of a topological relationship between that particle and other nearby particles. Without loss of generality, this topological identifier will be discussed in connection with atoms that bond together with other atoms to form molecules, and in particular, biological macromolecules such as proteins and lipids.
(40) Referring to
(41) Referring now to
(42) The topological identifier 64 encodes topological relationships between atoms 60 in a molecule 58. Thus, by comparing the topological identifiers 64 of two atoms 60, it is possible to define a topological distance between the two atoms 60. This enables determination, with high accuracy, of whether an interaction between those atoms 60 should be excluded even if those atoms are otherwise close enough so that interaction would normally be calculated.
(43) Comparing topological identifiers 64 therefore avoids the vast majority of corrections that would normally have been carried out in a correction pipeline, and also avoids wasteful computation. As will be discussed below, there are some special cases where computations will be carried out even if they should not be. However, there are so few of these cases that correction of the calculation can be done by software instead of by having a separate hardware correction pipeline.
(44) The implementation of topological identifiers 64 described herein relies on the fact that many molecules 58 feature a backbone of atoms with primary side-chains branching off the backbone. These primary side-chains can have secondary side-chains. These secondary side-chains can have tertiary side-chains and so an ad infinitum. However, it has been found that most molecules 58 of interest have a backbone with primary side-chains branching off the backbone, and secondary side-chains branching off the primary side-chains. Thus, a practical implementation requires that only primary and secondary side-chains be accounted for.
(45) In the embodiment described herein, each atom 60 is assigned an integer quartet. The members of the quartet identify a backbone, primary and secondary side-chains, and a termination flag. More generally, the topological identifier is an integer tuple with N+2 elements, where N is the number of levels of side chains to be accounted for. In the present embodiment, N=2 because only a primary and secondary side-chain are to be accounted for.
(46) For any atom 60, the first element of the quartet is a backbone identifier that identifies that atom's associated backbone atom. Where an atom 60 is itself the backbone atom, for purposes of assigning the first element, that atom 60 is considered to be associated with itself. All atoms of side-chains that ultimately connect to the same backbone atom would have the same backbone identifier.
(47) The second element of the quartet is a primary side-chain identifier that identifies the primary side-chain associated with the atom 60. The primary side-chain has an atom 60 that is bonded directly to the backbone. All atoms that are in the same side-chain, or are in side-chains connected to that same side-chain would have the same primary side-chain identifier.
(48) The third element of the quartet is a secondary side-chain identifier that defines the second level side-chain associated with the atom.
(49) Finally, the fourth element is a terminal flag that identifies whether or not the atom is a terminal atom. A terminal atom is one that is bound to the rest of the molecule by only one covalent bond. As used herein, the term “covalent bond” is independent of the number of electronic orbitals participating in the bond, and therefore includes double bonds and triple bonds.
(50) It should be apparent that the above scheme is recursive in nature and can be extended to any number of side-chains by simply adding suitable elements between the terminal flag and the backbone identifier.
(51) In the illustrated embodiment, the topological identifier is an integer quartet (n, m, k, t). Backbone atoms are identified as (n, 0, 0, 0). Atoms in a primary side-chain off the nth backbone atom are identified as (n, m, 0, 0) where m is an integer greater than or equal to 1 that represents the distance along the chain between that atom and the backbone atom. Atoms in a secondary side-chain are identified as (n, m, k, 0) where k is an integer greater than or equal to 1 that represents the distance between that atom and the atom at which the secondary side-chain intersects the primary side-chain, (n, m, 0, 0). A terminal atom, which only has a single neighbor, has its terminal flag set to 1. Thus, a terminal atom that has, as its neighbor, atom (n, m, k, 0) will have as its topological identifier 64 the integer quartet (n, m, k, 1).
(52)
(53) The atom attached to backbone atom 4 is a terminal atom because it has only one neighbor. Consistent with the rules, its quartet is the same as its neighbor's quartet, i.e. backbone atom 8's quartet, except its terminal flag is set to 1. Terminal atoms can also be found attached to backbone atoms 6 and 7 with corresponding quartets built according to the same rule.
(54) The assignment of topological identifiers to side-chains can be seen by inspecting the identifiers of atoms that are in the side-chain off of backbone atom 5. As shown, each quartet for all atoms that ultimately connect to backbone atom 5 will have n=5. All atoms in the same primary side-chain have the same value of m, while all atoms in the same secondary side-chain have the same value of k.
(55) The topological distance between two atoms having integer quartets (n.sub.1, m.sub.1, k.sub.1, t.sub.1) and (n.sub.2, m.sub.2, k.sub.2, t.sub.2) resolves into three cases.
(56) In the first case, the atoms are attached to different backbone atoms and therefore have different backbone identifiers. This means that n.sub.1≠n.sub.2. In such a case, the distance is obtained by taking the magnitude of the difference between the backbone identifiers and adding it to the sum of all the remaining elements of the two integer quartets:
|n.sub.1−n.sub.2|+m.sub.1+m.sub.2+k.sub.1+k.sub.2+t.sub.1+t.sub.2
(57) In the second case, the two backbone identifiers are the same, but the two atoms are on different primary side-chains. Thus, n.sub.1=n.sub.2 but m.sub.1≠m.sub.2. In that case, the distance is identical to that for the first case, but instead of adding together the primary side-chain identifiers, one evaluates the magnitude of their difference:
|n.sub.1−n.sub.2|+|m.sub.1−m.sub.2|+k.sub.1+k.sub.2+t.sub.1+t.sub.2
(58) In the third case, the two atoms are on the said primary side-chain but they are on different secondary side-chains. This means that n.sub.1=n.sub.2 and m.sub.1=m.sub.2, but k.sub.1≠k.sub.2. In that case, the distance is computed the same way as the second case, but instead of adding together the secondary side-chain identifiers, one evaluates the magnitude of their difference:
|n.sub.1−n.sub.2|+m.sub.1−m.sub.2|+|k.sub.1−k.sub.2|+t.sub.1+t.sub.2
(59) The foregoing method for assigning a topological identifier 64 assumes that the graph of a molecule 58 is an acyclic graph with at most one side-chain emanating from any atom. While this is true for most molecules of interest, there are exceptions. In such cases, a small number of edges are removed from the molecules graph until this condition is met.
(60)
(61) An atom's topological identifiers and its position location are bundled with its position as it makes its way through the PPIM array 52. As a result, the encoding must be as compact as possible.
(62) As a practical matter, in molecules of interest, most side-chains are short. For most lipids and proteins, three bits is sufficient to encode the primary side-chain identifier, and one bit is enough to encode the secondary side-chain identifier. One more bit is needed to encode the terminal flag. Thus, the remaining bits can be used to encode the backbone identifier.
(63) In some cases, a chemical system is too large for all backbone identifiers to be encoded because there are not enough bits allocated to carry out the encoding. In other cases, side-chains cannot be encoded in the bits available. Both of these cases, like the case in which the molecule has rings, must also be corrected in software.
(64) During the course of evaluating quantities, such as inter-particle forces, it is often necessary to carry out computations that involve evaluating a function of an argument. Evaluating a function, particularly a transcendental function, is a time-consuming task. To speed up this task, it is known to simply look up the value of the function for a particular argument in a look-up table. However, a table that provides low approximation error would be prohibitively large. Another approach is to divide the domain of the function into parts and to approximate the desired function using a parametric form, such as a cubic form. In such cases, a table provides a mapping from a domain region to coefficients of a cubic polynomial. A non-uniform partition of the domain can be used to provide a finer partition in those parts of the domain in which the cubic polynomial does not match the function well, for example, in those parts of the domain in which the function to be approximated is changing fast. Conversely, coarser partitions can be used where the match is good.
(65) In some cases, the function's value changes so rapidly for certain regions of its domain that even using the cubic polynomial would require that the look-up table of cubic coefficients have prohibitively many entries in order to adequately model the function in those portions of its domain.
(66)
(67) Naturally, since the approximation is a piecewise one, the coefficients of the polynomial change throughout the approximated function's domain. In fact, this is precisely why there have to be multiple entries in the table 84. As is apparent from
(68) In an alternative embodiment, shown in
(69) It is known to use the PPIM array 52 in connection with charge spreading and force interpolation using GSE methods described in Shan et al., “
(70) In an improvement of the method described therein, in which all grid locations are passed into the PPIM array 52, an alternative method exploits the fact that grid locations are not randomly located but actually have some spatial regularity. By exploiting this regularity, it becomes possible to pass selected locations into the array and derive the grid locations from those selected locations.
(71) For example, let X be a set of m grid locations (x.sub.1, x.sub.2 . . . x.sub.m) where each x.sub.i is a position vector having a dimensionality that is appropriate to the simulation space. According to the prior art method, to pass the points grid locations into X, one would pass all m points into the
(72) In an improved method, there exists a set Y of n locations (y.sub.1, y.sub.2 . . . y.sub.n) where n<m. There also exists a rule R such that X=R(Y). Thus, rather than pass X into the
(73) For example, if two vectors in the set X were (x, y, z.sub.1) and (x, y, z.sub.2), then one could simply pass a set Y that included the vector (x, y, (z.sub.1+z.sub.2)/2). Then, if one knew the grid spacing, one could derive the original two vectors from the set X. Alternatively, the set Y could just equal every other point from set X, in which case one could reconstruct the original set X by adding the appropriate grid spacing to the appropriate coordinates in the vectors in Y.
(74) In one method, the set Y of locations is loaded into the
(75) In one embodiment of the simulation machine 10, the particles that arrive at the interaction tile 16 for processing are always associated with a current time step. However, an alternative embodiment introduces a phase bit associated with data packets carrying particle data into the interaction tile 16. The phase bit's value is associated with a particular time step. This provides a way to distinguish between particles in two different time steps. As a result, it is possible for the interaction tile 16 to receive data packets associated with two time steps.
(76) In an embodiment that accommodates a phase bit, the interaction controller 34 maintains queues 36 associated with each value of phase bit. Upon receiving a particle, the phase bit of the particle is inspected and the particle is placed in a queue that is appropriate to the phase bit. The phase bit thus permits the interaction tile 16 to receive data associated with different time steps, and thereby eliminates the need to synchronize the interaction tiles 16 and the flex tiles 14. In operation, only queues corresponding to the current time step are loaded into the
(77) The invention is described in further detail in the attached appendix, the content of which is hereby incorporated by reference in its entirety.