SYSTEM TO PROVE SAFETY OF ARTIFICIAL GENERAL INTELLIGENCE VIA INTERACTIVE PROOFS
20230008689 · 2023-01-12
Inventors
Cpc classification
G06N7/01
PHYSICS
International classification
G06N7/00
PHYSICS
H04L9/32
ELECTRICITY
Abstract
A method to prove the safety (e.g., value-alignment) and other properties of artificial intelligence systems possessing general and/or super-human intelligence (together, AGI). The method uses probabilistic proofs in Interactive proof systems (IPS), in which a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2.sup.−100) IPS-based procedures can be used to test AGI behavior control systems that incorporate hard-coded ethics or valuelearning methods. An embodiment of the method, mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers, makes it possible to validate safe behavior via IPS number-theoretic methods. Other IPS embodiments can prove an unlimited number of AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs extend the power of the paradigm. The method applies to value-alignment between future AGI generations of disparate power.
Claims
1. An interactive proof system comprised of a verifier in turn comprised of one or a plurality of humans and a prover in turn comprised of one or a plurality of AGIs.
2. The system of claim 1, wherein said proof is directed toward safety of said humans with regard to behavior of said AGIs.
3. The system of claim 2, wherein said proof comprises the means to prove or disprove that a property can derive from a behavior control system.
4. The system of claim 3, wherein said behavior control system further comprises a behavior tree.
5. The system of claim 1, wherein said interactive proof system further comprises a multiple-prover interactive proof system.
6. The system of claim 1, wherein said interactive proof system comprises a means to approach a uniform random distribution from which random numbers are selected.
7. The system of claim 1, comprising a proof that is a zero-knowledge proof.
8. The system of claim 7, in which the roles of human and AGI are reversed, comprising a human prover and an AGI verifier.
9. The system of claim 1, further comprising a means to detect forgery of a behavior control system via mathematical properties of directed acyclic graphs.
10. The system of claim 1, wherein said interactive proof system further comprises a means to perform program-correctness-checking.
11. The system of claim 10, wherein said program-correctness-checking means further comprises a means to detect graph non-isomorphisms.
12. The system of claim 3, further comprising a means to map said behavior control system onto a set of axioms and transformation rules.
13. The method of claim 1, further comprising the steps: a. identifying a property of AGI that is desired to prove, b. identifying an interactive proof system method to which to apply to proving said AGI, c. creating a representation of said AGI property to which said interactive proof system method is applicable, d. identifying a pseudo-random number generator that is compatible with said representation, e. applying said pseudo-random number generator to the interactive proof system process in the random selection of problem instances to prove probabilistically, whereby said interactive proof system is rendered able to prove said AGI property.
14. The method of claim 1, further comprising the steps: a. assigning a unique prime number to each of said axioms and transformation rules, b. identifying a derivative behavior as an ordered composition of said prime numbers, c. testing whether any behavior, represented by a composite number, is a valid derivative of said axiom and transformation rules, by factoring said composite number, whereby said method is enabled to prove whether a specified behavior can be derived from a given behavior control system represented as axioms and transformation rules.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0035]
[0036]
[0037]
[0038]
DESCRIPTION OF EMBODIMENTS
Detailed Description
[0039] The reader is assumed to be knowledgeable in computer science, and to understand the concepts of a universal computer (a Turing machine), a deterministic vs. non-deterministic Turing machine, algorithms, and probabilistic algorithms, all of which have historically been referred to, simply, as “machines”. If a machine “accepts” a string from a language, it means it has performed a useful computation.
[0040]
[0041] However, the interesting thing is, when called repeatedly, the PTM/PPA iterates and compounds the error probability, so that over multiple calls, each with a different problem instance, the cumulative error probability converges to any arbitrarily low probability, approaching certainty. The repetition of the PTM/PPA over time is thus called the amplification lemma. Given an error tolerance ϵ of a PTM/PPA machine, we can run it repeatedly (say k iterations) until the desired error tolerance ϵ.sup.k has been reached. Thus, even if a particular PTM/PPA has only a 50-50 chance of verifying a given problem instance, if we call it 100 times with different instances of the same problem, which are required to be randomly chosen so that no bias in the instance selection invalidates the calculation of probability of error, the cumulative probability of falsehood is 2.sup.−100, which is about 8×10.sup.−31, an extremely low chance of being false. For a machine taking one action per second, this probability amounts to one failure in 100 years. If we want an even lower chance of being wrong, we can just run the PTM/PPA more times until the desired low probability is achieved.
[0042] This simple but fundamental property of the prior art is the key to understanding the novel advance of the present invention, and more technical detail of the PTM/PPA is given in the provisional application 63/216,547 and proofs that it works as described, in the literature references that the 547 application cites.
[0043] The IPS assembly 100 is based on the PTM/PPA 120. Thus, the Verifier 110 calls the PTM/PPA 120 to send a query 130 to a Prover 140. The Prover 140 generates an answer 150 that is submitted to and processed by the PTM/PPA 120. The PTM/PPA then calls a simple decision procedure 160, which calculates the cumulative probability of error as mentioned. If the probability of error is acceptably low, the decision procedure tells the IPS to enter the accept state 170 and terminate. If the error probability does not meet the required low threshold, the decision procedure 170 calls the PTM/PPA 120 to generate another unique query 130 to submit to the Prover 140, etc., until the desired low probability is achieved.
[0044] Turning to
[0045] In more general embodiments than the preferred, the IPS 100 may be used by lesser-intelligent Verifiers 210 to submit queries 230 to greater-intelligent Provers 240, queries whose subject is more general concerning Verifier's 210 universe, meaning any aspect of Verifier's 210 environment, such as about the origins of the universe, the Creator of the universe, how to extend healthy lifespan, how to end war or poverty, how to organize the economic system to achieve optimal opportunity for every human or AI agent, how to form an optimal government to optimally achieve maximum human happiness, etc. In sum, the present invention described in terms of its components in the IPS process is assembly 200.
[0046]
[0047] The first step 310 is to identify the property of AGI, such as its safety or value-alignment with regard to humans, for which a proof is desired. In the second step 320, we must identify an existing IPS method (of which there are many) or if no existing IPS method seems to apply, create a new IPS method, to which to apply to proving the property identified in step 310. For instance, there have been examples where workers were able to take a deterministic proof of a property of numbers and innovate a probabilistic proof from it that could be used in a specific IPS. In the third step 330, we must create a representation of the property 310 to which method 320 is applicable. Next, step 340, we must find or create a pseudorandom number generator (PRNG) that is compatible (see the references) with the representation 330. Last, step 350, we must apply the PRNG to the IPS process in the random selection of problem instances to prove probabilistically.
[0048] Many additional embodiments of the present invention are possible. For instance, various mathematical techniques that have been added to the general IPS can be employed. For example, multiple prover IPS employ more than one Prover and restrict the Provers from communicating with each other so they cannot conspire to coordinate their answers to the Verifier and in so doing deceive the Verifier.
[0049] Similarly, after the invention of IPS workers considered that it is not simple and straightforward to create a perfect random number generator in order to select problem instances randomly, as noted above. Accordingly, random number generators are called “pseudo random number generators” (PRNG). Thus, various methods to ensure higher fidelity uniformly random number distributions from a PRNG were invented, such as “weakly-random sources” and uses of the “seed” that is the initial condition of the PRNG, among others. Each of these methods used in the IPS at the heart of the present invention constitutes an additional embodiment.
[0050] Another embodiment example is the use of IPS to effect “zeroknowledge proofs”, which prove a certain property of a system to a certain probability but without revealing the details of the property. As an example, a human Verifier 210 may query 230 an AGI Prover 240 about a weapon the AGI 240 possesses, in which process the AGI 240 may prove the existence of the weapon, without revealing details of its workings or manufacture. Or the roles may be reversed, in which the Verifier 210 may be an AGI and the Prover 240 may be the human, so as to not reveal to the AGI Verifier 210 details of a technology that the human Prover 240 may possess.
[0051] As the history of IPS shows, each IPS method rests on one or more mathematical proofs about properties of mathematically-described entities. Thus, where we can create a usage of such a proof to apply IPS to a potential problem in human-AGI or AGI.sup.n-AGI.sup.n+1 interaction such as ensuring safe interaction, such an application is an additional embodiment of the invention.
[0052] In step 310, we must identify a property of AGI, such as value-alignment, to which to apply the IPS. Step 320: Identify or create an IPS method to apply to the property 310. Step 330: Develop a representation of the property to which method 320 is applicable. Step 340: Identify a pseudo-random number generator (PRNG) compatible with method 320. Step 350: Apply PRNG 340 to select random problem instances to prove probabilistically. All these steps are necessary 310 and sufficient to apply an IPS to proving an AGI property. Steps 2 and 3 may occur in reverse order.
[0053] Here are a couple of examples to clarify the process
Detection of Behavior Control System (BCS) Forgery Via Acrylic Graphs
[0054] A single heuristic, such as ‘terminate all humans’, or ethic, such as ‘terminate all agents using resources inefficiently as defined by the following metric’, added to a BCS could result in realization of the AGI existential threat, as could universal drives causing AGI to alter its utility function. Thus, any alteration, especially forgery, of ethics modules or BCS must be detected. One security measure is to store ethics modules, along with other critical AGI components, in a distributed ledger, perhaps distributed among all earlier autonomous agent generations since they all have a proprietary interest in the security and integrity of those components.
[0055] If we represent an AGI behavior control system (BCS) as a directed acyclic graph, such as an AI behavior tree with no loops, a unique labeling procedure exists for the nodes and edges as do BPP tests for equivalence of two such structures, although in general the problem is currently intractable (coNP-complete). Equivalence here means the two BCS programs produce the same functions. BPP tests can be used to detect forgery by comparing a reference copy of the BCS stored in a distributed ledger versus a copy presented by a potential adversary. Here we apply an existing BPP procedure to compare two read-once branching programs B.sub.1 and B.sub.2 where F is a finite field with at least 3m elements, and in so doing, detect forgery if it exists. The representation requires an assignment of polynomials to the graph nodes and edges.
[0056] Human or AGI.sup.n Verifier 210 selects elements a.sub.i through a.sub.m randomly from F.
[0057] AGI.sup.1 (if Verifier is human) or AGI.sup.n+1 (if Verifier is AGI.sup.1) Prover 230 evaluates the assigned polynomials p.sub.1 and p.sub.2 at a.sub.1 through a.sub.m.
[0058] If p.sub.1(a.sub.1, . . . , a.sub.m)=p.sub.2(a.sub.1, . . . , a.sub.m), Verifier 210 accepts, otherwise, rejects.
Program-Checking Via Graph Nonisomorphism
[0059] IPS 200 can be used as a probabilistic check for buggy outputs of programs by running one program P, such as a copy of an AGI.sup.n behavior control subroutine, on a machine, the Checker C. Assume the Prover 240 runs a program P that states that two uniquely-labeled graphs are isomorphic P(G.sub.1,G.sub.2). The procedure is (1) the Verifier 210 repeatedly permutes labels of one of {G.sub.1,G.sub.2}, chosen randomly, and (2) asks the Prover 240 if they are still isomorphic, a problem suspected to be NP-complete. The Prover 240 supplies the permutation as the witness, which can be checked in PTIME A guess has a 50-50 chance of being correct. Thus, with k iterations of the procedure the probability of error is 2.sup.−k.
Axiomatic System Representations
[0060] In principle, an axiomatic system comprising a language, axioms, and transformation rules can be described formally, and therefore, precisely.
[0061] Here we extend existing methods using an arithmetization of the axioms and composition rules (e.g. transformation or inference rules). The desired representation needs to be expressive enough to apply one or more desired number-theoretic theorems to it (more expressive than Presburger or Robinson arithmetic). Thus, we need a finite group of primes, infinite composites of the finite set, and infinite numbers that are relatively prime to the finite set.
[0062] Given an axiomatic system of finite axioms and rules and infinite compositions:
Axioms A={a.sub.1, a.sub.2, a.sub.3, . . . , a.sub.i}
Transformation rules R={r.sub.1, r.sub.1, R.sub.1 . . . , r.sub.j}
Compositions of axioms and inference rules C={c.sub.1, c.sub.2, c.sub.3, . . . , c.sub.k}. e.g.
(a.sub.1, a.sub.2).Math.r.sub.1.fwdarw.c.sub.1
(a.sub.2, a.sub.3, a.sub.4).Math.r.sub.2.fwdarw.c.sub.2,
etc., in which the symbol “.Math.” represents a valid syntactical composition resulting in well-formed formulas (wff) in infix notation, all of which are known to practitioners in the art of working with axiomatic system representations. The first composition example shows a binary transformation rule such as modus ponens from propositional logic while the second composition shows a general n-ary (in this case ternary) rule such as a sequence node testing 3 child nodes in a behavior tree.
[0063] All formulae representing all behaviors B are only expressible by the system if they are derivable by a composition of the axiom sets A and the rule sets R:
{A.Math.R}.fwdarw.B.
[0064] If we allow loops to express repetitive behavior, a loop may be examined with finite methods either by looking at an entire behavior sequence up to a point in time, or by inductive methods otherwise.
[0065] We assign a unique prime number p to each axiom a and transformation rule r, for intelligibility separating axioms and transformation rules (symbolically illustrated in Table 1 and with a more specific example in
TABLE-US-00001 TABLE 1 Arithmetization of an axiomatic behavior control system. Syntactical symbol Prime Model a1 p1 2 a2 p2 3 a3 p3 5 . . . . . . . . . an pn pn r1 pn + 1 pn + 1 r2 pn + 2 pn + 2 r3 pn + 3 pn + 3 . . . . . . . . . rn pm pm
[0066]
[0067] Each vertex is numerically coded for a given sample robotic BT algorithm drawn from the literature: 1: Fallback. 21, 22, 23, 34: Sequence. 41, 42: Condition. Vertex codes for lower-level BT algorithms: 31: Human approaching? 32: Maintain prescribed safe distance. 33: Human asks for help with task. 41: Begin log. 42: Is task moral? 43: Is task ethical? 35: Low energy? 36: Seek power station. 24: Wander.
[0068]
[0069] In this arithmetical representation, transformation rules taking two or more parameters are interpreted as composing the parameters with the transformation rule, i.e., multiplying the primes (axioms) or composites (formulae) instantiating the parameters along with the transformation rule. Then formulae derived within the system, which represent theorems or behaviors, constitute composite numbers, as can be proved by induction. Permitted behaviors are represented by theorems, that is, formulae not relatively prime to the axioms and transformation rules (i.e., composites). Proscribed, forbidden, unsafe behaviors are formulae representing numbers that are relatively prime to the axioms and transformation rules. In general, any axiomatic system specifies a set of constraints that its theorems satisfy [38].
[0070] The goal is to render the derived set of behaviors susceptible to the methods of BPP and IPS by reduction via arithmetization. Thus, we only need to capture properties that allow application of IPS. We do not need to represent all of number theory or use the full Godel arithmetization scheme to show incompleteness.
[0071] By the unique factorization theorem, this representation uniquely describes trajectories through the tree of axioms, rules, and formulae:
[0072] Unique factorization theorem. Every integer a either is 0, or is a unit +/−1, or has a representation in the form
a=up.sub.1p.sub.2 . . . p.sub.n, (6)
where u is a unit and p.sub.1, p.sub.2, . . . , p.sub.n are one or more positive primes, not necessarily distinct. The representation (3) is unique except for the order in which the primes occur.
[0073] In other words, each sequence uniquely describes a trajectory through a BCS tree, though the order in which the primes occur, i.e., the order of application of each axiom or rule in generating a behavior, is lost when multiplying permitted multiple instances of each axiom or rule together to get the typical factorization representation of a composite with exponents:
c.sub.i=p.sub.e1.sub.
[0074] However, the non-uniqueness is irrelevant when testing for compositeness vs. primality.
Broadening Language
[0075] Please understand that although the invention has been described above in terms of particular embodiments, the foregoing embodiments are provided as illustrative only, and do not limit or define the scope of the invention. Various other embodiments, including but not limited to the preceding, are also within the scope of the claims. Notably, the embodiments may be deployed on any hardware or software architecture invented by humans or AI, such as quantum computers, or more advanced hardware invented by humans, humans with augmented intelligence, or AGI.