From: Eliezer Yudkowsky (sentience@pobox.com)
Date: Tue Oct 19 2004 - 09:02:27 MDT
I've recently been pondering a difficulty with self-modifying AI.  I shall 
illustrate with Juergen Schmidhuber's Gödel Machine, but the problem 
generalizes.
http://www.idsia.ch/~juergen/goedelmachine.html
Juergen Schmidhuber's self-rewriting "Gödel Machine" is based on the notion 
of a machine that executes a self-rewrite as soon as it can produce a proof 
that the self-rewrite is an improvement.  This self-rewriting can extend to 
rewriting the theorem prover or theorem checker.  Leaving aside other 
qualms I have about the Gödel Machine, it appears to me that for the Gödel 
Machine to execute a change that switches to a new theorem-prover or 
proof-verifier, the Gödel Machine must prove that if the proof-verifier 
accepts P, then P.  Otherwise there is no way for the Gödel Machine to 
prove the new proof-verifier is useful, i.e., no way to expect results 
accepted by the verifier to be useful.
Example:  For the Gödel Machine's current theorem-prover to accept a 
proposed new theorem-prover, might require establishing that if the new 
theorem-prover accepts a proof that a proposed newer theorem-prover is 
useful, then the newer theorem-prover is useful.  Part of my 
dissatisfaction with the Gödel Machine is that Schmidhuber does not specify 
exactly what is meant by "useful", saying only that it has to do with 
expected utility.  But suppose we have an expected utility problem that 
involves maximizing the number of received apples.  To rewrite the current 
proof system with a new proof system would presumably involve proving that 
if the new proof system accepts a rewrite as "useful for increasing 
apples", that rewrite is useful for increasing apples.  The verifier has to 
translate a meta-language result into an object-language result.
Löb's Theorem establishes that for any proof system capable of encoding its 
own proof process,
|- Provable('P') -> P
   if and only if
|- P.
If you can prove "if P is provable then P", you can prove P.
http://www.sm.luth.se/~torkel/eget/godel/theorems.html
http://www.sm.luth.se/~torkel/eget/godel/loeb.html
In the same way that Gödel's Theorem formalizes the ancient Epimenides 
Paradox "This sentence is false", Löb's Theorem sparked Haskell Curry to 
invent the Santa Claus Paradox, "If this sentence is true then Santa Claus 
exists."  (Consider: *if* the sentence *were* true, then the antecedent 
would be true, and hence Santa Claus would exist, right?  So if the 
sentence were true, Santa Claus would exist?  This is precisely what the 
sentence asserts, and therefore the sentence is true and Santa Claus does 
exist.)  Löb's Theorem eliminates the self-reference in the Santa Claus 
Paradox via a diagonalization argument, as Gödel did with the Epimenides 
Paradox.  The upshot is that if you can prove "P's provability implies P", 
the Santa Claus Paradox carries through and you can prove P.  If you could 
prove, in Peano Arithmetic, that a proof of P!=NP in Peano Arithmetic would 
imply P!=NP, you could use Löb's construction to prove P!=NP.
Tarski used Löb's Theorem to argue that no consistent language can contain 
its own truth predicate.  If you want to talk about the truth of statements 
in an object language, you need a meta-language that cannot talk about the 
truth of the meta-language.  You can say "The sky is blue" and '"The sky is 
blue" is true', but not ''"The sky is blue" is true' is true', unless you 
invent a new concept of truth, true-1, to describe metalanguage truths in 
meta-meta-language.
http://www.ditext.com/tarski/tarski.html
In more familiar terms, it would seem that Schmidhuber's Gödel Machine must 
prove a new proof system is consistent in order to accept it, and that runs 
smack dab into Gödel's original theorem.  Any system that can prove its own 
consistency is inconsistent.
Maybe the Gödel Machine's first proof system would start with ZFC set 
theory.  ZFC suffices to prove the consistency of Peano Arithmetic, and 
might accept a rewrite implementing a proof verifier that accepted PA 
proofs - but what about the next iteration of self-rewriting?  The Peano 
prover would not accept another Peano prover.  The same problem holds for 
starting off the Gödel Machine with an axiom about the consistency of Peano 
Arithmetic, a further axiom about the consistency of the new system PA+1, 
et cetera.  Even if we assert infinite levels of consistency in PA+w, it 
doesn't help.  Just as every descending sequence of ordinals must 
eventually terminate, every sequence of self-rewrites would need to 
eventually terminate.
The difficulty with Schmidhuber's Gödel Machine concerns me because the 
problem with Löb's Theorem seems to generalize to other foundations for 
self-rewriting AI.  Even the simplest consistency under reflection of a 
self-modifying AI would seem to require the AI to believe that if its 
processes assert P, if AI |- P, that is evidence supporting P.  If the AI 
looks at its reflected processes, it will find that the meaning of belief 
P, the causal sequence leading up to the presence of the computational 
tokens representing belief P, is that such-and-such process T proved P.  If 
T's proof of P doesn't seem to the AI to imply P, then why would a 
reflective AI allow T's proof of P to load P as a belief, any more than a 
reflective AI would allow a random number-generator to load beliefs?  But 
if an AI does believe that (AI |- P) -> P, Löb's construction permits an 
automatic proof of P - unless we change one of the other implicit 
assumptions in Löb's argument, or in this whole scenario.
All the papers I've scanned so far on reflectivity in theorem-proving 
systems use the old towers-of-meta trick to get around Löb's Theorem.  I 
have not been able to find anything on reflectivity that wraps all the way 
around, the way humans think (but then humans are naively vulnerable to 
Epimenides and Santa Claus paradoxes), or the way a reflectively consistent 
seed AI would have to wrap around.
Lest any despair of AI, remember that humans can think about this stuff 
without going up in flames, ergo it must be possible somehow.
I have tentative ideas about how to resolve this problem, but I was 
wondering if there's anything in the existing literature.  Anything I 
should read on fully wrap-around reflectivity?  Anything out there on 
reflectivity that doesn't invoke the towers-of-meta trick?
-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:49 MDT