**From:** Eliezer Yudkowsky (*sentience@pobox.com*)

**Date:** Tue Oct 19 2004 - 09:02:27 MDT

**Next message:**Jeff Medina: "Re: A difficulty with AI reflectivity"**Previous message:**Keith Henson: "Progress in understanding natural minds (pointer to pointer)"**Next in thread:**Jeff Medina: "Re: A difficulty with AI reflectivity"**Reply:**Jeff Medina: "Re: A difficulty with AI reflectivity"**Reply:**Christian Szegedy: "Re: A difficulty with AI reflectivity"**Reply:**Ben Goertzel: "RE: [agi] A difficulty with AI reflectivity"**Reply:**Michael Roy Ames: "Re: A difficulty with AI reflectivity"**Reply:**Wei Dai: "Re: A difficulty with AI reflectivity"**Reply:**Maru: "Re: Weaknesses in FAI"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

I've recently been pondering a difficulty with self-modifying AI. I shall

illustrate with Juergen Schmidhuber's Gödel Machine, but the problem

generalizes.

http://www.idsia.ch/~juergen/goedelmachine.html

Juergen Schmidhuber's self-rewriting "Gödel Machine" is based on the notion

of a machine that executes a self-rewrite as soon as it can produce a proof

that the self-rewrite is an improvement. This self-rewriting can extend to

rewriting the theorem prover or theorem checker. Leaving aside other

qualms I have about the Gödel Machine, it appears to me that for the Gödel

Machine to execute a change that switches to a new theorem-prover or

proof-verifier, the Gödel Machine must prove that if the proof-verifier

accepts P, then P. Otherwise there is no way for the Gödel Machine to

prove the new proof-verifier is useful, i.e., no way to expect results

accepted by the verifier to be useful.

Example: For the Gödel Machine's current theorem-prover to accept a

proposed new theorem-prover, might require establishing that if the new

theorem-prover accepts a proof that a proposed newer theorem-prover is

useful, then the newer theorem-prover is useful. Part of my

dissatisfaction with the Gödel Machine is that Schmidhuber does not specify

exactly what is meant by "useful", saying only that it has to do with

expected utility. But suppose we have an expected utility problem that

involves maximizing the number of received apples. To rewrite the current

proof system with a new proof system would presumably involve proving that

if the new proof system accepts a rewrite as "useful for increasing

apples", that rewrite is useful for increasing apples. The verifier has to

translate a meta-language result into an object-language result.

Löb's Theorem establishes that for any proof system capable of encoding its

own proof process,

|- Provable('P') -> P

if and only if

|- P.

If you can prove "if P is provable then P", you can prove P.

http://www.sm.luth.se/~torkel/eget/godel/theorems.html

http://www.sm.luth.se/~torkel/eget/godel/loeb.html

In the same way that Gödel's Theorem formalizes the ancient Epimenides

Paradox "This sentence is false", Löb's Theorem sparked Haskell Curry to

invent the Santa Claus Paradox, "If this sentence is true then Santa Claus

exists." (Consider: *if* the sentence *were* true, then the antecedent

would be true, and hence Santa Claus would exist, right? So if the

sentence were true, Santa Claus would exist? This is precisely what the

sentence asserts, and therefore the sentence is true and Santa Claus does

exist.) Löb's Theorem eliminates the self-reference in the Santa Claus

Paradox via a diagonalization argument, as Gödel did with the Epimenides

Paradox. The upshot is that if you can prove "P's provability implies P",

the Santa Claus Paradox carries through and you can prove P. If you could

prove, in Peano Arithmetic, that a proof of P!=NP in Peano Arithmetic would

imply P!=NP, you could use Löb's construction to prove P!=NP.

Tarski used Löb's Theorem to argue that no consistent language can contain

its own truth predicate. If you want to talk about the truth of statements

in an object language, you need a meta-language that cannot talk about the

truth of the meta-language. You can say "The sky is blue" and '"The sky is

blue" is true', but not ''"The sky is blue" is true' is true', unless you

invent a new concept of truth, true-1, to describe metalanguage truths in

meta-meta-language.

http://www.ditext.com/tarski/tarski.html

In more familiar terms, it would seem that Schmidhuber's Gödel Machine must

prove a new proof system is consistent in order to accept it, and that runs

smack dab into Gödel's original theorem. Any system that can prove its own

consistency is inconsistent.

Maybe the Gödel Machine's first proof system would start with ZFC set

theory. ZFC suffices to prove the consistency of Peano Arithmetic, and

might accept a rewrite implementing a proof verifier that accepted PA

proofs - but what about the next iteration of self-rewriting? The Peano

prover would not accept another Peano prover. The same problem holds for

starting off the Gödel Machine with an axiom about the consistency of Peano

Arithmetic, a further axiom about the consistency of the new system PA+1,

et cetera. Even if we assert infinite levels of consistency in PA+w, it

doesn't help. Just as every descending sequence of ordinals must

eventually terminate, every sequence of self-rewrites would need to

eventually terminate.

The difficulty with Schmidhuber's Gödel Machine concerns me because the

problem with Löb's Theorem seems to generalize to other foundations for

self-rewriting AI. Even the simplest consistency under reflection of a

self-modifying AI would seem to require the AI to believe that if its

processes assert P, if AI |- P, that is evidence supporting P. If the AI

looks at its reflected processes, it will find that the meaning of belief

P, the causal sequence leading up to the presence of the computational

tokens representing belief P, is that such-and-such process T proved P. If

T's proof of P doesn't seem to the AI to imply P, then why would a

reflective AI allow T's proof of P to load P as a belief, any more than a

reflective AI would allow a random number-generator to load beliefs? But

if an AI does believe that (AI |- P) -> P, Löb's construction permits an

automatic proof of P - unless we change one of the other implicit

assumptions in Löb's argument, or in this whole scenario.

All the papers I've scanned so far on reflectivity in theorem-proving

systems use the old towers-of-meta trick to get around Löb's Theorem. I

have not been able to find anything on reflectivity that wraps all the way

around, the way humans think (but then humans are naively vulnerable to

Epimenides and Santa Claus paradoxes), or the way a reflectively consistent

seed AI would have to wrap around.

Lest any despair of AI, remember that humans can think about this stuff

without going up in flames, ergo it must be possible somehow.

I have tentative ideas about how to resolve this problem, but I was

wondering if there's anything in the existing literature. Anything I

should read on fully wrap-around reflectivity? Anything out there on

reflectivity that doesn't invoke the towers-of-meta trick?

-- Eliezer S. Yudkowsky http://intelligence.org/ Research Fellow, Singularity Institute for Artificial Intelligence

**Next message:**Jeff Medina: "Re: A difficulty with AI reflectivity"**Previous message:**Keith Henson: "Progress in understanding natural minds (pointer to pointer)"**Next in thread:**Jeff Medina: "Re: A difficulty with AI reflectivity"**Reply:**Jeff Medina: "Re: A difficulty with AI reflectivity"**Reply:**Christian Szegedy: "Re: A difficulty with AI reflectivity"**Reply:**Ben Goertzel: "RE: [agi] A difficulty with AI reflectivity"**Reply:**Michael Roy Ames: "Re: A difficulty with AI reflectivity"**Reply:**Wei Dai: "Re: A difficulty with AI reflectivity"**Reply:**Maru: "Re: Weaknesses in FAI"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

*
This archive was generated by hypermail 2.1.5
: Wed Jul 17 2013 - 04:00:49 MDT
*