Re: Suggested AI-Box protocol & AI-Honeypots

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Sat Jul 06 2002 - 17:47:58 MDT

Next message: Mike & Donna Deering: "Re: AI Jailer."
Previous message: Michael Warnock: "Re: Suggested AI-Box protocol & AI-Honeypots"
In reply to: Michael Warnock: "Re: Suggested AI-Box protocol & AI-Honeypots"
Next in thread: Tomaz Kristan: "Re: Suggested AI-Box protocol & AI-Honeypots"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Michael Warnock wrote:
>
> This seems very complete. Most of my ideas for how the AI party has
> been successful are not within the protocol. The only remaining notion
> I think to be reasonable is that Eli is convincing the Gatekeeper party
> that letting him out now increases the chances of real FAI by increasing
> the thought and self-doubt surrounding AI-Boxes and Friendliness.
> This too may be judged to be using a real-world Eli tactic such as a
> secret bribe, which breaks the first of the AI protocols.

That's why the protocol says "no real-world *material* stakes" - bribes
are ruled out but not other means of convincing the Gatekeeper to break
character, if you can manage to do so.

> I tend to think that an AI-Box with a single perimeter and an on/off
> switch would not do the job intended, but possibly influence the AI's
> perception of our own friendliness negatively.

That shouldn't make any difference. We are not dealing with an evolved
retaliator.

> On the other hand I
> think it's important that we not run programs on any substrate capable
> of unaided self-extension at the hardware level without extremely well
> thought-out safe guards.

There are no safeguards against a transhuman. If you want to win, do it
before then.

> If a transhuman AI could exist purely as software on more conventional
> hardware (barring quantum mechanisms in our own brains or extremely
> high requirements), then it should be possible to maintain hardware parity
> that is not under the influence of said software. This parity hardware
> would allow inspection of any and all code. The usefullness of this,
> as I see it, depends upon how algorythmic vs connectionist the program
> is and its level of complexity in general. Even given the most readable
> code, however, a transhuman AI, which has spent most of its existance
> working with code, including its own, would likely be able to slip one
> by even the most expert panel of code-reviewers.

Yep.

> So what else could be done along these lines? How about the
> equivalent of a honey-pot. In a traditional *N*X honeypot a
> system image is placed within a host environment to which it has
> no access or even indication that it is there. All outside communication
> is routed to the contained system image. Checks are made of the
> integrity of the image to alert of the hoped-for intrusion. Much can
> be learned about a particular cracker by tricking him into your
> honeypot.
> To produce the equivalent for a transhuman AI may prove
> impossible if it is able to harness superior hardware technology
> within its own perimeter. Also, if an UAI deduced the existance
> of the honeypot it would likely pretend to respect the first
> boundary so as to be let out or wait for an appropriate time to
> blow through the first and attempt the second. At least I think there
> is some possible use in this, though, if an UAI is let into what it
> thinks is the base level of reality and makes its character known,
> but is still contained.

IMO it would be very easy for a transhuman AI (UAI or FAI) to deduce the
existence of the honeypot.

-- 
Eliezer S. Yudkowsky                          http://intelligence.org/
Research Fellow, Singularity Institute for Artificial Intelligence

Next message: Mike & Donna Deering: "Re: AI Jailer."
Previous message: Michael Warnock: "Re: Suggested AI-Box protocol & AI-Honeypots"
In reply to: Michael Warnock: "Re: Suggested AI-Box protocol & AI-Honeypots"
Next in thread: Tomaz Kristan: "Re: Suggested AI-Box protocol & AI-Honeypots"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:40 MDT