Re: AI Jailer.

From: Moshe Looks (moshel@cs.huji.ac.il)
Date: Tue Jul 16 2002 - 02:40:14 MDT


James Higgins wrote:
> > How about asking the AI to produce a written document justifying its own
> > Friendliness design and safeguards as adequate? This paper could then be
> > sent around to lots of people who had no direct contact whatsoever with
> > the AI, who would try and poke holes in it. Unlike humans, an AI could
>
> This would be interesting, and maybe even helpful, but would prove
> nothing. Even an AI as stupid as I am could trick people using this
> method. First, the AI finds an extremely subtle method to accomplish
> its goals. Re-engineer itself as necessary to be able to exploit this.
> Then it writes an extremely thurough paper, even imposing safeguards
> that are not currently in place. Even better, suggest some that would
> *appear* to prevent the method the AI intends to use. If done properly
> it is unlikely anyone would miss this tiny crack in the armor.
>
The trick is not to have the AI justify a new design, but the original
design
that was implemented by human programmers. If the current (possible
redesigned)
AI is Friendly, it is only by virtue of the original design. The
"fool-the-humans-strategy" that you suggested would not work here, since
AI cannot alter the the original design, only comment on it.

Moshe



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:40 MDT