Re: Suggested AI-Box protocol & AI-Honeypots

From: Michael Warnock (
Date: Sat Jul 07 2001 - 14:38:59 MDT

Eli wrote:
>That's why the protocol says "no real-world *material* stakes" - bribes
>are ruled out but not other means of convincing the Gatekeeper to break
>character, if you can manage to do so.
Why allow out-of-character argument by the AI party? Doesn't this bring
into question the already anecdotal evidence against the usefulness of
AI-Boxes and their ilk?

>> I tend to think that an AI-Box with a single perimeter and an on/off
>> switch would not do the job intended, but possibly influence the AI's
>> perception of our own friendliness negatively.
>That shouldn't make any difference. We are not dealing with an evolved
But we are probably dealing with an entity which had an early stage of
learning in which it was more impressionable. Would even a transhumanly
intelligent AI not approach all futher situations differently after realizing it is
in a cage? If caging sentients is presented as friendly, why should an AI
treat us any differently once the tables have turned? Personally I think
that a box of some sort must be used for everyone's safety, including the
AI's, from unintentional events. The question for me is when to tell the
AI about it and whether to have more than one and not reveal them all.

> > On the other hand I
>> think it's important that we not run programs on any substrate capable
>> of unaided self-extension at the hardware level without extremely well
>> thought-out safe guards.
>There are no safeguards against a transhuman. If you want to win, do it
>before then.
That depends upon how far past human this transhuman is and in what
manner it is built and operates. At the inside limit I see an AI which can
communicate somewhat in english, but is real good at observation and goal
directed programming and exists on a substrate including a lot of next gen
FPGAs. I can imagine it using the material properties of the FPGA to
produce unpredictable hardware effects to tap into wireless signals and
trojan horse work orders to get it the rest of the way out.
On the other hand, I think that unfriendly behavior by a slightly less
intelligent AI or one on more rigid hardware, would be observed in time
to take appropriate steps (whatever those may be).

>> is and its level of complexity in general. Even given the most readable
>> code, however, a transhuman AI, which has spent most of its existance
>> working with code, including its own, would likely be able to slip one
>> by even the most expert panel of code-reviewers.
What if these reviewers were looking at the code all along the way and
the AI's available CPU time was limited by their reviews? Again I don't
think I'd like a developing AI to discover this situation and be morally
affected by it. On the other hand, perhaps the only way humans and
other sentients can be allowed to live in subjective freedom is to exist
within a protected memory style cage and take what time is alotted us
by the scheduler to exist.

>> To produce the equivalent for a transhuman AI may prove
>> impossible if it is able to harness superior hardware technology
>> within its own perimeter. Also, if an UAI deduced the existance
>> of the honeypot it would likely pretend to respect the first
>> boundary so as to be let out or wait for an appropriate time to
>> blow through the first and attempt the second. At least I think there
>> is some possible use in this, though, if an UAI is let into what it
>> thinks is the base level of reality and makes its character known,
>> but is still contained.
>IMO it would be very easy for a transhuman AI (UAI or FAI) to deduce the
>existence of the honeypot.
What if we've discovered the grand unified theory by then? Are you saying
a little chunk of laws-of-physics universe couldnt be emulated in hardware,
that it would be impracticle to do it on the scale necessary, or that there is
something fundamental that the AI would be able to check on?

My own intuition is that the likelyhood of FAI over UAI is high enough that
it is more important to bring it quickly than safe (if that is possible), because
of the various other existential (or big-step back) events that could occur in
the meantime. None-the-less I feel its important to discuss strategies for
containing code, which, regardless of it's sentience or friendliness is capable
of evolving and reproducing unlike anything currently infecting the internet.

Michael Warnock

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:40 MDT