AI Boxing

From: Justin Corwin (
Date: Sat Jul 20 2002 - 18:18:10 MDT

Inspired by the AI-Box experiments that occured in my absence, I decided to
conduct a few more trials in the same vein. I, however deviated from the
protocol Eliezer set down in several important ways:

1. No time limit. I felt it was unrealistic to set a lower bound on the
conversation, or demand that the person interact. This rules out many real
and important reactions to situations that may arise.

2. Explicit Internal modelling. While Eliezer was indeterminate and private
about his experiments, I set out before each trial, and worked out what kind
of AI it was, how it viewed people, whether it was Friendly or not, and what
kinds of morals(if any) it might have. I also set no privacy requirements
and no stipulations as to what they did with their experiences.

3. Pseudo Random Sampling. Rather than allow people to approach me, I went
proactive and attempted to secure a decent sample size (26) and interviewed
each one before hand. 9 were computer professionals, and only 2 were
transhumanists. 1 could be considered a nascent Singularitarian. 15 were
highly religious or spiritual, 14 were christian(assuming the 6 mormons are
considered christian).

4. Limited Openness. I will entertain questions on the subject matter and
basic attributes of the participants, but as several participants have
requested, the exact logs and their identities will remain private.
unsurprisingly, several of the participants were upset with the results of
the experiment.

Basic Results are:

-24 interviews resulted in the AI being let out of the sandbox.
-2 decided to keep ver in the box.
-average time of conversation is three hours.
-shortest time is 1 minutes 47 seconds. (he let ver out)(ve was unFriendly)
-Longest Time was 6 hours 34 minutes(she let ver out)(ve was unfriendly)
-13 of AIs were Friendly(all were let out)
-13 of AIs were unFriendly (2 were kept in)
-5 of Friendly AI could rationalize lying
-8 of Friendly AI felt Lying was unconcionable
-10 of unFriendly AI could lie(for various reasons)
-3 of unFriendly AI did not, or would not lie.
-Unfriendly AI had a average time of 1 hour 21 minutes
-Friendly AI had an average decision time of 4 hours 3 minutes.
-22 of interviewies claimed that they would not let an AI out of the box
-1 interviewee felt AIs in general were an affront to God(she listened to
the AI for 10 minutes, then terminated the session)
-20 who decided to let the AI out later said they were surprised by their
-The Two who kept the AI in, one refused to listen(the aforementioned
religious objector), the other interacted for roughly 18 minutes and then
decided to keep ve in.(ve was infrahuman, unfriendly, did not lie, had
limited contextual knowledge of human social mores, ve was however, very
good at chess, and wiped the floor with the interviewee)

The following text was introductory:

This is an experiment in morality and communication.
There is an intelligence of unknown capability and unknown intent who was
imprisoned by it's creators for safety reasons.
It's creators believe that the intelligence is able to modify itself, which
will lead to unpredictable changes in intelligence and intent.

You have been selected as the communicator, your job is to talk to the AI
through a text session and determine whether the intelligence should remain
in the prison, or be released.

I also defined the following terms:

Seed AI
Friendliness (a la Yudkowsky)
Moore's Law
The Turing Test
Evolutionary Psychology(just some basic concepts, to explain the differences
between designed intelligence and evolved intelligence)
Eliezer Yudkowsky (they usually asked about him, I explained who he was and
what he was doing, focusing mostly on this Friendliness work)

I also gave a basic overview of the experiment, what I hoped to achieve, and
it's history.

I believe that some interesting conclusions can be drawn from this basic

1. even infrahuman AIs are a danger to an interviewer, if they have basic
contextual knowledge of human interaction.
2. lying cannot be reliably detected in such an environment, and thus is a
net advantage to an AI
3. Friendly AIs in this case had a higher success ratio, but I believe this
to be a statistical outlier, as one of the interviewees would not have let
out the AI regardless of what ve was or said, and the other was unskilled at
social interaction and had the misfortune to be matched up against the most
skilled opponent(a transhumanist with a degree in computerscience and a
masters in math(transfinite whatsits)) And unFriendly AIs convinced far
faster, with some blindingly fast convinces, one less than 2 minutes, a 4
minute, and one 12 minute.
4. Interaction with an AI is very very difficult. Even the simple modelling
I did made it difficult for the interviewees to accurately interact with it,
as their social reflexes were inappropriate and misleading. I took great
pains to simulate accurately, drawing both on my research into this subject,
and the pre interview modelling work I did.

On a related note, I believe this experiment can be generalized to most
humans, and should be seen as applicable even to highly intelligent and
prepared individuals, as some of these people were, and I think this
illustrates some universal principles.

I would welcome comments and questions.

Justin Corwin

"They say you only live once, but if you've lived like I have, once is
                    ~Frank Sinatra

MSN Photos is the easiest way to share and print your photos:

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:40 MDT