[sl4] AI's behaving badly (subtitle: There's more to me that utility - why there's society, and possibility too)

From: Stuart Armstrong (dragondreaming@googlemail.com)
Date: Sun Dec 07 2008 - 03:55:40 MST


Dear Tim,

Sorry for the delay; now, a slightly more detailed critique of your approach.
The first problem is that you try and define your terms (compassion,
respect, short-term, long-term) in terms that an AI would understand
(utility function), but think about them in ways that are familiar to
you (see http://en.wikipedia.org/wiki/Intuition_pump).

Use David Hilbert's axiomatic approach here - rewrite your paper but
instead of compassion write "glasses of beer"; instead of respect,
write "hate"; short and long term should be replaced with "thin
buildings" and "fat buildings" respectively. If your terms are
properly defined (and they need to be, to construct an AI), then your
paper should still be just as convincing as before. If not, then you
are allowing your intuitions to override your understanding.

But the main problem is that you are then aware that problems may
emerge from your set up, and then go around "patching" them. For
instance, fixing the way the AI updates its vision of your utility,
etc... The problem is that you are trying to imagine, in advance, all
that could go wrong - and that is something we are spectacularly poor
at. It is basically a work of literature, or imagination - what do I
feel could go wrong here, is there something obvious, is there
something I've read that might be relevant, what are the dystopias
I've seen recently.

But the world can fail in more ways than we can possibly imagine. To
make your approach work, you have to do the reverse: define clearly
what the correct outcome is (it can be a class of outcomes, and each
one need not be defined, but the class has to be), prove that this
outcome (or outcome class) will always be a "good" outcome, and then
demonstrate that your approach will lead into this class. Not patching
what could go wrong, but showing that the situation will inevitably be
right. And to do that, it needs mathematical models, with intelligent
agent, utility function, compassion, respect, long term, short term,
and all those mathematically defined, and experiments or proofs to
demonstrate what class of behaviours the system will end up in.

(Note: it might seem hypocritical that myself, as a firm advocate of
the messy patching process:
http://www.neweuropeancentury.org/GodAI.pdf , would criticise your
method. But there are a pair of small but crucial differences; I do
not need that every error be caught in advance of starting up the AI,
and my class of good outcomes is defined: it is the class of future
events that, when described to us in full and exhaustive details,
would seem good to us now).

In the meantime, here is another problem to patch (and explains the subtitle):
For most people, our short term utilities have much greater salliance
and meaning to us than our long term utilities. This does not actually
mean that we are short term self-centered bastards; it's just that we
live in a world and a society that are designed to ensure that most of
our short term desires fail. If I know I could never sleep with the
boss, it is safer to dream and fantasise and want that. If it could
really happen... then I reign in my desires, and start really looking
at the whole situation.

Therefore it is safe for us to have these short term, unlikely
desires, and to spend so much time on them. Our long term desires, on
the other hand, tend to occupy just a small fragment of our daily
thoughts. Fortunately, a lot of our long term desires do not need much
more than this; a career choice, a few donations to charities or
political movements, the choice of friends to hang out with, an
occasional purchase, maybe the theme for a speech (if we are into
giving speeches). Even for the best of us, the long term desires
simply set up the framework for our short term desires (ie a
speech-writer for the singularity institute and a speech-writer for
the Nazi party will be following similar procedures most days;
worrying about phraseology, considering emotional impact, considering
the audience, wondering how best to get the point accross - there will
be some differences, mainly in rationality, but not enough to say,
just based on most everyday thoughts, that the first is part of a
movement that might save humanity and the other is evil beyond words).
The daily lives of soldiers in similar conditions is virtually
indistinguishable, whatever side they are on etc...

So now the AI comes along, and looks at these utilities. Then, it will
overestimate short terms utility, because that is what we do. And it
will try and grant us our short term desires, unaware that it is
destroying the balance between long and short term utilities. So it
may turn us into spoiled brats; more importantly, if asked "if this
human was put in a room with a button that said "Press this button a
get a ice cream, but ten people will be killed in ten years time" ",
then the AI will make the wrong judgement. And things will start to go
horribly wrong from there on.

Summary: our short term desires are held in check by the possibilities
in the world, so we overemphasise them. Our job can tell more about us
than our everyday utility function. An AI observing us from our
behaviour, would construct a utility function that overemphasises the
short term even more, and completely denigrates the importance of our
job (or our "position in society"). It would then make the wrong
decisions. And, with our short term desires granted, we may change
into beings we wouldn't want to become - because the AI will not
manage the transition skillfully, because that is not its role, nor
does it understand this transition in the way we do.

Anyway, I'm not advocating you patch this problem as well (I'm sure
that I could come up with other holes, and the Eleizer could, and even
after we run out of ideas, be sure there are still holes we haven't
caught), be rethink the approach.

Sorry for the over-long email,

Stuart



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:03 MDT