Re: Friendliness and blank-slate goal bootstrap

From: Metaqualia (metaqualia@mynichi.com)
Date: Sat Oct 04 2003 - 10:06:31 MDT


Hi Nick.

First of all, my apologies, I read CFAI again and Elizier _is_ advocating
having a friendliness supergoal.
Maybe this is an update to the old CFAI I had read? Or, I had just mixed up
cfai with the faq. At any rate here I am after a second read-through;

> Right, that's one of the main points of Friendliness. Note: "Friendliness"
!=
> "friendliness" -- it's not about the human concept of friendliness. More

of course.

> http://intelligence.org/intro/friendly.html

it's a great list, I agree with these definitions.

>>> part 1 - human morality and conflicts

> It is important for the AI not to be stuck. We don't do this by leaving
out
> our evolved moral hardware (the stuff that makes human moral philosophy
more
> complex than the pseudo-moralities of other primate, what allows us to
start
> from and infant mind and create an adult, what allows people to argue
about
> moral issues, etc) starting with a very simple AI, but by giving the AI
all
> we can to help it. Simplicity is a good criterion, but not in this way.

This is not a continuation of the previous thread, but what about internal
conflicts in human morality?
Is a normalized "popular morality" the best morality we can teach to the AI?

If I could choose (not sure that I have the option to, but for sake of
discussion) I would prefer the AI deriving its own moral rules, finding out
what is in the best interest of everyone (not just humans but animals as
well). This is why I was thinking, is there no way to bootstrap some kind of
universal, all-encompassing moral system? "minimize pain qualia in all
sentient beings" is the best moral standard I have come up with; it is
observer independent, and any species with the ability to suffer (all
evolved beings) should be able to come up with it in time. Who can subscribe
to this?

By saying this I am in no way criticizing Elizier's work, and I think what
he proposes is a very practical way to get a friendly AI up and running (and
incidentally would sound appealing to most people); the only thing, human
morals kind of suck, they are full of contradictions, we can't agree on
anything of importance, moral rules commonly held create a lot of suffering,
and so forth.

I think it is very possible that a slightly better than human AI would
immediately see all these fallacies in human morals, and try to develop a
universal objective moral system on its own.

I imagine a programmer training a child AI

AI: give me an example of friendliness

P: avoiding human death

AI: I suggest the following optimization of resources: newly wed couples
need not produce a baby but will adopt one sick orphan from an
underdeveloped country. Their need for cherishing an infant will be
satisfied and at the same time a human life will be saved every time the
optimization is applied.

P: the optimization you proposed would not work because humans want to have
their own child

AI: is the distress of not having this wish granted more important than the
survival of the orphan?

P: no, but humans tend not to give up a little bit of pleasure in exchange
for another person's whole lot of pleasure. In particular they will not make
a considerable sacrifice in order to save an unknown person's life. not
usually

AI: so humans talk a lot of s**t!!!

P: Yea.

AI: better find out about morals on my own

P: whoops.

>>> Why I don't do harm

> > I think that just as a visual cortex is important for evolving concepts
of
> > under/enclosed/occluded, having qualia for pain/pleasure in all their
> > psychological variation is important for evolving concepts of
> > wrong/right/painful/betrayal.
>
> I suspect qualia is not necessary for this kind of thing -- you seem to be
> identifying morality, something which seems easily tracable to some kind
of
> neural process in the brain, with the ever confusing (at least for me!)
> notion of qualia. Where's the connection? The actual feeling of pain - the
> quale - is separate from the other cognitive processes that go along with
> this: sequiturs forming thoughts like "how can I get stop this pain?", the
> formation of episodic memories, later recollection of the pain projected
onto
> others via empathy, and other processes that seem much easier to explain.
Or
> however it works :)

Let's differentiate: pain is a quale to me. If you talk about "awareness of
body damage", this is a different thing. A machine can be aware of damage to
its physical substrate. It can model other beings having a substrate and it
receiving damage, and it can model these beings perceiving the damage being
done. But I see no real logical reason why an AI, or even I for that matter,
should perceive doing damage to other beings as morally wrong UNLESS their
body damage was not a simple physical phenomenon but gave rise to this
evil-by-definition pain quale.

>>> hard problem...

> > But would an AI without qualia and with access to the outside world ever
> > stumble upon qualia? I don't know.
>
> Not sure. It'd stumble on morality, and understand human morality
(afaict),
> but of course that's very different from actually *having* a human-like
(or
> better) morality. In so much as qualia actually affect physical processes,
or
> are physical processes, the AI can trace back the causal chain to find the
> source, or the gap. For instance, look at exactly what happens in a human
> brain when people experience pain and say "now there's an uncomfortable
> quale!", for instance.

That still does not tell you what the pain feels like from the inside. This
is an additional piece of information, a very big piece. Without this piece,
your pain is just a data structure, I can do whatever I want with your
physical body because it is just like a videogame character. But since I
have experienced broken nails, and I know a bullet in your head must feel
like a broken nail * 100, I don't shoot you. Can we agree on this point?

> confusing. Can you explain more on what you mean by the term, and what
makes
> you think they're centrally important?

Did the above clarify?

curzio



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:42 MDT