Re: Friendliness not an Add-on

From: Philip Goetz (philgoetz@gmail.com)
Date: Sat Mar 04 2006 - 10:43:22 MST


It seems to me everyone is making a mistake in thinking that checking
for friendliness of a program is like checking whether the program
halts.

Computational proofs of validity, halting behavior, etc., must
consider every component of a program or theorem. Friendliness
involves only the actions taken by an AI. Given 2 AIs that construct
identical plans of action, they have equivalent Friendliness, even if
one has evil intent and one has benign intent. You don't need to look
inside the computations. You only need to check the proposed course
of action.

This means, for example, that protests that a program to verify the
Friendliness of an AI's output must be as complex as the AI itself.
This is wrong, since the complexity of the AI's output is many orders
of magnitude less than the complexity of the AI itself.

What I've just said makes the argument for friendliness verifiers
easier. However, I'm not on the side of the verifiers - I think there
is absolutely no hope of being able to formally verify anything about
the results of a proposed course of action in the world. Given any
formal system to prove actions benign, I could make it prove any set
of actions benign by rigging the perceptual system that connected the
action representations with their real-world semantics. Besides
which, decades of experience shows that systems with provable
properties are useless in the real world.

- Phil



This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:56 MDT