What if there's an expiration date? (was Re: Arbitrarily decide who benefits)

From: Tim Freeman (tim@fungible.com)
Date: Mon Apr 28 2008 - 11:37:16 MDT

From: "Samantha Atkins" <sjatkins@gmail.com>
>Why would we believe we have the ability to determine the "final ethical
>system" of entities many orders of magnitude smarter than us not to mention
>able to self-improve indefinitely? We have some leverage in the initial
>goal system.

Well, the standard argument for this is that rational entities act to
preserve their goal system. Otherwise they'd be doing something in
the future that's different from what they presently want done then.
So the initial goal system is the final goal system.

Has anyone tried to formalize this? It doesn't make sense when I look
at it closely. For example, my scheme [1] has an expiration date. It
takes action to optimize its utility function at a specific time
that's determined at the beginning, and when it's done, it's done. It
might then have set up some mechanism to be run again with a different
expiration date, or the humans might decide to start it again, or it
or the humans might run something else. Any of these possibilities,
if they happen, would be a natural consequence of the AI having put
the world into the state it thinks best at the expiration date, not a
continuation of the planning process it was doing before the
expiration date.

If there's no expiration date, I'm concerned about indefinitely
deferred gratification, where the AI is always putting off giving
people what they want so it give them more of what they want later in
the case where it thinks it can invest the utility points better than
the humans.

Tim Freeman               http://www.fungible.com           tim@fungible.com
[1] http://www.fungible.com/respect/paper.html#time-horizons

This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:02 MDT