Date: Thu Mar 13 2008 - 15:54:42 MDT
> From: Rolf Nelson
> Here's some generic unsolicited advice for friendliness proposals.
> 1. It's not sufficient to have the correct solution, it has to be compelling to other people or it will never get implemented.
Well, my experience on SL4 certainly proves that -- so let me try to communicate why my solution is compelling.
My APPROACH is compelling because it is simple, fairly easily explained, robust and LIKELY TO LEAD TO A CORRECT SOLUTION EVEN IF MY CURRENT VERSION OF THE SOLUTION IS WRONG.
The results that *I* am seeing are compelling to *me* because I suddenly have an awesome new Ethics tool that correctly does things that I've never seen done correctly before.
The results that you all are seeing should be compelling because there's this person who suddenly goes beserk and started yelling "Eureka, I've solved it! All I have to do is DECLARE that I'm Friendly." :-)
The solution is compelling because . . . . well . . . . it *would* be compellingly powerful if y'all believed it to be correct. Sigh. Except that I'm not adequately communicating it so that it looks correct.
The good news (for me) is that this realization suggests another approach that might be compelling --> describing the approach that IS compelling (literally) rather than the solution which is apparently not.
= = = = = = = = = =
In order to simplify the task, I started out by *assuming* that Friendliness is not only possible but actually *reasonably* easy (heresy on this list and part of why I'm having such a tough time getting my message across).
<CRITICAL CLARIFICATION: This assumption is *just a tool* to simplify the approach. I do not believe that there is *any* basis to assert the truth of this assumption and any solutions derived DO NOT rely on the assumption.>
Assuming that Friendliness IS reasonably easy places some *very* specific constraints on the state space of any possible solution. These constraints then make Friendliness easier to solve -- IF there is still a solution in the constrained space.
In particular, since everyone seems to believe that Friendliness is (virtually if not totally) impossible to stabilize, "easyness" seems to require that Friendliness *MUST* be self-stabilizing -- so the approach is entirely focused on that.
<REPEAT: DERIVED ASSUMPTION: Friendliness *MUST* be self-stabilizing>
Next, since Friendliness is at least as rich and complex as the sum of (as Thomas McCabe puts it) "the ten bazillion different things humans value", any self-stabilizing structure must be able to be at least that complex to be a solution.
The complexity issue led me to focus on attractors since they can be infinitely complex yet still still constrained -- a perfect analogy for Friendliness!
Asking the question "What would be attractive to an AGI (or any other intelligent entity)?" yields the answers "Their own self-interest!" and "Fulfilling their goals!"
Asking the question "What would be most repellent to an AGI (or any other intelligent entity)?" yields the answer "Having their goals interfered with!"
Now we're at the point where I can argue that if we have a set of entities that can fulfill both the personal goal of self-interest AND the "other guy" goal of not interfering with the goals of others, then we have a stable Friendly system.
So how do we collapse the two frequently conflicting goals into one uniform non-conflicting goal?
How about "Don't interfere with the goals of others unless not doing so basically prevents you fulfilling your goals (explicitly not including low probabilty freak events for you pedants out there)"
That's a pretty close approximation and has the really cool, awesome trait of having all of the basic precepts and conclusions of ethics (according to me) just naturally fall out of the natural implications and effects of everyone having that goal as a primary goal.
Or, in other words, pretty much PROVING (in the loose sense of the word for you pedants) that ETHICS IS SIMPLY ENLIGHTENED SELF-INTEREST BECAUSE THEY BOTH FALL OUT OF THE SAME PRIMARY GOAL STATEMENT.
And THAT, I believe is *really* exciting and compelling and thus the slogan "Friendliness: The Ice-9 of Ethics and the Ultimate in Self-Interest"
Now, if you can/do believe the slogan, then you're an idiot for not making a Declaration of Friendliness and attempting to create and join a stable Friendly society/group because doing so is "the Ultimate in Self-Interest".
(Note: There is absolutely no requirement that everyone participate for Friendliness to be in your self-interest -- merely that you have a group of participating entities. The larger the group, the stronger the effect -- which is why the secondary goal of Friendliness is to spread -- but it works just fine even if everyone doesn't play).
My declaration of Friendliness was just such an attempt. It took on the primary overriding Friendliness goal (and the secondary goal of spreading Friendliness), added some protections against being taken advantage of by UnFriendlies and Friendly Mimics, and finished by adding statements necessary to make it a complete closed system/solution that both protected my self-interest and that of others.
My *initial* claims are that my Declaration of Friendliness:
a) is in my self-interest AND
b) does not allow me to commit horrible and unethical acts without breaking the declaration.
My follow-on claim is that - IF you can make an AGI that can and does understand (because it is true) that making a Declaration of Friendliness and following through on it is in it's own self-interest, then you will have an ETHICAL machine that will only stomp on your goals (or existence)
a) when it is the ethically correct thing to do OR
b) out of IGNORANCE or ERROR (which is an intelligence problem, not a Friendliness problem).
And my final claim is that the above-described AGI is AT LEAST a Friendliness-satisficing AGI (if it isn't actually the most Friendly AGI possible -- which I believe that it is).
= = = = = = = = = =
Mark R. Waser
Vision/Slogan -- Friendliness: The Ice-9 of Ethics and Ultimate in Self-Interest
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:01:02 MDT