From: Ben Goertzel (email@example.com)
Date: Tue Feb 22 2005 - 13:57:40 MST
This is just a half-baked thought, but I wonder if there's some workable
version of it...
Maybe one could build in a "grandfather clause" stating that an AI can
optionally violate the safety rule IF it can prove that, when given the same
data about the world and a very long time to study it, its seed AI ancestor
would have decided to violate the safety rule.
This is by no means a full solution, though, because there will be serious
dangers that an advanced AI sees, but the seed AI would have been too stupid
From: Ben Goertzel [mailto:firstname.lastname@example.org]
Sent: Tuesday, February 22, 2005 3:53 PM
Subject: RE: ITSSIM (was Some new ideas on Friendly AI)
Your last paragraph indicates an obvious philosophical (not logical)
weakness of the ITSSIM approach as presented.
It is oriented toward protecting against danger from the AI itself, rather
than other dangers. Thus, suppose
-- there's a threat that has a 90% chance of destroying ALL OF THE
UNIVERSE with a different universe, except for the AI itself; but will
almost certainly leave the AI intact
-- the AI could avert this attack but in doing so it would make itself
slightly less safe (slightly less likely to obey the ITSSIM safety rule)
Then following the ITSSIM rule, the AI will let the rest of the world get
destroyed, because there is no action that it can take without decreasing
its amount of safety.
Unfortunately, I can't think of any clean way to get around this
problem -- yet. Can you?
From: email@example.com [mailto:firstname.lastname@example.org]On Behalf Of David
Sent: Tuesday, February 22, 2005 1:12 AM
Subject: Re: ITSSIM (was Some new ideas on Friendly AI)
I understand how ITSSIM is designed to "optimize for S", and also how
it might work in practice with the one of many possible qualitative
definitions of "Safety" being the concept that if we [humans] desire that
our mind-offspring respect our future "growth, joy and choice", the next N+1
incrementally improved generation should want the same for themselves and
In such a system, supergoals (like, e.g., CV) and their subgoals,
interacting with their environments, generate A (possible actions), to which
R (safety rule) is applied.
I'm very curious to learn how S and SG might interact -- might one
eventually dominate the other, or might they become co-attractors?
Of course, we're still stuck with quantifying this and other definitions
for "Safety", including acceptable margins.
NB: I believe we cannot create an S or an SG that are provably
invariant, but that both should be cleverly designed with the highest
probability of being invariant in the largest possible |U| we can muster
computationally (to our best knowledge for the longest possible
extrapolation, which may, arguably, still be too puny to be comfortably
"safe" or "friendly").
Perhaps the matrix of SB, SE, SN and SGB, SGE, SGN should duke-it-out
in simulation. Although, at some point, we will simply need to choose our S
and our SG and take our chances, taking into account the probability that
Big Red, True Blue, et al, may not have terribly conservative values for S
or SG slowing their progress. :-(
This archive was generated by hypermail 2.1.5 : Tue May 21 2013 - 04:00:47 MDT