From: Ben Goertzel (firstname.lastname@example.org)
Date: Wed Feb 23 2005 - 12:36:49 MST
> Now, lets say that AI0 and AI1 are both evaluating an improved
> system, AI'.
> If AI0 accepts AI', that implies that, to the best of its knowledge, AI'
> will not only be nice to puppies, but lead to puppy-niceness
> going forward
> in time as well (possible including any future AI''s). Now, a subgoal of
> this should be the ITSSIM property "I will only take an action if I can
> prove that it is expected not to decrease safety.", because
> decreased safety
> (in the ITSSIM sense) should clearly lead to decreased
> puppy-niceness over
> the future course of the universe.
This latter sentence is the part of your argument I don't agree with.
It is not clear that decreased safety on the part of the AI is necessarily
going to decrease puppy-niceness.
It could well be that the AI can prove that the way for it to maximize
expected puppy-niceness is for it to take a big risk by no longer being so
methodical -- by giving up on preceding each of its actions with a formal
proof, and just devoting its time to helping puppies rather than to proving
each of its actions. In fact it will usually be the case that most goals
can be achieved better, on average, without going through so much formal
The purpose of ITSSIM is to prevent such decisions. The purpose of the
fancy "emergency" modifications to ITSSIM is to allow it to make such an
decision in cases of severe emergency.
A different way to put your point, however, would be to speak not just about
averages but also about extreme values. One could say "The AI should act in
such a way as to provably increase the expected amount of puppy-niceness,
and provably not increase the odds that the probability of puppy-niceness
falls below 5%." That would be closer to what ITSSIM does: it tries to
mitigate against the AI taking risks in the interest of maximizing expected
The problem is that this relies really heavily on the correct generalization
of puppy-niceness. Note that ITSSIM in itself doesn't rely on
generalization very heavily at all -- the only use of generalization is in
the measurement of "amounts of knowledge." I think that a safety mechanism
that relies heavily on the quality of generalization is, conceptually at
least, less safe than one that doesn't. Of course this conclusion might be
disproven once we have a solid theoretical understanding of this type of
generalization. I see no choice but to rely heavily on generalization in
the context of "emergency measures", though, unfortunately...
This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:50 MDT