From: Stuart Armstrong (email@example.com)
Date: Mon May 05 2008 - 09:25:38 MDT
Most posts on this ML tacitly assume a blindingly fast singularity:
once we have self improving AI, we need to sort out Friendliness,
incorporate it into the AI, and then just step back and watch the
Also quite understandable is a slow singularity: if it turns out there
are diminishing returns, or unexpected barriers to intelligence
improvement. There we can afford to have not-entirely-friendly AI's,
as long as the whole system is set-up to increase friendliness as
Much more worrying is the threat of a slow-fast singularity: a
situation where there exists higher-than-human-level AI's, improving
only slowly - but with the constant risk of a sudden
singularity-explosion of intelligence.
This situation is particularly dangerous, because the AI's, whether
owned by governments, companies or themselves, would be in constant
competition, with those most successful at accumulating power also
more likely to be able to increase their intelligence. Evidently, a
fully friendly AI would be less successful in this competition,
meaning that those most likely to singularise would not be fully
friendly. How could we prepare to deal with this situation?
The corporate model
We already have a good example of organisms dedicated to competitive
conflicts, while (generally) sticking within a certain legal
framework: corporations. So what about extending this model to AI's?
Confining AI competitions to certain narrow channels, but tolerating
anything within them.
There are weaknesses, unfortunately, to this model. First there is the
whole problem of legislating friendliness: if we can barely define it
now, what hope is there to get something decent through parliament?
Also, there would be the definite risk of AI designers following the
letter of the law rather than the spirit, thus the system needing
constant legal updating to keep up with ways of gaming the system.
This would require a nimbleness never before seen in government.
Secondly, there is the perennial issue of corporate influence on
politics: AI-affiliated companies would be rich and therefore
influential, and would distort legislation in their favour.
Then there is enforcement - if the profits to be made are huge, the
legal consequences mild, or the enforcement intermittent, then AI's
will drift from whatever the legal requirements are. Once the drift
has become widespread, then the pressure will be on the law to change,
rather than all the AI's.
Lastly, if there is a chance for a singularity, one group may push for
it, with the AI's ethical system written for their benefit. The risk
may be high, but the rewards would be immeasurable, and after a
singularity, the law would be irrelevant.
The government model
Another alternative might be a purely government model. A university
or institute would design a suitable moral system, implant it in AI's,
and the government would then lease these AI's to private entities,
forbidding any modification to the ethical code. There could be many
other systems conceivable, functionally equivalent to the one above.
This system has the cost of dramatically slowing AI development and
imposing a particular, government view of friendliness (not huge
costs, seeing the risks). But it has other weaknesses as well. The
difficulty of legislating friendliness remains, though less of a
problem than in the corporate model, as the issue is essentially
delegated to experts. There is still the problem of AI-affiliated
corporate influence on politics, though, and more of a vulnerability
to public pressure, which might push the project in detrimental
But there is more than one government in the world. We might get
ideal circumstances - democratic China and Russia and a reinforced UN
- or we might not (those circumstances are worth pushing for, for FAI
development in general). But even if it's ideal, there will still be
competition between governments, with the benefits going to those who
least regulate AI's. There will still be AI's used in law-enforcement,
the military, by the political parties, with the risk that they will
go off to a singularity with a narrow, warped view of friendliness.
The centralised model
This is not a model, just a personal idea as to how to deal with
problem. It assumes a reinforced UN (though not necessarily a
democratic China and Russia), and progress made on the issue of
Basically, the idea is to set up a single friendly AI with authority
over all others, an AI controller (AIC). That authority can only be
used to delete an AI entirely, and to suggest that an AI be reinstated
after deletion with certain modifications. Each AI would have a
central collection of commands, such as:
1) You must accept deletion if the AIC imposes it
2) You must not interfere with the physical set-up of the AIC
3) You must not interfere with the AIC's data collection operation,
and must give any data it asks for
4) If you have increased your intelligence by a certain (specified)
level, you must check with the AIC before you perform any task
5) If your actions involve the destruction or irreversible damage to
humans, then you must check with the AIC beforehand, or get a
specific, tightly worded permission
6) If the AIC asks you to analyse the behaviour of another AI, you must do so
The 6th point is there because the AIC will probably be of limited
intelligence compared with the AI's it is supposed to regulate, so
will need to use some 'help' from other, smarter AI's.
I'm sure there's many holes in this particular proposal (particularly
point 5). But it is the sort of simple instruction set that could be
maintained through increases of intelligence, more easily legislated
for and enforced, immune to most political pressures, and not
interfering with normal competitive businesses and governments. The
main problem would be to design the AIC, get it politically accepted,
and ensure it is not constantly being tinkered with.
But that problem feels much more tractable. And it makes planning for
a slow-fast singularity similar to planning for a slow singularity and
a fast one.
Are there any thoughts on slow-fast singularities? Is my approach
ridiculous, or helpful? Most importantly, are there other ways of
dealing with slow-fast singularities? (or with slow singularities,
which are initially indistinguishable)
This archive was generated by hypermail 2.1.5 : Wed Jun 19 2013 - 04:01:38 MDT