From: Stuart Armstrong (firstname.lastname@example.org)
Date: Wed Jul 16 2008 - 08:03:40 MDT
Working from the source code, and the current state (or a recent
state), up to the result "this entity would not defect in most
reasonably expected circumstance" seems to be a highly non-trivial
Unless the SI's are for some reason running at a tiny fraction of
capacity, they would not be able to do the simulations needed to check
such a thing; and the quantity of simulations needed increases if you
need to be certain for a long time, or even in usual circumstances.
The "reasonably expected circumstances" is another problem.
Unless, of course, there are mathematical shortcuts, theorems that
could obliviate the need to match every possible input with the
outcome "defects-doesn't defect". Some source codes could allow an
outside SI to figure out, with great confidence, whether the SI would
defect or not. Call such SI's predictable. It may become an advantage
to be a predictable SI, even if the source code is otherwise inferior.
> Actually, the difficulty I had in mind was the seeming impossibility of
> *proving* one's source code to another. Sure, one SI can just send her
> source code to another, but how does the other SI know that it's not just
> some source code that she wants him to think she's running?
In principle, this problem seems insolveable; there is nothing to stop
an SI having a source code that says: "behave like this for ten
thousand years, then switch to this other, secret source code".
In practice, there might be hope. If one SI had the details of how the
other was historically constructed; if she has access to the full
memory of the other, the physical makeup, follows her subroutines, and
is convinced that the source code is robust against being overthrown
by a secret command of the type above, then trust may be possible.
Again, having a mathematically predictable source code may be an
> It seems that if it *were* possible to prove one's source code to another,
> interesting things besides superrationality, like for example self-enforcing
> contracts, would be possible. (One simply modifies one's source code to
> guarantee attempts to fulfill the contract terms and then prove the new
> source code to the counterparties.)
In principle, impossible, in practice possible (and then, finally,
possible in principle). The reason is that unless they are willing to
put enforcing the contract as their only goal, then the same problem
of matching possible circumstances to behaviour becomes apparent. You
don't know what inputs would cause the other to break the contract;
therefore you can never be certain they won't appear during the
carrying out of the contract.
However, in practice it should be quite doable. Given access to the
other's code under reasonable circumstances, getting a certain
expectation of trust seems plausible.
Then once the "reasonable circumstances", "expectation of trust" and
the potential error are expressed in mathematical form, they could be
incoporated into the contract, and then get a rigourously self
This archive was generated by hypermail 2.1.5 : Tue May 21 2013 - 04:01:04 MDT