Reasoning about delayed gratification (was Re: Simulation argument in the NY Times)

From: Tim Freeman (tim@fungible.com)
Date: Mon Aug 20 2007 - 06:35:48 MDT


From: Matt Mahoney <matmahoney@yahoo.com>
>[1] Legg, Shane, and Marcus Hutter (2006), A Formal Measure of Machine
>Intelligence, Proc. Annual machine learning conference of Belgium and The
>Netherlands (Benelearn-2006). Ghent, 2006.
>http://www.vetta.org/documents/ui_benelearn.pdf

Excellent reference. Thanks for posting it. I like how they deal
with the problem of how to discount delayed gratification. Just
incorporate the utility adjustment as a consequence of delay into the
environment, and require the total reward from the environment to be
no more than 1. (That's equation 2 on page 5.)

I had never seen a principled, parameter-free way to do that before.

-- 
Tim Freeman               http://www.fungible.com           tim@fungible.com


This archive was generated by hypermail 2.1.5 : Wed Jul 17 2013 - 04:00:58 MDT