Reflective consistency

by Eliezer Yudkowsky Mar 22 2016

A decision system is reflectively consistent if it can approve of itself, or approve the construction of similar decision systems (as well as perhaps approving other decision systems too).

A decision system is "reflectively consistent" if it can approve the construction of similar decision systems. For example, if you have an expected utility satisficer (it either takes the null action, or an action with expected utility greater than $~$\theta$~$) then this agent can self-modify to any other design which also either takes no action, or approves a plan with expected utility greater than $~$\theta.$~$ A satisficer might also approve changing itself into an expected utility maximizer (if it expects that this self-modification itself leads to expected utility at least $~$\theta$~$) but it will at least approve replacing itself with another satisficer. On the other hand, a [causal_decision_theory causal decision theorist] given a chance to self-modify will only approve the construction of [son_of_cdt something that is not a causal decision theorist]. A property satisfies the stronger condition of reflective stability when decision systems with that property only approve their own replacement with other decision systems with that property. For example, a Paperclip maximizer will under ordinary circumstances only approve code changes that preserve the property of maximizing paperclips, so "wanting to make paperclips" is reflectively stable and not just reflectively consistent.