In the context of a Task AGI, one application of what we call 'conservatism' is the Burrito Problem\. Suppose I show the AI five burritos and five non\-burritos\. Rather than learning the simplest concept that distinguishes burritos from non\-burritos and then creating something that is maximally a burrito under this concept, we would like the AI to learn a simple and narrow concept that classifies these five things as burritos according to some simple rule \(not just the rule, "only these exact five objects are burritos"\) but which also classifies as few other objects as burritos as possible\. This concept however must still be broad enough to permit the construction of a sixth burrito that is not molecularly identical to any of the first five\. But not so broad that the burrito includes butolinum toxin \(because, hey, anything made out of mostly carbon\-hydrogen\-oxygen\-nitrogen that looks like a burrito ought to be fine\)\.
To me, the most natural way to approach this is to take a probability distribution over "what it means to be a burrito," and to produce a thing that is maximally likely to be a burrito rather than a thing which is maximally burrito-like. Of course this still depends on having a good distribution over "what it means to be a burrito" (as does your approach).