"> That is to say, the “right” behavior is s..."

That is to say, the “right” behavior is surrounded by a massive crater of “good enough” behaviors, and in the long-term they all converge to the same place. We just need to land in the crater.

This does seem true if you're talking about acts in a human distribution. i.e. if you've smeared actions out over a space s.t. a uniform probability density over that space is roughly the distribution of human actions. Then, actions near the "good enough" behaviors might also be good.

If you're optimizing, and not sampling from a known distribution over human actions (i.e. quantilizing or similar), then it looks like you'll still get problems with unforeseen maxima, edge instantiation and the like, problems that could easily end up with catastrophic outcomes.