by Paul Christiano Mar 20 2016 updated Mar 20 2016

I think the key question is whether:

  1. the burrito judge needs to be extremely powerful, or
  2. the burrito judge needs to be modestly more powerful than the burrito producer.

In world 1 I agree that the burrito-evaluator seems pretty tough to build. We certainly have disagreements about that case, but I'm happy to set it aside for now.

In world 2 things seem much less scary. Because I only need to run these evaluations with e.g. 1% probability, the judge can use 50x more resources than the burrito producer. So it's imaginable that the judge can be more powerful than the producer.

You seem to think that we are in world 1. I think that we are probably in world 2, but I'm certainly not sure. I discuss the issue in this post.

Some observations:

So I don't think that we can just ask the judge to evaluate the burrito; but the judge has enough going for her that I expect we can find some strategy that lets her win. I think this is the biggest open problem for my current approach.