Emin's Page

The Two Envelope Problem


UNDER CONSTRUCTION

Greg Wornell told me about the following variant of the "Two Envelope Problem" which he heard from Imre Teletar. The puzzle is as follows.

Variant of Two Envelope Puzzle

I flip a fair coin until it comes up heads. Let the random variable N represent the resulting number of flips. I put 3N in one envlope and 3(N-1) in the other. One of the evelopes is selected at random and given to you. You may look at your envelope and either keep it or switch. Given that you observe Y dollars in your envelope, what should you do to maximize your expected wealth? For example, if you see Y=1 in your envelope, you know for certain that the other envelope contains 3 and so you should switch. But what should you do if you see other values of Y?

Some key points to keep in mind about the puzzle:

  • Y is either 1, 3, 9, 27, etc.
  • In contrast to the original two envelope problem, Y is chosen in a random manner and you could in principle compute the probability for any event you like.
  • Once you think you have the solution, you might want to ask yourself how useful it is to look in the envelope (this is the most interesting part of the puzzle).

  • My solution to the puzzle is presented below. Don't read further unless you want to know the answer.

    Solution

    The solution which maximizes expected wealth given the observation is to always ask for the other envelope no matter how much the first envelope contains. I will first explain why this maximizes the expected wealth given the observation of the amount in the first envelope. After that, I will address the apparent paradox that there appears to be no "value" in looking in the envelope at all.

    Intuition for Solution

    The intuition for why always switching is best is as follows. It's obvious that you should always switch if Y=1 since in this case you know the other envelope contains 3. Thus we will focus on the case where Y>1.Given that you see Y in your envelope you know that either N=log3Y or N=log3(Y-1). It's twice as likely that N takes the smaller value since it's half as likely that you got one more head on the coin flip. Specifically, given that you see Y>1, there is a 2/3 chance that the other envelope contains Y/3 and a 1/3 chance that the other envelope contains 3Y. The payoff for switching is three times larger if the latter is true, however. Thus going for the larger payoff by always switching is the best thing to do conditioned on observing Y. A formal derivation follows at the end of this article for the mathemtically inclined.

    An Apparent Paradox

    The obvious objection to this solution is that always switching and always staying are the same on average and should yield exactly the same expected wealth. Specifically, it is true that expected wealth for always switching and always staying cannot be different if this expecation is taken without conditioning on the observation. Put another way, if I always take the envelope I am given and you always take the other envelope, then by symmetry our expected wealth must be the same. Thus it seems that always switching cannot be the optimal solution. We resolve this paradox using three successively deeper arguments.
    First Resolution of the Paradox

    The problem asks us to maximize the expected wealth conditioned on the fact that we see Y in the first envelope. A strategy is good if it maximizes this conditional expected wealth. The paradox essentially claims that the always switch strategy does not maximize the unconditioned expected wealth. Thus one resolution of the paradox is that it simply addresses the wrong question. The paradox is as if I told you that eating a particular food makes me jump high, and you answered that I must be wrong because eating that food does not make me run fast. Clearly the two are issues are not the same.

    The astute reader may be unsatisfied with the first resolution. Specifically, the idea behind the paradox is not necessarily that conditional and unconditional expected wealth are the same, but that they are related. To show that there is no useful relationship between conditional and unconditional wealth we turn to the second resolutoin of the paradox.

    The Second Resolution of the Paradox

    The second reason that the unconditional expected wealth is not useful is because it is infinite. Specifically, any strategy one decides to adopt before the game begins has an expected payoff of infinity. This is because the money in the envelopes grows exponentially with the number of flips. The fact that the unconditional expected wealth is always infinite essentially means that unconditional expected wealth is a useless concept for this game. Any pair of strategies are equivalent as far as unconditional expected wealth is concerned. Thus saying that always switching and always staying are equivalent in unconditined expected wealth is trivial and does not suggest anything about how good always switching is at maximizing the condtional expected wealth.

    This is perhaps a somewhat subtle point, so in an attempt to clarify it, let me be a bit pedantic. The idea behind the paradox is that if A and B are any two strategies which have the same unconditional expected wealth, they must also have the same conditional expected wealth. To write this as a syllogism, let usame(A,B) be the claim that the stragies A and B have the same unconditioned expected wealth, and let csame(A,B) the claim that these two stratgies have the same conditional expected wealth. We can now restate the apparent paradox more precisely in this language as

  • For any strategies A and B, usame(A,B) implies csame(A,B).
  • Let SWITCH be the strategy "always switch".
  • Let STAY be the strategy "always stay".
  • The predicate usame(SWITCH,STAY) is true by symmetry.
  • Therefore, the claim csame(SWITCH,STAY) is true by combining the preceeding points.
  • This syllogism is incorrect. To see this note that step 4 above is true for any A and B since every pair of strategies has the same unconditional expected wealth of infinity. So if the above chain of reasoning is correct then every pair of strategies has the same conditional expected wealth. Clearly this conclusion is false because a strategy which keeps an envelope with 1 dollar is always worse than one which exchanges it for the envelope with 3 dollars. Thus the second resolution of the paradox is that there is absolutely no connection between uncondtional expected wealth and condtional expected wealth. Still, the reader may still be disturbed by the paradox so we offer a third analysis.

    A Third Resolution of the Paradox
    Let us modify the original game so that we always stop after T coin flips. This modification prevents infinities from entering the picture and signficantly affects the situation. Specifically, in the modified game the always switch strategy is no longer optimal. This is because the maximum possible amount of money in an envelope is 3T. Thus the optimal strategy is to keep an envelope if its value is 3T and otherwise exchange it.

    Rigorous derivation that always switching is best

    If you see Y=1 in your envelope you should obviously always switch since you can infer that N=1 and so the other envelope contains more money.

    Instead, if you see Y > 1 in your envelope, you can compute Pr(N|Y) as follows.

    Pr(N=n|Y=y) = Pr(Y=y and N=n)/Pr(Y=y) 
    by Bayes' law and therefore the above expression is only non-zero for n=log3y or n=log3(y/3). We can therefore evaulate Pr(N|Y) via
    Pr(N=log y|Y=y) = Pr(Y=y and N=log y)/Pr(Y=y)
       = Pr(Y=y|N=log y)*Pr(N=log y) / 
           [Pr(Y=y|N=log y/3)*Pr(N=log y/3) + Pr(Y=y|N=log y)*Pr(N=log y)] 
       = 1/2 * Pr(N=log y) / [ 1/2 * 2 * Pr(N=log y) + 1/2 * Pr(N=log y)] 
       = 1/3
    
    Thus Pr(N=log y/3|Y=y) = 2/3
    
    Thus if we keep the envelpe our conditional expected wealth is Y, while if we switch then our conditional expected wealth is
    E[Wealth|Y] = Pr(N=log Y/3|Y)*Y/3 + Pr(N=log Y|Y)*3Y
                = (2/9) * Y + Y
                = (11/9) Y.
    
    Thus the expected for switching exceeds the expected wealth for staying by 2/9 Y and so we should always switch.