I can see that that's equal to the formula given in the paper, log_B(M/4B), but I couldn't see any explanation for that formula in the paper. Your explanation makes sense, but where did it come from?
I didn't find it too. But it has to reservse memory for both sub-tree and active buffers. While we'are reserving memory for sub-tree in effective_cache_size and memory for last pages of buffers in maintenance_work_mem.
It seems a bit pessimistic: while it's true that the a subtree can't be larger than 2 * maxIndexTuplesPerPage ^ levelStep, you can put a tighter upper bound on it. The max size of a subtree of depth n can be calculated as the geometric series:
where r = maxIndexTuplesPerPage. For r=2 those formulas are equal, but for a large r and small n (which is typical), the 2 * maxIndexTuplesPerPage^levelStep formula gives a value that's almost twice as large as the real max size of a subtree.