Why is there ever a buffer on the root node? It seems like a waste of time to load N tuples from the heap into the root buffer, only to empty the buffer after it fills up. You might as well pull tuples directly from the heap.
Yes, seems reasonable. Buffer on the root node was in the paper. But now I don't see the need of it too.