Hi,
Thinking about this a bit more, do we really need to build the hash
table on the first pass? Why not to do this:
(1) batching - read the tuples, stuff them into a simple list - don't build the hash table yet
(2) building the hash table - we have all the tuples in a simple list, batching is done - we know exact row count,
cansize the table properly - build the table
Also, maybe we could use a regular linear hash table [1], instead of
using the current implementation with NTUP_PER_BUCKET=1. (Although,
that'd be absolutely awful with duplicates.)
regards
Tomas
[1] http://en.wikipedia.org/wiki/Linear_probing