Tom, all,
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> So that's at least going in the right direction.
I agree that this is going in the right direction; it certainly would
make the plan that I *expect* to be chosen more likely, however..
I've been fiddling with this on the very much larger overall database
where this test case came from and have found that hashing the large
table can actually be *faster* and appears to cause a more consistent
and constant amount of disk i/o (which is good).
The test case exhibits a bit of why this is the case- the per-tuple hash
lookup is way closer to the per-tuple cost of building the hash table
than I'd expect.
per-tuple cost to build the hash table (41M tuples): 0.33us
per-tuple cost to scan/do hash lookups (41M tuples): 0.29us (with a small hash table of only 41K entries)
The total difference being: 1233.854 vs. 1428.424, or only 194.570ms in
favor of scanning the big table instead of hashing it.
These numbers are just from those posted with my original email:
http://explain.depesz.com/s/FEq
http://explain.depesz.com/s/FOU
I've seen much worse though- I have one case where hash-the-big-table
took 5s and hash-the-small-table took almost 10s (total times). I'm
trying to see if I can pull that out and isolate how it's different (and
see if it was just due to other load on the box).
What I'm trying to get at in this overall email is: why in the world is
it so expensive to do hash lookups? I would have expected the cost of
the hash table to be *much* more than the cost to do a hash lookup, and
that doing hash lookups against a small hash table would be fast enough
to put serious pressure on the i/o subsystem. Instead, the building of
the hash table actually puts more pressure and can end up being more
efficient overall. We have a process that basically does this a whole
bunch and the "hash-the-big-table" operation takes about 4.7 hrs, while
the "hash-the-small-table" approach went well past 5 hours and was only
about 70% complete.
Thoughts?
Thanks,
Stephen