On Mon, Jun 9, 2014 at 11:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Keep in mind that that standard advice is meant for all-in-memory cases,
> not for cases where the alternative to running with longer hash chains
> is dumping tuples out to disk and reading them back.
Sure, but that doesn't help someone who sets work_mem to some very
large value precisely to ensure that the hash join will be done in
memory. They still don't get the benefit of a smaller NTUP_PER_BUCKET
setting.
> I'm quite prepared to believe that we should change NTUP_PER_BUCKET ...
> but appealing to standard advice isn't a good basis for arguing that.
> Actual performance measurements (in both batched and unbatched cases)
> would be a suitable basis for proposing a change.
Well, it's all in what scenario you test, right? If you test the case
where something overflows work_mem as a result of the increased size
of the bucket array, it's always going to suck. And if you test the
case where that doesn't happen, it's likely to win. I think Stephen
Frost has already done quite a bit of testing in this area, on
previous threads. But there's no one-size-fits-all solution.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company