Re: A better way than tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: A better way than tweaking NTUP_PER_BUCKET
Date
Msg-id CA+U5nM+GEgqdcjDppezY1vzQtY9=HxM=At0LhOA1NvCynjdr2w@mail.gmail.com
Whole thread Raw
In response to Re: A better way than tweaking NTUP_PER_BUCKET  (Stephen Frost <sfrost@snowman.net>)
Responses Re: A better way than tweaking NTUP_PER_BUCKET  (Stephen Frost <sfrost@snowman.net>)
Re: A better way than tweaking NTUP_PER_BUCKET  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On 22 June 2013 14:48, Stephen Frost <sfrost@snowman.net> wrote:

> Based on your argument that we want to have a bucket load which is, on
> average, the size of NTUP_PER_BUCKET, I have to '-1' on this.

That is not my argument. I am pointing out that the comments claim the
code does that, and yet the code does not.

So either we apply the patch to make the code fit the comment, or we
change the comment.

Tom's wish for an average tuples per bucket of 10 may be connected to
that error; I can't say. But we definitely don't end up with an
average of 10 in many cases, which is the basis for previous poor
performance reports.

> What we
> want is to size a table large enough that we never have any true
> collisions (cases where different 32-bit hash values end up in the same
> bucket) *for all tuples being hashed*, that includes both the side
> building the hash table and the side doing the scan.  This should be
> done when we can avoid batching- if we don't have enough to avoid
> batching while having such a large table, we should consider adjusting
> the hash table size down some because, in those cases, we're memory
> constrained.
>
> When we have enough memory to avoid batching, we never want to have to
> check down through a bucket for a tuple which doesn't exist- we should
> simply be able to look in the hash table itself, determine (with a
> single fetch) that there are no entries for that hash value, and throw
> the tuple away and move on to the next.

The bottom line is that you're hoping for perfection. If we knew
exactly how many tuples were in the hash table then we could size it
precisely for the hash join we are about to perform. We don't know
that, so we can't size it precisely. Worse than that is that we know
we badly estimate the number of buckets on a regular basis.

That gives us three choices: if we have a hash table fixed without
prior knowledge then we either (0) we continue to underallocate them
in many cases or (1) potentially overallocate buckets. Or (2) we learn
to cope better with the variation in the number of entries. If we rule
out the first two we have only the last one remaining.

(1) will always be a heuristic, and as you point out could itself be
an extreme setting.

So I think that (2) is the best route: Given that we know with much
better certainty the number of rows in the scanned-relation, we should
be able to examine our hash table after it has been built and decide
whether it would be cheaper to rebuild the hash table with the right
number of buckets, or continue processing with what we have now. Which
is roughly what Heikki proposed already, in January.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Next
From: Fabien COELHO
Date:
Subject: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)