Home > mailing lists

Re: A better way than tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: A better way than tweaking NTUP_PER_BUCKET
Date	June 22, 2013 19:19:56
Msg-id	CA+U5nM+GEgqdcjDppezY1vzQtY9=HxM=At0LhOA1NvCynjdr2w@mail.gmail.com Whole thread Raw
In response to	Re: A better way than tweaking NTUP_PER_BUCKET (Stephen Frost <sfrost@snowman.net>)
Responses	Re: A better way than tweaking NTUP_PER_BUCKET (Stephen Frost <sfrost@snowman.net>) Re: A better way than tweaking NTUP_PER_BUCKET (Heikki Linnakangas <hlinnakangas@vmware.com>)
List	pgsql-hackers

Tree view

On 22 June 2013 14:48, Stephen Frost <sfrost@snowman.net> wrote:

> Based on your argument that we want to have a bucket load which is, on
> average, the size of NTUP_PER_BUCKET, I have to '-1' on this.

That is not my argument. I am pointing out that the comments claim the
code does that, and yet the code does not.

So either we apply the patch to make the code fit the comment, or we
change the comment.

Tom's wish for an average tuples per bucket of 10 may be connected to
that error; I can't say. But we definitely don't end up with an
average of 10 in many cases, which is the basis for previous poor
performance reports.

> What we
> want is to size a table large enough that we never have any true
> collisions (cases where different 32-bit hash values end up in the same
> bucket) *for all tuples being hashed*, that includes both the side
> building the hash table and the side doing the scan.  This should be
> done when we can avoid batching- if we don't have enough to avoid
> batching while having such a large table, we should consider adjusting
> the hash table size down some because, in those cases, we're memory
> constrained.
>
> When we have enough memory to avoid batching, we never want to have to
> check down through a bucket for a tuple which doesn't exist- we should
> simply be able to look in the hash table itself, determine (with a
> single fetch) that there are no entries for that hash value, and throw
> the tuple away and move on to the next.

The bottom line is that you're hoping for perfection. If we knew
exactly how many tuples were in the hash table then we could size it
precisely for the hash join we are about to perform. We don't know
that, so we can't size it precisely. Worse than that is that we know
we badly estimate the number of buckets on a regular basis.

That gives us three choices: if we have a hash table fixed without
prior knowledge then we either (0) we continue to underallocate them
in many cases or (1) potentially overallocate buckets. Or (2) we learn
to cope better with the variation in the number of entries. If we rule
out the first two we have only the last one remaining.

(1) will always be a heuristic, and as you point out could itself be
an extreme setting.

So I think that (2) is the best route: Given that we know with much
better certainty the number of rows in the scanned-relation, we should
be able to examine our hash table after it has been built and decide
whether it would be cheaper to rebuild the hash table with the right
number of buckets, or continue processing with what we have now. Which
is roughly what Heikki proposed already, in January.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Bruce Momjian
Date: 22 June 2013, 18:03:57
Subject: Re: Hard limit on WAL space used (because PANIC sucks)

From: Fabien COELHO
Date: 22 June 2013, 19:54:49
Subject: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)

Re: A better way than tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

Previous

Next