Home > mailing lists

Re: tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: tweaking NTUP_PER_BUCKET
Date	July 3, 2014 18:50:42
Msg-id	53B5A5FA.4050705@fuzzy.cz Whole thread Raw
In response to	Re: tweaking NTUP_PER_BUCKET (Stephen Frost <sfrost@snowman.net>)
Responses	Re: tweaking NTUP_PER_BUCKET
List	pgsql-hackers

Tree view

Hi Stephen,

On 3.7.2014 20:10, Stephen Frost wrote:
> Tomas,
> 
> * Tomas Vondra (tv@fuzzy.cz) wrote:
>> However it's likely there are queries where this may not be the case,
>> i.e. where rebuilding the hash table is not worth it. Let me know if you
>> can construct such query (I wasn't).
> 
> Thanks for working on this! I've been thinking on this for a while
> and this seems like it may be a good approach. Have you considered a
> bloom filter over the buckets..? Also, I'd suggest you check the

I know you've experimented with it, but I haven't looked into that yet.

> archives from about this time last year for test cases that I was
> using which showed cases where hashing the larger table was a better
> choice- those same cases may also show regression here (or at least
> would be something good to test).

Good idea, I'll look at the test cases - thanks.

> Have you tried to work out what a 'worst case' regression for this 
> change would look like? Also, how does the planning around this
> change? Are we more likely now to hash the smaller table (I'd guess
> 'yes' just based on the reduction in NTUP_PER_BUCKET, but did you
> make any changes due to the rehashing cost?)?

The case I was thinking about is underestimated cardinality of the inner
table and a small outer table. That'd lead to a large hash table and
very few lookups (thus making the rehash inefficient). I.e. something
like this:
 Hash Join    Seq Scan on small_table (rows=100) (actual rows=100)    Hash       Seq Scan on bad_estimate (rows=100)
(actualrows=1000000000)           Filter: ((a < 100) AND (b < 100))

But I wasn't able to reproduce this reasonably, because in practice
that'd lead to a nested loop or something like that (which is a planning
issue, impossible to fix in hashjoin code).

Tomas

pgsql-hackers by date:

From: Atri Sharma
Date: 03 July 2014, 18:40:40
Subject: Re: tweaking NTUP_PER_BUCKET

From: Greg Stark
Date: 03 July 2014, 18:52:31
Subject: Re: tweaking NTUP_PER_BUCKET

Re: tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

Previous

Next