Home > mailing lists

Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
Date	June 9, 2014 18:01:53
Msg-id	CA+TgmobBm2TeY5dSe0UTMRCuOiYmJ5E17EdTU+YMmsaUMEvprg@mail.gmail.com Whole thread Raw
In response to	why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller (b8flowerfire <b8flowerfire@gmail.com>)
Responses	Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller (Tom Lane <tgl@sss.pgh.pa.us>) Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller (b8flowerfire <b8flowerfire@gmail.com>)
List	pgsql-hackers

Tree view

On Mon, Jun 9, 2014 at 4:06 AM, b8flowerfire <b8flowerfire@gmail.com> wrote:
> When I read the source code about the hashjoin, I was very confused that the
> postgresql define the NTUP_PER_BUCKET value as 10.
> Since this value is used to estimate the tuple count in one bucket, is it
> better if we have a smaller value?
> I have not done some experiments, but it seems that we could archive less
> hash collisions and better performance if we decrease the value.
> So could anyone explain to me that why we define NTUP_PER_BUCKET as 10?
> If there exists a specified situation that we would get worse performance or
> some troubles if set NTUP_PER_BUCKET to 1 or 2?

This has come up before.  Basically, the problem is that if you reduce
NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the
amount of space available for tuples to the point where the hash join
overflows to multiple batches.  That will be more expensive than
performing the hash join with a higher NTUP_PER_BUCKET.

But having said that, I think the current situation is pretty bad,
too.  NTUP_PER_BUCKET is basically the anticipated load factor for the
hash table, and everything I've ever read about hash table design says
you want that to be something like 1, or 0.25.  So 10 seems really
high.  But I'm not sure exactly what to do to fix the problem, because
there are indeed cases where we will be worse off if we just lower the
value categorically.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Kevin Grittner
Date: 09 June 2014, 18:00:59
Subject: Re: Inaccuracy in VACUUM's tuple count estimates

From: Magnus Hagander
Date: 09 June 2014, 18:03:17
Subject: Re: Supporting Windows SChannel as OpenSSL replacement

Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller - Mailing list pgsql-hackers

Previous

Next