Home > mailing lists

Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller - Mailing list pgsql-hackers

From	b8flowerfire
Subject	Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
Date	June 10, 2014 08:13:13
Msg-id	1402377188082-5806617.post@n5.nabble.com Whole thread Raw
In response to	Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

Robert Haas wrote
> On Mon, Jun 9, 2014 at 4:06 AM, b8flowerfire <

> b8flowerfire@

> > wrote:
> 
> This has come up before.  Basically, the problem is that if you reduce
> NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the
> amount of space available for tuples to the point where the hash join
> overflows to multiple batches.  That will be more expensive than
> performing the hash join with a higher NTUP_PER_BUCKET.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Thanks for the explanation. But i don't think it is very convincible.
Simply reduce the value of NTUP_PER_BUCKET will enlarge the pointer array
and reduce the tuples in one batch. But is that effect significant to the
performance?
The utilization of the work_mem, i think, is determined by the ratio of size
of the pointer and the size of the tuple.
Let's assume the size of tuple is 28 bytes, which is very reasonable because
it's the sum of the size of HJTUPLE_OVERHEAD(at least 8 bytes), the size of
MinimalTupleData(at least 10 bytes) and the content of a tuple(assume 10
bytes). And the size of pointer is 4 bytes.
So, if NTUP_PER_BUCKET is set to 10, about (28 * 10 / 28 * 10 + 4) of the
work_mem is used to store tuples. If NTUP_PER_BUCKET is set to 1, about (28
/ 28 + 4) of the work_mem is used to store tuples, reduced to 90% of the
original.
As a result, changing the value of NTUP_PER_BUCKET to 1 may increase the
batches number by only about 10%. So it that enough to effect the
performance? Or maybe i can not do the calculation simply in this way.

Besides, we have larger main-memory now. If we set the work_mem larger, the
more batches effect introduced by the smaller NTUP_PER_BUCKET value may be
reduced, couldn't it?

I have read about discussion about the NTUP_PER_BUCKET before. It seems that
if we change NTUP_PER_BUCKET to 50 or even larger, the performance wouldn't
be much worse. Because every tuple in the chain of a bucket has a hash
value. Having more tuples in a bucket simply increase some comparisons of
two integers. So is it the same if we change it smaller, that we could not
get much better? Is it one of the reasons that we define it as 10?

After all, it is my first time to discuss in a open source community. Thank
you all for the reply and the discussion. Thanks.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/why-postgresql-define-NTUP-PER-BUCKET-as-10-not-other-numbers-smaller-tp5806472p5806617.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

pgsql-hackers by date:

From: Amit Kapila
Date: 10 June 2014, 07:27:46
Subject: Re: [bug fix] Memory leak in dblink

From:
Date: 10 June 2014, 11:03:12
Subject: Re: pg_receivexlog add synchronous mode

Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller - Mailing list pgsql-hackers

Previous

Next