Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller - Mailing list pgsql-hackers

From b8flowerfire
Subject Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller
Date
Msg-id 1402377188082-5806617.post@n5.nabble.com
Whole thread Raw
In response to Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Re: why postgresql define NTUP_PER_BUCKET as 10, not other numbers smaller  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert Haas wrote
> On Mon, Jun 9, 2014 at 4:06 AM, b8flowerfire <

> b8flowerfire@

> > wrote:
> 
> This has come up before.  Basically, the problem is that if you reduce
> NTUP_PER_BUCKET, the bucket array gets larger, which might reduce the
> amount of space available for tuples to the point where the hash join
> overflows to multiple batches.  That will be more expensive than
> performing the hash join with a higher NTUP_PER_BUCKET.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Thanks for the explanation. But i don't think it is very convincible.
Simply reduce the value of NTUP_PER_BUCKET will enlarge the pointer array
and reduce the tuples in one batch. But is that effect significant to the
performance?
The utilization of the work_mem, i think, is determined by the ratio of size
of the pointer and the size of the tuple.
Let's assume the size of tuple is 28 bytes, which is very reasonable because
it's the sum of the size of HJTUPLE_OVERHEAD(at least 8 bytes), the size of
MinimalTupleData(at least 10 bytes) and the content of a tuple(assume 10
bytes). And the size of pointer is 4 bytes.
So, if NTUP_PER_BUCKET is set to 10, about (28 * 10 / 28 * 10 + 4) of the
work_mem is used to store tuples. If NTUP_PER_BUCKET is set to 1, about (28
/ 28 + 4) of the work_mem is used to store tuples, reduced to 90% of the
original.
As a result, changing the value of NTUP_PER_BUCKET to 1 may increase the
batches number by only about 10%. So it that enough to effect the
performance? Or maybe i can not do the calculation simply in this way.

Besides, we have larger main-memory now. If we set the work_mem larger, the
more batches effect introduced by the smaller NTUP_PER_BUCKET value may be
reduced, couldn't it?

I have read about discussion about the NTUP_PER_BUCKET before. It seems that
if we change NTUP_PER_BUCKET to 50 or even larger, the performance wouldn't
be much worse. Because every tuple in the chain of a bucket has a hash
value. Having more tuples in a bucket simply increase some comparisons of
two integers. So is it the same if we change it smaller, that we could not
get much better? Is it one of the reasons that we define it as 10?

After all, it is my first time to discuss in a open source community. Thank
you all for the reply and the discussion. Thanks.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/why-postgresql-define-NTUP-PER-BUCKET-as-10-not-other-numbers-smaller-tp5806472p5806617.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [bug fix] Memory leak in dblink
Next
From:
Date:
Subject: Re: pg_receivexlog add synchronous mode