Home > mailing lists

Re: A better way than tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

From	Stephen Frost
Subject	Re: A better way than tweaking NTUP_PER_BUCKET
Date	January 26, 2014 01:33:17
Msg-id	20140125223346.GT31026@tamriel.snowman.net Whole thread Raw
In response to	Re: A better way than tweaking NTUP_PER_BUCKET (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: A better way than tweaking NTUP_PER_BUCKET
List	pgsql-hackers

Tree view

Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > In the end, I believe we absolutely should do something about this.
> > Hashing a 64M-row table (requiring upwards of 8G) instead of hashing
> > a 2M-row table is really bad of us.
>
> Huh?  I don't see anything in the thread suggesting that we're doing that,
> or that changing NTUP_PER_BUCKET would fix it if we do.  Are you thinking
> of some other discussion?

This thread sprung from or was at least related to, as I recall, my
issues with NTUP_PER_BUCKET over the summer.  Perhaps I'm wrong, but
here's the thread I was referring to:

http://www.postgresql.org/message-id/20130404201612.GM4361@tamriel.snowman.net

Where I demonstrated that we decide to hash a much larger table, rather
than the smaller one, based on the estimated depth of the buckets and
including the costing from that, which is driven based on how big we
decide to make the hash table where we use NTUP_PER_BUCKET to pick a
table size much smaller than we really should be.

> AFAICT, there was no consensus in this thread on what to do, which
> probably has something to do with the lack of concrete performance
> tests presented to back up any particular proposal.

This I entirely agree with- more testing and more information on how
such a change impacts other workloads would be great.  Unfortunately,
while I've provided a couple of test cases and seen similar situations
on IRC, this is very data-dependent which makes it difficult to have
concrete answers for every workload.

Still, I'll try and spend some time w/ pg_bench's schema definition and
writing up some larger queries to run through it (aiui, the default set
of queries won't typically result in a hashjoin) and see what happens
there.
Thanks,
    Stephen

pgsql-hackers by date:

From: Marco Atzeri
Date: 26 January 2014, 01:33:14
Subject: Re: Postgresql for cygwin - 3rd

From: Andrew Dunstan
Date: 26 January 2014, 01:35:13
Subject: Re: What is happening on buildfarm member crake?

Re: A better way than tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

Previous

Next