Re: A better way than tweaking NTUP_PER_BUCKET - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: A better way than tweaking NTUP_PER_BUCKET
Date
Msg-id 20140125223346.GT31026@tamriel.snowman.net
Whole thread Raw
In response to Re: A better way than tweaking NTUP_PER_BUCKET  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: A better way than tweaking NTUP_PER_BUCKET  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > In the end, I believe we absolutely should do something about this.
> > Hashing a 64M-row table (requiring upwards of 8G) instead of hashing
> > a 2M-row table is really bad of us.
>
> Huh?  I don't see anything in the thread suggesting that we're doing that,
> or that changing NTUP_PER_BUCKET would fix it if we do.  Are you thinking
> of some other discussion?

This thread sprung from or was at least related to, as I recall, my
issues with NTUP_PER_BUCKET over the summer.  Perhaps I'm wrong, but
here's the thread I was referring to:

http://www.postgresql.org/message-id/20130404201612.GM4361@tamriel.snowman.net

Where I demonstrated that we decide to hash a much larger table, rather
than the smaller one, based on the estimated depth of the buckets and
including the costing from that, which is driven based on how big we
decide to make the hash table where we use NTUP_PER_BUCKET to pick a
table size much smaller than we really should be.

> AFAICT, there was no consensus in this thread on what to do, which
> probably has something to do with the lack of concrete performance
> tests presented to back up any particular proposal.

This I entirely agree with- more testing and more information on how
such a change impacts other workloads would be great.  Unfortunately,
while I've provided a couple of test cases and seen similar situations
on IRC, this is very data-dependent which makes it difficult to have
concrete answers for every workload.

Still, I'll try and spend some time w/ pg_bench's schema definition and
writing up some larger queries to run through it (aiui, the default set
of queries won't typically result in a hashjoin) and see what happens
there.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Marco Atzeri
Date:
Subject: Re: Postgresql for cygwin - 3rd
Next
From: Andrew Dunstan
Date:
Subject: Re: What is happening on buildfarm member crake?