Re: Why hash join instead of nested loop? - Mailing list pgsql-performance

From Rhett Garber
Subject Re: Why hash join instead of nested loop?
Date
Msg-id 41b0fe8905080912113f381f84@mail.gmail.com
Whole thread Raw
In response to Re: Why hash join instead of nested loop?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Bingo, the smaller the sort_mem, the faster that query is.

Thanks a lot to everybody that helped, i'll tweak with these values
more when I get a chance now that I have some guidelines that make
sense.

Rhett

On 8/9/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Rhett Garber <rhettg@gmail.com> writes:
> > Well that could be an issue, is this abnormally large:
> > #shared_buffers = 1536          # min 16, at least max_connections*2, 8KB each
> > shared_buffers = 206440
> > #sort_mem = 131072              # min 64, size in KB
> > sort_mem = 524288               # min 64, size in KB
> > vacuum_mem = 131072             # min 1024, size in K
>
> The vacuum_mem number is OK I think, but both of the others seem
> unreasonably large.  Conventional wisdom about shared_buffers is that
> the sweet spot is maybe 10000 or so buffers, rarely more than 50000.
> (Particularly in pre-8.0 releases, there are code paths that grovel
> through all the buffers linearly, so there is a significant cost to
> making it too large.)  Don't worry about it being too small to make
> effective use of RAM --- we rely on the kernel's disk cache to do that.
>
> sort_mem is *per sort*, and so half a gig in a machine with only a
> couple of gig is far too much except when you know you have only one
> query running.  A couple dozen backends each trying to use half a gig
> will drive you into the ground in no time.  Conventional wisdom here
> is that the global setting should be conservatively small (perhaps
> 10Mb to 100Mb depending on how many concurrent backends you expect to
> have), and then you can explicitly increase it locally with SET for
> specific queries that need it.
>
> In terms of the problem at hand, try the test case with a few different
> values of sort_mem (use SET to adjust it, you don't need to touch the
> config file) and see what happens.  I think the cost you're seeing is
> just startup overhead to zero a hash table of a few hundred meg ...
>
>                         regards, tom lane
>

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Why hash join instead of nested loop?
Next
From: Steve Poe
Date:
Subject: Re: Table locking problems?