Home > mailing lists

Re: Why hash join instead of nested loop? - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Why hash join instead of nested loop?
Date	August 9, 2005 19:02:09
Msg-id	18143.1123614059@sss.pgh.pa.us Whole thread Raw
In response to	Re: Why hash join instead of nested loop? (Rhett Garber <rhettg@gmail.com>)
Responses	Re: Why hash join instead of nested loop?
List	pgsql-performance

Tree view

Rhett Garber <rhettg@gmail.com> writes:
> Well that could be an issue, is this abnormally large:
> #shared_buffers = 1536          # min 16, at least max_connections*2, 8KB each
> shared_buffers = 206440
> #sort_mem = 131072              # min 64, size in KB
> sort_mem = 524288               # min 64, size in KB
> vacuum_mem = 131072             # min 1024, size in K

The vacuum_mem number is OK I think, but both of the others seem
unreasonably large.  Conventional wisdom about shared_buffers is that
the sweet spot is maybe 10000 or so buffers, rarely more than 50000.
(Particularly in pre-8.0 releases, there are code paths that grovel
through all the buffers linearly, so there is a significant cost to
making it too large.)  Don't worry about it being too small to make
effective use of RAM --- we rely on the kernel's disk cache to do that.

sort_mem is *per sort*, and so half a gig in a machine with only a
couple of gig is far too much except when you know you have only one
query running.  A couple dozen backends each trying to use half a gig
will drive you into the ground in no time.  Conventional wisdom here
is that the global setting should be conservatively small (perhaps
10Mb to 100Mb depending on how many concurrent backends you expect to
have), and then you can explicitly increase it locally with SET for
specific queries that need it.

In terms of the problem at hand, try the test case with a few different
values of sort_mem (use SET to adjust it, you don't need to touch the
config file) and see what happens.  I think the cost you're seeing is
just startup overhead to zero a hash table of a few hundred meg ...

            regards, tom lane

pgsql-performance by date:

From: Michael Fuhr
Date: 09 August 2005, 18:54:40
Subject: Re: Table locking problems?

From: Rhett Garber
Date: 09 August 2005, 19:13:12
Subject: Re: Why hash join instead of nested loop?

Re: Why hash join instead of nested loop? - Mailing list pgsql-performance

Previous

Next