Thread: Optimizing joins

Optimizing joins

From
"Eric Peters"
Date:
I have some simple joins that are taking FOREVER to run - on a couple tables with only maybe a hundred thousand rows.  I've done vacuum analyze on the tables but they still seem to take a lot of time.  Where should I go first for optimizing my postgres server to handle this better?  Are there memory options to increase a buffer in postgres, or are there specific ways to optimize the query itself?
 
Thanks,
 
Eric

Re: Optimizing joins

From
Scott Marlowe
Date:
On Mon, 6 May 2002, Eric Peters wrote:

> I have some simple joins that are taking FOREVER to run - on a couple
> tables with only maybe a hundred thousand rows.  I've done vacuum
> analyze on the tables but they still seem to take a lot of time.  Where
> should I go first for optimizing my postgres server to handle this
> better?  Are there memory options to increase a buffer in postgres, or
> are there specific ways to optimize the query itself?

Two things:

1: use explain to see what's taking so long.
2: create indexes on the fields being used to join the tables.

If both of those are done, and the indexes are being used, then you can
increase memory buffers, but your kernel has to know how to hand out
more shared memory to postgresql or postgresql will fail to start.

If the indexes are NOT being used after an analyze, then you can alter the
numbers the postgresql server uses when deciding on the "cost" of
different options.

random_page_cost
cpu_tuple_cost
cpu_index_tuple_cost
cpu_operator_cost

are the options.  Use show <setting> to see what it's set to, set
<setting>=number to set them.

Turning down the cpu_index_tuple_cost and the random_page_cost favors
indexes, turning down cpu_operator_cost favors sequential scans.

If explain shows your planner using a hashjoin, try

set enable_hashjoin = off

and see if it runs faster.  Near as I can tell, hash joins are just plain
slow on postgresql compared to almost any other option.