Re: Column correlation drifts, index ignored again - Mailing list pgsql-performance

From Kevin Brown
Subject Re: Column correlation drifts, index ignored again
Date
Msg-id 20040224035601.GC3090@filer
Whole thread Raw
In response to Re: Column correlation drifts, index ignored again  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Column correlation drifts, index ignored again
List pgsql-performance
Josh Berkus wrote:
> John,
>
> > I think the key is to get the planner to correctly ballpark the number of
> > rows in the date range.  If it does, I can't imagine it ever deciding to
> > read 1,000,000 rows instead of 1,000 with any sane "cost" setting.  I'm
> > assuming the defaults are sane :)
>
> The default for random_page_cost is sane, but very conservative; it's pretty
> much assuming tables that are bigger than RAM and a single IDE disk.   If
> your setup is better than that, you can lower it.
>
> For example, in the ideal case (database fits in RAM, fast RAM, CPU, and
> random seek on the disk), you can lower it to 1.5.    For less ideal
> situations, 1.8 to 2.5 is reasonable on high-end hardware.

I suspect this ultimately depends on the types of queries you do, the
size of the tables involved, disk cache, etc.

For instance, if you don't have sort_mem set high enough, then things
like large hash joins will spill to disk and almost certainly cause a
lot of contention (random access patterns) even if a sequential scan is
being used to read the table data.  The fix there is, of course, to
increase sort_mem if possible (as long as you don't cause paging during
the operation, which will also slow things down), but you might not
really have that option -- in which case you might see some improvement
by tweaking random_page_cost.

On a system where the PG data is stored on a disk that does other things,
you'll actually want random_page_cost to be *closer* to 1 rather than
further away.  The reason is that the average access time of a sequential
page in that case is much closer to that of a random page than it would
be if the disk in question were dedicated to PG duty.  This also goes for
large RAID setups where multiple types of data (e.g., home directories,
log files, etc.) are stored along with the PG data -- such disk setups
will have more random activity happening on the disk while PG activity
is happening, thus making the PG sequential access pattern appear more
like random access.


The best way I can think of to tune random_page_cost is to do EXPLAIN
ANALYZE on the queries you want to optimize the most under the
circumstances the queries are most likely to be run, then do the same
with enable_seqscan off.  Then look at the ratio of predicted and actual
times for the scans themselves.  Once you've done that, you can tweak
random_page_cost up or down and do further EXPLAINs (with enable_seqscan
off and without ANALYZE) until the ratio of the estimated index scan time
to the actual index scan time of the same query (gotten previously via
EXPLAIN ANALYZE) is the same as the ratio of the estimated sequential
scan time (which won't change based on random_page_cost) to the actual
sequential scan time.

So:

1.  set enable_seqscan = on
2.  set random_page_cost = <some really high value to force seqscans>
3.  EXPLAIN ANALYZE query
4.  record the ratio of estimated to actual scan times.
5.  set enable_seqscan = off
6.  set random_page_cost = <rough estimate of what it should be>
7.  EXPLAIN ANALYZE query
8.  record the actual index scan time(s)
9.  tweak random_page_cost
10.  EXPLAIN query
11.  If ratio of estimate to actual (recorded in step 8) is much
     different than that recorded in step 4, then go back to step 9.
     Reduce random_page_cost if the random ratio is larger than the
     sequential ratio, increase if it's smaller.


As a result, I ended up setting my random_page_cost to 1.5 on my system.
I suspect that the amount of pain you'll suffer when the planner
incorrectly chooses a sequential scan is much greater on average than
the amount of pain if it incorrectly chooses an index scan, so I'd tend
to favor erring on the low side for random_page_cost.


I'll know tomorrow whether or not my tweaking worked properly, as I have
a job that kicks off every night that scans the entire filesystem and
stores all the inode information about every file in a newly-created table,
then "merges" it into the existing file information table.  Each table
is about 2.5 million rows...


--
Kevin Brown                          kevin@sysexperts.com

pgsql-performance by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: [PERFORMANCE] slow small delete on large table
Next
From: Christopher Browne
Date:
Subject: Re: [PERFORMANCE] slow small delete on large table