Re: Column correlation drifts, index ignored again - Mailing list pgsql-performance
From | Kevin Brown |
---|---|
Subject | Re: Column correlation drifts, index ignored again |
Date | |
Msg-id | 20040224035601.GC3090@filer Whole thread Raw |
In response to | Re: Column correlation drifts, index ignored again (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Column correlation drifts, index ignored again
|
List | pgsql-performance |
Josh Berkus wrote: > John, > > > I think the key is to get the planner to correctly ballpark the number of > > rows in the date range. If it does, I can't imagine it ever deciding to > > read 1,000,000 rows instead of 1,000 with any sane "cost" setting. I'm > > assuming the defaults are sane :) > > The default for random_page_cost is sane, but very conservative; it's pretty > much assuming tables that are bigger than RAM and a single IDE disk. If > your setup is better than that, you can lower it. > > For example, in the ideal case (database fits in RAM, fast RAM, CPU, and > random seek on the disk), you can lower it to 1.5. For less ideal > situations, 1.8 to 2.5 is reasonable on high-end hardware. I suspect this ultimately depends on the types of queries you do, the size of the tables involved, disk cache, etc. For instance, if you don't have sort_mem set high enough, then things like large hash joins will spill to disk and almost certainly cause a lot of contention (random access patterns) even if a sequential scan is being used to read the table data. The fix there is, of course, to increase sort_mem if possible (as long as you don't cause paging during the operation, which will also slow things down), but you might not really have that option -- in which case you might see some improvement by tweaking random_page_cost. On a system where the PG data is stored on a disk that does other things, you'll actually want random_page_cost to be *closer* to 1 rather than further away. The reason is that the average access time of a sequential page in that case is much closer to that of a random page than it would be if the disk in question were dedicated to PG duty. This also goes for large RAID setups where multiple types of data (e.g., home directories, log files, etc.) are stored along with the PG data -- such disk setups will have more random activity happening on the disk while PG activity is happening, thus making the PG sequential access pattern appear more like random access. The best way I can think of to tune random_page_cost is to do EXPLAIN ANALYZE on the queries you want to optimize the most under the circumstances the queries are most likely to be run, then do the same with enable_seqscan off. Then look at the ratio of predicted and actual times for the scans themselves. Once you've done that, you can tweak random_page_cost up or down and do further EXPLAINs (with enable_seqscan off and without ANALYZE) until the ratio of the estimated index scan time to the actual index scan time of the same query (gotten previously via EXPLAIN ANALYZE) is the same as the ratio of the estimated sequential scan time (which won't change based on random_page_cost) to the actual sequential scan time. So: 1. set enable_seqscan = on 2. set random_page_cost = <some really high value to force seqscans> 3. EXPLAIN ANALYZE query 4. record the ratio of estimated to actual scan times. 5. set enable_seqscan = off 6. set random_page_cost = <rough estimate of what it should be> 7. EXPLAIN ANALYZE query 8. record the actual index scan time(s) 9. tweak random_page_cost 10. EXPLAIN query 11. If ratio of estimate to actual (recorded in step 8) is much different than that recorded in step 4, then go back to step 9. Reduce random_page_cost if the random ratio is larger than the sequential ratio, increase if it's smaller. As a result, I ended up setting my random_page_cost to 1.5 on my system. I suspect that the amount of pain you'll suffer when the planner incorrectly chooses a sequential scan is much greater on average than the amount of pain if it incorrectly chooses an index scan, so I'd tend to favor erring on the low side for random_page_cost. I'll know tomorrow whether or not my tweaking worked properly, as I have a job that kicks off every night that scans the entire filesystem and stores all the inode information about every file in a newly-created table, then "merges" it into the existing file information table. Each table is about 2.5 million rows... -- Kevin Brown kevin@sysexperts.com
pgsql-performance by date: