Re: Column correlation drifts, index ignored again - Mailing list pgsql-performance
From | Kevin Brown |
---|---|
Subject | Re: Column correlation drifts, index ignored again |
Date | |
Msg-id | 20040224201419.GD3090@filer Whole thread Raw |
In response to | Re: Column correlation drifts, index ignored again (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Column correlation drifts, index ignored again
Re: Column correlation drifts, index ignored again |
List | pgsql-performance |
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > > Kevin, > >> 1. set enable_seqscan = on > >> 2. set random_page_cost = <some really high value to force seqscans> > >> 3. EXPLAIN ANALYZE query > >> 4. record the ratio of estimated to actual scan times. > >> 5. set enable_seqscan = off > >> 6. set random_page_cost = <rough estimate of what it should be> > >> 7. EXPLAIN ANALYZE query > >> 8. record the actual index scan time(s) > >> 9. tweak random_page_cost > >> 10. EXPLAIN query > >> 11. If ratio of estimate to actual (recorded in step 8) is much > >> different than that recorded in step 4, then go back to step 9. > >> Reduce random_page_cost if the random ratio is larger than the > >> sequential ratio, increase if it's smaller. > > > Nice, we ought to post that somewhere people can find it in the future. > > If we post it as recommended procedure we had better put big caveat > notices on it. The pitfalls with doing this are: > > 1. If you repeat the sequence exactly as given, you will be homing in on > a RANDOM_PAGE_COST that describes your system's behavior with a fully > cached query. It is to be expected that you will end up with 1.0 or > something very close to it. The only way to avoid that is to use a > query that is large enough to blow out your kernel's RAM cache; which of > course will take long enough that iterating step 10 will be no fun, > and people will be mighty tempted to take shortcuts. Oops. You're right. I did this on my system, but forgot to put it in the list of things to do: 0. Fill the page cache with something other than PG data, e.g. by repeatedly catting several large files and redirecting the output to /dev/null. The sum total size of the files should exceed the amount of memory on the system. The reason you might not have to do this between EXPLAIN ANALYZE queries is that the first query will scan the table itself while the second one will scan the index. But that was probably more specific to the query I was doing. If the one you're doing is complex enough the system may have to read data pages from the table itself after fetching the index page, in which case you'll want to fill the page cache between the queries. > 2. Of course, you are computing a RANDOM_PAGE_COST that is relevant to > just this single query. Prudence would suggest repeating the process > with several different queries and taking some sort of average. Right. And the average should probably be weighted based on the relative frequency that the query in question will be executed. In my case, the query I was experimenting with was by far the biggest query that occurs on my system (though it turns out that there are others in that same process that I should look at as well). > When I did the experiments that led up to choosing 4.0 as the default, > some years ago, it took several days of thrashing the disks on a couple > of different machines before I had numbers that I didn't think were > mostly noise :-(. I am *real* suspicious of any replacement numbers > that have been derived in just a few minutes. One problem I've been running into is the merge join spilling to disk because sort_mem isn't big enough. The problem isn't that this is happening, it's that I think the planner is underestimating the impact that doing this will have on the time the merge join takes. Does the planner even account for the possibility that a sort or join will spill to disk? Spilling to disk like that will suddenly cause sequential reads to perform much more like random reads, unless the sequential scans are performed in their entirety between sorts/merges. In any case, one thing that none of this really accounts for is that it's better to set random_page_cost too low than too high. The reason is that index scans are more selective than sequential scans: a sequential scan will read the entire table every time, whereas an index scan will read only the index pages (and their parents) that match the query. My experience is that when the planner improperly computes the selectivity of the query (e.g., by not having good enough or sufficiently up to date statistics), it generally computes a lower selectivity than the query actually represents, and thus selects a sequential scan when an index scan would be more efficient. The auto vacuum daemon helps in this regard, by keeping the statistics more up-to-date. Certainly you shouldn't go overboard by setting random_page_cost too low "just in case", but it does mean that if you go through the process of running tests to determine the proper value for random_page_cost, you should probably select a random_page_cost that's in the lower part of the range of values you got. -- Kevin Brown kevin@sysexperts.com
pgsql-performance by date: