Re: Index Scans become Seq Scans after VACUUM ANALYSE - Mailing list pgsql-hackers
From | Thomas Lockhart |
---|---|
Subject | Re: Index Scans become Seq Scans after VACUUM ANALYSE |
Date | |
Msg-id | 3CBD837E.D03E45C@fourpalms.org Whole thread Raw |
In response to | Re: Index Scans become Seq Scans after VACUUM ANALYSE (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Index Scans become Seq Scans after VACUUM ANALYSE
|
List | pgsql-hackers |
> Oh, come on Tom, surely I have been around long enough to lend credence that > wish to have a positive affect on PostgreSQL development. :) Tom does have a way with words sometimes, eh? > enable_seqscan=0, disallows sequential scan, that is not what I am saying. This > is a problem I (and others) have been yapping about for a long time. > I just think there is sufficient evidence to suggest that if a DBA creates an > index, there is strong evidence (better than statistics) that the index need be > used. In the event that an index exists, there is a strong indication that, > without overwhelming evidence, that the index should be used. You have admitted > that statistics suck, but the existence of an index must weight (heavily) on > the evaluation on whether or not to use an index. Tom is a mathematician by training, and is trying to balance the optimizer decisions right on the transition between best and next-best possibility. Biasing it to one decision or another when all of his test cases clearly show the *other* choice would be better puts it in the realm of an arbitrary choice *not* supported by the data! afaict there are *two* areas which might benefit from analysis or adjustments, but they have to be supported by real test cases. 1) the cost estimates used for each property of the data and storage. 2) the statistical sampling done on actual data during analysis. The cost numbers have been tested on what is hopefully a representative set of machines. It may be possible to have a test suite which would allow folks to run the same data on many different platforms, and to contribute other test cases for consideration. Perhaps you would like to organize a test suite? Tom, do you already have cases you would like to see in this test? The statistical sampling (using somewhere between 1 and 10% of the data afaicr) *could* get fooled by pathalogical storage topologies. So for cases which seem to have reached the "wrong" conclusion we should show *what* values for the above would have arrived at the correct result *without* biasing other potential results in the wrong direction. Systems which have optimizing planners can *never* be guaranteed to generate the actual lowest-cost query plan. Any impression that Oracle, for example, actually does do that may come from a lack of visibility into the process, and a lack of forum for discussing these edge cases. Please don't take Tom's claim that he doesn't get excited about wrong planner choices which are not too wrong as an indication that he isn't interested. The point is that for *edge cases* the "correct answer" can never be known until the query is actually run two different ways. And the planner is *never* allowed to do that. So tuning the optimizer over time is the only way to improve things, and with edge cases a factor of two in timing is, statistically, an indication that the results are close to optimal. We rarely get reports that the planner made the best choice for a plan, but of course people usually don't consider optimal performance to be a reportable problem ;) - Thomas
pgsql-hackers by date: