Thread: detecting poor query plans
A fair number of the performance problems posted to mailing lists are the result of the query planner choosing a poor query plan. Some of these instances are due to a defect in PostgreSQL (e.g. index type comparisons), but many of them are caused by inaccurate statistics: either ANALYZE has not been run recently, or the statistics target for this column needs to be raised. It occurred to me that these kinds of poor planning decisions could easily be detected by PostgreSQL itself: after we've finished executing a plan, we can trivially compare the # of results produced by each node in the query tree with the # of results the planner expected that node to produce (look at EXPLAIN ANALYZE, for example). If the estimate is off by a significant margin (say, 300%), we could perhaps emit a HINT suggesting that the user re-run ANALYZE or raise the statistics target for the relevant column. We can add a GUC to control whether any of these messages are emitted at all, or perhaps the degree of estimation error that is deemed acceptable. One potential problem might involve queries that the planner will almost *never* do a good job on (see [1] for example). Until we get around to improving the planner, there isn't much universally- applicable advice we can offer in these situations: running ANALYZE all day or setting the statistics target arbitrarily high won't significantly improve the chosen query plan, so you'd continue to see the hint described above. BTW, this situation is a simple example of the general technique of using feedback from the executor to improve query optimization decisions. If you're interested, there is plenty of discussion of this topic in the DBMS literature -- I'd be happy to provide pointers to papers. It might be cool to try implementing some of this stuff for PostgreSQL in the future. Comments? -Neil [1] http://www.mail-archive.com/pgsql-performance%40postgresql.org/msg02415.html
>>>>> "Neil" == Neil Conway <neilc@samurai.com> writes: Neil> It occurred to me that these kinds of poor planning Neil> decisions could easily be detected by PostgreSQL itself: Neil> after we've finished executing a plan, we can trivially Neil> compare the # of results produced by eachnode in the query Neil> tree with the # of results the planner expected that node to Neil> produce (look at EXPLAINANALYZE, for example). If the Indeed. This is the approach being followed by the LeO project (Learning Optimizer) at IBM Almaden. http://www.almaden.ibm.com/software/dm/SMART/leo.shtml There is a vldb paper that describes it .. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
Neil Conway <neilc@samurai.com> writes: > It occurred to me that these kinds of poor planning decisions could > easily be detected by PostgreSQL itself: after we've finished > executing a plan, we can trivially compare the # of results produced > by each node in the query tree with the # of results the planner > expected that node to produce (look at EXPLAIN ANALYZE, for > example). If the estimate is off by a significant margin (say, 300%), > we could perhaps emit a HINT suggesting that the user re-run ANALYZE I think such a thing would have such a low signal-to-noise ratio as to be useless :-(. As you note, there are many places where the planner's estimate is routinely off by more than 3x (or any other threshold you might pick instead). In some situations that doesn't really matter, as the same plan would have gotten picked anyway. Also, since 7.2 came out what we've seen more and more is cases where the row count estimate is acceptably good, but the wrong plan was picked anyway because of deficiencies in the cost equations. The question you really want to know about is not whether the row count estimate is close, it's whether another plan could have done better. regards, tom lane
Neil Conway <neilc@samurai.com> writes: > It occurred to me that these kinds of poor planning decisions could easily > be detected by PostgreSQL itself: after we've finished executing a plan, we > can trivially compare the # of results produced by each node in the query > tree with the # of results the planner expected that node to produce There's a dual to this as well. If the results were very close but the actual time taken to run the node doesn't match the cost calculated then some optimizer parameter needs to be adjusted. Either one of the cost_* parameters or random_page_cost, or effective_cache_size or... I'm not sure it's as obvious what to put in the HINT though. Ideally these results would have to be gathered and pumped through linear optimization algorithms which is a lot more work. -- greg
Tom Lane <tgl@sss.pgh.pa.us> writes: > I think such a thing would have such a low signal-to-noise ratio as > to be useless :-(. As you note, there are many places where the > planner's estimate is routinely off by more than 3x (or any other > threshold you might pick instead). I wonder, perhaps we could add a "certainty" parameter to the estimated query plan + result sizes + costs produced by the planner. That way, when we run into a planner deficiency we can basically mark the relevant portion of the query tree as a WAG, and not bother with emitting hints for it. > In some situations that doesn't really matter, as the same plan > would have gotten picked anyway. The hint is NOT "the chosen plan was non-optimal"; the hint is "the query planner did not produce an accurate row count estimate for this node in the query tree." The chosen query plan may or may not be optimal -- we're merely pointing out that we chose the plan we did on shakey grounds. The hint might just as well indicate a problem with another query that happens to apply a similar predicate to the column in question. > The question you really want to know about is not whether the row > count estimate is close, it's whether another plan could have done > better. Perhaps, but is there a reasonable way to answer the second question? -Neil
On Wed, Nov 26, 2003 at 11:59:33AM -0500, Neil Conway wrote: > > In some situations that doesn't really matter, as the same plan > > would have gotten picked anyway. > > The hint is NOT "the chosen plan was non-optimal"; the hint is "the > query planner did not produce an accurate row count estimate for this > node in the query tree." Maybe it could only be done for SeqScan and IndexScan nodes, which are probably the most common source of bad estimates related to poor statistics. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "At least to kernel hackers, who really are human, despite occasional rumors to the contrary" (LWN.net)
Greg Stark <gsstark@mit.edu> writes: > There's a dual to this as well. If the results were very close but > the actual time taken to run the node doesn't match the cost > calculated then some optimizer parameter needs to be adjusted. I was thinking about this, but I couldn't think of how to get it to work properly: (1) The optimizer's cost metric is somewhat bogus to begin with. ISTM that translating a cost of X into an expectedruntime of Y msecs is definitely not trivial to do. (2) The size of a node's result relation does not depend upon anything outside of PostgreSQL, whereas the exacttime it takes to produce that result relation depends on a wide collection of external factors. For example,if the system is under heavy load, queries will take longer than normal to run. Or, if the query invokesa function that happens to occasionally block waiting on some resource, the execution time of the querycould be wildly unpredictable. (3) ISTM we couldn't produce a really helpful hint message, even if we somehow resolved #1 and #2 -Neil
Neil Conway <neilc@samurai.com> writes: > I was thinking about this, but I couldn't think of how to get it to > work properly: > > (1) The optimizer's cost metric is somewhat bogus to begin with. > ISTM that translating a cost of X into an expected runtime of > Y msecs is definitely not trivial to do. At least for all the possible plans of a given query at a specific point in time the intention is that the cost be proportional to the execution time. > the exact time it takes to produce that result relation depends on a wide > collection of external factors. That's a valid point. The ms/cost factor may not be constant over time. However I think in the normal case this number will tend towards a fairly consistent value across queries and over time. It will be influenced somewhat by things like cache contention with other applications though. On further thought the real problem is that these numbers are only available when running with "explain" on. As shown recently on one of the lists, the cost of the repeated gettimeofday calls can be substantial. It's not really feasible to suggest running all queries with that profiling. -- greg
Greg Stark <gsstark@mit.edu> writes: > At least for all the possible plans of a given query at a specific > point in time the intention is that the cost be proportional to the > execution time. Why is this relevant? Given a cost X at a given point in time, the system needs to derive an "expected runtime" Y, and compare Y with the actual runtime. I think that producing Y given an arbitrary X involves so many parameters as to be practically impossible for us to compute with any degree of accuracy. > That's a valid point. The ms/cost factor may not be constant over > time. However I think in the normal case this number will tend > towards a fairly consistent value across queries and over time. It might be true in the "normal case", but that doesn't seem very helpful to me: in general, the mapping of plan costs to execution time can vary wildly over time. Spewing "hints" to the log whenever the system's workload changes, a checkpoint occurs, or the system's RAID array hiccups doesn't sound like a useful feature to me. > On further thought the real problem is that these numbers are only > available when running with "explain" on. As shown recently on one > of the lists, the cost of the repeated gettimeofday calls can be > substantial. That sounds more like an implementation detail than the "real problem" to me -- I think this proposed feature has more fundamental issues. -Neil
Greg Stark <gsstark@mit.edu> writes: > That's a valid point. The ms/cost factor may not be constant over time. > However I think in the normal case this number will tend towards a fairly > consistent value across queries and over time. It will be influenced somewhat > by things like cache contention with other applications though. I think it would be interesting to collect the numbers over a long period of time and try to learn something from the averages. The real hole in Neil's original suggestion was that it assumed that comparisons based on just a single query would be meaningful enough to pester the user about. > On further thought the real problem is that these numbers are only available > when running with "explain" on. As shown recently on one of the lists, the > cost of the repeated gettimeofday calls can be substantial. It's not really > feasible to suggest running all queries with that profiling. Yeah. You could imagine a simplified-stats mode that only collects the total runtime (two gettimeofday's per query is nothing) and the row counts (shouldn't be impossibly expensive either, especially if we merged the needed fields into PlanState instead of requiring a separately allocated node). Not sure if that's as useful though. regards, tom lane
> > On further thought the real problem is that these numbers are only available > > when running with "explain" on. As shown recently on one of the lists, the > > cost of the repeated gettimeofday calls can be substantial. It's not really > > feasible to suggest running all queries with that profiling. > > Yeah. You could imagine a simplified-stats mode that only collects the > total runtime (two gettimeofday's per query is nothing) and the row > counts (shouldn't be impossibly expensive either, especially if we > merged the needed fields into PlanState instead of requiring a > separately allocated node). Not sure if that's as useful though. How about a PGC_POSTMASTER GUC variable which tells postgres to collect details on the planner's performance and comparison to actual run times. Optionally, we could also have the executor run some/all of the possible plans (presumably only useful for SELECTs) and keep details on the performance of each. At postmaster shutdown (some other time?) a report could be produced profiling all queries. The reason I suggest this is it would have zero impact on production databases but would allow DBAs to profile their databases with real usage patterns in development environments. Gavin
Gavin Sherry wrote: > > > On further thought the real problem is that these numbers are only available > > > when running with "explain" on. As shown recently on one of the lists, the > > > cost of the repeated gettimeofday calls can be substantial. It's not really > > > feasible to suggest running all queries with that profiling. > > > > Yeah. You could imagine a simplified-stats mode that only collects the > > total runtime (two gettimeofday's per query is nothing) and the row > > counts (shouldn't be impossibly expensive either, especially if we > > merged the needed fields into PlanState instead of requiring a > > separately allocated node). Not sure if that's as useful though. > > How about a PGC_POSTMASTER GUC variable which tells postgres to collect > details on the planner's performance and comparison to actual run times. > Optionally, we could also have the executor run some/all of the possible > plans (presumably only useful for SELECTs) and keep details on the > performance of each. At postmaster shutdown (some other time?) a report > could be produced profiling all queries. > > The reason I suggest this is it would have zero impact on production > databases but would allow DBAs to profile their databases with real usage > patterns in development environments. Great idea. Under ideal situations, it shouldn't be needed, but things are often less than idea. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073