Thread: Re: Stopgap solution for table-size-estimate updatingproblem
Re: Stopgap solution for table-size-estimate updatingproblem
From
"Zeugswetter Andreas DAZ SD"
Date:
>> One possibility: vacuum already knows how many tuples it removed. We >> could set reltuples equal to, say, the mean of the number-of-tuples- >> after-vacuuming and the number-of-tuples-before. In a steady state >> situation this would represent a fairly reasonable choice. In cases >> where the table size has actually decreased permanently, it'd take a few >> cycles of vacuuming before reltuples converges to the new value, but that >> doesn't seem too bad. > > That sounds good to me. Covers all cases I can see from here. Yes, sounds good for me also. I think that would be a good thing even if viewed isolated from the rest of the proposal. I am sorry if I made the impression that I don't like a change in this direction in general, I think there is need for both. I am only worried about core OLTP applications where every query is highly tuned (and a different plan is more often than not counter productive, especially if it comes and goes without intervention). >> A standalone ANALYZE should still do what it does now, though, I think; >> namely set reltuples to its best estimate of the current value. good, imho :-) > A GUC-free solution...but yet manual control is possible. Sounds good to > me - and for you Andreas, also? It is the GUC to keep the optimizer from using the dynamic page count, that I would still like to have. I especially liked Simon's name for it: enable_dynamic_statistics=true Tom wrote: >> But I am used to applications >> that prepare a query and hold the plan for days or weeks. If you happen to >> create the plan when the table is by chance empty you lost. > > You lose in either case, since this proposal doesn't change when > planning occurs or doesn't occur. This is not true in my case, since I only "update statistics"/analyze when the tables have representative content (i.e. not empty). Andreas
"Zeugswetter Andreas DAZ SD" <ZeugswetterA@spardat.at> writes: > Tom wrote: >>> But I am used to applications >>> that prepare a query and hold the plan for days or weeks. If you happen to >>> create the plan when the table is by chance empty you lost. >> >> You lose in either case, since this proposal doesn't change when >> planning occurs or doesn't occur. > This is not true in my case, since I only "update statistics"/analyze > when the tables have representative content (i.e. not empty). I'm unsure why you feel you need a knob to defeat this. The only time when the plan would change from what you think of as the hand-tuned case is when the physical table size is greatly different from what it was when you analyzed. The entire point of wanting to make this change is exactly that in that situation the plan *does* need to change. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > I'm unsure why you feel you need a knob to defeat this. The only time > when the plan would change from what you think of as the hand-tuned > case is when the physical table size is greatly different from what it > was when you analyzed. The entire point of wanting to make this change > is exactly that in that situation the plan *does* need to change. Simply put because the optimizer isn't infallible. And some mistakes are more costly than others. Continuing to use a plan that worked fine after an incremental change to the table is unlikely to cause pain whereas changing plans opens a pandora's box of potential catastrophic failures. Imagine a scenario where the system was running fine using nested loops and index scans but the user deletes a few records (at 9am just as the site is hitting peak usage and before I'm awake) and suddenly the planner decides to use sequential scans and hash joins. The resulting plan may be far too slow and crash the application. This is especially likely if the original plan estimates were off. You're going to say the opposite is also possible but it's not really true. A DML change that doesn't trigger an execution plan change isn't going to cause a disproportionate change in the execution time of queries. It's going to cause a change in execution time proportionate to the change in the data. If the user doubles the number of records in the table (something I can predict the likelihood of) it probably means the query will take twice as long. Now there may be a faster plan out there but failing to find it just means the query will take twice as long. If the user halves the number of records and the planner tries to be clever and switches plans, then it might be right, but it might be wrong. And the potential damage if it's wrong is unbounded. It could just take twice as long, but it could take 1,000 times as long or worse. For a production OLTP system I would want to be able to control when the plans change. In an ideal world I would even want to inspect and test them before they go live. The last thing I want is for them to change spontaneously when I'm not expecting it. -- greg
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> I'm unsure why you feel you need a knob to defeat this. > Simply put because the optimizer isn't infallible. And one of the main reasons that it's fallible is because it sometimes uses grossly obsolete statistics. We can fix the first-order problems in this line with the proposed changes. (Obsolete pg_statistic contents are an issue too, but they usually have only second-order effects on plan choices.) > And some mistakes are more > costly than others. Continuing to use a plan that worked fine after an > incremental change to the table is unlikely to cause pain We're not talking about "incremental" changes; those would be unlikely to result in a plan change in any case. The cases that are causing pain are where the table size has changed by an order of magnitude and the planner failed to notice. > You're going to say the opposite is also possible but it's not really true. A > DML change that doesn't trigger an execution plan change isn't going to cause > a disproportionate change in the execution time of queries. Nonsense. You're assuming incremental changes (ie, only a small fractional change in table size), but we are getting killed by non-incremental cases. If the plan cost estimates are such that a small fractional change in table size will cause the planner to switch to a hugely worse plan, then you're living on the edge of disaster anyway. Or are you telling me that every time you VACUUM or ANALYZE, you immediately hand-inspect the plans for every query you use? A further point is that only VACUUM can decrease the table size, and VACUUM already updates these stats anyway. The only "loss of control" involved here is prevention of a plan change in response to a significant increase in table size. Overestimates of result size usually don't produce as horrible plans as underestimates, so the downside doesn't seem as large as you make it out to be. > For a production OLTP system I would want to be able to control when > the plans change. In an ideal world I would even want to inspect and > test them before they go live. This is pure fantasy. It certainly has nothing to do with the current state of nor future directions for the planner, and you haven't even convinced me that it's a desirable goal. What you are describing is a brittle, inflexible system that is much more likely to break under unforeseen circumstances than it is to perform well reliably. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Nonsense. You're assuming incremental changes (ie, only a small > fractional change in table size), but we are getting killed by > non-incremental cases. If the plan cost estimates are such that a small > fractional change in table size will cause the planner to switch to a > hugely worse plan, then you're living on the edge of disaster anyway. > Or are you telling me that every time you VACUUM or ANALYZE, you > immediately hand-inspect the plans for every query you use? Well with the current situation the best I can hope for is to run analyze at times when we can withstand minor outages and I can respond. Probably I would run it during off-peak hours. So basically while I don't hand-inspect plans, I'm using the site to test them. If the site's still running 5 minutes after the analyze then they're probably ok. I have actually written up a script that I intend to experiment with that explains every query in the system then runs analyze within a transaction and then reruns explain on every query to check for any changed plans. It only commits if there are no unchanged plans. This is all just an experiment though. I'm not sure how effective it'll be. > A further point is that only VACUUM can decrease the table size, and > VACUUM already updates these stats anyway. The only "loss of control" > involved here is prevention of a plan change in response to a > significant increase in table size. Overestimates of result size > usually don't produce as horrible plans as underestimates, so the > downside doesn't seem as large as you make it out to be. That's true. I don't think the proposed change makes the situation with respect to plan stability any worse than the status quo. But it does seem to lock us into the idea that plans could change at any time whatsoever. I'm not sure why VACUUM without ANALYZE updates the statistics at all though. Isn't that what ANALYZE is for? > This is pure fantasy. It certainly has nothing to do with the current > state of nor future directions for the planner, and you haven't even > convinced me that it's a desirable goal. What you are describing is a > brittle, inflexible system that is much more likely to break under > unforeseen circumstances than it is to perform well reliably. Huh. That's how I see the current setup. I find the current thinking too fragile precisely because there's no way to test it and guarantee it will perform consistently. I want something that won't suddenly change behaviour in ways I can't predict. I want something that will consistently run the same code path every time except at well defined points in time according to well defined processes. I'll point out other databases end up treading the same ground. Oracle started with a well defined rules-based system that was too inflexible to handle complex queries. So they went to a cost-based optimizer much like Postgres's current optimizer. But DBAs resisted for a long time precisely because they couldn't control it or predict its behaviour as well. Now they have a plan stability system where you can plan queries using the cost based optimizer but then store the plans for future use. You can even take the plans and store them and load them on development systems for testing. Their system is awfully kludgy though. Postgres can probably do much better. -- greg
On Mon, 2004-11-29 at 15:37, Tom Lane wrote: > "Zeugswetter Andreas DAZ SD" <ZeugswetterA@spardat.at> writes: > > Tom wrote: > >>> But I am used to applications > >>> that prepare a query and hold the plan for days or weeks. If you happen to > >>> create the plan when the table is by chance empty you lost. > >> > >> You lose in either case, since this proposal doesn't change when > >> planning occurs or doesn't occur. > > > This is not true in my case, since I only "update statistics"/analyze > > when the tables have representative content (i.e. not empty). > > I'm unsure why you feel you need a knob to defeat this. The only time > when the plan would change from what you think of as the hand-tuned > case is when the physical table size is greatly different from what it > was when you analyzed. The entire point of wanting to make this change > is exactly that in that situation the plan *does* need to change. Well, the cutover between plans is supposed to happen at exactly the right place, so in theory you should want this. The margin for error on the various estimates means that the actual cutover is often some way away from the smooth transition point. If you're unlucky enough to have a plan that fluctuates on either side of the planner's transition point AND where the transition point is misplaced then you can get a large discontinuity in execution times. That's when a careful man such as Andreas can take extra benefit from manual control. You're both right. We should help both the careful tuner and the short-of-time-developer. -- Best Regards, Simon Riggs
On 11/29/2004 10:49 AM, Greg Stark wrote: > I'll point out other databases end up treading the same ground. Oracle started > with a well defined rules-based system that was too inflexible to handle > complex queries. So they went to a cost-based optimizer much like Postgres's > current optimizer. But DBAs resisted for a long time precisely because they > couldn't control it or predict its behaviour as well. Now they have a plan > stability system where you can plan queries using the cost based optimizer but > then store the plans for future use. You can even take the plans and store > them and load them on development systems for testing. I can attest to this. I work (Middlware, not DBA stuff) with fairly large oracle databases (40T, billions of rows). The data is added in chunks (tablespaces) and in general do not materially affect the distribution of data. However, oracle would many times suddenly take a plan and shove it in a new sub-optimal query path after adding the data. The solution was to 1) fix the stats and/or stored outline in a staging area manually (DBA) or 2) hint the query in the middleware (uggh -- my group MW) Once good, move the stored outlines to the production hardware -- all is fixed. For the most part we fix using option 2 cause it is generally easier to hint the query than to fix the stored outline (though our DBAs say they can). Using stored outlines has gone a long way to ensure stability on our systems. > > Their system is awfully kludgy though. Postgres can probably do much better. > -- -Rupa