Re: generalizing the planner knobs - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: generalizing the planner knobs |
Date | |
Msg-id | 1133704639.2906.840.camel@localhost.localdomain Whole thread Raw |
In response to | Re: generalizing the planner knobs (Csaba Nagy <nagy@ecircle-ag.com>) |
Responses |
Re: generalizing the planner knobs
Re: generalizing the planner knobs |
List | pgsql-hackers |
On Fri, 2005-12-02 at 11:07 +0100, Csaba Nagy wrote: > On Thu, 2005-12-01 at 22:01, Tom Lane wrote: > > Greg Stark <gsstark@mit.edu> writes: > > > On the other hand the type I would prefer to see are hints that feed directly > > > into filling in information the planner lacks. This only requires that the > > > user understand his own data and still lets the planner pick the best plan > > > based on the provided information. > > > > This would avoid some issues, but it still is vulnerable to the problem > > that the hint you put in your code today will fail to track changes in > > your data tomorrow. > > Tom, I have to disagree here. At least in our application, we must > provide for an acceptable worst case scenario, and sometimes a slightly > wrong estimate can lead to a plan which is very fast 99% of the time but > completely wrong in 1% of the cases. Sometimes the percentage is 50/50. > I've had this situation with some "limit" plans where the planner had > chosen a wrong index. The problem there was that the planner had > estimated that the query will have 20 rows as a result, but it had less, > and resulted in the complete scan of the index... as opposed to a much > smaller scan that would have resulted by scanning the other index, as > that one would have provided an end condition orders of magnitudes > sooner. Now the statistics will always be only an estimation, and +/- a > few can really make a big difference in some situations. In this > particular situation the index choice of the planner would have been > faster for all cases where there were really 20 rows returned, but I > forced it to always choose the other plan (by adding the proper order > by) because I can't risk a bad result in any of the cases. > In this particular case I was able to force the planner choose a > specific plan, but that might not be always possible, so I guess it > really would make sense to be able to tell the planner how selective > some conditions are. And yes, sometimes I would like to "freeze" a > specific "safe" plan for a specific query, even if it is not optimal. Csaba raises a good point here. Many people say they want "hints" when what they actually require the plan to be both stable and predictable. Tom is right to point out that data can change over time. However, experience with packaged application tuning is that you actually do want to have things work in a stable way, even if that is somewhat sub-optimal because when you have 1000s of statements it is important that it doesn't change after you tune it - otherwise you never finish. So I would like to give that requirement a name "Plan Stability"; the actual solution to that could be many things. Another aspect to this is predictability. At the moment, we optimise according to the first parameter a prepared statement is bound with. Many data distributions contain a small number of values that represent a large fraction of the total. This can mean that it is pretty random whether we will get a SeqScan and be stuck with it, or get an IndexScan and be stuck with it. Either plan being wrong 50% of the time. In these cases, hinting is definitely a very bad thing, since whichever you hint, you'll be wrong. This situation gives us two more requirements: - predictability - because we want to know the worst case - flexibility - because we want to be able to take advantage of the best case, but without causing an unconstrained worst case So IMHO, the requirements list for prepared statement planning is that optimization must be: - Flexible - Predictable - Stable The actual solutions to all of those things could be many and varied. ISTM we could do some of that with another GUC, lets call it prepare_once = on. The system default is to have a prepared statement bound to a plan on its first parameter bind. If we set this to "off", then the statement will replan each time we bind. This would give us both flexibility and predictability. (As ever, someone suggest a better name?). The requirements for very large statement tuning are fairly different from that. e.g. large reports or Data Warehousing queries. In those cases, Tom's warning about the data changing can be critical and we don't have the same need to prepare queries. Neil's original point about needing to avoid SeqScans on some tables but not others hasn't really been addressed. That got turned into hinting on particular nodes/node types, but there are still issues: How do you identify one node in a complex plan? Perhaps the way forward would be to make enable_* accept a list of tables, or a * (like listen_addresses). (And make "on" a synonym for "*" and off a synonym for an empty list). That way, you'd be able to control on a per-table basis what plan types are possible. (Still wouldn't work that well for complex plans where same table referenced multiple times, but its a step in the right direction.) I'm not sure I like that particular idea much, but I'm attempting to address Neil's original point, with which I agree. Best Regards, Simon Riggs
pgsql-hackers by date: