On Mon, Mar 17, 2014 at 11:50 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Mon, Mar 17, 2014 at 12:57 PM, Atri Sharma <atri.jiit@gmail.com> wrote: > >> There's a big difference between saying to the planner, "Use plan X" >> vs "Here's some information describing the data supporting choosing >> plan X intelligently". The latter allows for better plans in the face >> of varied/changing data, integrates with the planner in natural way, >> and encourages users to understand how the planner works. > > +1 > > I was thinking of varying the 'weight' of a user defined plan by an fixed > experimental factor to tell the planner to give higher/lower preference to > this plan, but after your idea above, I think Stephen's point of introducing > a GUC for the factor is the only way possible and I agree with him on the > point that eventually the user will figure out a way to force usage of his > plan using the GUC.
GUC is not the answer beyond the "broad brush" mostly debugging level features they already support. What do you do if your plan simultaneously needs and does not need nestloops?
A query plan is a complicated thing that is the result of detail analysis of the data. I bet there are less than 100 users on the planet with the architectural knowledge of the planner to submit a 'plan'. What users do have is knowledge of the data that the database can't effectively gather for some reason. Looking at my query above, what it would need (assuming the planner could not be made to look through length()) would be something like:
SELECT * FROM foo WHERE
length(bar) <= 1000 WITH SELECTIVITY 0.999 AND length(bar) >= 2 WITH SELECTIVITY 0.999;
Wont this have scaling issues and issues over time as the data in the table changes?
Suppose I make a view with the above query. With time, as the data in the table changes, the selectivity values wont be good for planning. This may potentially lead to a lot of changes in the view definition and other places where this query was used.
In general, I think I step back on my point that specifying the selectivity is a bad idea.
Could this also work (for the time being) for cross-column statistics?