Re: Improving non-joinable EXISTS subqueries - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: Improving non-joinable EXISTS subqueries |
Date | |
Msg-id | 48AC58CD.EE98.0025.0@wicourts.gov Whole thread Raw |
In response to | Re: Improving non-joinable EXISTS subqueries (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > when we have an EXISTS that could be done both ways, > why not just generate plans for both ways, and leave the decision > which to use until later? That seems good to me. The costs for the slower plan generally come out much higher. When the run times are close, the one that edges out the other doesn't always win, but that's to be expected. EXISTS is hardly unique in that respect. Competing on costs seems better than some more mechanical approach. > Like maybe even execution time? > > We have speculated in the past about having alternative plans that > could be conditionally executed based on information not available > at planning time. This could be seen as a first experiment in that > direction. I am not thinking of a general-purpose AlternativePlan > kind of execution node, because SubPlans aren't actually part of the > main plan-node tree, but an AlternativeSubPlans expression node > type might work. > > The two issues that would obviously have to be faced to make this > work are: > > 1. While the planner is estimating evaluation costs of the qual > conditions for the upper query, which EXISTS implementation do we assume > will be used? It might be that we could still use my original idea of > providing cost_qual_eval() with some context about the likely number of > calls, but what I'm thinking at the moment is that it's not worth the > trouble, because it isn't going to matter that much. Either possibility > is expensive enough compared to ordinary qual conditions that the > planner will be driven in the direction of plans that minimize the > number of EXISTS evaluations, and that's all that we really care about. > So I'd be inclined to just use the numbers for the base (non hashed) > implementation and be done with it. Seems reasonable from this point of view: it seems like you'd never choose a plan worse than the current releases, although you might sometimes miss a plan that would be even faster than the suggested improvement finds. I think it makes sense to defer this until such time (if ever) that it is shown to be worth the effort. > 2. How will the executor make the decision which to use? Well, it's > got access to the overall rowcount estimates that the planner made. > What I'm thinking of doing is having the AlternativeSubPlans node > look at the rowcount estimate of its immediate parent Plan node. > This is actually exactly the right number for a subplan in the > targetlist of the Plan node. For a subplan in the qual list, it's > an underestimate, but probably not an enormous underestimate. > (Assuming that the subplan is at the end of the qual list, which is > where it'd normally be, the expected number of calls of the subplan > would be the output rowcount estimate divided by the estimated > selectivity of the subplan qual --- but at present the latter is always > 0.5 ...) If you meant multiplied by 0.5 I think I followed that. Made sense. > Another technique that we could play with is to have the > AlternativeSubPlans node track the actual number of calls it gets, > and switch from the "retail" implementation to the "hashed" > implementation if that exceeds a threshold. This'd provide some > robustness in the face of bad estimates, although of course it's > not optimal compared to having made the right choice to start with. That sounds interesting, but unless it has value as a prototype for other runtime adaptivity, it sounds like a lot of work for the benefit. I'm not that unhappy with the estimates I'm getting in a properly tuned database. And the execution-time work to process some number of rows this way seems likely to far exceed the work to refine the estimates and costing used to choose a plan. -Kevin
pgsql-hackers by date: