Re: Improving non-joinable EXISTS subqueries - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Improving non-joinable EXISTS subqueries
Date
Msg-id 48AC58CD.EE98.0025.0@wicourts.gov
Whole thread Raw
In response to Re: Improving non-joinable EXISTS subqueries  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> when we have an EXISTS that could be done both ways,
> why not just generate plans for both ways, and leave the decision
> which to use until later?
That seems good to me.  The costs for the slower plan generally come
out much higher.  When the run times are close, the one that edges out
the other doesn't always win, but that's to be expected.  EXISTS is
hardly unique in that respect.  Competing on costs seems better than
some more mechanical approach.
> Like maybe even execution time?
> 
> We have speculated in the past about having alternative plans that
> could be conditionally executed based on information not available
> at planning time.  This could be seen as a first experiment in that
> direction.  I am not thinking of a general-purpose AlternativePlan
> kind of execution node, because SubPlans aren't actually part of the
> main plan-node tree, but an AlternativeSubPlans expression node
> type might work.
> 
> The two issues that would obviously have to be faced to make this
> work are:
> 
> 1. While the planner is estimating evaluation costs of the qual
> conditions for the upper query, which EXISTS implementation do we
assume
> will be used?  It might be that we could still use my original idea
of
> providing cost_qual_eval() with some context about the likely number
of
> calls, but what I'm thinking at the moment is that it's not worth
the
> trouble, because it isn't going to matter that much.  Either
possibility
> is expensive enough compared to ordinary qual conditions that the
> planner will be driven in the direction of plans that minimize the
> number of EXISTS evaluations, and that's all that we really care
about.
> So I'd be inclined to just use the numbers for the base (non hashed)
> implementation and be done with it.
Seems reasonable from this point of view: it seems like you'd never
choose a plan worse than the current releases, although you might
sometimes miss a plan that would be even faster than the suggested
improvement finds.  I think it makes sense to defer this until such
time (if ever) that it is shown to be worth the effort.
> 2. How will the executor make the decision which to use?  Well, it's
> got access to the overall rowcount estimates that the planner made.
> What I'm thinking of doing is having the AlternativeSubPlans node
> look at the rowcount estimate of its immediate parent Plan node.
> This is actually exactly the right number for a subplan in the
> targetlist of the Plan node.  For a subplan in the qual list, it's
> an underestimate, but probably not an enormous underestimate.
> (Assuming that the subplan is at the end of the qual list, which is
> where it'd normally be, the expected number of calls of the subplan
> would be the output rowcount estimate divided by the estimated
> selectivity of the subplan qual --- but at present the latter is
always
> 0.5 ...)
If you meant multiplied by 0.5 I think I followed that.  Made sense.
> Another technique that we could play with is to have the
> AlternativeSubPlans node track the actual number of calls it gets,
> and switch from the "retail" implementation to the "hashed"
> implementation if that exceeds a threshold.  This'd provide some
> robustness in the face of bad estimates, although of course it's
> not optimal compared to having made the right choice to start with.
That sounds interesting, but unless it has value as a prototype for
other runtime adaptivity, it sounds like a lot of work for the
benefit.  I'm not that unhappy with the estimates I'm getting in a
properly tuned database.  And the execution-time work to process some
number of rows this way seems likely to far exceed the work to refine
the estimates and costing used to choose a plan.
-Kevin


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: make dist does not work in VPATH
Next
From: "Robert Haas"
Date:
Subject: Re: Improving non-joinable EXISTS subqueries