Re: Get rid of runtime handling of AlternativeSubPlan? - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Get rid of runtime handling of AlternativeSubPlan? |
Date | |
Msg-id | CAApHDvoYDvK9cnoqweKoLzorFCsWWkom_ZFkvxYvM7wp0ojgBw@mail.gmail.com Whole thread Raw |
In response to | Get rid of runtime handling of AlternativeSubPlan? (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Mon, 22 Jun 2020 at 12:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Back in bd3daddaf232d95b0c9ba6f99b0170a0147dd8af, which introduced > AlternativeSubPlans, I wrote: > > There is a lot more that could be done based on this infrastructure: in > particular it's interesting to consider switching to the hash plan if we start > out using the non-hashed plan but find a lot more upper rows going by than we > expected. I have therefore left some minor inefficiencies in place, such as > initializing both subplans even though we will currently only use one. > > That commit will be twelve years old come August, and nobody has either > built anything else atop it or shown any interest in making the plan choice > switchable mid-run. So it seems like kind of a failed experiment. > > Therefore, I'm considering the idea of ripping out all executor support > for AlternativeSubPlan and instead having the planner replace an > AlternativeSubPlan with the desired specific SubPlan somewhere late in > planning, possibly setrefs.c. > > Admittedly, the relevant executor support only amounts to a couple hundred > lines, but that's not nothing. A perhaps-more-useful effect is to get rid > of the confusing and poorly documented EXPLAIN output that you get for an > AlternativeSubPlan. > > I also noted that the existing subplan-selection logic in > ExecInitAlternativeSubPlan is really pretty darn bogus, in that it uses a > one-size-fits-all execution count estimate of parent->plan->plan_rows, no > matter which subexpression the subplan is in. This is only appropriate > for subplans in the plan node's targetlist, and can be either too high > or too low elsewhere. It'd be relatively easy for setrefs.c to do > better, I think, since it knows which subexpression it's working on > at any point. When I was working on [1] a few weeks ago, I did wonder if I'd have to use an AlternativeSubPlan when doing result caching for subqueries. The problem is likely the same as why they were invented in the first place; we basically don't know how many rows the parent will produce when planning the subplan. For my case, I have an interest in both the number of rows in the outer plan, and the ndistinct estimate on the subplan parameters. If the parameters for the subquery are all distinct, then there's not much sense in trying to cache results to use later. We're never going to need them. Right now, if I wanted to use AlternativeSubPlan to delay the choice of this until run-time, then I'd be missing information about the ndistinct estimation since we don't have that information available in the final plan. Perhaps that's an argument for doing this in setrefs.c instead. I could look up the ndistinct estimate there. For switching plans on the fly during execution. I can see the sense in that as an idea. For the hashed subplan case, we'd likely want to switch to hashing mode if we discovered that there were many more rows in the outer query than we had thought there would be. However, I'm uncertain if Result Cache would never need anything similar as technically we could just switch off the caching if we discovered our cache hit ration was either terrible or 0. We would have an additional node to pull tuples through, however. Switching would also require that the tupleslot type was the same between the alternatives. David [1] https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com
pgsql-hackers by date: