On Sun, 27 Sep 2020 at 10:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Thanks for reviewing!
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > 1. I think we should be moving away from using linitial() and second()
> > when we know there are two items in the list. Using list_nth() has
> > less overhead.
>
> Uh, really?
Yeah. Using linitial() and lsecond() will check if the list is
not-NIL. lsecond() does an additional check to ensure the list has at
least two elements. None of which are required since we already know
the list has two elements.
> And if it's true, why would we change all the call sites
> rather than improving the pg_list.h macros?
Maybe we should. Despite the non-NIL check and length check in
list_head(), list_second_cell(), list_third_cell() functions, the
corresponding macro will crash anyway if those functions were to
return NULL. We might as well just use list_nth_cell() to get the
ListCell without any checks to see if the cell exists. I can go off
and fix those separately. I attached a 0004 patch to help explain what
I'm talking about.
> > 2. I did have sight concerns that fix_alternative_subplan() always
> > assumes the list of subplans will always be 2, though on looking at
> > the definition of AlternativeSubPlan, I see always having two in the
> > list is mentioned. It feels like fix_alternative_subplan() wouldn't
> > become much more complex to allow any non-zero number of subplans, but
> > maybe making that happen should wait until there is some need for more
> > than two. It just feels a bit icky to have to document the special
> > case when not having the special case is not that hard to implement.
>
> It seemed to me that dealing with the general case would make
> fix_alternative_subplan() noticeably more complex and less obviously
> correct. I might be wrong though; what specific coding did you have in
> mind?
I had thought something like 0003 (attached). It's a net reduction of
3 entire lines, including the removal of the comment that explained
that there's always two in the list.
> > On a side note, I was playing around with the following case:
> > ...
> > both master and patched seem to not choose to use the hashed subplan
> > which results in a pretty slow execution time. This seems to be down
> > to cost_subplan() doing:
> > /* we only need to fetch 1 tuple; clamp to avoid zero divide */
> > sp_cost.per_tuple += plan_run_cost / clamp_row_est(plan->plan_rows);
> > I imagine / 2 might be more realistic to account for the early abort,
> > which is pretty much what the ALL_SUBLINK and ANY_SUBLINK do just
> > below:
>
> Hm, actually isn't it the other way around? *If* there are any matching
> rows, then what's being done here is an accurate estimate. But if there
> are not, we're going to have to scan the entire subquery output to verify
> that. I wonder if we should just be taking the subquery cost at face
> value, ie be pessimistic not optimistic. If the user is bothering to
> test EXISTS, we should expect that the no-match case does happen.
>
> However, I think that's a distinct concern from this patch; this patch
> is only meant to improve the processing of alternative subplans, not
> to change the costing rules around them. If we fool with it I'd rather
> do so as a separate patch.
Yeah, agreed. I'll open another thread.
David