Re: Planner making bad choice in alternative subplan decision - Mailing list pgsql-hackers

From David Rowley
Subject Re: Planner making bad choice in alternative subplan decision
Date
Msg-id CAApHDvqBoYU8aES4a0t-J15wk1wPMFJDHcyafyfHj7JqJ+u9wg@mail.gmail.com
Whole thread Raw
In response to Re: Planner making bad choice in alternative subplan decision  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Thanks for chipping in here.

On Tue, 29 Sep 2020 at 12:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Any opinions on this?
>
> This patch scares the heck out of me.  It's a pretty much unprincipled
> change in a fundamental selectivity estimator, which is going to affect
> all sorts of queries not only the particular case you have in mind.
> There's no reason to think that the outcome will be positive for other
> cases, either.

hmm. Yeah, I understand your thoughts. The reason why I had thoughts
that this might be an okay route to fix the problem was regarding the
comment above the current good which says, "Is that a good idea?".

I think I've mentioned somewhere on list about a risk-based costing
model, where tag on some sort of risk factor onto a Path and have
add_path() consider the risk and the cost perhaps with influence of
some GUCs to help weight the decision in a certain direction.  A
simple version of the primary case for this is how we multiply
selectivities of quals assuming no correlation. With each
multiplication, we increase the risk of being wrong which would
increase the risk score. How this problem tends to come out and bite
people is how we end up with a selectivity that's so low we think
there's 1 row and we end up doing some subsequent join as a
non-parameterised nested loop join, but it turns out 1 million rows
match and someone has to wait a long time for their query to finish.
The risk+cost based planner would see that that's risky and maybe
consider hashing that 1 row and doing a hash join.  Hashing 1 row is
pretty cheap, not much more expensive than nested looping, but if it
blows up, the explosion is contained.

Anyway, perhaps it's better to fix the more general case that I
mention one day when we have some sort of risk factor in the costing
model and just assign a higher risk to the seq scan path.

> The idea I'd had was to adjust make_subplan and cost_subplan to estimate
> EXIST cases on the basis of either 50% retrieval (same as ANY/ALL) or
> maybe full retrieval if you want to be pessimistic.  I've not had time
> to try it out though.

Yeah, I can look at that again, if you think it's more reasonable. It
was the first place a landed when looking for a fix until I discovered
the problem was more generic than just subplans.

David



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros
Next
From: Tom Lane
Date:
Subject: Re: Small improvements to pg_list.h's linitial(), lsecond(), lthird() etc macros