Home > mailing lists

Re: Planner making bad choice in alternative subplan decision - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Planner making bad choice in alternative subplan decision
Date	September 29, 2020 04:52:00
Msg-id	CAApHDvrohKikkCKH_yLkugmZpeTbpzq5uFV_m2h=KhcxLyTTqw@mail.gmail.com Whole thread Raw
In response to	Re: Planner making bad choice in alternative subplan decision (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Planner making bad choice in alternative subplan decision (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Tue, 29 Sep 2020 at 12:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > Any opinions on this?
>
> This patch scares the heck out of me.  It's a pretty much unprincipled
> change in a fundamental selectivity estimator, which is going to affect
> all sorts of queries not only the particular case you have in mind.
> There's no reason to think that the outcome will be positive for other
> cases, either.
>
> The idea I'd had was to adjust make_subplan and cost_subplan to estimate
> EXIST cases on the basis of either 50% retrieval (same as ANY/ALL) or
> maybe full retrieval if you want to be pessimistic.  I've not had time
> to try it out though.

Here's a patch which does that.  There are no regression test failures.

I'm trying to anticipate areas that this could cause a regression.  I
think generally a hashed subplan should win in about all cases where
we lookup all of the items in the table. The places where it could
cause regression are, where we have to load way more into the hash
table than we're going to lookup. Assuming roughly the same costs for
the subplan for hashing and the non-hashed subplan, then the crossover
point will be about 2 lookups (2 x 50%).  I do wonder if 50% is a bit
too big a number. We did ask the planner for a fast startup plan, so
there is perhaps some reasoning to make the cost multiplier somewhere
between 50% and 1 / nrows. I'm just not sure what that should be.
There's very little to go on cost-wise, or even heuristically. So
consider the patch still a discussion level patch.

David

Attachment

cost_EXISTS_SUBLINKs_as_half_total_cost.patch

pgsql-hackers by date:

From: "tsunakawa.takay@fujitsu.com"
Date: 29 September 2020, 04:51:12
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist

From: Michael Paquier
Date: 29 September 2020, 05:03:12
Subject: Re: __pg_log_level in anonynous enum should be initialized? (Was: pgsql: Change SHA2 implementation based on OpenSSL to use EVP digest ro)

Re: Planner making bad choice in alternative subplan decision - Mailing list pgsql-hackers

Attachment

Previous

Next