Re: disfavoring unparameterized nested loops - Mailing list pgsql-hackers

From John Naylor
Subject Re: disfavoring unparameterized nested loops
Date
Msg-id CAFBsxsFAhMq-nKqY8e3xeTyasyNz0vUC44J84HaCJFj-7PzO6w@mail.gmail.com
Whole thread Raw
In response to Re: disfavoring unparameterized nested loops  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: disfavoring unparameterized nested loops  (Kenneth Marshall <ktm@rice.edu>)
List pgsql-hackers
On Tue, Jun 15, 2021 at 8:00 PM David Rowley <dgrowleyml@gmail.com> wrote:

> In my experience, the most common reason that the planner chooses
> non-parameterized nested loops wrongly is when there's row
> underestimation that says there's just going to be 1 row returned by
> some set of joins.  The problem often comes when some subsequent join
> is planned and the planner sees the given join rel only produces one
> row.  The cheapest join method we have to join 1 row is Nested Loop.
> So the planner just sticks the 1-row join rel on the outer side
> thinking the executor will only need to scan the inner side of the
> join once.  When the outer row count blows up, then we end up scanning
> that inner side many more times. The problem is compounded when you
> nest it a few joins deep
>
> Most of the time when I see that happen it's down to either the
> selectivity of some correlated base-quals being multiplied down to a
> number low enough that we clamp the estimate to be 1 row.   The other
> case is similar, but with join quals.

If an estimate is lower than 1, that should be a red flag that Something Is Wrong. This is kind of a crazy idea, but what if we threw it back the other way by computing 1 / est , and clamping that result to 2 <= res < 10 (or 100 or something)? The theory is, the more impossibly low it is, the more wrong it is. I'm attracted to the idea of dealing with it as an estimation problem and not needing to know about join types. Might have unintended consequences, though.

Long term, it would be great to calculate something about the distribution of cardinality estimates, so we can model risk in the estimates. 

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: SSL/TLS instead of SSL in docs
Next
From: Greg Nancarrow
Date:
Subject: Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values