Re: disfavoring unparameterized nested loops - Mailing list pgsql-hackers

From Andrey Lepikhov
Subject Re: disfavoring unparameterized nested loops
Date
Msg-id 379b65d3-fde4-a9b4-1ea9-4eaa3936e729@postgrespro.ru
Whole thread Raw
In response to Re: disfavoring unparameterized nested loops  (Benjamin Coutu <ben.coutu@zeyos.com>)
Responses Re: disfavoring unparameterized nested loops
List pgsql-hackers
On 29/9/2022 21:32, Benjamin Coutu wrote:
> I'd like to revamp this important discussion.
> 
> As is well described in this fairly recent paper here https://www.vldb.org/pvldb/vol9/p204-leis.pdf (which also looks
atPostgres) "estimation errors quickly grow as the number of joins increases, and that these errors are usually the
reasonfor bad plans" - I think we can all get behind that statement.
 
> 
> While nested loop joins work great when cardinality estimates are correct, they are notoriously bad when the
optimizerunderestimates and they degrade very fast in such cases - the good vs. bad here is very asymmetric. On the
otherhand hash joins degrade much more gracefully - they are considered very robust against underestimation. The above
mentionedpaper illustrates that all mayor DBMS (including Postgres) tend to underestimate and usually that
underestimationincreases drastically with the number of joins (see Figures 3+4 of the paper).
 
> 
> Now, a simple approach to guarding against bad plans that arise from underestimation could be to use what I would
calla nested-loop-conviction-multiplier based on the current depth of the join tree, e.g. for a base table that
multiplierwould obviously be 1, but would then grow (e.g.) quadratically. That conviction-multiplier would *NOT* be
usedto skew the cardinality estimates themselves, but rather be applied to the overall nested loop join cost at each
particularstage of the plan when comparing it to other more robust join strategies like hash or sort-merge joins. That
waywhen we can be sure to have a good estimate at the bottom of the join tree we treat all things equal, but favor
nestedloops less and less as we move up the join tree for the sake of robustness.
 
> Also, we can expand the multiplier whenever we fall back to using the default cardinality constant as surely all bets
areoff at that point - we should definitely treat nested loop joins as out of favor in this instance and that could
easilybe incorporated by simply increasing the conviction-mutliplier.
 
> 
> What are your thoughts on this simple idea - is it perhaps too simple?
In my practice, parameterized nested loop reduces, sometimes 
drastically, execution time. If your query touches a lot of tables but 
extracts only a tiny part of the data, and you have good coverage by 
indexes, PNL works great.
Moreover, I have pondered extending parameterization through subqueries 
and groupings.

What could you say about a different way: hybrid join? In MS SQL Server, 
they have such a feature [1], and, according to their description, it 
requires low overhead. They start from HashJoin and switch to NestLoop 
if the inner input contains too small tuples. It solves the issue, Isn't it?

[1] 
https://techcommunity.microsoft.com/t5/sql-server-blog/introducing-batch-mode-adaptive-joins/ba-p/385411

-- 
regards,
Andrey Lepikhov
Postgres Professional




pgsql-hackers by date:

Previous
From: Andrey Lepikhov
Date:
Subject: Re: Oversight in reparameterize_path_by_child leading to executor crash
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: Show WAL write and fsync stats in pg_stat_io