Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path - Mailing list pgsql-hackers

From Arne Roland
Subject Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Date
Msg-id 1946b902b91a480ba0d29f1c86016d7a@index.de
Whole thread Raw
In response to PATCH: generate fractional cheapest paths in generate_orderedappend_path  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
List pgsql-hackers

Hi,


thanks for looking into it!


For some reason the patch doesn't apply at my end, could you repost one based at the master?


> 1) If get_cheapest_fractional_path_for_pathkeys returns NULL, it's not
> clear whether to default to cheapest_startup or cheapest_total. We might
> also consider an incremental sort, in which case the startup cost
> matters (for Sort it does not). So we may need an enhanced version of
> get_cheapest_fractional_path_for_pathkeys that generates such paths.
>
> 2) Same for the cheapest_total - maybe there's a partially sorted path,
> and using it with incremental sort on top would be better than using
> cheapest_total_path + sort.

> I'd argue that this patch does not need to add the Incremental Sort
> capabilities in (1) and (2) - it's just another place where we decided
> not to consider this sort variant yet.

I'd say your reasoning is sound. If I'd want to get better partial costs for incremental sorts, I'd look at get_cheapest_fractional_path first. That sounds more important than generate_orderedappend_paths. Either way I'd say that is a completely separate issue and I think that should be looked at separately.


>3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry

> about require_parallel_safe, just like the other functions nearby.

I think it should. We have a ParallelAppend node after all.
I'm not really familiar with the way get_cheapest_fractional_path_for_pathkeys is used, but a quick search suggests to me, that build_minmax_path was thus far the only one using it. And minmax paths are never parallel safe anyway. I think that is the reason it doesn't do that already.


Regards

Arne


From: Tomas Vondra <tomas.vondra@enterprisedb.com>
Sent: Saturday, April 17, 2021 1:52:19 AM
To: pgsql-hackers
Subject: PATCH: generate fractional cheapest paths in generate_orderedappend_path
 
Hi,

This patch is a WIP fix for the issue described in [1], where the
planner picks a more expensive plan with partition-wise joins enabled,
and disabling this option produces a cheaper plan. That's a bit strange
because with the option disabled we consider *fewer* plans, so we should
not be able to generate a cheaper plan.

The problem lies in generate_orderedappend_paths(), which builds two
types of append paths - with minimal startup cost, and with minimal
total cost. That however does not work for queries with LIMIT, which
also need to consider at fractional cost, but the path interesting from
this perspective may be eliminated by other paths.

Consider three paths (this comes from the reproducer shared in [1]):

  A: nestloop_path   startup 0.585000    total 35708.292500
  B: nestloop_path   startup 0.292500    total 150004297.292500
  C: mergejoin_path  startup 9748.112737 total 14102.092737

With some reasonable LIMIT value (e.g. 5% of the data), we really want
to pick path A. But that path is dominated both in startup cost (by B)
and total cost (path C). Hence generate_orderedappend_paths() will
ignore it, and we'll generate a more expensive LIMIT plan.

In [2] Tom proposed to modify generate_orderedappend_paths() to also
consider the fractional cheapest_path, just like we do for startup and
total costs. The attached patch does exactly that, and it does seem to
do the trick.

There are some loose ends, though:

1) If get_cheapest_fractional_path_for_pathkeys returns NULL, it's not
clear whether to default to cheapest_startup or cheapest_total. We might
also consider an incremental sort, in which case the startup cost
matters (for Sort it does not). So we may need an enhanced version of
get_cheapest_fractional_path_for_pathkeys that generates such paths.

2) Same for the cheapest_total - maybe there's a partially sorted path,
and using it with incremental sort on top would be better than using
cheapest_total_path + sort.

3) Not sure if get_cheapest_fractional_path_for_pathkeys should worry
about require_parallel_safe, just like the other functions nearby.

I'd argue that this patch does not need to add the Incremental Sort
capabilities in (1) and (2) - it's just another place where we decided
not to consider this sort variant yet.

I'm not sure how much this depends on partition-wise join, and why
disabling it generates the right plan. The reproducer uses that, but
AFAICS generate_orderedappend_paths() works like this since 2010 (commit
11cad29c915). I'd bet the issue exists since then and it's possible to
get similar cases even without partition-wise join.

I can reproduce it on PostgreSQL 12, though (which however supports
partition-wise join).

Not sure whether this should be backpatched. We didn't get any reports
until now, so it doesn't seem like a pressing issue. OTOH most users
won't actually notice this, they'll just get worse plans without
realizing there's a better option.


regards

[1]
https://www.postgresql.org/message-id/011937a3-7427-b99f-13f1-c07a127cf94c%40enterprisedb.com

[2] https://www.postgresql.org/message-id/4006636.1618577893%40sss.pgh.pa.us

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

Previous
From: "Andrey V. Lepikhov"
Date:
Subject: Re: Asynchronous Append on postgres_fdw nodes.
Next
From: vignesh C
Date:
Subject: Re: [HACKERS] logical decoding of two-phase transactions