Re: BUG #19408: Bad plan for UNION ALL subquery with outer WHERE, ORDER BY, LIMIT, and separate indexes - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #19408: Bad plan for UNION ALL subquery with outer WHERE, ORDER BY, LIMIT, and separate indexes
Date
Msg-id 592230.1771025767@sss.pgh.pa.us
Whole thread Raw
In response to BUG #19408: Bad plan for UNION ALL subquery with outer WHERE, ORDER BY, LIMIT, and separate indexes  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
PG Bug reporting form <noreply@postgresql.org> writes:
> The query planner generates a very suboptimal plan under the following
> circumstances:
> - Each table has an index supporting the WHERE clause
> - Each table has a different index supporting the ORDER BY clause
> - WHERE clause has a very low selectivity for some but not all of the tables
> involved

I poked into this a little bit, because as you say it seems like we
ought to be able to do better.  I found that the blame seems to attach
to generate_orderedappend_paths(), which is being very blinkered about
which combinations of subpaths it will consider for a MergeAppend.
Basically, for each relevant sort ordering (here, only ORDER BY y),
it wants to consider:

1. A MergeAppend in which every subpath returns a presorted result,
and each is the best on startup cost.

2. A MergeAppend in which every subpath returns a presorted result,
and each is the best on total cost.

3. A MergeAppend in which every subpath returns a presorted result,
and each is the best on fractional cost (i.e., estimated cost after
stopping early thanks to LIMIT).

It will consider sorting an unsorted subpath too, but only if there
is no presorted path, no matter how bad that path is.

In this example we need a combination of sorting one non-ordered
input and using one presorted input, and it won't consider that
because it sees a (bad) presorted path for the first input.

I experimented with the attached quick hack.  It arrives at the right
solution for your example, but it's still feeling rather blinkered.
It's not clear to me that the above rule always finds the best
combinations of subpaths.  I wonder if we should tear this code up and
devise some other organizing principle for selecting the subpaths to
consider.  (I can't say that I love the way in which partitioned-input
considerations have been shoved into it, either; maybe a rewrite could
factor that better.)

Another reason why this is only WIP quality is that it causes a bunch
of plan changes in the regression tests.  I haven't looked to see if
they're all reasonable.  Even if they're nominally better plans, some
of them may be defeating the point of the particular test by causing
it to test a different scenario.

            regards, tom lane

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 90275e25872..98b4be70500 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1944,6 +1944,7 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
             Path       *cheapest_startup,
                        *cheapest_total,
                        *cheapest_fractional = NULL;
+            Path        sorted_cheapest = {0};

             /* Locate the right paths, if they are available. */
             cheapest_startup =
@@ -1971,6 +1972,35 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
                 Assert(cheapest_total->param_info == NULL);
             }

+            /*
+             * Even if we found suitable ordered paths, they might be more
+             * expensive than sorting the cheapest-total path. ( We don't
+             * build a full representation of that sorted path here, just make
+             * sorted_cheapest valid enough to perform cost comparisons.)
+             *
+             * Break ties in favor of sorting the cheapest-total path, because
+             * that's probably a less risky solution than relying on an
+             * ordered path.
+             *
+             * XXX is it worth considering incremental sort here?  It seems in
+             * most cases that would not move the needle, so for now don't add
+             * the complexity.
+             */
+            cost_sort(&sorted_cheapest, root, NIL,
+                      childrel->cheapest_total_path->disabled_nodes,
+                      childrel->cheapest_total_path->total_cost,
+                      childrel->cheapest_total_path->rows,
+                      childrel->cheapest_total_path->pathtarget->width,
+                      0.0, work_mem, -1.0);
+
+            if (compare_path_costs(&sorted_cheapest, cheapest_startup,
+                                   STARTUP_COST) <= 0)
+                cheapest_startup = childrel->cheapest_total_path;
+
+            if (compare_path_costs(&sorted_cheapest, cheapest_total,
+                                   TOTAL_COST) <= 0)
+                cheapest_total = childrel->cheapest_total_path;
+
             /*
              * When building a fractional path, determine a cheapest
              * fractional path for each child relation too. Looking at startup
@@ -2012,14 +2042,22 @@ generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
                  * XXX We might consider partially sorted paths too (with an
                  * incremental sort on top). But we'd have to build all the
                  * incremental paths, do the costing etc.
-                 *
-                 * Also, notice whether we actually have different paths for
-                 * the "fractional" and "total" cases.  This helps avoid
-                 * generating two identical ordered append paths.
                  */
                 if (cheapest_fractional == NULL)
                     cheapest_fractional = cheapest_total;
-                else if (cheapest_fractional != cheapest_total)
+
+                /* Again, sorting the cheapest-total path could beat this */
+                if (compare_fractional_path_costs(&sorted_cheapest,
+                                                  cheapest_fractional,
+                                                  path_fraction) <= 0)
+                    cheapest_fractional = childrel->cheapest_total_path;
+
+                /*
+                 * Notice whether we actually have different paths for the
+                 * "fractional" and "total" cases.  This helps avoid
+                 * generating two identical ordered append paths.
+                 */
+                if (cheapest_fractional != cheapest_total)
                     fraction_neq_total = true;
             }


pgsql-bugs by date:

Previous
From: Noah Misch
Date:
Subject: Re: BUG #19406: substring(text) fails on valid UTF-8 toasted value in PostgreSQL 15.16
Next
From: Rafia Sabih
Date:
Subject: Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY