Re: Properly pathify the union planner - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Properly pathify the union planner
Date
Msg-id CAMbWs49H5oKvsUM_SrPbrAPkNqabuxHacjWC+Qd77dmVyMyXEQ@mail.gmail.com
Whole thread Raw
In response to Re: Properly pathify the union planner  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Properly pathify the union planner
List pgsql-hackers

On Fri, Mar 8, 2024 at 11:31 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 8 Mar 2024 at 00:39, Richard Guo <guofenglinux@gmail.com> wrote:
> I would like to have another look, but it might take several days.
> Would that be too late?

Please look. Several days is fine. I'd like to start looking on Monday
or Tuesday next week.

I've had another look, and here are some comments that came to my mind.

* There are cases where the setop_pathkeys of a subquery does not match
the union_pathkeys generated in generate_union_paths() for sorting by
the whole target list.  In such cases, even if we have explicitly sorted
the paths of the subquery to meet the ordering required for its
setop_pathkeys, we cannot find appropriate ordered paths for
MergeAppend.  For instance,

explain (costs off)
(select a, b from t t1 where a = b) UNION (select a, b from t t2 where a = b);
                        QUERY PLAN
-----------------------------------------------------------
 Unique
   ->  Sort
         Sort Key: t1.a, t1.b
         ->  Append
               ->  Index Only Scan using t_a_b_idx on t t1
                     Filter: (a = b)
               ->  Index Only Scan using t_a_b_idx on t t2
                     Filter: (a = b)
(8 rows)

(Assume t_a_b_idx is a btree index on t(a, b))

In this query, the setop_pathkeys of the subqueries includes only one
PathKey because 'a' and 'b' are in the same EC inside the subqueries,
while the union_pathkeys of the whole query includes two PathKeys, one
for each target entry.  After we convert the setop_pathkeys to outer
representation, we'd notice that it does not match union_pathkeys.
Consequently, we are unable to recognize that the index scan paths are
already appropriately sorted, leading us to miss the opportunity to
utilize MergeAppend.

Not sure if this case is common enough to be worth paying attention to.

* In build_setop_child_paths() we also create properly sorted partial
paths, which seems not necessary because we do not support parallel
merge append, right?

* Another is minor and relates to cosmetic matters.  When we unique-ify
the result of a UNION, we take the number of distinct groups as equal to
the total input size.  For the Append path and Gather path, we use
'dNumGroups', which is 'rows' of the Append path.  For the MergeAppend
we use 'rows' of the MergeAppend path.  I believe they are supposed to
be the same, but I think it'd be better to keep them consistent: either
use 'dNumGroups' for all the three kinds of paths, or use 'path->rows'
for each path.

Thanks
Richard

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: WIP Incremental JSON Parser
Next
From: Mats Kindahl
Date:
Subject: Re: Hooking into ExplainOneQuery() complicated by missing standard_ExplainOneQuery