Thread: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"

Earlier today in [1], a bug was reported regarding a problem with the
code added in 66c0185a3 where I'd failed to handle the case correctly
where the UNION's targetlist has columns which are not sortable.  For
pg_class, that's relfrozenxid, relminmxid and relacl.

The most minimal reproducer prior to the revert is:

set enable_hashagg=0;
explain (costs off) select '123'::xid union select '123'::xid;

There is still some ongoing discussion about this on the release
mailing list as per mentioned by Tom in the commit message in
7204f3591.

At some point that discussion is going to need to circle back onto
-hackers again, and since I've already written a patch to fix the
issue and un-revert Tom's revert. I just wanted a place on -hackers to
allow that code to be viewed and discussed.  I did also post a patch
on [2], but that no longer applies to master due to the revert.

I'll allow the RMT to choose where the outcome of the RMT decision
goes.  Let this thread be for at least the coding portion of this or
be my thread for this patch for the v18 cycle if the RMT rules in
favour of keeping that code reverted for v17.

I've attached 2 patches.

0001 is a simple revert of Tom's revert (7204f3591).
0002 fixes the issue reported by Hubert.

If anyone wants to have a look, I'd be grateful for that.  Tom did
call for further review after this being the 4th issue reported for
66c0185a3.

David

[1] https://postgr.es/message-id/Zktzf926vslR35Fv%40depesz.com
[2] https://www.postgresql.org/message-id/CAApHDvpDQh1NcL7nAsd3YAKj4vgORwesB3GYuNPnEXXRfA2g4w%40mail.gmail.com

Attachment
On 21/05/2024 05:58, David Rowley wrote:
> Let this thread be for at least the coding portion of this or be my
> thread for this patch for the v18 cycle if the RMT rules in favour of
> keeping that code reverted for v17.
> 
> I've attached 2 patches.
> 
> 0001 is a simple revert of Tom's revert (7204f3591).
> 0002 fixes the issue reported by Hubert.
> 
> If anyone wants to have a look, I'd be grateful for that.  Tom did
> call for further review after this being the 4th issue reported for
> 66c0185a3.

My planner experience is a bit rusty, but I took a quick look. Looks 
generally OK to me. Some comments below:

> +    /* For for UNIONs (not UNION ALL), try sorting, if sorting is possible */

Duplicated word: "For for"

> /*
>  * build_setop_child_paths
>  *        Build paths for the set op child relation denoted by 'rel'.
>  *
>  * interesting_pathkeys: if not NIL, also include paths that suit these
>  * pathkeys, sorting any unsorted paths as required.
>  * *pNumGroups: if not NULL, we estimate the number of distinct groups
>  *        in the result, and store it there

The indentation on 'interesting_pathkeys' and '*pNumGroups' is inconsistent.

I have a vague feeling that this comment deserves to be longer. The 
function does a lot of things. How is 'child_tlist' different from 
rel->reltarget for example?

'interesting_pathkeys' is modified by the call to 
add_setop_child_rel_equivalences(): it adds members to the 
EquivalenceClasses of the pathkeys. Is that worth mentioning here, or is 
that obvious to someone who know more about the planner?

>         /*
>          * Create paths to suit final sort order required for setop_pathkeys.
>          * Here we'll sort the cheapest input path (if not sorted already) and
>          * incremental sort any paths which are partially sorted.
>          */
>         is_sorted = pathkeys_count_contained_in(setop_pathkeys,
>                                                 subpath->pathkeys,
>                                                 &presorted_keys);
> 
>         if (!is_sorted)
>         {

Maybe also mention that if it's already sorted, it's used as is.

BTW, could the same machinery be used for INTERSECT as well? There was a 
brief mention of that in the original thread, but I didn't understand 
the details. Not for v17, but I'm curious. I was wondering if 
build_setop_child_paths() should be named build_union_child_paths(), 
since it's only used with UNIONs, but I guess it could be used as is for 
INTERSECT too.


# Testing

postgres=# begin; create table foo as select i from generate_series(1, 
1000000) i; create index on foo (i); commit;
BEGIN
SELECT 1000000
CREATE INDEX
COMMIT
postgres=# set enable_seqscan=off;
SET
postgres=# explain (select 1 as i union select i from foo) order by i;
                                               QUERY PLAN 

------------------------------------------------------------------------------------------------------
  Unique  (cost=144370.89..149370.89 rows=1000001 width=4)
    ->  Sort  (cost=144370.89..146870.89 rows=1000001 width=4)
          Sort Key: (1)
          ->  Append  (cost=0.00..31038.44 rows=1000001 width=4)
                ->  Result  (cost=0.00..0.01 rows=1 width=4)
                ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=1000000 width=4)
(6 rows)

I'm disappointed it couldn't produce a MergeAppend plan. If you replace 
the "union" with "union all" you do get a MergeAppend.

Some more cases where I hoped for a MergeAppend:

postgres=# explain (select i, 'foo' from foo union select i, 'foo' from 
foo) order by 1;
                                                  QUERY PLAN 

-------------------------------------------------------------------------------------------------------------
  Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
    ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
          Sort Key: foo.i, ('foo'::text)
          ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)
                ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=1000000 width=36)
                ->  Index Only Scan using foo_i_idx on foo foo_1 
(cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)


postgres=# explain (select 'foo', i from foo union select 'bar', i from 
foo) order by 1;
                                                  QUERY PLAN 

-------------------------------------------------------------------------------------------------------------
  Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
    ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
          Sort Key: ('foo'::text), foo.i
          ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)
                ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=1000000 width=36)
                ->  Index Only Scan using foo_i_idx on foo foo_1 
(cost=0.42..26038.42 rows=1000000 width=36)
(6 rows)


The following two queries are the same from the user's point of view, 
but one is written using WITH:

postgres=# explain (select i from foo union (select 1::int order by 1) 
union select i from foo) order by 1;
                                                  QUERY PLAN 

------------------------------------------------------------------------------------------------------------
  Unique  (cost=326083.66..336083.67 rows=2000001 width=4)
    ->  Sort  (cost=326083.66..331083.67 rows=2000001 width=4)
          Sort Key: foo.i
          ->  Append  (cost=0.42..62076.87 rows=2000001 width=4)
                ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=1000000 width=4)
                ->  Result  (cost=0.00..0.01 rows=1 width=4)
                ->  Index Only Scan using foo_i_idx on foo foo_1 
(cost=0.42..26038.42 rows=1000000 width=4)
(7 rows)

postgres=# explain with x (i) as (select 1::int order by 1)  (select i 
from foo union select i from x union select i from foo) order by 1;
                                               QUERY PLAN 

------------------------------------------------------------------------------------------------------
  Unique  (cost=0.89..82926.54 rows=2000001 width=4)
    ->  Merge Append  (cost=0.89..77926.54 rows=2000001 width=4)
          Sort Key: foo.i
          ->  Index Only Scan using foo_i_idx on foo 
(cost=0.42..26038.42 rows=1000000 width=4)
          ->  Sort  (cost=0.02..0.03 rows=1 width=4)
                Sort Key: (1)
                ->  Result  (cost=0.00..0.01 rows=1 width=4)
          ->  Index Only Scan using foo_i_idx on foo foo_1 
(cost=0.42..26038.42 rows=1000000 width=4)
(8 rows)

I would've expected a MergeAppend in both cases.


None of these test cases are broken as such, you just don't get the 
benefit of the optimization. I suspect they might all have the same root 
cause, as they all involve constants in the target list. I think that's 
a pretty common use case of UNION though.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




On 2024-May-21, David Rowley wrote:

> I've attached 2 patches.
> 
> 0001 is a simple revert of Tom's revert (7204f3591).
> 0002 fixes the issue reported by Hubert.

I would like to request that you don't keep 0001's message as you have
it here.  It'd be more readable to take 66c0185a3d14's whole commit
message with a small suffix like "try 2" in the commit title, and add an
additional second paragraph stating it was transiently reverted by
7204f35919b7.  Otherwise it's harder to make sense of the commit on its
own later.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



On Wed, 22 May 2024 at 00:35, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-May-21, David Rowley wrote:
>
> > I've attached 2 patches.
> >
> > 0001 is a simple revert of Tom's revert (7204f3591).
> > 0002 fixes the issue reported by Hubert.
>
> I would like to request that you don't keep 0001's message as you have
> it here.  It'd be more readable to take 66c0185a3d14's whole commit
> message with a small suffix like "try 2" in the commit title, and add an
> additional second paragraph stating it was transiently reverted by
> 7204f35919b7.  Otherwise it's harder to make sense of the commit on its
> own later.

Thanks for having a look.  I was planning to have the commit message
as per attached. I'd only split the patch for ease of review per
request of Tom. I should have mentioned that here.

I would adjust the exact wording in the final paragraph as required
depending on what plan materialises.

This also fixes up the comment stuff that Heikki mentioned.

David

Attachment
On Tue, May 21, 2024 at 8:44 AM David Rowley <dgrowleyml@gmail.com> wrote:
> Thanks for having a look.  I was planning to have the commit message
> as per attached. I'd only split the patch for ease of review per
> request of Tom. I should have mentioned that here.
>
> I would adjust the exact wording in the final paragraph as required
> depending on what plan materialises.
>
> This also fixes up the comment stuff that Heikki mentioned.

The consensus on pgsql-release was to unrevert this patch and commit
the fix now, rather than waiting for the next beta. However, the
consensus was also to push the un-revert as a separate commit from the
bug fix, rather than together as suggested by Álvaro. Since time is
short due to the impending release and it's very late where you are,
I've taken care of this. Hope that's OK.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com



On Wed, 22 May 2024 at 05:36, Robert Haas <robertmhaas@gmail.com> wrote:
> The consensus on pgsql-release was to unrevert this patch and commit
> the fix now, rather than waiting for the next beta. However, the
> consensus was also to push the un-revert as a separate commit from the
> bug fix, rather than together as suggested by Álvaro. Since time is
> short due to the impending release and it's very late where you are,
> I've taken care of this. Hope that's OK.

Thanks for handling that. It's much appreciated.

David



(Thanks for your review. I'm sorry I didn't have time and energy to
respond properly until now)

On Tue, 21 May 2024 at 23:48, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> BTW, could the same machinery be used for INTERSECT as well? There was a
> brief mention of that in the original thread, but I didn't understand
> the details. Not for v17, but I'm curious. I was wondering if
> build_setop_child_paths() should be named build_union_child_paths(),
> since it's only used with UNIONs, but I guess it could be used as is for
> INTERSECT too.

I'd previously thought about that, but when I thought about it I'd
considered getting rid of the SetOp Intersect and replacing with a
join.  To do that my conclusion was that we'd first need to improve
joins using IS NOT DISTINCT FROM, as that's the behaviour we need for
correct setop NULL handling.  However, on relooking, I see that we
could still use SetOp Intersect with the flags injected into the
targetlist and get sorted results to it via Merge Append rather than
Append.  That might require better Const handling than what's in the
patch today due to the 1/0 flag that gets added to the subquery tlist.
I was unsure how much trouble to go to for INTERSECT. I spent about 7
years in a job writing queries and don't recall ever feeling the need
to use INTERSECT.  I did use EXCEPT, however... like at least once.
I'll probably circle back to it one day. People maybe don't use it
because it's so terribly optimised.

> # Testing
>
> postgres=# begin; create table foo as select i from generate_series(1,
> 1000000) i; create index on foo (i); commit;
> BEGIN
> SELECT 1000000
> CREATE INDEX
> COMMIT
> postgres=# set enable_seqscan=off;
> SET
> postgres=# explain (select 1 as i union select i from foo) order by i;
>                                                QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------
>   Unique  (cost=144370.89..149370.89 rows=1000001 width=4)
>     ->  Sort  (cost=144370.89..146870.89 rows=1000001 width=4)
>           Sort Key: (1)
>           ->  Append  (cost=0.00..31038.44 rows=1000001 width=4)
>                 ->  Result  (cost=0.00..0.01 rows=1 width=4)
>                 ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=4)
> (6 rows)
>
> I'm disappointed it couldn't produce a MergeAppend plan. If you replace
> the "union" with "union all" you do get a MergeAppend.
>
> Some more cases where I hoped for a MergeAppend:

I've not looked again in detail, but there was some discussion along
these lines in [1].  I think the problem is down to how we remove
redundant PathKeys when the EquivalenceClass has a Const. There can
only be 1 value, so no need for a PathKey to represent that. The
problem with that comes with lack of equivalence visibility through
subqueries.  The following demonstrates:

create table ab(a int, b int, primary key(a,b));
set enable_seqscan=0;
set enable_bitmapscan=0;

explain (costs off) select * from (select * from ab where a=1 order by
b) order by a,b;
                QUERY PLAN
-------------------------------------------
 Sort
   Sort Key: ab.a, ab.b
   ->  Index Only Scan using ab_pkey on ab
         Index Cond: (a = 1)
(4 rows)

explain (costs off) select * from (select * from ab where a=1 order by
b) order by b;
             QUERY PLAN
-------------------------------------
 Index Only Scan using ab_pkey on ab
   Index Cond: (a = 1)
(2 rows)

Because the subquery only publishes that it's ordered by "b", the
outer query thinks it needs to sort on "a,b".  That's a wasted effort
since the subquery has an equivalence class for "a" with a constant.
The outer query doesn't know that.

> postgres=# explain (select i, 'foo' from foo union select i, 'foo' from
> foo) order by 1;
>                                                   QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------
>   Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
>     ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
>           Sort Key: foo.i, ('foo'::text)
>           ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)
>                 ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=36)
>                 ->  Index Only Scan using foo_i_idx on foo foo_1
> (cost=0.42..26038.42 rows=1000000 width=36)
> (6 rows)
>
>
> postgres=# explain (select 'foo', i from foo union select 'bar', i from
> foo) order by 1;
>                                                   QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------
>   Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
>     ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
>           Sort Key: ('foo'::text), foo.i
>           ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)
>                 ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=36)
>                 ->  Index Only Scan using foo_i_idx on foo foo_1
> (cost=0.42..26038.42 rows=1000000 width=36)
> (6 rows)

This isn't great. I think it's for the same reason as mentioned above.
I didn't test, but I think the patch in [1] should fix it.  I need to
spend more time on it before proposing it for v18. It adds some
possibly expensive lookups and requires recursively searching
PathKeys. It's quite complex and needs more study.

> The following two queries are the same from the user's point of view,
> but one is written using WITH:
>
> postgres=# explain (select i from foo union (select 1::int order by 1)
> union select i from foo) order by 1;
>                                                   QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------
>   Unique  (cost=326083.66..336083.67 rows=2000001 width=4)
>     ->  Sort  (cost=326083.66..331083.67 rows=2000001 width=4)
>           Sort Key: foo.i
>           ->  Append  (cost=0.42..62076.87 rows=2000001 width=4)
>                 ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=4)
>                 ->  Result  (cost=0.00..0.01 rows=1 width=4)
>                 ->  Index Only Scan using foo_i_idx on foo foo_1
> (cost=0.42..26038.42 rows=1000000 width=4)
> (7 rows)
>
> postgres=# explain with x (i) as (select 1::int order by 1)  (select i
> from foo union select i from x union select i from foo) order by 1;
>                                                QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------
>   Unique  (cost=0.89..82926.54 rows=2000001 width=4)
>     ->  Merge Append  (cost=0.89..77926.54 rows=2000001 width=4)
>           Sort Key: foo.i
>           ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=4)
>           ->  Sort  (cost=0.02..0.03 rows=1 width=4)
>                 Sort Key: (1)
>                 ->  Result  (cost=0.00..0.01 rows=1 width=4)
>           ->  Index Only Scan using foo_i_idx on foo foo_1
> (cost=0.42..26038.42 rows=1000000 width=4)
> (8 rows)
>
> I would've expected a MergeAppend in both cases.

That's surprising. I don't have an answer without debugging and I
can't quite motivate myself to do that right now for this patch.

> None of these test cases are broken as such, you just don't get the
> benefit of the optimization. I suspect they might all have the same root
> cause, as they all involve constants in the target list. I think that's
> a pretty common use case of UNION though.

It's true that there are quite a few things left on the table here. I
think the refactoring work that has been done moves some of the
barriers away for future improvements. There just wasn't enough time
to get those done for v17. I hope to get some time and energy for it
in v18.  I'm just thankful that you found no bugs. If you do happen to
find any, I can tell you a good time not to report them! :)

David

[1] https://www.postgresql.org/message-id/CAApHDvqo1rV8O4pMU2-22iTASBXgnm4kbHF6A8_VMqiDR3hG8A@mail.gmail.com