Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION" - Mailing list pgsql-hackers

From David Rowley
Subject Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"
Date
Msg-id CAApHDvpSnhmyioG9TLzM5yBH0qre963sZ5Rd+fnBaZ__B8jbrA@mail.gmail.com
Whole thread Raw
In response to Re: Path to unreverting "Allow planner to use Merge Append to efficiently implement UNION"  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
(Thanks for your review. I'm sorry I didn't have time and energy to
respond properly until now)

On Tue, 21 May 2024 at 23:48, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> BTW, could the same machinery be used for INTERSECT as well? There was a
> brief mention of that in the original thread, but I didn't understand
> the details. Not for v17, but I'm curious. I was wondering if
> build_setop_child_paths() should be named build_union_child_paths(),
> since it's only used with UNIONs, but I guess it could be used as is for
> INTERSECT too.

I'd previously thought about that, but when I thought about it I'd
considered getting rid of the SetOp Intersect and replacing with a
join.  To do that my conclusion was that we'd first need to improve
joins using IS NOT DISTINCT FROM, as that's the behaviour we need for
correct setop NULL handling.  However, on relooking, I see that we
could still use SetOp Intersect with the flags injected into the
targetlist and get sorted results to it via Merge Append rather than
Append.  That might require better Const handling than what's in the
patch today due to the 1/0 flag that gets added to the subquery tlist.
I was unsure how much trouble to go to for INTERSECT. I spent about 7
years in a job writing queries and don't recall ever feeling the need
to use INTERSECT.  I did use EXCEPT, however... like at least once.
I'll probably circle back to it one day. People maybe don't use it
because it's so terribly optimised.

> # Testing
>
> postgres=# begin; create table foo as select i from generate_series(1,
> 1000000) i; create index on foo (i); commit;
> BEGIN
> SELECT 1000000
> CREATE INDEX
> COMMIT
> postgres=# set enable_seqscan=off;
> SET
> postgres=# explain (select 1 as i union select i from foo) order by i;
>                                                QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------
>   Unique  (cost=144370.89..149370.89 rows=1000001 width=4)
>     ->  Sort  (cost=144370.89..146870.89 rows=1000001 width=4)
>           Sort Key: (1)
>           ->  Append  (cost=0.00..31038.44 rows=1000001 width=4)
>                 ->  Result  (cost=0.00..0.01 rows=1 width=4)
>                 ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=4)
> (6 rows)
>
> I'm disappointed it couldn't produce a MergeAppend plan. If you replace
> the "union" with "union all" you do get a MergeAppend.
>
> Some more cases where I hoped for a MergeAppend:

I've not looked again in detail, but there was some discussion along
these lines in [1].  I think the problem is down to how we remove
redundant PathKeys when the EquivalenceClass has a Const. There can
only be 1 value, so no need for a PathKey to represent that. The
problem with that comes with lack of equivalence visibility through
subqueries.  The following demonstrates:

create table ab(a int, b int, primary key(a,b));
set enable_seqscan=0;
set enable_bitmapscan=0;

explain (costs off) select * from (select * from ab where a=1 order by
b) order by a,b;
                QUERY PLAN
-------------------------------------------
 Sort
   Sort Key: ab.a, ab.b
   ->  Index Only Scan using ab_pkey on ab
         Index Cond: (a = 1)
(4 rows)

explain (costs off) select * from (select * from ab where a=1 order by
b) order by b;
             QUERY PLAN
-------------------------------------
 Index Only Scan using ab_pkey on ab
   Index Cond: (a = 1)
(2 rows)

Because the subquery only publishes that it's ordered by "b", the
outer query thinks it needs to sort on "a,b".  That's a wasted effort
since the subquery has an equivalence class for "a" with a constant.
The outer query doesn't know that.

> postgres=# explain (select i, 'foo' from foo union select i, 'foo' from
> foo) order by 1;
>                                                   QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------
>   Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
>     ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
>           Sort Key: foo.i, ('foo'::text)
>           ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)
>                 ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=36)
>                 ->  Index Only Scan using foo_i_idx on foo foo_1
> (cost=0.42..26038.42 rows=1000000 width=36)
> (6 rows)
>
>
> postgres=# explain (select 'foo', i from foo union select 'bar', i from
> foo) order by 1;
>                                                   QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------
>   Unique  (cost=380767.54..395767.54 rows=2000000 width=36)
>     ->  Sort  (cost=380767.54..385767.54 rows=2000000 width=36)
>           Sort Key: ('foo'::text), foo.i
>           ->  Append  (cost=0.42..62076.85 rows=2000000 width=36)
>                 ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=36)
>                 ->  Index Only Scan using foo_i_idx on foo foo_1
> (cost=0.42..26038.42 rows=1000000 width=36)
> (6 rows)

This isn't great. I think it's for the same reason as mentioned above.
I didn't test, but I think the patch in [1] should fix it.  I need to
spend more time on it before proposing it for v18. It adds some
possibly expensive lookups and requires recursively searching
PathKeys. It's quite complex and needs more study.

> The following two queries are the same from the user's point of view,
> but one is written using WITH:
>
> postgres=# explain (select i from foo union (select 1::int order by 1)
> union select i from foo) order by 1;
>                                                   QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------
>   Unique  (cost=326083.66..336083.67 rows=2000001 width=4)
>     ->  Sort  (cost=326083.66..331083.67 rows=2000001 width=4)
>           Sort Key: foo.i
>           ->  Append  (cost=0.42..62076.87 rows=2000001 width=4)
>                 ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=4)
>                 ->  Result  (cost=0.00..0.01 rows=1 width=4)
>                 ->  Index Only Scan using foo_i_idx on foo foo_1
> (cost=0.42..26038.42 rows=1000000 width=4)
> (7 rows)
>
> postgres=# explain with x (i) as (select 1::int order by 1)  (select i
> from foo union select i from x union select i from foo) order by 1;
>                                                QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------
>   Unique  (cost=0.89..82926.54 rows=2000001 width=4)
>     ->  Merge Append  (cost=0.89..77926.54 rows=2000001 width=4)
>           Sort Key: foo.i
>           ->  Index Only Scan using foo_i_idx on foo
> (cost=0.42..26038.42 rows=1000000 width=4)
>           ->  Sort  (cost=0.02..0.03 rows=1 width=4)
>                 Sort Key: (1)
>                 ->  Result  (cost=0.00..0.01 rows=1 width=4)
>           ->  Index Only Scan using foo_i_idx on foo foo_1
> (cost=0.42..26038.42 rows=1000000 width=4)
> (8 rows)
>
> I would've expected a MergeAppend in both cases.

That's surprising. I don't have an answer without debugging and I
can't quite motivate myself to do that right now for this patch.

> None of these test cases are broken as such, you just don't get the
> benefit of the optimization. I suspect they might all have the same root
> cause, as they all involve constants in the target list. I think that's
> a pretty common use case of UNION though.

It's true that there are quite a few things left on the table here. I
think the refactoring work that has been done moves some of the
barriers away for future improvements. There just wasn't enough time
to get those done for v17. I hope to get some time and energy for it
in v18.  I'm just thankful that you found no bugs. If you do happen to
find any, I can tell you a good time not to report them! :)

David

[1] https://www.postgresql.org/message-id/CAApHDvqo1rV8O4pMU2-22iTASBXgnm4kbHF6A8_VMqiDR3hG8A@mail.gmail.com



pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: Shared detoast Datum proposal
Next
From: Ajin Cherian
Date:
Subject: Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding