On Sat, 4 Jan 2025 at 06:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Actually ... why is that Sort there at all? The whole plan looks like
>
> regression=# explain (costs off)
> SELECT c2 AS ca2, c2 AS ca3 FROM t0
> UNION
> SELECT DISTINCT ca8 AS ca5, ca7 AS ca6
> FROM (SELECT c1 AS ca7, c2 AS ca8 FROM t0) AS ta1
> JOIN
> (SELECT c1 AS ca10, c1 AS ca11 FROM t0) AS ta2
> ON TRUE;
> QUERY PLAN
> ---------------------------------------------------------------------
> Unique
> -> Sort
> Sort Key: t0.c2, t0.c2
> -> Append
> -> Seq Scan on t0
> -> Subquery Scan on "*SELECT* 2"
> -> Sort
> Sort Key: t0_1.c2, t0_1.c1 USING <
> -> HashAggregate
> Group Key: t0_1.c2, t0_1.c1
> -> Nested Loop
> -> Seq Scan on t0 t0_1
> -> Materialize
> -> Seq Scan on t0 t0_2
> (14 rows)
>
> There is no value in forcing a sort of the subquery's output,
> and the previous code didn't do so:
The reason for the 2nd sort, (ignoring the invalidity of the pathkeys)
is due to how build_setop_child_paths() tries sorting the
cheapest_input_path. The problem is that the sorted path gets added
to the same RelOptInfo as the cheapest_input_path so if add_path()
sees they're fuzzily the same cost, the sorted has better pathkeys, so
the actual cheapest path is thrown away. I think the reason this
happens in this case is that there are just not that many rows to
sort, and it's fairly expensive to get those rows.
The reason that the planner doesn't end up doing a Merge Append is
because the sorted path for the first branch of the UNION has the same
column twice in the targetlist and only needs a single PathKey to
sort. Later, that causes the get_cheapest_path_for_pathkeys() in
generate_union_paths() not to find a matching sorted path since
union_pathkeys has 2 PathKeys. I think that might be fixed by the
patch I had being toying around with which I posted in [1]. If we had
something like that then that would make the apparent wasted sort less
of an issue here.
David
[1] https://www.postgresql.org/message-id/CAApHDvqo1rV8O4pMU2-22iTASBXgnm4kbHF6A8_VMqiDR3hG8A%40mail.gmail.com