Home > mailing lists

Re: Even when the data is already ordered, MergeAppend still adds a Sort node - Mailing list pgsql-hackers

From	feichanghong
Subject	Re: Even when the data is already ordered, MergeAppend still adds a Sort node
Date	July 20, 2025 11:49:16
Msg-id	tencent_6815C70B82A2D2D33B56867E67C0478D8906@qq.com Whole thread Raw
In response to	Re: Even when the data is already ordered, MergeAppend still adds a Sort node (Zhang Mingli <zmlpostgres@gmail.com>)
Responses	Re: Even when the data is already ordered, MergeAppend still adds a Sort node
List	pgsql-hackers

Tree view

On Jul 19, 2025, at 13:15, Zhang Mingli <zmlpostgres@gmail.com> wrote:

Hi,

On Jul 18, 2025 at 22:52 +0800, feichanghong <feichanghong@qq.com>, wrote:

explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------
Limit  (cost=366.26..366.27 rows=1 width=8)
->  Sort  (cost=366.26..391.26 rows=9999 width=8)
Sort Key: t.a, t.b
->  Index Only Scan using t_a_b_idx on t  (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(5 rows)
```

Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the
upper layers of the subquery, rather than just keeping the portion trimmed by
`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.

The subquery has a qualifier a = 1 that forms an Equivalence Class (EC) whose ec_member contains a constant.
As a result, subroot->sort_pathkeys doesn't need to include column a.
However, in the outer query, there are no such qualifiers to form a similar EC, and the ORDER BY a, b clause means root->sort_pathkeys requires both columns a and b.
When convert_subquery_pathkeys is called, the subpath lacks the pathkeys for column a.

Furthermore, is there a more efficient way to write this, to avoid the
`Sort` node mentioned above?

Yes, your understanding is basically consistent with mine.

A simple solution is to add an EC using a qual:

EXPLAIN SELECT * FROM (SELECT * FROM t WHERE a = 1 AND b > 1 ORDER BY a, b) WHERE a = 1 ORDER BY a, b LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.29..0.32 rows=1 width=8)
-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)
Index Cond: ((a = 1) AND (b > 1))
(3 rows)

Thank you for your suggestion, this method can address simple subquery

scenarios. However, my situation involves a union all, so it's not possible to

add the corresponding equality qualifier at the top level. The SQL is as

follows:

```sql

explain select a, b from (

(select a, b from t t1 where a > 19000 order by a, b)

union all

(select a, b from t t2 where a = 1 and b > 1 order by a, b)

) t order by a, b limit 1;

```

Currently, I have not found a better way to rewrite this, except by optimizing

this scenario from the pg kernel side.

Best Regards,
Fei Changhong

pgsql-hackers by date:

From: Nikhil Kumar Veldanda
Date: 20 July 2025, 11:28:08
Subject: Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)

From: Konstantin Knizhnik
Date: 20 July 2025, 14:48:11
Subject: DSA overflow in hash join

Re: Even when the data is already ordered, MergeAppend still adds a Sort node - Mailing list pgsql-hackers

Previous

Next