Home > mailing lists

Even when the data is already ordered, MergeAppend still adds a Sort node - Mailing list pgsql-hackers

From	feichanghong
Subject	Even when the data is already ordered, MergeAppend still adds a Sort node
Date	July 18 17:37:56
Msg-id	tencent_5912614CE9ED3D3BE3BFB3FC03CAA49A4305@qq.com Whole thread Raw
Responses	Re: Even when the data is already ordered, MergeAppend still adds a Sort node
List	pgsql-hackers

Tree view

Hi,

Recently, I found while using `union all order by limit` that `MergeAppend`

adds a `Sort` node even when child node data is already ordered, leading to

low execution efficiency.

The issue can be reproduced with the following case (on the latest master

branch). In the execution plan below, although `t2` is ordered by `a, b` via

`t_a_b_idx`, a `Sort` node is still added, causing the outer `limit` to be

ineffective. Most of the time is spent on the `Sort` node.

```sql

create table t(a int, b int);

insert into t select 1, i from generate_series(1, 10000)i;

insert into t select i, i from generate_series(10000, 20000)i;

create index on t(a, b);

analyze t;

set enable_bitmapscan to off;

set enable_seqscan to off;

explain select a, b from (

(select a, b from t t1 where a > 19000 order by a, b)

union all

(select a, b from t t2 where a = 1 and b > 1 order by a, b)

) t order by a, b limit 1;

QUERY PLAN

--------------------------------------------------------------------------------------------------

Limit (cost=366.56..366.57 rows=1 width=8)

-> Merge Append (cost=366.56..531.05 rows=10999 width=8)

Sort Key: t1.a, t1.b

-> Index Only Scan using t_a_b_idx on t t1 (cost=0.29..29.79 rows=1000 width=8)

Index Cond: (a > 19000)

-> Sort (cost=366.26..391.26 rows=9999 width=8)

Sort Key: t2.a, t2.b

-> Index Only Scan using t_a_b_idx on t t2 (cost=0.29..316.27 rows=9999 width=8)

Index Cond: ((a = 1) AND (b > 1))

(9 rows)

```

I briefly analyzed the cause of this issue: in the `standard_qp_callback`

function, while setting `sort_pathkeys`, the `a` column is not added to

`sort_pathkeys` because its `EquivalenceClass` contains `Const`. Hence, `a`

is absent in `PlannerInfo->query_pathkeys`. The purpose of this might be to

reduce the number of Sort columns.

Subsequently, when `build_index_paths` generates `IndexPath`, the

`truncate_useless_pathkeys` function removes the `a` column from

`Path->pathkeys`.

Thus, this issue can be minimally reproduced with the following SQL:

```sql

explain select * from (select * from t where a > 19000 order by a, b) order by a, b limit 1;

QUERY PLAN

----------------------------------------------------------------------------------

Limit (cost=0.29..0.32 rows=1 width=8)

-> Index Only Scan using t_a_b_idx on t (cost=0.29..29.79 rows=1000 width=8)

Index Cond: (a > 19000)

(3 rows)

explain select * from (select * from t where a = 1 and b > 1 order by a, b) order by a, b limit 1;

QUERY PLAN

-----------------------------------------------------------------------------------------

Limit (cost=366.26..366.27 rows=1 width=8)

-> Sort (cost=366.26..391.26 rows=9999 width=8)

Sort Key: t.a, t.b

-> Index Only Scan using t_a_b_idx on t (cost=0.29..316.27 rows=9999 width=8)

Index Cond: ((a = 1) AND (b > 1))

(5 rows)

```

Should we retain the complete `pathkeys` in `Path->pathkeys` for use by the

upper layers of the subquery, rather than just keeping the portion trimmed by

`PlannerInfo->query_pathkeys`? I'm not sure if my understanding is correct.

Furthermore, is there a more efficient way to write this, to avoid the

`Sort` node mentioned above?

Best Regards,
Fei Changhong

pgsql-hackers by date:

From: Tender Wang
Date: 18 July, 17:33:50
Subject: Re: postgres_fdw could deparse ArrayCoerceExpr

From: jian he
Date: 18 July, 17:44:26
Subject: Re: pg_dump does not dump domain not-null constraint's comments

Even when the data is already ordered, MergeAppend still adds a Sort node - Mailing list pgsql-hackers

Previous

Next