Thread: Remove extra Sort node above a btree-compatible Index Scan
Hello hackers,
While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1],
I noticed some unexpected plan diffs when running the xtree and treeb
test modules. Specifically, an additional Sort node appears on top of
an Index Only Scan, even when the index key and sort key are the same.
For example:
--- src/test/modules/xtree/expected/interval.out
+++ src/test/modules/xtree/results/interval.out
@@ -375,10 +375,12 @@
SET enable_seqscan TO false;
EXPLAIN (COSTS OFF)
SELECT f1 FROM INTERVAL_TBL_OF r1 ORDER BY f1;
- QUERY PLAN
---------------------------------------------------------------------
- Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
-(1 row)
+ QUERY PLAN
+--------------------------------------------------------------------------
+ Sort
+ Sort Key: f1
+ -> Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
+(3 rows)
I’ve attached a patch that removes this unnecessary Sort node for
B-tree-compatible indexes. This change should:
- Reduce the number of test failures in the xtree module from 43 to 30
- Reduce the size of regression.diffs from 234K to 123K
Since pathkey comparison is a hot path in query planning and exercised
by many test queries, I plan to gather more performance metrics.
However, in a simple test running make installcheck with and without
the patch, I observed no negative impact on the runtime of the
regression test suite (which doesn’t include other btree-like indexes)
and a positive impact on the same regression tests for xtree.
Regression tests (same plans):
w/o patch:
make installcheck 1.36s user 2.21s system 12% cpu 28.018 total
w/ patch:
make installcheck 1.32s user 2.12s system 12% cpu 28.089 total
xtree tests:
w/o patch (inferior plan w/ extra sort node):
make installcheck 1.52s user 2.42s system 10% cpu 36.817 total
w/ patch (good plan):
make installcheck 1.52s user 2.48s system 12% cpu 32.201 total
I’ve marked the patch as no-cfbot, as it applies only on top of the
aforementioned patch series [1].
Thoughts?
[1] https://www.postgresql.org/message-id/a5dfb7cd-7a89-48ab-a913-e5304eee0854%40eisentraut.org
Best,
Alex
While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1],
I noticed some unexpected plan diffs when running the xtree and treeb
test modules. Specifically, an additional Sort node appears on top of
an Index Only Scan, even when the index key and sort key are the same.
For example:
--- src/test/modules/xtree/expected/interval.out
+++ src/test/modules/xtree/results/interval.out
@@ -375,10 +375,12 @@
SET enable_seqscan TO false;
EXPLAIN (COSTS OFF)
SELECT f1 FROM INTERVAL_TBL_OF r1 ORDER BY f1;
- QUERY PLAN
---------------------------------------------------------------------
- Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
-(1 row)
+ QUERY PLAN
+--------------------------------------------------------------------------
+ Sort
+ Sort Key: f1
+ -> Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
+(3 rows)
I’ve attached a patch that removes this unnecessary Sort node for
B-tree-compatible indexes. This change should:
- Reduce the number of test failures in the xtree module from 43 to 30
- Reduce the size of regression.diffs from 234K to 123K
Since pathkey comparison is a hot path in query planning and exercised
by many test queries, I plan to gather more performance metrics.
However, in a simple test running make installcheck with and without
the patch, I observed no negative impact on the runtime of the
regression test suite (which doesn’t include other btree-like indexes)
and a positive impact on the same regression tests for xtree.
Regression tests (same plans):
w/o patch:
make installcheck 1.36s user 2.21s system 12% cpu 28.018 total
w/ patch:
make installcheck 1.32s user 2.12s system 12% cpu 28.089 total
xtree tests:
w/o patch (inferior plan w/ extra sort node):
make installcheck 1.52s user 2.42s system 10% cpu 36.817 total
w/ patch (good plan):
make installcheck 1.52s user 2.48s system 12% cpu 32.201 total
I’ve marked the patch as no-cfbot, as it applies only on top of the
aforementioned patch series [1].
Thoughts?
[1] https://www.postgresql.org/message-id/a5dfb7cd-7a89-48ab-a913-e5304eee0854%40eisentraut.org
Best,
Alex
Attachment
On Fri, Feb 28, 2025 at 12:23 AM Alexandra Wang <alexandra.wang.oss@gmail.com> wrote: > While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1], > I noticed some unexpected plan diffs when running the xtree and treeb > test modules. Specifically, an additional Sort node appears on top of > an Index Only Scan, even when the index key and sort key are the same. > For example: > > --- src/test/modules/xtree/expected/interval.out > +++ src/test/modules/xtree/results/interval.out What's the xtree module? There's no such test module? -- Peter Geoghegan
Alexandra Wang <alexandra.wang.oss@gmail.com> writes: > I’ve attached a patch that removes this unnecessary Sort node for > B-tree-compatible indexes. This does not look right at all. You can't just ignore the opfamily etc. while deciding whether two pathkeys represent the same sort ordering, as you did in create_mergejoin_plan(). I don't like pathkeys_have_same_sortop() either. The pathkey data structures were designed to let pointer comparison be sufficient for deciding whether pathkeys are equivalent: see the "canonical pathkey" stuff in pathkeys.c. I think that this patch may be band-aiding over some breakage of that concept, but you've not provided enough context to figure out what the underlying problem is. regards, tom lane
Hi Peter,
On Thu, Feb 27, 2025 at 11:31 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Feb 28, 2025 at 12:23 AM Alexandra Wang
<alexandra.wang.oss@gmail.com> wrote:
> While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1],
> I noticed some unexpected plan diffs when running the xtree and treeb
> test modules. Specifically, an additional Sort node appears on top of
> an Index Only Scan, even when the index key and sort key are the same.
> For example:
>
> --- src/test/modules/xtree/expected/interval.out
> +++ src/test/modules/xtree/results/interval.out
What's the xtree module? There's no such test module?
Thank you so much for asking!
I borrowed the following three patches from the Index AM API cleanup
thread [1] and attached them here for reference:
thread [1] and attached them here for reference:
v20-0001: TEST - Add loadable modules as tests of the AM API
v20-0003: TEST - Stop requiring BTREE_AM_OID outside btree
v20-0012: Use CompareType more and StrategyNumber less
The v20-0001 patch introduces the xtree loadable test module. Xtree as
an extension is a shallow copy of the btree index, it should behave
exactly like btree. Specifically, it uses the same operators and
strategy numbers as btree, and all of its IndexAMRoutine methods point
to the corresponding btree methods. The only difference is that xtree
belongs to a different opfamily.
v20-0001 adds two loadable modules:
src/test/module/xtree (a shallow copy of btree)
src/test/module/xhash (a shallow copy of hash, not relevant to this discussion)
These test modules are not meant for commit; they are used solely for
testing the Index AM API.
By running "make check" in src/test/module/xtree, a script included in
the patch copies all tests from src/test/regress and replaces all
btree indexes with xtree indexes, allowing verification of how the
Index AM API works with non-core indexes.
The other two patches serve the following purposes:
v20-0003 removes BTREE_AM_OID-specific assertions outside btree code,
allowing other Index AMs.
v20-0012 is nice to have since it also touches the PathKey struct and
adds a CompareType field in PathKey. CompareType is more generic for
sort ordering compared to AM-specific strategy numbers. I’d like to
include this patch because even if two AMs have different strategy
numbers, they could still map to the same CompareType. When it comes
to deciding whether two PathKeys represent the same ordering, what
really matters is the underlying sort operator, and CompareType is
very relevant in that regard.
So, with the above patches applied (v20-0012 is optional for this
demonstration), I can run the following queries:
-- setup
create extension xtree;
create table t (b int, x int);
create index on t using btree(b);
create index on t using xtree(x);
insert into t select i, i from generate_series(1, 1000)i;
analyze t;
-- queries
explain (costs off) select b from t where b < 10 order by (b);
explain (costs off) select x from t where x < 10 order by (x);
-- output:
test=# explain (costs off) select b from t where b < 10 order by (b);
QUERY PLAN
------------------------------------
Index Only Scan using t_b_idx on t
Index Cond: (b < 10)
(2 rows)
test=# explain (costs off) select x from t where x < 10 order by (x);
QUERY PLAN
------------------------------------------
Sort
Sort Key: x
-> Index Only Scan using t_x_idx on t
Index Cond: (x < 10)
(4 rows)
Notice that the xtree index scan introduces an unnecessary Sort node,
even though the Index Only Scan already returns rows of x sorted by
the same operator (int4lt (”<”)) used in the ORDER BY (x) clause.
The problem I'm trying to solve is to eliminate this redundant sort
for btree-like index AMs.
[1] https://www.postgresql.org/message-id/a5dfb7cd-7a89-48ab-a913-e5304eee0854%40eisentraut.org
Best,
Alex