Re: Consider explicit incremental sort for Append and MergeAppend - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Consider explicit incremental sort for Append and MergeAppend |
Date | |
Msg-id | CAMbWs49wSNPPD=FOQqzjPNZ_N9EGDv=7-ou0dFgd0HSwP3fTAg@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
On Mon, May 19, 2025 at 10:21 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, May 15, 2025 at 9:03 AM Andrei Lepikhov <lepihov@gmail.com> wrote: > > 2. IncrementalSort is not always more effective - it depends on the > > column's number of groups. In my experience, a non-cost-based decision > > one day meets the problematic case, and the people who stick with it are > > much more confused than in the case when planner decision connected to > > the costings - they trust the cost model or the cost model tuned by GUCs. > I wonder if this could be fixed in nodeIncrementalSort.c. I think it's > a problem to rely on planner estimates because planner estimates of > the # of groups are quite unreliable. But at runtime, we can notice > whether the groups are small or large and adjust the algorithm > accordingly. In fact, it looks like we already have some logic for > that: > > /* > * Sorting many small groups with tuplesort is inefficient. In order to > * cope with this problem we don't start a new group until the current one > * contains at least DEFAULT_MIN_GROUP_SIZE tuples (unfortunately this also > * means we can't assume small groups of tuples all have the same prefix keys.) > * When we have a bound that's less than DEFAULT_MIN_GROUP_SIZE we start looking > * for the new group as soon as we've met our bound to avoid fetching more > * tuples than we absolutely have to fetch. > */ > #define DEFAULT_MIN_GROUP_SIZE 32 > > But maybe this logic needs to be further refined somehow. I can't > shake the feeling that it's going to be really hard to have every > place that uses incremental sort decide whether to use an incremental > sort or a regular sort -- we should try to get to a place where it's > safe to use an incremental sort when it's possible without worrying > that it might actually be slower. Agreed. In some cases, we currently don't have the infrastructure to consider both incremental sort and full sort and compare their costs — for example, when inserting explicit sorts for mergejoins, or, as in this case, for Append/MergeAppend. Besides, I think the planner currently assumes that incremental sort is always faster than full sort when there are presorted keys, and this premise has been applied in various parts of the code. I checked all the callers of create_incremental_sort_path, and they all follow the logic that if there are presorted keys, only incremental sort is considered. Also, to understand how incremental sort performs in the worst case, I ran the following test. create table ab (a int not null, b int not null); insert into ab select 0,x from generate_Series(1,999000)x union all select x%1000+1,0 from generate_Series(999001,1000000)x; create index on ab (a); vacuum analyze ab; In this table, group 0 has 999000 rows, and the remaining groups 1-1000 have just 1 row each. -- incremental sort explain (analyze, costs off) select * from ab order by a,b; QUERY PLAN ------------------------------------------------------------------------------------------------ Incremental Sort (actual time=585.093..714.589 rows=1000000.00 loops=1) Sort Key: a, b Presorted Key: a Full-sort Groups: 33 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB Pre-sorted Groups: 1 Sort Method: external merge Average Disk: 17680kB Peak Disk: 17680kB Buffers: shared hit=5273, temp read=2210 written=2222 -> Index Scan using ab_a_idx on ab (actual time=0.192..330.289 rows=1000000.00 loops=1) Index Searches: 1 Buffers: shared hit=5273 Planning Time: 0.354 ms Execution Time: 759.693 ms (11 rows) -- full sort explain (analyze, costs off) select * from ab order by a,b; QUERY PLAN -------------------------------------------------------------------------------------------- Sort (actual time=570.755..831.825 rows=1000000.00 loops=1) Sort Key: a, b Sort Method: external merge Disk: 17704kB Buffers: shared hit=5273, temp read=2213 written=2225 -> Index Scan using ab_a_idx on ab (actual time=0.187..327.701 rows=1000000.00 loops=1) Index Searches: 1 Buffers: shared hit=5273 Planning Time: 0.304 ms Execution Time: 877.112 ms (9 rows) So with the same subpath, incremental sort still outperforms full sort even if there is a big skew in the number of rows per pre-sorted group (which is a bit surprising to me). So, I think it's generally safe to use incremental sort whenever possible. There might be some corner cases where incremental sort is slower than full sort, and I think it would be best to address those in nodeIncrementalSort.c, as Robert suggested. Thanks Richard
pgsql-hackers by date: