Thread: Re: Consider the number of columns in the sort cost model
Hi! On Thu, 22 Aug 2024 at 23:46, Andrei Lepikhov <lepihov@gmail.com> wrote: > > Hi, > > I would like to propose a slight elaboration of the sort cost model. > In practice, we frequently see the choice of suboptimal sortings, which > slump performance by 10-50%. > > The key reason here is the optimiser's blindness to the fact that > sorting calls a comparison operator for each pair of attributes in the > tuple. Just for example: > > SET work_mem = '128MB'; > CREATE TABLE test ( > x1 int,x2 int,x3 int,x4 int,x5 int,x6 int,x7 int,x8 int,payload text); > INSERT INTO test (x1,x2,x3,x4,x5,x6,x7,x8,payload) ( > SELECT 1,2,3,4,5,6,7,(1E5-x), 'abc'||x > FROM generate_series(1,1E5) AS x); > VACUUM ANALYZE test; > > Let's execute the following three queries: > > EXPLAIN (ANALYZE) > SELECT * FROM test WHERE x1<9E4 ORDER BY x8; > EXPLAIN (ANALYZE) > SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8; > EXPLAIN (ANALYZE) > SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x4,x5,x6,x7,x8; > > My laptop executes these three queries in 100ms, 300ms, and 560ms, > respectively. At the same time, the costs of these query plans are > similar. This corner case shows that sorting matters and in the corner > case, dependence on the number of comparisons looks close to linear. Indeed. My numbers are 44.167, 88.759,136.864 ms: > The patch in the attachment is a simplistic attempt to differentiate > these sortings by the number of columns. It is not ideal because > comparisons depend on the duplicates in each column, but it may be the > subject of further work. > > Even such a trivial model allows us to resolve the case like the below: > CREATE INDEX ON test (x1,x2,x3,x8); > EXPLAIN (ANALYZE) SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8; > > The current master will choose SeqScan for four columns as well as for a > single column. With the model, the optimiser will understand that > sorting four columns costs more than sorting a single column and switch > to IndexScan. > > -- > regards, Andrei Lepikhov I configm, before applying your patch Index Scan is not chosen, whereas after it does. Two questions: 1) Why do we change the `cost_sort` definition? We can calculate the length of `patchkeys` inside cost_sort if we want, just like we do inside cost_gather_merge. No need to make extension developers suffer with API change, isn't it? 2) I'm not very sure about the (1.0 + cmpMultiplier) coefficient. I understand that we need to multiply by something that is not 0 (and that's why we add 1), and this is strictly better than multiplying by 2, but why exactly is this formula like this, not (1.0 + cmpMultiplier ^ 2 / 10) for example? Maybe we need some more sophisticated approach? Also, this patch needs to be rebased, again. -- Best regards, Kirill Reshke
On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepihov@gmail.com> wrote: > I am suspicious of that but open to hearing other opinions. The > coefficient incorporates knowledge about how many comparisons will be > made with this sorting operator. The caller can have some additional > knowledge about that. For example, IncrementalSort knows about groups - > each group has the same prefix, which means each tuple comparison will > inevitably cause a comparison for each column from the prefix. Also, it > may be knowledge about the number of distinct values for each column, > etc. I'm not sure we should elaborate cost_sort a lot here. I agree that cost_sort() likely needs a bit of work to try to bring it closer to the current reality. There have been plenty of performance improvements to sort over the years and I don't recall the costing code ever being touched, likely that's because there are just so many things that are not correctly or not costed at all. As for your patch, I'm suspicious that the general idea you're proposing is an actual improvement. I've only glanced at the patch, but at least from this fragment here: @@ -2150,9 +2156,12 @@ cost_sort(Path *path, PlannerInfo *root, { Cost startup_cost; Cost run_cost; + int cmpMultiplier = npathkeys; and: static void cost_tuplesort(Cost *startup_cost, Cost *run_cost, - double tuples, int width, + double tuples, int width, int cmpMultiplier, Cost comparison_cost, int sort_mem, double limit_tuples) { @@ -1913,7 +1917,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost, tuples = 2.0; /* Include the default cost-per-comparison */ - comparison_cost += 2.0 * cpu_operator_cost; + comparison_cost += (1.0 + cmpMultiplier) * cpu_operator_cost; it seems you're just charging cpu_operator_cost * <number of columns to sort by>. It seems like it won't be very hard to fool that into doing the wrong thing when the first column to sort by is distinct or almost distinct. There's going to be far fewer or no tiebreaker comparisons for that case. As mentioned by Kirill, I also don't understand the cost_sort signature change. Why would you do that over just doing list_length(pathkeys) within cost_sort()? Your throwing away a parameter that might be the most useful one of the bunch for allowing better sort cost estimates. I think maybe what is worth working on is seeing if you can better estimate the number of tiebreak comparisons by checking the number of distinct values for each sort key. That might require what we discussed on another thread about having estimate_num_groups() better use the n_distinct estimate from the EquivalenceMember with the least distinct values. It'll be another question if all that can be done and this all still perform well enough for cost_sort(). You may have to write it first before we can figure that out. It may be possible to buy back some of the overheads with some additional caching... Perhaps that could be done within the EquivalenceClass. David
On 10/15/24 12:15, David Rowley wrote: > As for your patch, I'm suspicious that the general idea you're > proposing is an actual improvement. I didn't intend to treat it as an 'improvement' but as an intermediate patch. The main purpose here is to debate the way & introduce considering of number of columns. Conservative development approach has been preferred before. > > it seems you're just charging cpu_operator_cost * <number of columns > to sort by>. It seems like it won't be very hard to fool that into > doing the wrong thing when the first column to sort by is distinct or > almost distinct. There's going to be far fewer or no tiebreaker > comparisons for that case. As I've written above, it is for starters. It allow to analyse how the balance between different types of orderings can be changed in extreme cases. I can join this patch with the following, implementing differentiation by distincts, but the effect on regression tests will be smoothed out. The primary idea is 1) to elaborate GROUP-BY optimisation and 2) give the optimiser a tool to choose a more optimal sort, comparing MergeAppend/Sort/IncrementalSort costs. The whole idea is implemented in the branch [1] and described in the post [2]. Of course, we differentiate sortings by distinct of the first column (only one trustworthy statistic). It is not so difficult (but I doubt the real necessity) to use extended statistics and reflect in the cost values of different combinations of columns. > > As mentioned by Kirill, I also don't understand the cost_sort > signature change. Why would you do that over just doing > list_length(pathkeys) within cost_sort()? Your throwing away a > parameter that might be the most useful one of the bunch for allowing > better sort cost estimates. Ok, may be it is too much for an intermediate patch. We can change the interface later, if necessary. > Perhaps that could be done within the EquivalenceClass. Thanks for the idea! [1] https://github.com/postgrespro/postgres/tree/sort-columnsnum [2] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort -- regards, Andrei Lepikhov
Hi! Thank you for your work on this subject! On 23.10.2024 04:39, Andrei Lepikhov wrote: > On 15/10/2024 12:15, David Rowley wrote: >> On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepihov@gmail.com> wrote: >> I think maybe what is worth working on is seeing if you can better >> estimate the number of tiebreak comparisons by checking the number of >> distinct values for each sort key. That might require what we >> discussed on another thread about having estimate_num_groups() better >> use the n_distinct estimate from the EquivalenceMember with the least >> distinct values. It'll be another question if all that can be done >> and this all still perform well enough for cost_sort(). You may have >> to write it first before we can figure that out. It may be possible >> to buy back some of the overheads with some additional caching... >> Perhaps that could be done within the EquivalenceClass. > It seems that your idea with an equivalence class works. > In the patchset attached, I show all the ideas (I initially wanted to > discuss it step-by-step). > In the attached, the first patch is about choosing adequate expression > from the list of EC members. > The second one is the current patch, which considers the number of > sort columns. > Third patch - elaboration of the second patch. Use only the trustful > statistics here - the number of ndistinct values on the first sorted > column. > And the last patch is a demo of how I'm going to use the previous > three patches and add one more strategy to improve the order of > columns in the GROUP-BY clause. > I have some question about your patches. 1. You consider two lists root->group_pathkeys and root->processed_groupClause. As far as I understand, this is due to the fact that the pathkey elements are identical to the group clause and identity verification is precisely checked by this condition: if (foreach_current_index(lc1) >= root->num_groupby_pathkeys) To be honest, I didn’t find information about this in the code, but did I understand everything correctly? 2. I noticed that statistics of distinct values are calculated several times. Maybe I miss something, but this can be avoided if these statistics can be saved after calculation. For example, I saw that it is possible that you calculate the same statistic information for the same equivalence members in cost_incremental_sort and identify_sort_ecmember. Is it possible to store information in a structure and use it later? 3. I think you should initialize the variable ndist in your patch. I faced the compile complaining during your code compilation. costsize.c: In function ‘identify_sort_ecmember’: costsize.c:6694:42: error: ‘ndist’ may be used uninitialized [-Werror=maybe-uninitialized] 6694 | em->em_ndistinct = ndist; | ~~~~~~~~~~~~~~~~~^~~~~ costsize.c:6680:33: note: ‘ndist’ was declared here 6680 | double ndist; | ^~~ cc1: all warnings being treated as errors gmake[4]: *** [<builtin>: costsize.o] Error 1 4. Before executing examine_variable function I think you should check that it is not null. @@ -575,6 +575,8 @@ get_useful_group_keys_orderings(PlannerInfo *root, Path *path) */ continue; + Assert (node != NULL); + examine_variable(root, node, 0, &vardata); if (!HeapTupleIsValid(vardata.statsTuple)) continue; -- Regards, Alena Rybakina Postgres Professional
On 10/28/24 16:48, Alena Rybakina wrote:Thanks for the confirmation.On 23.10.2024 04:39, Andrei Lepikhov wrote:YesOn 15/10/2024 12:15, David Rowley wrote:To be honest, I didn’t find information about this in the code, but did I understand everything correctly?
And the last patch is a demo of how I'm going to use the previous three patches and add one more strategy to improve the order of columns in the GROUP-BY clause.
When you said that, I saw it. You are right, checking for 0 is what stops us from useless recalculation. I noticed it right away.Hmm, I don't see multiple calculations. em_distinct has made specifically for this sake. Can you provide specific case and code lines?
2. I noticed that statistics of distinct values are calculated several times. Maybe I miss something, but this can be avoided if these statistics can be saved after calculation. For example, I saw that it is possible that you calculate the same statistic information for the same equivalence members in cost_incremental_sort and identify_sort_ecmember. Is it possible to store information in a structure and use it later?
Thank you :)I think you can just update your compiler. But I added the ndist initialisation to make more compilers happy :).
3. I think you should initialize the variable ndist in your patch. I faced the compile complaining during your code compilation.
costsize.c: In function ‘identify_sort_ecmember’:
costsize.c:6694:42: error: ‘ndist’ may be used uninitialized [-Werror=maybe-uninitialized]
6694 | em->em_ndistinct = ndist;
| ~~~~~~~~~~~~~~~~~^~~~~
costsize.c:6680:33: note: ‘ndist’ was declared here
6680 | double ndist;
| ^~~
cc1: all warnings being treated as errors
gmake[4]: *** [<builtin>: costsize.o] Error 1
+ Assert (node != NULL);I don't think so. At least until you provide the case when the get_sortgroupclause_expr function returns NULL.
+
examine_variable(root, node, 0, &vardata);
if (!HeapTupleIsValid(vardata.statsTuple))
continue;
That's more, remember - the patch 0004 here is just to show the perspective and still under construction.
Anyway, thanks, I found out that the patch set doesn't apply correctly because of 828e94c. So, see the new version in the attachment.
Yes, you are right, if this case is possible, then it is connected with an error in the formation of the target list.
I played around with the examples a bit and couldn't figure out something. When I added the same values to different columns - firstly in a, later in b, the order of the columns for sort operation doesn't change. Isn't this a mistake? create table a (x1 int, y1 int);
create table b (x2 int, y2 int);
insert into a values (NULL, NULL);
insert into a values (NULL, 1);
insert into a values (1, 1);
insert into a values (1, NULL);
create index a_x1_idx on a(x1);
create index b_x2_idx on b(x2);
create index a_y1_idx on a(y1);
create index b_y2_idx on b(y2);
insert into b select 1, 2 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 1, 3 from generate_series(3,30) as id;
explain analyze select a.x1, s.x2, s.y2 from a left join (select distinct * from b) s on a.x1=s.x2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Hash Right Join (cost=44.99..48.15 rows=5 width=12) (actual time=0.225..0.250 rows=8 loops=1)
Hash Cond: (b.x2 = a.x1)
-> HashAggregate (cost=43.90..46.16 rows=226 width=8) (actual time=0.117..0.123 rows=3 loops=1)
Group Key: b.x2, b.y2
Batches: 1 Memory Usage: 40kB
-> Seq Scan on b (cost=0.00..32.60 rows=2260 width=8) (actual time=0.030..0.044 rows=48 loops=1)
-> Hash (cost=1.04..1.04 rows=4 width=4) (actual time=0.073..0.074 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on a (cost=0.00..1.04 rows=4 width=4) (actual time=0.047..0.051 rows=4 loops=1)
Planning Time: 1.649 ms
Execution Time: 0.485 ms
(11 rows)
delete from b;
insert into b select 2, 1 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 3, 1 from generate_series(3,30) as id;
vacuum analyze;
explain analyze select a.x1, s.x2, s.y2 from a left join (select distinct * from b) s on a.x1=s.x2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1.79..2.86 rows=4 width=12) (actual time=0.083..0.090 rows=4 loops=1)
Hash Cond: (a.x1 = b.x2)
-> Seq Scan on a (cost=0.00..1.04 rows=4 width=4) (actual time=0.010..0.011 rows=4 loops=1)
-> Hash (cost=1.75..1.75 rows=3 width=8) (actual time=0.067..0.068 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> HashAggregate (cost=1.72..1.75 rows=3 width=8) (actual time=0.063..0.064 rows=3 loops=1)
Group Key: b.x2, b.y2
Batches: 1 Memory Usage: 24kB
-> Seq Scan on b (cost=0.00..1.48 rows=48 width=8) (actual time=0.006..0.014 rows=48 loops=1)
Planning Time: 0.391 ms
Execution Time: 0.151 ms
(11 rows)
I noticed a spelling: "significance phusics".
/*
* Calculate multiplier reflecting the number of comparisons which executor
* have to perform during the sort with this specific order of columns.
*
* The comparison factor f = 1.+F(pathkeys). There 1. incapsulates the
* second-order of significance phusics which cost function doesn't consider.
-- Regards, Alena Rybakina Postgres Professional
> I played around with the examples a bit and couldn't figure out > something. When I added the same values to different columns - > firstly in a, later in b, the order of the columns for sort operation > doesn't change. Isn't this a mistake? > > create table a (x1 int, y1 int); > create table b (x2 int, y2 int); > insert into a values (NULL, NULL); > insert into a values (NULL, 1); > insert into a values (1, 1); > insert into a values (1, NULL); > > create index a_x1_idx on a(x1); > create index b_x2_idx on b(x2); > create index a_y1_idx on a(y1); > create index b_y2_idx on b(y2); > > insert into b select 1, 2 from generate_series(11,20) as id; > insert into b select 1, 1 from generate_series(1,10) as id; > insert into b select 1, 3 from generate_series(3,30) as id; > > explain analyze select a.x1, s.x2, s.y2 from a left join (select > distinct * from b) s on a.x1=s.x2; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------ > Hash Right Join (cost=44.99..48.15 rows=5 width=12) (actual > time=0.225..0.250 rows=8 loops=1) > Hash Cond: (b.x2 = a.x1) > -> HashAggregate (cost=43.90..46.16 rows=226 width=8) (actual > time=0.117..0.123 rows=3 loops=1) > Group Key: b.x2, b.y2 > Batches: 1 Memory Usage: 40kB > -> Seq Scan on b (cost=0.00..32.60 rows=2260 width=8) > (actual time=0.030..0.044 rows=48 loops=1) > -> Hash (cost=1.04..1.04 rows=4 width=4) (actual > time=0.073..0.074 rows=4 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 9kB > -> Seq Scan on a (cost=0.00..1.04 rows=4 width=4) (actual > time=0.047..0.051 rows=4 loops=1) > Planning Time: 1.649 ms > Execution Time: 0.485 ms > (11 rows) > > delete from b; > insert into b select 2, 1 from generate_series(11,20) as id; > insert into b select 1, 1 from generate_series(1,10) as id; > insert into b select 3, 1 from generate_series(3,30) as id; > vacuum analyze; > explain analyze select a.x1, s.x2, s.y2 from a left join (select > distinct * from b) s on a.x1=s.x2; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------- > Hash Left Join (cost=1.79..2.86 rows=4 width=12) (actual > time=0.083..0.090 rows=4 loops=1) > Hash Cond: (a.x1 = b.x2) > -> Seq Scan on a (cost=0.00..1.04 rows=4 width=4) (actual > time=0.010..0.011 rows=4 loops=1) > -> Hash (cost=1.75..1.75 rows=3 width=8) (actual > time=0.067..0.068 rows=3 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 9kB > -> HashAggregate (cost=1.72..1.75 rows=3 width=8) (actual > time=0.063..0.064 rows=3 loops=1) > Group Key: b.x2, b.y2 > Batches: 1 Memory Usage: 24kB > -> Seq Scan on b (cost=0.00..1.48 rows=48 width=8) > (actual time=0.006..0.014 rows=48 loops=1) > Planning Time: 0.391 ms > Execution Time: 0.151 ms > (11 rows) Sorry, I missed vacuum analyze before deleting all data from table b, but after running it I still got the same plan. alena@postgres=# create table a (x1 int, y1 int); create table b (xcreate table a (x1 int, y1 int); create table b (x2 int, y2 int);); insert into a values (NULL, NULL); insert into a values (NULL, 1); insert into a values (1, 1);L); insert into a values (1, NULL); create index a_x1_idx on a(x1); create index a_x1_idx on a(x1); create index b_x2_idx on b(x2); create index a_y1_idx on a(y1); create index b_y2_idx on b(y2); insert into b select 1, 2 from generate_series(11,20) as id; insert into b select 1, 2 from generate_series(11,20) as id; insert into b select 1, 1 from generate_series(1,10) as id; insert into b select 1, 3 from generate_series(3,30) as id; alena@postgres=# vacuum analyze; VACUUM alena@postgres=# explain analyze select a.x1, s.x2, s.y2 from a left join (select distinct * from b) s on a.x1=s.x2; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Hash Left Join (cost=1.79..2.86 rows=4 width=12) (actual time=0.168..0.185 rows=8 loops=1) Hash Cond: (a.x1 = b.x2) -> Seq Scan on a (cost=0.00..1.04 rows=4 width=4) (actual time=0.027..0.029 rows=4 loops=1) -> Hash (cost=1.75..1.75 rows=3 width=8) (actual time=0.129..0.130 rows=3 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> HashAggregate (cost=1.72..1.75 rows=3 width=8) (actual time=0.119..0.123 rows=3 loops=1) Group Key: b.x2, b.y2 Batches: 1 Memory Usage: 24kB -> Seq Scan on b (cost=0.00..1.48 rows=48 width=8) (actual time=0.013..0.029 rows=48 loops=1) Planning Time: 1.464 ms Execution Time: 0.352 ms (11 rows) -- Regards, Alena Rybakina Postgres Professional
On 10/31/24 21:00, Alena Rybakina wrote: > On 29.10.2024 05:47, Andrei Lepikhov wrote: > I played around with the examples a bit and couldn't figure out > something. When I added the same values to different columns - firstly > in a, later in b, the order of the columns for sort operation doesn't > change. Isn't this a mistake? This feature implements a cost-based approach to choosing the grouping clauses' order. What kind of 'mistake' do you see here? Is there something wrong with the costs, or do you see a bug in the cost calculation code? Please specify your request. -- regards, Andrei Lepikhov
Hi folks, Just wanted to mention, that looks like some CFBot test are failing, something around level_tracking in pgss.