Thread: Re: Consider the number of columns in the sort cost model

Re: Consider the number of columns in the sort cost model

From
Kirill Reshke
Date:
Hi!

On Thu, 22 Aug 2024 at 23:46, Andrei Lepikhov <lepihov@gmail.com> wrote:
>
> Hi,
>
> I would like to propose a slight elaboration of the sort cost model.
> In practice, we frequently see the choice of suboptimal sortings, which
> slump performance by 10-50%.
>
> The key reason here is the optimiser's blindness to the fact that
> sorting calls a comparison operator for each pair of attributes in the
> tuple. Just for example:
>
> SET work_mem = '128MB';
> CREATE TABLE test (
>    x1 int,x2 int,x3 int,x4 int,x5 int,x6 int,x7 int,x8 int,payload text);
> INSERT INTO test (x1,x2,x3,x4,x5,x6,x7,x8,payload) (
>    SELECT 1,2,3,4,5,6,7,(1E5-x), 'abc'||x
>      FROM generate_series(1,1E5) AS x);
> VACUUM ANALYZE test;
>
> Let's execute the following three queries:
>
> EXPLAIN (ANALYZE)
> SELECT * FROM test WHERE x1<9E4 ORDER BY x8;
> EXPLAIN (ANALYZE)
> SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8;
> EXPLAIN (ANALYZE)
> SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x4,x5,x6,x7,x8;
>
> My laptop executes these three queries in 100ms, 300ms, and 560ms,
> respectively. At the same time, the costs of these query plans are
> similar. This corner case shows that sorting matters and in the corner
> case, dependence on the number of comparisons looks close to linear.


Indeed. My numbers are  44.167, 88.759,136.864 ms:


> The patch in the attachment is a simplistic attempt to differentiate
> these sortings by the number of columns. It is not ideal because
> comparisons depend on the duplicates in each column, but it may be the
> subject of further work.
>
> Even such a trivial model allows us to resolve the case like the below:
> CREATE INDEX ON test (x1,x2,x3,x8);
> EXPLAIN (ANALYZE) SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8;
>
> The current master will choose SeqScan for four columns as well as for a
> single column. With the model, the optimiser will understand that
> sorting four columns costs more than sorting a single column and switch
> to IndexScan.
>
> --
> regards, Andrei Lepikhov


I configm, before applying your patch Index Scan is not chosen,
whereas after it does.

Two questions:
1) Why do we change the `cost_sort` definition? We can calculate the
length of `patchkeys` inside cost_sort if we want, just like we do
inside cost_gather_merge. No need to make extension developers suffer
with API change, isn't it?

2) I'm not very sure about the (1.0 + cmpMultiplier) coefficient. I
understand that we need to multiply by something that is not 0 (and
that's why we add 1), and this is strictly better than multiplying by
2, but why exactly is this formula like this, not (1.0 + cmpMultiplier
^ 2 / 10) for example? Maybe we need some more sophisticated approach?

Also, this patch needs to be rebased, again.

-- 
Best regards,
Kirill Reshke



Re: Consider the number of columns in the sort cost model

From
David Rowley
Date:
On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepihov@gmail.com> wrote:
> I am suspicious of that but open to hearing other opinions. The
> coefficient incorporates knowledge about how many comparisons will be
> made with this sorting operator. The caller can have some additional
> knowledge about that. For example, IncrementalSort knows about groups -
> each group has the same prefix, which means each tuple comparison will
> inevitably cause a comparison for each column from the prefix. Also, it
> may be knowledge about the number of distinct values for each column,
> etc. I'm not sure we should elaborate cost_sort a lot here.

I agree that cost_sort() likely needs a bit of work to try to bring it
closer to the current reality.  There have been plenty of performance
improvements to sort over the years and I don't recall the costing
code ever being touched, likely that's because there are just so many
things that are not correctly or not costed at all.

As for your patch, I'm suspicious that the general idea you're
proposing is an actual improvement.  I've only glanced at the patch,
but at least from this fragment here:

@@ -2150,9 +2156,12 @@ cost_sort(Path *path, PlannerInfo *root,
 {
  Cost startup_cost;
  Cost run_cost;
+ int cmpMultiplier = npathkeys;

and:

 static void
 cost_tuplesort(Cost *startup_cost, Cost *run_cost,
-    double tuples, int width,
+    double tuples, int width, int cmpMultiplier,
     Cost comparison_cost, int sort_mem,
     double limit_tuples)
 {
@@ -1913,7 +1917,7 @@ cost_tuplesort(Cost *startup_cost, Cost *run_cost,
  tuples = 2.0;

  /* Include the default cost-per-comparison */
- comparison_cost += 2.0 * cpu_operator_cost;
+ comparison_cost += (1.0 + cmpMultiplier) * cpu_operator_cost;

it seems you're just charging cpu_operator_cost * <number of columns
to sort by>. It seems like it won't be very hard to fool that into
doing the wrong thing when the first column to sort by is distinct or
almost distinct. There's going to be far fewer or no tiebreaker
comparisons for that case.

As mentioned by Kirill, I also don't understand the cost_sort
signature change. Why would you do that over just doing
list_length(pathkeys) within cost_sort()? Your throwing away a
parameter that might  be the most useful one of the bunch for allowing
better sort cost estimates.

I think maybe what is worth working on is seeing if you can better
estimate the number of tiebreak comparisons by checking the number of
distinct values for each sort key.  That might require what we
discussed on another thread about having estimate_num_groups() better
use the n_distinct estimate from the EquivalenceMember with the least
distinct values.  It'll be another question if all that can be done
and this all still perform well enough for cost_sort(). You may have
to write it first before we can figure that out.  It may be possible
to buy back some of the overheads with some additional caching...
Perhaps that could be done within the EquivalenceClass.

David



Re: Consider the number of columns in the sort cost model

From
Andrei Lepikhov
Date:
On 10/15/24 12:15, David Rowley wrote:
> As for your patch, I'm suspicious that the general idea you're
> proposing is an actual improvement.
I didn't intend to treat it as an 'improvement' but as an intermediate 
patch. The main purpose here is to debate the way & introduce 
considering of number of columns. Conservative development approach has 
been preferred before.
> 
> it seems you're just charging cpu_operator_cost * <number of columns
> to sort by>. It seems like it won't be very hard to fool that into
> doing the wrong thing when the first column to sort by is distinct or
> almost distinct. There's going to be far fewer or no tiebreaker
> comparisons for that case.
As I've written above, it is for starters. It allow to analyse how the 
balance between different types of orderings can be changed in extreme 
cases. I can join this patch with the following, implementing 
differentiation by distincts, but the effect on regression tests will be 
smoothed out.
The primary idea is 1) to elaborate GROUP-BY optimisation and 2) give 
the optimiser a tool to choose a more optimal sort, comparing 
MergeAppend/Sort/IncrementalSort costs.
The whole idea is implemented in the branch [1] and described in the 
post [2]. Of course, we differentiate sortings by distinct of the first 
column (only one trustworthy statistic). It is not so difficult (but I 
doubt the real necessity) to use extended statistics and reflect in the 
cost values of different combinations of columns.

> 
> As mentioned by Kirill, I also don't understand the cost_sort
> signature change. Why would you do that over just doing
> list_length(pathkeys) within cost_sort()? Your throwing away a
> parameter that might  be the most useful one of the bunch for allowing
> better sort cost estimates.
Ok, may be it is too much for an intermediate patch. We can change the 
interface later, if necessary.
> Perhaps that could be done within the EquivalenceClass.
Thanks for the idea!

[1] https://github.com/postgrespro/postgres/tree/sort-columnsnum
[2] https://danolivo.substack.com/p/elaboration-of-the-postgresql-sort

-- 
regards, Andrei Lepikhov




Re: Consider the number of columns in the sort cost model

From
Alena Rybakina
Date:
Hi! Thank you for your work on this subject!

On 23.10.2024 04:39, Andrei Lepikhov wrote:
> On 15/10/2024 12:15, David Rowley wrote:
>> On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov <lepihov@gmail.com> wrote:
>> I think maybe what is worth working on is seeing if you can better
>> estimate the number of tiebreak comparisons by checking the number of
>> distinct values for each sort key.  That might require what we
>> discussed on another thread about having estimate_num_groups() better
>> use the n_distinct estimate from the EquivalenceMember with the least
>> distinct values.  It'll be another question if all that can be done
>> and this all still perform well enough for cost_sort(). You may have
>> to write it first before we can figure that out.  It may be possible
>> to buy back some of the overheads with some additional caching...
>> Perhaps that could be done within the EquivalenceClass.
> It seems that your idea with an equivalence class works.
> In the patchset attached, I show all the ideas (I initially wanted to 
> discuss it step-by-step).
> In the attached, the first patch is about choosing adequate expression 
> from the list of EC members.
> The second one is the current patch, which considers the number of 
> sort columns.
> Third patch - elaboration of the second patch. Use only the trustful 
> statistics here - the number of ndistinct values on the first sorted 
> column.
> And the last patch is a demo of how I'm going to use the previous 
> three patches and add one more strategy to improve the order of 
> columns in the GROUP-BY clause.
>
I have some question about your patches.

1. You consider two lists root->group_pathkeys and 
root->processed_groupClause.

As far as I understand, this is due to the fact that the pathkey 
elements are identical to the group clause and identity verification is 
precisely checked by this condition:

if (foreach_current_index(lc1) >= root->num_groupby_pathkeys)

To be honest, I didn’t find information about this in the code, but did 
I understand everything correctly?

2. I noticed that statistics of distinct values ​​are calculated several 
times. Maybe I miss something, but this can be avoided if these 
statistics can be saved after calculation. For example, I saw that it is 
possible that you calculate the same statistic information for the same 
equivalence members in cost_incremental_sort and identify_sort_ecmember. 
Is it possible to store information in a structure and use it later?

3. I think you should initialize the variable ndist in your patch. I 
faced the compile complaining during your code compilation.

costsize.c: In function ‘identify_sort_ecmember’:
costsize.c:6694:42: error: ‘ndist’ may be used uninitialized 
[-Werror=maybe-uninitialized]
  6694 |                         em->em_ndistinct = ndist;
       |                         ~~~~~~~~~~~~~~~~~^~~~~
costsize.c:6680:33: note: ‘ndist’ was declared here
  6680 |                         double  ndist;
       |                                 ^~~
cc1: all warnings being treated as errors
gmake[4]: *** [<builtin>: costsize.o] Error 1

4. Before executing examine_variable function I think you should check 
that it is not null.

@@ -575,6 +575,8 @@ get_useful_group_keys_orderings(PlannerInfo *root, 
Path *path)
               */
              continue;

+        Assert (node != NULL);
+
          examine_variable(root, node, 0, &vardata);
          if (!HeapTupleIsValid(vardata.statsTuple))
              continue;


-- 
Regards,
Alena Rybakina
Postgres Professional




Re: Consider the number of columns in the sort cost model

From
Alena Rybakina
Date:
On 29.10.2024 05:47, Andrei Lepikhov wrote:
On 10/28/24 16:48, Alena Rybakina wrote:
On 23.10.2024 04:39, Andrei Lepikhov wrote:
On 15/10/2024 12:15, David Rowley wrote:
And the last patch is a demo of how I'm going to use the previous three patches and add one more strategy to improve the order of columns in the GROUP-BY clause.

To be honest, I didn’t find information about this in the code, but did I understand everything correctly?
Yes
Thanks for the confirmation.

2. I noticed that statistics of distinct values ​​are calculated several times. Maybe I miss something, but this can be avoided if these statistics can be saved after calculation. For example, I saw that it is possible that you calculate the same statistic information for the same equivalence members in cost_incremental_sort and identify_sort_ecmember. Is it possible to store information in a structure and use it later?
Hmm, I don't see multiple calculations. em_distinct has made specifically for this sake. Can you provide specific case and code lines?
When you said that, I saw it. You are right, checking for 0 is what stops us from useless recalculation. I noticed it right away.

3. I think you should initialize the variable ndist in your patch. I faced the compile complaining during your code compilation.

costsize.c: In function ‘identify_sort_ecmember’:
costsize.c:6694:42: error: ‘ndist’ may be used uninitialized [-Werror=maybe-uninitialized]
  6694 |                         em->em_ndistinct = ndist;
       |                         ~~~~~~~~~~~~~~~~~^~~~~
costsize.c:6680:33: note: ‘ndist’ was declared here
  6680 |                         double  ndist;
       |                                 ^~~
cc1: all warnings being treated as errors
gmake[4]: *** [<builtin>: costsize.o] Error 1
I think you can just update your compiler. But I added the ndist initialisation to make more compilers happy :).
Thank you :)

+        Assert (node != NULL);
+
          examine_variable(root, node, 0, &vardata);
          if (!HeapTupleIsValid(vardata.statsTuple))
              continue;
I don't think so. At least until you provide the case when the get_sortgroupclause_expr function returns NULL.
That's more, remember - the patch 0004 here is just to show the perspective and still under construction.
Anyway, thanks, I found out that the patch set doesn't apply correctly because of 828e94c. So, see the new version in the attachment.

Yes, you are right, if this case is possible, then it is connected with an error in the formation of the target list.


I played around with the examples a bit and couldn't figure out something. When I added the same values ​​to different columns - firstly in a, later in b, the order of the columns for sort operation doesn't change. Isn't this a mistake? create table a (x1 int, y1 int);
create table b (x2 int, y2 int);
insert into a values (NULL, NULL);
insert into a values (NULL, 1);
insert into a values (1, 1);
insert into a values (1, NULL);

create index a_x1_idx on a(x1);
create index b_x2_idx on b(x2);
create index a_y1_idx on a(y1);
create index b_y2_idx on b(y2);

insert into b select 1, 2 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 1, 3 from generate_series(3,30) as id;

explain analyze select a.x1, s.x2, s.y2 from a left join (select distinct * from b) s on a.x1=s.x2;
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=44.99..48.15 rows=5 width=12) (actual time=0.225..0.250 rows=8 loops=1)
   Hash Cond: (b.x2 = a.x1)
   ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8) (actual time=0.117..0.123 rows=3 loops=1)
         Group Key: b.x2, b.y2
         Batches: 1  Memory Usage: 40kB
         ->  Seq Scan on b  (cost=0.00..32.60 rows=2260 width=8) (actual time=0.030..0.044 rows=48 loops=1)
   ->  Hash  (cost=1.04..1.04 rows=4 width=4) (actual time=0.073..0.074 rows=4 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual time=0.047..0.051 rows=4 loops=1)
 Planning Time: 1.649 ms
 Execution Time: 0.485 ms
(11 rows)

delete from b;
insert into b select 2, 1 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 3, 1 from generate_series(3,30) as id;
vacuum analyze;
explain analyze select a.x1, s.x2, s.y2 from a left join (select distinct * from b) s on a.x1=s.x2;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1.79..2.86 rows=4 width=12) (actual time=0.083..0.090 rows=4 loops=1)
   Hash Cond: (a.x1 = b.x2)
   ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual time=0.010..0.011 rows=4 loops=1)
   ->  Hash  (cost=1.75..1.75 rows=3 width=8) (actual time=0.067..0.068 rows=3 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  HashAggregate  (cost=1.72..1.75 rows=3 width=8) (actual time=0.063..0.064 rows=3 loops=1)
               Group Key: b.x2, b.y2
               Batches: 1  Memory Usage: 24kB
               ->  Seq Scan on b  (cost=0.00..1.48 rows=48 width=8) (actual time=0.006..0.014 rows=48 loops=1)
 Planning Time: 0.391 ms
 Execution Time: 0.151 ms
(11 rows)



I noticed a spelling: "significance phusics".

/*
 * Calculate multiplier reflecting the number of comparisons which executor
 * have to perform during the sort with this specific order of columns.
 *
 * The comparison factor f = 1.+F(pathkeys). There 1. incapsulates the
 * second-order of significance phusics which cost function doesn't consider.

-- 
Regards,
Alena Rybakina
Postgres Professional

Re: Consider the number of columns in the sort cost model

From
Alena Rybakina
Date:
> I played around with the examples a bit and couldn't figure out 
> something. When I added the same values ​​to different columns - 
> firstly in a, later in b, the order of the columns for sort operation 
> doesn't change. Isn't this a mistake?
>
> create table a (x1 int, y1 int);
> create table b (x2 int, y2 int);
> insert into a values (NULL, NULL);
> insert into a values (NULL, 1);
> insert into a values (1, 1);
> insert into a values (1, NULL);
>
> create index a_x1_idx on a(x1);
> create index b_x2_idx on b(x2);
> create index a_y1_idx on a(y1);
> create index b_y2_idx on b(y2);
>
> insert into b select 1, 2 from generate_series(11,20) as id;
> insert into b select 1, 1 from generate_series(1,10) as id;
> insert into b select 1, 3 from generate_series(3,30) as id;
>
> explain analyze select a.x1, s.x2, s.y2 from a left join (select 
> distinct * from b) s on a.x1=s.x2;
>                                                  QUERY PLAN
> ------------------------------------------------------------------------------------------------------------
>  Hash Right Join  (cost=44.99..48.15 rows=5 width=12) (actual 
> time=0.225..0.250 rows=8 loops=1)
>    Hash Cond: (b.x2 = a.x1)
>    ->  HashAggregate  (cost=43.90..46.16 rows=226 width=8) (actual 
> time=0.117..0.123 rows=3 loops=1)
>          Group Key: b.x2, b.y2
>          Batches: 1  Memory Usage: 40kB
>          ->  Seq Scan on b  (cost=0.00..32.60 rows=2260 width=8) 
> (actual time=0.030..0.044 rows=48 loops=1)
>    ->  Hash  (cost=1.04..1.04 rows=4 width=4) (actual 
> time=0.073..0.074 rows=4 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>          ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual 
> time=0.047..0.051 rows=4 loops=1)
>  Planning Time: 1.649 ms
>  Execution Time: 0.485 ms
> (11 rows)
>
> delete from b;
> insert into b select 2, 1 from generate_series(11,20) as id;
> insert into b select 1, 1 from generate_series(1,10) as id;
> insert into b select 3, 1 from generate_series(3,30) as id;
> vacuum analyze;
> explain analyze select a.x1, s.x2, s.y2 from a left join (select 
> distinct * from b) s on a.x1=s.x2;
>                                                   QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
>  Hash Left Join  (cost=1.79..2.86 rows=4 width=12) (actual 
> time=0.083..0.090 rows=4 loops=1)
>    Hash Cond: (a.x1 = b.x2)
>    ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual 
> time=0.010..0.011 rows=4 loops=1)
>    ->  Hash  (cost=1.75..1.75 rows=3 width=8) (actual 
> time=0.067..0.068 rows=3 loops=1)
>          Buckets: 1024  Batches: 1  Memory Usage: 9kB
>          ->  HashAggregate  (cost=1.72..1.75 rows=3 width=8) (actual 
> time=0.063..0.064 rows=3 loops=1)
>                Group Key: b.x2, b.y2
>                Batches: 1  Memory Usage: 24kB
>                ->  Seq Scan on b  (cost=0.00..1.48 rows=48 width=8) 
> (actual time=0.006..0.014 rows=48 loops=1)
>  Planning Time: 0.391 ms
>  Execution Time: 0.151 ms
> (11 rows)

Sorry, I missed vacuum analyze before deleting all data from table b, 
but after running it I still got the same plan.

alena@postgres=# create table a (x1 int, y1 int);
create table b (xcreate table a (x1 int, y1 int);
create table b (x2 int, y2 int););
insert into a values (NULL, NULL);
insert into a values (NULL, 1);
insert into a values (1, 1);L);
insert into a values (1, NULL);
create index a_x1_idx on a(x1);
create index a_x1_idx on a(x1);
create index b_x2_idx on b(x2);
create index a_y1_idx on a(y1);
create index b_y2_idx on b(y2);
insert into b select 1, 2 from generate_series(11,20) as id;
insert into b select 1, 2 from generate_series(11,20) as id;
insert into b select 1, 1 from generate_series(1,10) as id;
insert into b select 1, 3 from generate_series(3,30) as id;

alena@postgres=# vacuum analyze;
VACUUM
alena@postgres=# explain analyze select a.x1, s.x2, s.y2 from a left 
join (select distinct * from b) s on a.x1=s.x2;
                                                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------
  Hash Left Join  (cost=1.79..2.86 rows=4 width=12) (actual 
time=0.168..0.185 rows=8 loops=1)
    Hash Cond: (a.x1 = b.x2)
    ->  Seq Scan on a  (cost=0.00..1.04 rows=4 width=4) (actual 
time=0.027..0.029 rows=4 loops=1)
    ->  Hash  (cost=1.75..1.75 rows=3 width=8) (actual time=0.129..0.130 
rows=3 loops=1)
          Buckets: 1024  Batches: 1  Memory Usage: 9kB
          ->  HashAggregate  (cost=1.72..1.75 rows=3 width=8) (actual 
time=0.119..0.123 rows=3 loops=1)
                Group Key: b.x2, b.y2
                Batches: 1  Memory Usage: 24kB
                ->  Seq Scan on b  (cost=0.00..1.48 rows=48 width=8) 
(actual time=0.013..0.029 rows=48 loops=1)
  Planning Time: 1.464 ms
  Execution Time: 0.352 ms
(11 rows)

-- 
Regards,
Alena Rybakina
Postgres Professional