Thread: [Question] Similar Cost but variable execution time in sort

[Question] Similar Cost but variable execution time in sort

From
Ankit Kumar Pandey
Date:
Hi,

This was noticed in 
https://www.postgresql.org/message-id/CAApHDvo2y9S2AO-BPYo7gMPYD0XE2Lo-KFLnqX80fcftqBCcyw@mail.gmail.com

I am bringing it up again.


Consider the following example:

Setup (tuple should be in memory to avoid overshadowing of disk I/O in 
the experimentation):

work_mem = 2048MB

create table abcd(a int, b int, c int, d int);
insert into abcd select x*random(), x*random(), x*random(), x*random() 
from generate_series(1, 100000)x;

select pg_prewarm(abcd);


1. explain analyze select * from abcd order by a;

                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  Sort  (cost=9845.82..10095.82 rows=100000 width=16) (actual 
time=134.113..155.990 rows=100000 loops=1)
    Sort Key: a
    Sort Method: quicksort  Memory: 8541kB
    ->  Seq Scan on abcd  (cost=0.00..1541.00 rows=100000 width=16) 
(actual time=0.013..28.418 rows=100000 loops=1)
  Planning Time: 0.392 ms
  Execution Time: 173.702 ms
(6 rows)

2. explain analyze select * from abcde order by a,b;

explain analyze select * from abcd order by a,b;
                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  Sort  (cost=9845.82..10095.82 rows=100000 width=16) (actual 
time=174.676..204.065 rows=100000 loops=1)
    Sort Key: a, b
    Sort Method: quicksort  Memory: 8541kB
    ->  Seq Scan on abcd  (cost=0.00..1541.00 rows=100000 width=16) 
(actual time=0.018..29.213 rows=100000 loops=1)
  Planning Time: 0.055 ms
  Execution Time: 229.119 ms
(6 rows)


3. explain analyze select * from abcd order by a,b,c;

                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
  Sort  (cost=9845.82..10095.82 rows=100000 width=16) (actual 
time=159.829..179.675 rows=100000 loops=1)
    Sort Key: a, b, c
    Sort Method: quicksort  Memory: 8541kB
    ->  Seq Scan on abcd  (cost=0.00..1541.00 rows=100000 width=16) 
(actual time=0.018..31.207 rows=100000 loops=1)
  Planning Time: 0.055 ms
  Execution Time: 195.393 ms
(6 rows)

In above queries, startup and total costs are same, yet execution time 
varies wildly.

Question: If cost is same for similar query, shouldn't execution time be 
similar as well?

 From my observation, we only account for data in cost computation but 
not number of

columns sorted.

Should we not account for number of columns in sort as well?


Relevant discussion: 
https://www.postgresql.org/message-id/CAApHDvoc1m_vo1+XVpMUj+Mfy6rMiPQObM9Y-jZ=Xrwc1gkPFA@mail.gmail.com


Regards,

Ankit





Re: [Question] Similar Cost but variable execution time in sort

From
Tom Lane
Date:
Ankit Kumar Pandey <itsankitkp@gmail.com> writes:
> From my observation, we only account for data in cost computation but 
> not number of columns sorted.
> Should we not account for number of columns in sort as well?

I'm not sure whether simply charging more for 2 sort columns than 1
would help much.  The traditional reasoning for not caring was that
data and I/O costs would swamp comparison costs anyway, but maybe with
ever-increasing memory sizes we're getting to the point where it is
worth refining the model for in-memory sorts.  But see the header
comment for cost_sort().

Also ... not too long ago we tried and failed to install more-complex
sort cost estimates for GROUP BY.  The commit log message for f4c7c410e
gives some of the reasons why that failed, but what it boils down to
is that useful estimates would require information we don't have, such
as a pretty concrete idea of the relative costs of different datatypes'
comparison functions.

In short, maybe there's something to be done here, but I'm afraid
there is a lot of infrastructure slogging needed first, if you want
estimates that are better than garbage-in-garbage-out.

            regards, tom lane



Re: [Question] Similar Cost but variable execution time in sort

From
Ankit Kumar Pandey
Date:
> On 05/03/23 22:21, Tom Lane wrote:

> Ankit Kumar Pandey <itsankitkp@gmail.com> writes:
> > From my observation, we only account for data in cost computation but 
> > not number of columns sorted.
> > Should we not account for number of columns in sort as well?
>
> I'm not sure whether simply charging more for 2 sort columns than 1
> would help much.  The traditional reasoning for not caring was that
> data and I/O costs would swamp comparison costs anyway, but maybe with
> ever-increasing memory sizes we're getting to the point where it is
> worth refining the model for in-memory sorts.  But see the header
> comment for cost_sort().
> 
> Also ... not too long ago we tried and failed to install more-complex
> sort cost estimates for GROUP BY.  The commit log message for f4c7c410e
> gives some of the reasons why that failed, but what it boils down to
> is that useful estimates would require information we don't have, such
> as a pretty concrete idea of the relative costs of different datatypes'
> comparison functions.
>
> In short, maybe there's something to be done here, but I'm afraid
> there is a lot of infrastructure slogging needed first, if you want
> estimates that are better than garbage-in-garbage-out.
>
>            regards, tom lane

Thanks, I can see the challenges in this.

Regards,
Ankit