Is postgres able to share sorts required by common partition window functions? - Mailing list pgsql-general

From Sebastien Arod
Subject Is postgres able to share sorts required by common partition window functions?
Date
Msg-id CADd42iFZWwYNsXjEM_3HWK3QnfiCrMNmpOkZqyBQCabnVxOPtw@mail.gmail.com
Whole thread Raw
Responses Re: Is postgres able to share sorts required by common partition window functions?
Re: Is postgres able to share sorts required by common partition window functions?
List pgsql-general
Hi all,

I'm trying to optimize the following query on postgres 11.6 (running on Aurora)
select distinct
  c1,
  first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2,
  first_value(c3) OVER (PARTITION BY c1 order by c3) AS c3,
  first_value(c4) OVER (PARTITION BY c1 order by c4) AS c4
from  
  t;

 
From the explain plan (attached at the end of the email) I see that postgresql is doing several sorts one per window function and one for the distinct that seems ok.
However all the window functions being on the same partition I would have expected postgresql to "share" a preliminary sort on c1 that would then be useful to reduce the work on all window functions but it doesn't.
I even created an index on c1 hoping that postgresql would be able to use it in order to minimize the cost of the sorts but I couldn't make it use it.

Is there something I am missing?

You can find below a script to set up a table and data to reproduce as well as the explain plan.

Setup Script
create table t(
pk varchar(200) PRIMARY key,
c1 varchar(200),
c2 varchar(200),
c3 varchar(200),
c4 varchar(200)
);
create index i1 on t (c1);

insert into t
(pk, c1, c2, c3, c4 )
select
 generate_series::text pk,
 'Grp' ||(generate_series / 4)::text c1,
 generate_series::text c2,
 generate_series::text c3,
 generate_series::text c4
from generate_series(0, 1000000);

Explain Plan
Unique  (cost=808480.87..820980.88 rows=1000001 width=123) (actual time=7131.675..7781.082 rows=250001 loops=1)
  ->  Sort  (cost=808480.87..810980.87 rows=1000001 width=123) (actual time=7131.673..7603.926 rows=1000001 loops=1)
        Sort Key: c1, (first_value(c2) OVER (?)), (first_value(c3) OVER (?)), (first_value(c4) OVER (?))
        Sort Method: external merge  Disk: 59640kB
        ->  WindowAgg  (cost=558937.90..578937.92 rows=1000001 width=123) (actual time=5179.374..6268.937 rows=1000001 loops=1)
              ->  Sort  (cost=558937.90..561437.90 rows=1000001 width=91) (actual time=5179.355..5679.136 rows=1000001 loops=1)
                    Sort Key: c1, c4
                    Sort Method: external merge  Disk: 52912kB
                    ->  WindowAgg  (cost=336736.93..356736.95 rows=1000001 width=91) (actual time=3260.950..4389.116 rows=1000001 loops=1)
                          ->  Sort  (cost=336736.93..339236.93 rows=1000001 width=59) (actual time=3260.934..3778.385 rows=1000001 loops=1)
                                Sort Key: c1, c3
                                Sort Method: external merge  Disk: 46176kB
                                ->  WindowAgg  (cost=141877.96..161877.98 rows=1000001 width=59) (actual time=1444.692..2477.284 rows=1000001 loops=1)
                                      ->  Sort  (cost=141877.96..144377.96 rows=1000001 width=27) (actual time=1444.669..1906.993 rows=1000001 loops=1)
                                            Sort Key: c1, c2
                                            Sort Method: external merge  Disk: 39424kB
                                            ->  Seq Scan on t  (cost=0.00..18294.01 rows=1000001 width=27) (actual time=0.011..177.815 rows=1000001 loops=1)
Planning Time: 0.214 ms
Execution Time: 7839.646 ms

pgsql-general by date:

Previous
From: Christophe Courtois
Date:
Subject: Pgbench : vacuum default behaviour history
Next
From: Michael Lewis
Date:
Subject: Re: Apply LIMIT when computation is logically irrelevant