Thread: Is postgres able to share sorts required by common partition window functions?

Is postgres able to share sorts required by common partition window functions?

From

Sebastien Arod

Date:

06 July 2020, 16:43:31

Hi all,

I'm trying to optimize the following query on postgres 11.6 (running on Aurora)
select distinct
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by c3) AS c3,
first_value(c4) OVER (PARTITION BY c1 order by c4) AS c4
from
t;

From the explain plan (attached at the end of the email) I see that postgresql is doing several sorts one per window function and one for the distinct that seems ok.

However all the window functions being on the same partition I would have expected postgresql to "share" a preliminary sort on c1 that would then be useful to reduce the work on all window functions but it doesn't.

I even created an index on c1 hoping that postgresql would be able to use it in order to minimize the cost of the sorts but I couldn't make it use it.

Is there something I am missing?

You can find below a script to set up a table and data to reproduce as well as the explain plan.

Setup Script

create table t(
pk varchar(200) PRIMARY key,
c1 varchar(200),
c2 varchar(200),
c3 varchar(200),
c4 varchar(200)
);
create index i1 on t (c1);

insert into t
(pk, c1, c2, c3, c4 )
select
generate_series::text pk,
'Grp' ||(generate_series / 4)::text c1,
generate_series::text c2,
generate_series::text c3,
generate_series::text c4
from generate_series(0, 1000000);

Explain Plan

Unique (cost=808480.87..820980.88 rows=1000001 width=123) (actual time=7131.675..7781.082 rows=250001 loops=1)
-> Sort (cost=808480.87..810980.87 rows=1000001 width=123) (actual time=7131.673..7603.926 rows=1000001 loops=1)
Sort Key: c1, (first_value(c2) OVER (?)), (first_value(c3) OVER (?)), (first_value(c4) OVER (?))
Sort Method: external merge Disk: 59640kB
-> WindowAgg (cost=558937.90..578937.92 rows=1000001 width=123) (actual time=5179.374..6268.937 rows=1000001 loops=1)
-> Sort (cost=558937.90..561437.90 rows=1000001 width=91) (actual time=5179.355..5679.136 rows=1000001 loops=1)
Sort Key: c1, c4
Sort Method: external merge Disk: 52912kB
-> WindowAgg (cost=336736.93..356736.95 rows=1000001 width=91) (actual time=3260.950..4389.116 rows=1000001 loops=1)
-> Sort (cost=336736.93..339236.93 rows=1000001 width=59) (actual time=3260.934..3778.385 rows=1000001 loops=1)
Sort Key: c1, c3
Sort Method: external merge Disk: 46176kB
-> WindowAgg (cost=141877.96..161877.98 rows=1000001 width=59) (actual time=1444.692..2477.284 rows=1000001 loops=1)
-> Sort (cost=141877.96..144377.96 rows=1000001 width=27) (actual time=1444.669..1906.993 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge Disk: 39424kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=27) (actual time=0.011..177.815 rows=1000001 loops=1)
Planning Time: 0.214 ms
Execution Time: 7839.646 ms

Re: Is postgres able to share sorts required by common partition window functions?

From

Michael Lewis

Date:

06 July 2020, 19:55:00

Does this give the same result and do the optimization you want?

select
c1,
min(c2) AS c2,
min(c3) AS c3,
min(c4) AS c4
from
t

group by

c1;

Re: Is postgres able to share sorts required by common partition window functions?

From

Sebastien Arod

Date:

06 July 2020, 20:28:48

Hi Michael,

I simplified the real query before posting it here and I now realize that I oversimplified things.

Unfortunately the real query cannot be re-written with a group by.

Some of the window functions are more complex with order by clause using complex expressions involving multiple columns.

The following example would be more realistic:

select distinct
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2, c4) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3) AS c3

from
t;

On Mon, Jul 6, 2020 at 9:55 PM Michael Lewis <mlewis@entrata.com> wrote:

Does this give the same result and do the optimization you want?

select
c1,
min(c2) AS c2,
min(c3) AS c3,
min(c4) AS c4
from
t
group by
c1;

Re: Is postgres able to share sorts required by common partition window functions?

From

Michael Lewis

Date:

06 July 2020, 22:19:20

Distinct is a great way to get quick results when writing quick & dirty queries, but I rarely have them perform better than a re-write that avoids the need. It collects a ton of results, orders them, and throws away duplicates in the process. I don't love the idea of that extra work. Did you say you have an index on c1?

select
c1,
sub1.c2,
sub2.c3
from
t
join lateral (select c2 from t1 where t1.c1 = t.c1 order by c2, c4 limit 1 ) as sub1 on true
join lateral (select c3 from t1 where t1.c1 = t.c1 order by coalesce(c4, '000'), c3 limit 1 ) as sub2 on true;

select
c1,
(select c2 from t1 where t1.c1 = t.c1 order by c2, c4 limit 1 ) AS c2,
(select c3 from t1 where t1.c1 = t.c1 order by coalesce(c4, '000'), c3 limit 1 ) AS c3
from
t;

I don't know the data, but I assume there may be many rows with the same c1 value, so then you would likely benefit from getting that distinct set first like below as your FROM table.

(select c1 from t group by c1 ) AS t

Re: Is postgres able to share sorts required by common partition window functions?

From

"David G. Johnston"

Date:

06 July 2020, 22:40:52

On Monday, July 6, 2020, Michael Lewis <mlewis@entrata.com> wrote:

Did you say you have an index on c1?
[...]
I don't know the data, but I assume there may be many rows with the same c1 value, so then you would likely benefit from getting that distinct set first like below as your FROM table.

Re-reading the original email I see both the answer to your question and the data being queried.

David J.

Re: Is postgres able to share sorts required by common partition window functions?

From

"David G. Johnston"

Date:

06 July 2020, 22:48:34

On Monday, July 6, 2020, Sebastien Arod <sebastien.arod@gmail.com> wrote:

I would have expected postgresql to "share" a preliminary sort on c1 that would then be useful to reduce the work on all window functions but it doesn't.

The plan shown does share - the output of one sort goes into another. Subsequent sorts still have to happen but they should be faster as the first field is already grouped. Doesn’t change the plan though.

I even created an index on c1 hoping that postgresql would be able to use it in order to minimize the cost of the sorts but I couldn't make it use it.

Use it how? You are still evaluating 250k groups so doing anything piece-wise seems like an expected loss compared to sequential scan.

David J.

Re: Is postgres able to share sorts required by common partition window functions?

From

Sebastien Arod

Date:

07 July 2020, 15:01:20

Michael, David thanks for your quick replies.

@Michael

I initially dismissed writing this query using joins or subselects because the real query has about 80 columns and I was afraid that having 80 joins/subselect would cause issues with postgresql including planner that would fallback to GEQO.

I'll test it anyway with real data and see how it behaves.

@David About Subsequent sorts

From my experiments the time of subsequent sorts is not reduced by previous sort but is in fact higher.

In the following example in Query 1 Sort node on key c1, c2 has a time of ~1893 excluding child nodes

In the following example in Query 2

Sort node on key c1 has a time of ~1558 excluding child nodes
Sort node on key c1, c2 has a time of ~2964 excluding child nodes which is higher than the time for the same sort key in Query 1 but in Query 2 data postgresql could have benefited from a preliminary sort on c1.

I can't figure why it would be slower.

Query1

explain analyze select
first_value(c2) OVER (PARTITION BY c1 order by c2)
from t;

Explain Plan Query1

WindowAgg (cost=135042.46..155042.48 rows=1000001 width=47) (actual time=1555.005..2639.659 rows=1000001 loops=1)
-> Sort (cost=135042.46..137542.46 rows=1000001 width=15) (actual time=1554.989..2070.462 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge Disk: 25960kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=15) (actual time=0.013..177.037 rows=1000001 loops=1)
Planning Time: 0.115 ms
Execution Time: 2693.935 ms

Query2

explain analyze select
first_value(c2) OVER (PARTITION BY c1),
first_value(c2) OVER (PARTITION BY c1 order by c2)

from t;

Explain Plan Query2

WindowAgg (cost=313730.43..333730.45 rows=1000001 width=79) (actual time=4692.257..5702.226 rows=1000001 loops=1)
-> Sort (cost=313730.43..316230.43 rows=1000001 width=47) (actual time=4692.152..5174.928 rows=1000001 loops=1)
Sort Key: c1, c2
Sort Method: external merge Disk: 32688kB
-> WindowAgg (cost=135042.46..152542.48 rows=1000001 width=47) (actual time=1188.109..2210.845 rows=1000001 loops=1)
-> Sort (cost=135042.46..137542.46 rows=1000001 width=15) (actual time=1188.099..1709.253 rows=1000001 loops=1)
Sort Key: c1
Sort Method: external merge Disk: 25960kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=15) (actual time=0.014..150.435 rows=1000001 loops=1)
Planning Time: 0.059 ms
Execution Time: 5756.591 ms

@David About using an index

You are correct that a simple index cannot be leveraged efficiently.

However a covering index including all columns and starting with the partition column c1 could be used to avoid sorting on c1 right?

But in my tests it is not used. In the following example covering_index could be used to avoid filtering on c1 but it doesn't:

Covering Index

create index covering_index on t (c1,c2, c3,c4);

Query

explain analyze select
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2, c4) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3) AS c3
from
t;

Plan
WindowAgg (cost=417853.93..440353.95 rows=1000001 width=123) (actual time=3071.980..4184.030 rows=1000001 loops=1)
-> Sort (cost=417853.93..420353.93 rows=1000001 width=91) (actual time=3071.962..3575.641 rows=1000001 loops=1)
Sort Key: c1, (COALESCE(c4, '000'::character varying)), c3
Sort Method: external merge Disk: 52912kB
-> WindowAgg (cost=193152.96..215652.98 rows=1000001 width=91) (actual time=1268.373..2271.463 rows=1000001 loops=1)
-> Sort (cost=193152.96..195652.96 rows=1000001 width=59) (actual time=1268.357..1711.526 rows=1000001 loops=1)
Sort Key: c1, c2, c4
Sort Method: external merge Disk: 46168kB
-> Seq Scan on t (cost=0.00..18294.01 rows=1000001 width=59) (actual time=0.014..163.347 rows=1000001 loops=1)
Planning Time: 0.081 ms
Execution Time: 4250.367 ms

However the index is used if one of the partition by + order by matches entirely the beginning of the index. For instance query:

select
c1,
first_value(c2) OVER (PARTITION BY c1 order by c2) AS c2,
first_value(c3) OVER (PARTITION BY c1 order by coalesce(c4, '000'), c3) AS c3
from
t;

Uses covering_index to avoid the sort on c1, c2

On Tue, Jul 7, 2020 at 12:48 AM David G. Johnston <david.g.johnston@gmail.com> wrote:

On Monday, July 6, 2020, Sebastien Arod <sebastien.arod@gmail.com> wrote:
I would have expected postgresql to "share" a preliminary sort on c1 that would then be useful to reduce the work on all window functions but it doesn't.

The plan shown does share - the output of one sort goes into another. Subsequent sorts still have to happen but they should be faster as the first field is already grouped. Doesn’t change the plan though.

I even created an index on c1 hoping that postgresql would be able to use it in order to minimize the cost of the sorts but I couldn't make it use it.

Use it how? You are still evaluating 250k groups so doing anything piece-wise seems like an expected loss compared to sequential scan.

David J.

Re: Is postgres able to share sorts required by common partition window functions?

From

Michael Lewis

Date:

07 July 2020, 17:24:01

On Monday, July 6, 2020, Michael Lewis <mlewis@entrata.com> wrote:

Did you say you have an index on c1?
[...]
I don't know the data, but I assume there may be many rows with the same c1 value, so then you would likely benefit from getting that distinct set first like below as your FROM table.

> Re-reading the original email I see both the answer to your question and the data being queried.

> David J.

Thanks David. I meant it as a rhetorical question, since yes of course there was an index. I also didn't trust the example to be true to real data in terms of c1 values distribution.

On Tue, Jul 7, 2020 at 9:01 AM Sebastien Arod <sebastien.arod@gmail.com> wrote:

Michael, David thanks for your quick replies.

@Michael
I initially dismissed writing this query using joins or subselects because the real query has about 80 columns and I was afraid that having 80 joins/subselect would cause issues with postgresql including planner that would fallback to GEQO.
I'll test it anyway with real data and see how it behaves.

Contrived and overly simplified examples often lead to uninformed, bad advice. I would not attempt joins, unless the number of distinct c1 values is relatively small perhaps. It might go fine though, and depending on your query and the statistics on the table, perhaps join_collapse_limit = 1 would be prudent to constrain the planner to your desired plan and not introduce the chance for the genetic optimizer to get involved.

Sort Method: external merge Disk: 52912kB

Sort Method: external merge Disk: 46168kB

What is your work_mem set to? Would it be possible to set it higher (for this process) to avoid spilling to disk?