Home > mailing lists

Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys - Mailing list pgsql-hackers

From	Richard Guo
Subject	Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys
Date	June 8, 2023 09:11:17
Msg-id	CAMbWs49jxZkfSP-9rG0Go4W7=pXCgS7jcjYJEERGJF3SoOFAqg@mail.gmail.com Whole thread
In response to	Remove WindowClause PARTITION BY items belonging to redundant pathkeys (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys
List	pgsql-hackers

Tree view

On Thu, Jun 8, 2023 at 7:37 AM David Rowley <dgrowleyml@gmail.com> wrote:

What the attached patch does is process each WindowClause and removes
any items from the PARTITION BY clause that are columns or expressions
relating to redundant PathKeys.

Effectively, this allows the nodeWindowAgg.c code which stops
processing WindowAgg rows when the run condition is met to work as the
PARTITION BY clause is completely removed in the case of the above
query. Removing the redundant PARTITION BY items also has the added
benefit of not having to needlessly check if the next row belongs to
the same partition as the last row. For the above, that check is a
waste of time as all rows have relkind = 'r'

This is a nice optimization. I reviewed it and here are my findings.

In create_windowagg_plan there is such comment that says

* ... Note: in principle, it's possible
* to drop some of the sort columns, if they were proved redundant by
* pathkey logic. However, it doesn't seem worth going out of our way to
* optimize such cases.

Since this patch removes any clauses from the wc->partitionClause for
redundant pathkeys, this comment seems outdated, at least for the sort
columns in partitionClause.

Also I'm wondering if we can do the same optimization to
wc->orderClause. I tested it with the query below and saw performance
gains.

create table t (a int, b int);
insert into t select 1,2 from generate_series(1,100000)i;
analyze t;

explain analyze
select * from
(select a, b, rank() over (PARTITION BY a order by b) rank
from t where b = 2)
where a = 1 and rank <= 10;

With and without this optimization to wc->orderClause the execution time
is 67.279 ms VS. 119.120 ms (both best of 3).

I notice you comment in the patch that doing this is unsafe because it
would change the semantics of peer rows during execution. Would you
please elaborate on that?

Thanks
Richard

pgsql-hackers by date:

From: "Daniel Verite"
Date: 08 June 2023, 09:03:25
Subject: Re: Order changes in PG16 since ICU introduction

From: "Joel Jacobson"
Date: 08 June 2023, 09:41:35
Subject: Re: Do we want a hashset type?

Re: Remove WindowClause PARTITION BY items belonging to redundant pathkeys - Mailing list pgsql-hackers

Previous

Next