Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Avoid extra Sort nodes between WindowAggs when sorting can be reused
Date
Msg-id CAD21AoB3TpWcRqMOMi68V2YXApGcSL_noq2zg+bvV5QfK0EwvQ@mail.gmail.com
Whole thread Raw
In response to Re: Avoid extra Sort nodes between WindowAggs when sorting can bereused  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: Avoid extra Sort nodes between WindowAggs when sorting can bereused  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-hackers
On Mon, Jul 2, 2018 at 5:25 PM, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 26 Jun 2018, at 17:11, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:
>
>> I took a look at the patch. It applies and compiles, the tests pass.
>
> Thanks for reviewing, and apologies for the slow response.
>
>> Some thoughts about the code:
>>
>> * Postgres lists cache their lengths, so you don't need uniqueLen.
>
> Good point, fixed.
>
>> * Use an array of WindowClauseSortNode's instead of a list. It's more suitable here because you are going to sort
(qsort_listcreates a temporary array). 
>
> I was thinking about that, but opted for code simplicity with a List approach.
> The required size of the array isn’t known ahead of time, so it must either
> potentially overallocate to the upper bound of root->parse->windowClause or use
> heuristics and risk reallocating when growing, neither of which is terribly
> appealing.  Do you have any suggestions or preferences?
>
>> * Reversing the sorted list looks more confusing to me than just sorting it in the proper order right away. A
qsort()comparator returning negative means "left goes before right", but here it is effectively the other way around. 
>
> Changed.
>
>> * There is a function named make_pathkeys_for_window that makes a list of canonical pathkeys given a window clause.
Usingthis function to give you the pathkeys, and then comparing them, would be more future-proof in case we ever start
usinghashing for windowing. Moreover, the canonical pathkeys can be compared with pointer comparison which is much
fasterthan equal(). Still, I'm not sure whether it's going to be convenient to use in this case, or even whether it is
aright thing to do. What do you think? 
>
> That’s an interesting thought, one that didn’t occur to me while hacking.  I’m
> not sure whether is would be wise/clean to overload with this functionality
> though.
>
> Attached updated version also adds a testcase that was clearly missing from the
> previous version and an updated window.out.
>
> cheers ./daniel
>

Thank you for updating the patch! There are two review comments.

The current select_active_windows() function compares the all fields
of WindowClause for the sorting but with this patch we compare only
tleSortGroupRef, sortop and the number of uniqueOrder. I think this
leads a degradation as follows.

=# explain select *, sum(b) over w1, sum(a) over w2, sum(b) over w3
from w window w1 as (partition by a order by a nulls first), w2 as
(partition by a order by a), w3 as (partition by a order by a nulls
first);

* Current HEAD
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 WindowAgg  (cost=369.16..414.36 rows=2260 width=32)
   ->  Sort  (cost=369.16..374.81 rows=2260 width=24)
         Sort Key: a
         ->  WindowAgg  (cost=158.51..243.26 rows=2260 width=24)
               ->  WindowAgg  (cost=158.51..203.71 rows=2260 width=16)
                     ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
                           Sort Key: a NULLS FIRST
                           ->  Seq Scan on w  (cost=0.00..32.60
rows=2260 width=8)
(8 rows)

* With patch
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 WindowAgg 3  (cost=500.72..545.92 rows=2260 width=32)
   ->  Sort  (cost=500.72..506.37 rows=2260 width=24)
         Sort Key: a NULLS FIRST
         ->  WindowAgg 2  (cost=329.61..374.81 rows=2260 width=24)
               ->  Sort  (cost=329.61..335.26 rows=2260 width=16)
                     Sort Key: a
                     ->  WindowAgg 1  (cost=158.51..203.71 rows=2260 width=16)
                           ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
                                 Sort Key: a NULLS FIRST
                                 ->  Seq Scan on w  (cost=0.00..32.60
rows=2260 width=8)
(10 rows)

---
+                        * Generating the uniqueOrder can be offloaded
to the comparison
+                        * function to optimize for the case where we
only have a single
+                        * window.  For now, optimize for readibility.

s/readibility/readability/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Nikita Glukhov
Date:
Subject: Re: jsonpath
Next
From: Robert Haas
Date:
Subject: Re: branches_of_interest.txt