Re: Printing window function OVER clauses in EXPLAIN - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Printing window function OVER clauses in EXPLAIN
Date
Msg-id CAKFQuwYqj04gc-t1u27vKKFavF1ME8aFhmv7p8LscqSPnK22xA@mail.gmail.com
Whole thread Raw
In response to Re: Printing window function OVER clauses in EXPLAIN  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Sat, Mar 8, 2025 at 6:15 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Sun, 9 Mar 2025 at 10:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> One thing that puzzled me a bit is that many of the outputs
> show "ROWS UNBOUNDED PRECEDING" in window functions where that
> definitely wasn't in the source query.  Eventually I realized
> that that comes from window_row_number_support() and cohorts
> optimizing the query.  While this isn't wrong, I suspect it
> will cause a lot of confusion and questions.  I wonder if we
> should do something to hide the change?

I suspect it might be more confusing if we were to show the user the
original frame options. Isn't EXPLAIN meant to be a window into the
plan that's been or would be executed? I think it would be misleading
to display something different to what will be executed.


Looking at this example:

SELECT
    empno,
    depname,
    row_number() OVER (PARTITION BY depname ORDER BY enroll_date) rn,
    rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS BETWEEN
                 UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) rnk,
    count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN
                   CURRENT ROW AND CURRENT ROW) cnt
FROM empsalary;

The new output is:

 WindowAgg
   Output: empno, depname, (row_number() OVER (PARTITION BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING)), (rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING)), count(*) OVER (PARTITION BY depname ORDER BY enroll_date RANGE BETWEEN CURRENT ROW AND CURRENT ROW), enroll_date
   ->  WindowAgg
         Output: depname, enroll_date, empno, row_number() OVER (PARTITION BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING), rank() OVER (PARTITION BY depname ORDER BY enroll_date ROWS UNBOUNDED PRECEDING)
         ->  Sort
               Output: depname, enroll_date, empno
               Sort Key: empsalary.depname, empsalary.enroll_date
               ->  Seq Scan on pg_temp.empsalary
                     Output: depname, enroll_date, empno


It is kinda annoying that row_number and rank have their entire expression output twice when the computation only happens once.  But that is outside the scope; just making an observation.  It just becomes even worse when we fill in the details.

As for the optimization, any reason to not just show that it was done?  In optimize_window_clauses arrange to save the existing_wc somewhere on the relevant window functions then, in explain, output something like:

-> WindowAgg
Output: depname, enroll_date, empno, row_number() OVER (...), rank() OVER (...)
Reframed: row_number() from (default) RANGE => ROWS
(I'm unsure whether we can write "default" here though, it isn't critical.)
Reframed: rank() from UNBOUNDED FOLLOWING => CURRENT ROW

(I initially put the entire frame clause, without omitting default frame_end, there but then figured it defeated the point.  We should only show those elements (type, start, end) that actually are different between the parsed query and what gets executed.)

Which does bring up the point, to what extent should the explain output rely on defaults versus being explicit?  We are omitting frame_end of CURRENT ROW generally here.

David J.

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Commitfest app release on Feb 17 with many improvements
Next
From: Alexander Korotkov
Date:
Subject: Re: pgsql: reindexdb: Add the index-level REINDEX with multiple jobs