Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options
Date
Msg-id CAApHDvq_-XLUke9A_u16ttuEUFHJp6cZP8YZsSZpmOJ=aMC4uA@mail.gmail.com
Whole thread Raw
In response to Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, 18 Oct 2022 at 12:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Erwin Brandstetter <brsaweda@gmail.com> writes:
> > I am thinking of building a test case to run
> > - all existing window functions
> > - with all basic variants of frame definitions
> > - once with ROWS, once with RANGE
> > - on basic table that has duplicate and NULL values in partition and
> > ordering columns
> > - in all supported major versions
>
> > To verify for which of our window functions ROWS vs. RANGE never makes a
> > difference.
> > That should be obvious in most cases, just to be sure.
>
> > Do you think this would be helpful?
>
> Doubt it.  Per the old saying "testing can prove the presence of bugs,
> but not their absence", this could prove that some functions *do*
> respond to these options, but it cannot prove that a function
> *doesn't*.  Maybe you just didn't try the right test case.

I suppose this is kind of like fuzz testing.  Going by "git log
--grep=sqlsmith", fuzzing certainly has found bugs for us in the past.
I personally wouldn't discourage Erwin from doing this.

For me, my first port of call will be to study the code of each window
function to see if the frame options can affect the result. I *do*
need to spend more time on this still. It would be good to have some
extra assurance on having read the code with some more exhaustive
testing results. If Erwin was to find result variations that I missed
then we might avoid writing some new bugs.

Also, I just did spend a little more time reading a few window
functions and I see percent_rank() is another candidate for this
optimisation. I've never needed to use that function before, but from
the following experiment, it seems to just be (rank() over (order by
...) - 1) / (count(*) over () - 1). Since rank() is already on the
list and count(*) over() contains all rows in the frame, then it seems
percent_rank() can join the club too.

create table t0 as select x*random() as a from generate_series(1,1000000)x;
select * from (select a,percent_rank() over (order by a) pr,(rank()
over (order by a) - 1) / (count(*) over () - 1)::float8 pr2  from t0)
c where pr <> pr2;

David



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Next
From: David Rowley
Date:
Subject: Re: Warning about using pg_stat_reset() and pg_stat_reset_shared()