Home > mailing lists

Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options
Date	October 18, 2022 02:58:47
Msg-id	CAApHDvq_-XLUke9A_u16ttuEUFHJp6cZP8YZsSZpmOJ=aMC4uA@mail.gmail.com Whole thread Raw
In response to	Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Tue, 18 Oct 2022 at 12:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Erwin Brandstetter <brsaweda@gmail.com> writes:
> > I am thinking of building a test case to run
> > - all existing window functions
> > - with all basic variants of frame definitions
> > - once with ROWS, once with RANGE
> > - on basic table that has duplicate and NULL values in partition and
> > ordering columns
> > - in all supported major versions
>
> > To verify for which of our window functions ROWS vs. RANGE never makes a
> > difference.
> > That should be obvious in most cases, just to be sure.
>
> > Do you think this would be helpful?
>
> Doubt it.  Per the old saying "testing can prove the presence of bugs,
> but not their absence", this could prove that some functions *do*
> respond to these options, but it cannot prove that a function
> *doesn't*.  Maybe you just didn't try the right test case.

I suppose this is kind of like fuzz testing.  Going by "git log
--grep=sqlsmith", fuzzing certainly has found bugs for us in the past.
I personally wouldn't discourage Erwin from doing this.

For me, my first port of call will be to study the code of each window
function to see if the frame options can affect the result. I *do*
need to spend more time on this still. It would be good to have some
extra assurance on having read the code with some more exhaustive
testing results. If Erwin was to find result variations that I missed
then we might avoid writing some new bugs.

Also, I just did spend a little more time reading a few window
functions and I see percent_rank() is another candidate for this
optimisation. I've never needed to use that function before, but from
the following experiment, it seems to just be (rank() over (order by
...) - 1) / (count(*) over () - 1). Since rank() is already on the
list and count(*) over() contains all rows in the frame, then it seems
percent_rank() can join the club too.

create table t0 as select x*random() as a from generate_series(1,1000000)x;
select * from (select a,percent_rank() over (order by a) pr,(rank()
over (order by a) - 1) / (count(*) over () - 1)::float8 pr2  from t0)
c where pr <> pr2;

David

pgsql-hackers by date:

From: Peter Geoghegan
Date: 18 October 2022, 02:52:11
Subject: Re: New strategies for freezing, advancing relfrozenxid early

From: David Rowley
Date: 18 October 2022, 03:00:19
Subject: Re: Warning about using pg_stat_reset() and pg_stat_reset_shared()

Re: Allow WindowFuncs prosupport function to use more optimal WindowClause options - Mailing list pgsql-hackers

Previous

Next