Re: Implement for window functions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Implement for window functions
Date
Msg-id 1037735.1610402426@sss.pgh.pa.us
Whole thread Raw
In response to Re: Implement for window functions  (Zhihong Yu <zyu@yugabyte.com>)
Responses Re: Implement for window functions
List pgsql-hackers
I started to look through this patch, and the first thing I'm wondering
about is why bother with a new pg_proc column, ie why not just apply the
behavior to any window function?  The fact that the SQL committee
restricts this syntax to a few window functions is just their usual
design tic of creating one-off syntax that could be much more general.
We've not hesitated to generalize in similar situations in the past.

The main thing I can see against that position is that it's not very
clear what to do if the window function has more than one window-ized
argument --- or at least, the only plausible interpretation would be
to ignore rows in which any of those arguments is null, which this
implementation is incapable of doing (since we don't know exactly
which arguments the function will try to treat as window-ized).
However, having a pronulltreatment column isn't helping that
situation at all: somebody could mark a multi-input window function
as ignoring nulls, and we'd silently do the wrong thing in any case
where those inputs weren't nulls at exactly the same rows.

My thought, therefore, is to drop pg_proc.pronulltreatment and instead
enforce an implementation restriction that when IGNORE NULLS is specified, 
WinGetFuncArgInPartition and WinGetFuncArgInFrame throw an error if
asked about any argument position other than the first one.  As long
as only the first argument is window-ized, the implementation you have
here will act correctly.  If anybody ever finds that annoying, they can
figure out how to relax the restriction at that time.

The need for a TREAT NULLS option to CREATE FUNCTION would thereby
also go away, which is good because I don't think this patch has
fully implemented that (notably, I don't see any pg_dump changes).

As far as the actual implementation goes:

* The undocumented relationship between "relpos" (which used to be
constant and now isn't) and "target" and "step" makes my head hurt.
I'm sure this could be redesigned to be simpler, or if not, at
least it should be commented a lot more thoroughly.

* I'm quite concerned about performance; it looks like this will
degrade to O(N^2) in practical situations, which isn't going to
satisfy anyone.  I think we need to track how many nulls we've
already seen so that we aren't re-visiting earlier rows over and
over.  That should make it possible to un-disable the set_mark
optimization, which is something that's independently catastrophic
for performance.  While I've not stopped to design this fully, maybe
we could keep state along the lines of "there are j rows with null
values of the window-ized argument before row k of the partition."
Updating that by dead reckoning as we navigate would be enough to
fix the O(N^2) problem for typical scenarios.  I think.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Next
From: Daniel Gustafsson
Date:
Subject: Re: Online checksums patch - once again