Home > mailing lists

Re: Implement for window functions - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Implement for window functions
Date	January 11, 2021 22:00:26
Msg-id	1037735.1610402426@sss.pgh.pa.us Whole thread Raw
In response to	Re: Implement for window functions (Zhihong Yu <zyu@yugabyte.com>)
Responses	Re: Implement for window functions
List	pgsql-hackers

Tree view

I started to look through this patch, and the first thing I'm wondering
about is why bother with a new pg_proc column, ie why not just apply the
behavior to any window function?  The fact that the SQL committee
restricts this syntax to a few window functions is just their usual
design tic of creating one-off syntax that could be much more general.
We've not hesitated to generalize in similar situations in the past.

The main thing I can see against that position is that it's not very
clear what to do if the window function has more than one window-ized
argument --- or at least, the only plausible interpretation would be
to ignore rows in which any of those arguments is null, which this
implementation is incapable of doing (since we don't know exactly
which arguments the function will try to treat as window-ized).
However, having a pronulltreatment column isn't helping that
situation at all: somebody could mark a multi-input window function
as ignoring nulls, and we'd silently do the wrong thing in any case
where those inputs weren't nulls at exactly the same rows.

My thought, therefore, is to drop pg_proc.pronulltreatment and instead
enforce an implementation restriction that when IGNORE NULLS is specified, 
WinGetFuncArgInPartition and WinGetFuncArgInFrame throw an error if
asked about any argument position other than the first one.  As long
as only the first argument is window-ized, the implementation you have
here will act correctly.  If anybody ever finds that annoying, they can
figure out how to relax the restriction at that time.

The need for a TREAT NULLS option to CREATE FUNCTION would thereby
also go away, which is good because I don't think this patch has
fully implemented that (notably, I don't see any pg_dump changes).

As far as the actual implementation goes:

* The undocumented relationship between "relpos" (which used to be
constant and now isn't) and "target" and "step" makes my head hurt.
I'm sure this could be redesigned to be simpler, or if not, at
least it should be commented a lot more thoroughly.

* I'm quite concerned about performance; it looks like this will
degrade to O(N^2) in practical situations, which isn't going to
satisfy anyone.  I think we need to track how many nulls we've
already seen so that we aren't re-visiting earlier rows over and
over.  That should make it possible to un-disable the set_mark
optimization, which is something that's independently catastrophic
for performance.  While I've not stopped to design this fully, maybe
we could keep state along the lines of "there are j rows with null
values of the window-ized argument before row k of the partition."
Updating that by dead reckoning as we navigate would be enough to
fix the O(N^2) problem for typical scenarios.  I think.

            regards, tom lane

pgsql-hackers by date:

From: Tomas Vondra
Date: 11 January 2021, 21:51:21
Subject: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

From: Daniel Gustafsson
Date: 11 January 2021, 23:07:57
Subject: Re: Online checksums patch - once again

Re: Implement for window functions - Mailing list pgsql-hackers

Previous

Next