Re: Implement for window functions - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Implement |
Date | |
Msg-id | 1037735.1610402426@sss.pgh.pa.us Whole thread Raw |
In response to |
Re: Implement |
Responses |
Re: Implement |
List | pgsql-hackers |
I started to look through this patch, and the first thing I'm wondering about is why bother with a new pg_proc column, ie why not just apply the behavior to any window function? The fact that the SQL committee restricts this syntax to a few window functions is just their usual design tic of creating one-off syntax that could be much more general. We've not hesitated to generalize in similar situations in the past. The main thing I can see against that position is that it's not very clear what to do if the window function has more than one window-ized argument --- or at least, the only plausible interpretation would be to ignore rows in which any of those arguments is null, which this implementation is incapable of doing (since we don't know exactly which arguments the function will try to treat as window-ized). However, having a pronulltreatment column isn't helping that situation at all: somebody could mark a multi-input window function as ignoring nulls, and we'd silently do the wrong thing in any case where those inputs weren't nulls at exactly the same rows. My thought, therefore, is to drop pg_proc.pronulltreatment and instead enforce an implementation restriction that when IGNORE NULLS is specified, WinGetFuncArgInPartition and WinGetFuncArgInFrame throw an error if asked about any argument position other than the first one. As long as only the first argument is window-ized, the implementation you have here will act correctly. If anybody ever finds that annoying, they can figure out how to relax the restriction at that time. The need for a TREAT NULLS option to CREATE FUNCTION would thereby also go away, which is good because I don't think this patch has fully implemented that (notably, I don't see any pg_dump changes). As far as the actual implementation goes: * The undocumented relationship between "relpos" (which used to be constant and now isn't) and "target" and "step" makes my head hurt. I'm sure this could be redesigned to be simpler, or if not, at least it should be commented a lot more thoroughly. * I'm quite concerned about performance; it looks like this will degrade to O(N^2) in practical situations, which isn't going to satisfy anyone. I think we need to track how many nulls we've already seen so that we aren't re-visiting earlier rows over and over. That should make it possible to un-disable the set_mark optimization, which is something that's independently catastrophic for performance. While I've not stopped to design this fully, maybe we could keep state along the lines of "there are j rows with null values of the window-ized argument before row k of the partition." Updating that by dead reckoning as we navigate would be enough to fix the O(N^2) problem for typical scenarios. I think. regards, tom lane
pgsql-hackers by date: