Thread: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
> The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, lag, [...]. This is not implemented in PostgreSQL
(http://www.postgresql.org/docs/devel/static/functions-window.html)
I've had a go at implementing this, and I've attached the resulting patch. It's not finished yet, but I was hoping to find out if my solution is along the right lines.
In particular, I'm storing the ignore-nulls flag in the frameOptions of a window function definition, and am adding a function to the windowapi.h to get at these options. I'm keeping the last non-null value in WinGetPartitionLocalMemory (which I hope is the right place), but I'm not using any of the *GetDatum macros to access it.
An example of my change's behaviour:
nwhite=# select *, lag(num,0) ignore nulls over (order by generate_series) from
nwhite-# (select generate_series from generate_series(0,10)) s
nwhite-# left outer join
nwhite-# numbers n
nwhite-# on (s.generate_series = n.num);
generate_series | num | lag
-----------------+-----+-----
0 | |
1 | 1 | 1
2 | | 1
3 | | 1
4 | 4 | 4
5 | 5 | 5
6 | | 5
7 | | 5
8 | | 5
9 | 9 | 9
10 | | 9
(11 rows)
I'd find this feature really useful, so I hope you can help me get my patch to a contributable state.
Thanks -
Nick
(http://www.postgresql.org/docs/devel/static/functions-window.html)
I've had a go at implementing this, and I've attached the resulting patch. It's not finished yet, but I was hoping to find out if my solution is along the right lines.
In particular, I'm storing the ignore-nulls flag in the frameOptions of a window function definition, and am adding a function to the windowapi.h to get at these options. I'm keeping the last non-null value in WinGetPartitionLocalMemory (which I hope is the right place), but I'm not using any of the *GetDatum macros to access it.
An example of my change's behaviour:
nwhite=# select *, lag(num,0) ignore nulls over (order by generate_series) from
nwhite-# (select generate_series from generate_series(0,10)) s
nwhite-# left outer join
nwhite-# numbers n
nwhite-# on (s.generate_series = n.num);
generate_series | num | lag
-----------------+-----+-----
0 | |
1 | 1 | 1
2 | | 1
3 | | 1
4 | 4 | 4
5 | 5 | 5
6 | | 5
7 | | 5
8 | | 5
9 | 9 | 9
10 | | 9
(11 rows)
I'd find this feature really useful, so I hope you can help me get my patch to a contributable state.
Thanks -
Nick
Attachment
Nicholas White <n.j.white@gmail.com> writes: > The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, > lag, [...]. This is not implemented in PostgreSQL > (http://www.postgresql.org/docs/devel/static/functions-window.html) > I've had a go at implementing this, and I've attached the resulting patch. > It's not finished yet, but I was hoping to find out if my solution is along > the right lines. Since we're trying to get 9.3 to closure, this patch probably isn't going to get much attention until the 9.4 development cycle starts (in a couple of months, likely). In the meantime, please add it to the next commitfest list so we remember to come back to it: https://commitfest.postgresql.org/action/commitfest_view?id=18 One comment just from a quick eyeball look is that we really hate adding new keywords that aren't UNRESERVED, because that risks breaking existing applications. Please see if you can refactor the grammar to make those new entries unreserved. regards, tom lane
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
Thanks - I've added it here: https://commitfest.postgresql.org/action/patch_view?id=1096 .
I've also attached a revised version that makes IGNORE and RESPECT UNRESERVED keywords (following the pattern of NULLS_FIRST and NULLS_LAST).
NickI've also attached a revised version that makes IGNORE and RESPECT UNRESERVED keywords (following the pattern of NULLS_FIRST and NULLS_LAST).
On 23 March 2013 14:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Nicholas White <n.j.white@gmail.com> writes:Since we're trying to get 9.3 to closure, this patch probably isn't
> The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead,
> lag, [...]. This is not implemented in PostgreSQL
> (http://www.postgresql.org/docs/devel/static/functions-window.html)
> I've had a go at implementing this, and I've attached the resulting patch.
> It's not finished yet, but I was hoping to find out if my solution is along
> the right lines.
going to get much attention until the 9.4 development cycle starts
(in a couple of months, likely). In the meantime, please add it to
the next commitfest list so we remember to come back to it:
https://commitfest.postgresql.org/action/commitfest_view?id=18
One comment just from a quick eyeball look is that we really hate
adding new keywords that aren't UNRESERVED, because that risks
breaking existing applications. Please see if you can refactor the
grammar to make those new entries unreserved.
regards, tom lane
Attachment
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Hitoshi Harada
Date:
On Sat, Mar 23, 2013 at 3:25 PM, Nicholas White <n.j.white@gmail.com> wrote:
Hm, you made another lookahead in base_yylex to make them unreserved -- looks ok, but not sure if there was no way to do it.
You might want to try byref types such as text. It seems you need to copy the datum to save the value in appropriate memory context. Also, try to create a view on those expressions. I don't think it correctly preserves it.
Thanks,
-- Thanks - I've added it here: https://commitfest.postgresql.org/action/patch_view?id=1096 .
I've also attached a revised version that makes IGNORE and RESPECT UNRESERVED keywords (following the pattern of NULLS_FIRST and NULLS_LAST).
Hm, you made another lookahead in base_yylex to make them unreserved -- looks ok, but not sure if there was no way to do it.
You might want to try byref types such as text. It seems you need to copy the datum to save the value in appropriate memory context. Also, try to create a view on those expressions. I don't think it correctly preserves it.
Thanks,
Hitoshi Harada
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
Thanks for the feedback.
For the parsing changes, it seems I can either make RESPECT and IGNORE reserved keywords, or add a lookahead to construct synthetic RESPECT NULLS and IGNORE NULLS keywords. The grammar wouldn't compile if RESPECT and IGNORE were just normal unreserved keywords due to ambiguities after a function definition (e.g. select abs(1) respect; - which is currently a valid statement).Thanks -
On 24 March 2013 03:43, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
--On Sat, Mar 23, 2013 at 3:25 PM, Nicholas White <n.j.white@gmail.com> wrote:Thanks - I've added it here: https://commitfest.postgresql.org/action/patch_view?id=1096 .
I've also attached a revised version that makes IGNORE and RESPECT UNRESERVED keywords (following the pattern of NULLS_FIRST and NULLS_LAST).
Hm, you made another lookahead in base_yylex to make them unreserved -- looks ok, but not sure if there was no way to do it.
You might want to try byref types such as text. It seems you need to copy the datum to save the value in appropriate memory context. Also, try to create a view on those expressions. I don't think it correctly preserves it.
Thanks,
Hitoshi Harada
Attachment
On Sun, 2013-03-24 at 20:15 -0400, Nicholas White wrote: > I've redone the leadlag function changes to use datumCopy as you > suggested. However, I've had to remove the NOT_USED #ifdef around > datumFree so I can use it to avoid building up large numbers of copies > of Datums in the memory context while a query is executing. I've > attached the revised patch... > Comments: WinGetResultDatumCopy() calls datumCopy, which will just copy in the current memory context. I think you want it in the per-partition memory context, otherwise the last value in each partition will stick around until the query is done (so many partitions could be a problem). That should be easy enough to do by switching to the winobj->winstate->partcontext memory context before calling datumCopy, and then switching back. For that matter, why store the datum again at all? You can just store the offset of the last non-NULL value in that partition, and then fetch it using WinGetFuncArgInPartition(), right? We really want to avoid any per-tuple allocations. Alternatively, you might look into setting a mark when you get a non-NULL value. Then you could just always fetch the oldest one. Unfortunately, I think that only works with const_offset=true... so the previous idea might be better. I'll leave it to someone else to review the grammar changes. Regards,Jeff Davis
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Alvaro Herrera
Date:
Nicholas White escribió: > For the parsing changes, it seems I can either make RESPECT and IGNORE > reserved keywords, or add a lookahead to construct synthetic RESPECT NULLS > and IGNORE NULLS keywords. The grammar wouldn't compile if RESPECT and > IGNORE were just normal unreserved keywords due to ambiguities after a > function definition (e.g. select abs(1) respect; - which is currently a > valid statement). Well, making them reserved keywords is not that great, so maybe the lookahead thingy is better. However, this patch introduces the third and fourth uses of the "save the lookahead token" pattern in the "default" switch cases. Can we refactor that bit somehow, to avoid so many duplicates? (For a minute I thought that Andrew Gierth's WITH ORDINALITY patch would add another one, but it seems not.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Jun 15, 2013 at 9:37 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Nicholas White escribió: > >> For the parsing changes, it seems I can either make RESPECT and IGNORE >> reserved keywords, or add a lookahead to construct synthetic RESPECT NULLS >> and IGNORE NULLS keywords. The grammar wouldn't compile if RESPECT and >> IGNORE were just normal unreserved keywords due to ambiguities after a >> function definition (e.g. select abs(1) respect; - which is currently a >> valid statement). > > Well, making them reserved keywords is not that great, so maybe the > lookahead thingy is better. However, this patch introduces the third > and fourth uses of the "save the lookahead token" pattern in the > "default" switch cases. Can we refactor that bit somehow, to avoid so > many duplicates? (For a minute I thought that Andrew Gierth's WITH > ORDINALITY patch would add another one, but it seems not.) Making things reserved keywords is painful and I don't like it, but I've started to become skeptical of shifting the problem to the lexer, too. Sometimes special case logic in the lexer about token combining can have surprising consequences in other parts of the grammar. For example, with a lexer hack, what will happen if someone has a column named RESPECT and does SELECT ... ORDER BY respect NULLS LAST? I haven't studied the code in detail so maybe it's fine, but it's something to think about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Hitoshi Harada
Date:
On Sat, Jun 15, 2013 at 1:30 PM, Jeff Davis <pgsql@j-davis.com> wrote:
On Sun, 2013-03-24 at 20:15 -0400, Nicholas White wrote:Comments:
> I've redone the leadlag function changes to use datumCopy as you
> suggested. However, I've had to remove the NOT_USED #ifdef around
> datumFree so I can use it to avoid building up large numbers of copies
> of Datums in the memory context while a query is executing. I've
> attached the revised patch...
>
WinGetResultDatumCopy() calls datumCopy, which will just copy in the
current memory context. I think you want it in the per-partition memory
context, otherwise the last value in each partition will stick around
until the query is done (so many partitions could be a problem). That
should be easy enough to do by switching to the
winobj->winstate->partcontext memory context before calling datumCopy,
and then switching back.
For that matter, why store the datum again at all? You can just store
the offset of the last non-NULL value in that partition, and then fetch
it using WinGetFuncArgInPartition(), right? We really want to avoid any
per-tuple allocations.
I believe WinGetFuncArgInPartition is a bit slow if the offset is far from the current row. So it might make sense to store the last-seen value, but I'm not sure if we need to copy datum every time. I haven't looked into the new patch.
Thanks,
--
Hitoshi Harada
Hitoshi Harada
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
Thanks for the reviews. I've attached a revised patch that has the lexer refactoring Alvaro mentions (arbitarily using a goto rather than a bool flag) and uses Jeff's idea of just storing the index of the last non-null value (if there is one) in the window function's context (and not the Datum itself).
However, Robert's right that SELECT ... ORDER BY respect NULLS LAST will now fail. An earlier iteration of the patch had RESPECT and IGNORE as reserved, but that would have broken tables with columns called "respect" (etc.), which the current version allows. Is this backwards incompatibility acceptable? If not, maybe I should try doing a two-token lookahead in the token-aggregating code between the lexer and the parser (i.e. make a RESPECT_NULLS token out of a sequence of RESPECT NULLS OVER tokens, remembering to replace the OVER token)? Or what about adding an %expect statement to the Bison grammar, confirming that the shift / reduce conflicts caused by using the RESPECT, IGNORE & NULL_P tokens the in out_clause rule are OK?Attachment
On Tue, Jun 18, 2013 at 6:27 PM, Nicholas White <n.j.white@gmail.com> wrote: > Thanks for the reviews. I've attached a revised patch that has the lexer > refactoring Alvaro mentions (arbitarily using a goto rather than a bool > flag) and uses Jeff's idea of just storing the index of the last non-null > value (if there is one) in the window function's context (and not the Datum > itself). > > However, Robert's right that SELECT ... ORDER BY respect NULLS LAST will now > fail. An earlier iteration of the patch had RESPECT and IGNORE as reserved, > but that would have broken tables with columns called "respect" (etc.), > which the current version allows. Is this backwards incompatibility > acceptable? I think it's better to add new partially reserved keywords than to have this kind of random breakage. When you make something a partially-reserved keyword, then things break, but in a fairly well-defined way. Lexer hacks can break things in ways that are much subtler, which we may not even realize for a long time, and which in that case would mean that the word "respect" needs to be quoted in some contexts but not others. That's going to cause a lot of headaches, because pg_dump etc. know about quoting reserved keywords, but they don't know anything about quoting unreserved keywords in contexts where they might happen to be followed by the wrong next word. > If not, maybe I should try doing a two-token lookahead in the > token-aggregating code between the lexer and the parser (i.e. make a > RESPECT_NULLS token out of a sequence of RESPECT NULLS OVER tokens, > remembering to replace the OVER token)? Or what about adding an %expect > statement to the Bison grammar, confirming that the shift / reduce conflicts > caused by using the RESPECT, IGNORE & NULL_P tokens the in out_clause rule > are OK? These lines of inquiry don't seem promising to me. It's going to be complicated and unmaintainable and may just move the failure scenarios to cases that are too obscure for us to reason about. I think the question is whether this feature is really worth adding new reserved keywords for. I have a hard time saying we shouldn't support something that's part of the SQL standard, but personally, it's not something I've seen come up prior to this thread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 2013-06-20 at 10:03 -0400, Robert Haas wrote: > I think the question is whether this feature is really worth adding > new reserved keywords for. I have a hard time saying we shouldn't > support something that's part of the SQL standard, but personally, > it's not something I've seen come up prior to this thread. What's the next step here? The feature sounds useful to me. If the grammar is unacceptable, does someone have an alternative idea, like using new function names instead of grammar? If so, what are reasonable names to use? Also, I think someone mentioned this already, but what about first_value() and last_value()? Shouldn't we do those at the same time? Regards,Jeff Davis
On Fri, Jun 21, 2013 at 12:18 AM, Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2013-06-20 at 10:03 -0400, Robert Haas wrote: >> I think the question is whether this feature is really worth adding >> new reserved keywords for. I have a hard time saying we shouldn't >> support something that's part of the SQL standard, but personally, >> it's not something I've seen come up prior to this thread. > > What's the next step here? Well, ideally, some other people weigh in on the value of the feature vs. the pain of reserving the keywords. > The feature sounds useful to me. ...and there's one person with an opinion now! :-) The other question here is - do we actually have the grammar right? As in, is this actually the syntax we're supposed to be implementing? It looks different from what's shown here, where the IGNORE NULLS is inside the function's parentheses, rather than afterwards: http://rwijk.blogspot.com/2010/06/simulating-laglead-ignore-nulls.html IBM seems to think it's legal either inside or outside the parentheses: http://pic.dhe.ibm.com/infocenter/informix/v121/index.jsp?topic=%2Fcom.ibm.sqls.doc%2Fids_sqs_2594.htm Regardless of what syntax we settle on, we should also make sure that the conflict is intrinsic to the grammar and can't be factored out, as Tom suggested upthread. It's not obvious to me what the actual ambiguity is here. If you've seen "select lag(num,0)" and the lookahead token is "respect", what's the problem? It sort of looks like it could be a column label, but not even unreserved keywords can be column labels, so that's not it. Probably deserves a bit more investigation... > If the grammar is unacceptable, does > someone have an alternative idea, like using new function names instead > of grammar? If so, what are reasonable names to use? We could just add additional, optional Boolean argument to the existing functions. It's non-standard, but we avoid adding keywords. > Also, I think someone mentioned this already, but what about > first_value() and last_value()? Shouldn't we do those at the same time? Not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 2013-06-21 at 09:21 -0400, Robert Haas wrote: > The other question here is - do we actually have the grammar right? > As in, is this actually the syntax we're supposed to be implementing? > It looks different from what's shown here, where the IGNORE NULLS is > inside the function's parentheses, rather than afterwards: > > http://rwijk.blogspot.com/2010/06/simulating-laglead-ignore-nulls.html > > IBM seems to think it's legal either inside or outside the parentheses: > > http://pic.dhe.ibm.com/infocenter/informix/v121/index.jsp?topic=%2Fcom.ibm.sqls.doc%2Fids_sqs_2594.htm The spec seems pretty clear that it falls outside of the parentheses, unless I am missing something. > Regardless of what syntax we settle on, we should also make sure that > the conflict is intrinsic to the grammar and can't be factored out, as > Tom suggested upthread. It's not obvious to me what the actual > ambiguity is here. If you've seen "select lag(num,0)" and the > lookahead token is "respect", what's the problem? It sort of looks > like it could be a column label, but not even unreserved keywords can > be column labels, so that's not it. Probably deserves a bit more > investigation... I think the problem is when the function is used as a table function; e.g.: SELECT * FROM generate_series(1,10) respect; > We could just add additional, optional Boolean argument to the > existing functions. It's non-standard, but we avoid adding keywords. I thought about that, but it is awkward because the argument would have to be constant (or, if not, it would be quite strange). Regards,Jeff Davis
On Fri, Jun 21, 2013 at 11:33 AM, Jeff Davis <pgsql@j-davis.com> wrote: >> Regardless of what syntax we settle on, we should also make sure that >> the conflict is intrinsic to the grammar and can't be factored out, as >> Tom suggested upthread. It's not obvious to me what the actual >> ambiguity is here. If you've seen "select lag(num,0)" and the >> lookahead token is "respect", what's the problem? It sort of looks >> like it could be a column label, but not even unreserved keywords can >> be column labels, so that's not it. Probably deserves a bit more >> investigation... > > I think the problem is when the function is used as a table function; > e.g.: > > SELECT * FROM generate_series(1,10) respect; Ah ha. Well, there's probably not much help for that. Disallowing keywords as table aliases would be a cure worse than the disease, I think. I suppose the good news is that there probably aren't many people using RESPECT as a column name, but I have a feeling that we're almost certain to get complaints about reserving IGNORE. I think that will have to be quoted as a PL/pgsql variable name as well. :-( >> We could just add additional, optional Boolean argument to the >> existing functions. It's non-standard, but we avoid adding keywords. > > I thought about that, but it is awkward because the argument would have > to be constant (or, if not, it would be quite strange). True... but e.g. string_agg() has the same issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Troels Nielsen
Date:
Hello all
I've been examining PostgreSQL to gain a greater understanding
of RDBMS. (Thanks for a nice, very educational system!)
In the process I've been looking into a few problems and the
complications of this patch appeared relatively uninvolved, so I
tried to look for a solution.
I found the following:
The grammar conflict appears to be because of ambiguities in:
1. table_ref (used exclusively in FROM clauses)
2. index_elem (used exclusively in INDEX creation statements).
Now, this doesn't seem to make much sense, as AFAICT window functions
are explicitly disallowed in these contexts (transformWindowFuncCall
will yield errors, and I can't really wrap my head around what a
window function call would mean there).
I therefore propose a simple rearrangement of the grammar,
syntactically disallowing window functions in the outer part of those
contexts (a_expr's inside can't and shouldn't be done much about)
which will allow both RESPECT and IGNORE to become unreserved
keywords, without doing any lexer hacking or abusing the grammar.
I've attached a patch which will add RESPECT NULLS and IGNORE NULLS to
the grammar in the right place. Also the window frame options are set
but nothing more, so this patch needs to be merged with Nicholas White's
original patch.
One problem I see with this approach to the grammar is that the
error messages will change when putting window functions in any of the
forbidden places. The new error messages are I think worse and less
specific than the old ones. I suppose that can be fixed though, or
maybe the problem isn't so severe.
Example of old error message:
window functions are not allowed in functions in FROM
New error message:
syntax error at or near "OVER"
in addition I think the original patch as it stands has a few
problems that I haven't seen discussed:
1. The result of the current patch using lead
create table test_table (
id serial,
val integer);
insert into test_table (val) select * from unnest(ARRAY[1,2,3,4,NULL, NULL, NULL, 5, 6, 7]);
select val, lead(val, 2) ignore nulls over (order by id) from test_table;
val | lead
-----+------
1 | 3
2 | 4
3 | 4
4 | 4
| 4
| 5
| 6
5 | 7
6 | 7
7 | 7
(10 rows)
I would expect it to output:
select val, lead(val, 2) ignore nulls over (order by id) from test_table;
val | lead
-----+------
1 | 3
2 | 4
3 | 5
4 | 6
| 6
| 6
| 6
5 | 7
6 |
7 |
(10 rows)
That is: skip two rows forward not counting null rows.
The lag behavior works well as far as I can see.
2. It would be nice if an error was given when ignore nulls was used
on a function for which it had no effect. Perhaps this should be up to
the different window function themselves to do though.
Apart from those points I think the original patch is nice and provides a functionality
that's definitely nice to have.
Kind Regards
Troels Nielsen
On Fri, Jun 21, 2013 at 8:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jun 21, 2013 at 11:33 AM, Jeff Davis <pgsql@j-davis.com> wrote:Ah ha. Well, there's probably not much help for that. Disallowing
>> Regardless of what syntax we settle on, we should also make sure that
>> the conflict is intrinsic to the grammar and can't be factored out, as
>> Tom suggested upthread. It's not obvious to me what the actual
>> ambiguity is here. If you've seen "select lag(num,0)" and the
>> lookahead token is "respect", what's the problem? It sort of looks
>> like it could be a column label, but not even unreserved keywords can
>> be column labels, so that's not it. Probably deserves a bit more
>> investigation...
>
> I think the problem is when the function is used as a table function;
> e.g.:
>
> SELECT * FROM generate_series(1,10) respect;
keywords as table aliases would be a cure worse than the disease, I
think. I suppose the good news is that there probably aren't many
people using RESPECT as a column name, but I have a feeling that we're
almost certain to get complaints about reserving IGNORE. I think that
will have to be quoted as a PL/pgsql variable name as well. :-(True... but e.g. string_agg() has the same issue.
>> We could just add additional, optional Boolean argument to the
>> existing functions. It's non-standard, but we avoid adding keywords.
>
> I thought about that, but it is awkward because the argument would have
> to be constant (or, if not, it would be quite strange).--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
Good catch - I've attached a patch to address your point 1. It now returns the below (i.e. correctly doesn't fill in the saved value if the index is out of the window. However, I'm not sure whether (e.g.) lead-2-ignore-nulls means count forwards two rows, and if that's null use the last one you've seen (the current implementation) or count forwards two non-null rows (as you suggest). The behaviour isn't specified in a (free) draft of the 2003 standard (http://www.wiscorp.com/sql_2003_standard.zip), and I don't have access to the (non-free) final version. Could someone who does have access to it clarify this? I've also added your example to the regression test cases.
-----+------
1 | 3
2 | 4
3 | 4
4 | 4
| 4
| 5
| 6
5 | 7
6 |
7 |
(10 rows)
If the other reviewers are happy with your grammar changes then I'll merge them into the patch. Alternatively, if departing from the standard is OK then we could reorder the keywords so that a window function is like SELECT lag(x,1) OVER RESPECT NULLS (ORDER BY y) - i.e. putting the respect / ignore tokens after the OVER reserved keyword. Although non-standard it'd make the grammar change trivial.
> Also, I think someone mentioned this already, but what about
> first_value() and last_value()? Shouldn't we do those at the same time?
I didn't include this functionality for the first / last value window functions as their implementation is currently a bit different; they just call WinGetFuncArgInFrame to pick out a single value. Making these functions respect nulls would involve changing the single lookup to a walk through the tuples to find the first non-null version, and keeping track of this index in a struct in the context. As this change is reasonably orthogonal I was going to submit it as a separate patch.select val, lead(val, 2) ignore nulls over (order by id) from test_table;
val | lead -----+------
1 | 3
2 | 4
3 | 4
4 | 4
| 4
| 5
| 6
5 | 7
6 |
7 |
(10 rows)
If the other reviewers are happy with your grammar changes then I'll merge them into the patch. Alternatively, if departing from the standard is OK then we could reorder the keywords so that a window function is like SELECT lag(x,1) OVER RESPECT NULLS (ORDER BY y) - i.e. putting the respect / ignore tokens after the OVER reserved keyword. Although non-standard it'd make the grammar change trivial.
> Also, I think someone mentioned this already, but what about
> first_value() and last_value()? Shouldn't we do those at the same time?
Thanks -
Attachment
On Fri, Jun 21, 2013 at 6:29 PM, Troels Nielsen <bn.troels@gmail.com> wrote: > The grammar conflict appears to be because of ambiguities in: > 1. table_ref (used exclusively in FROM clauses) > 2. index_elem (used exclusively in INDEX creation statements). > > Now, this doesn't seem to make much sense, as AFAICT window functions > are explicitly disallowed in these contexts (transformWindowFuncCall > will yield errors, and I can't really wrap my head around what a > window function call would mean there). > > I therefore propose a simple rearrangement of the grammar, > syntactically disallowing window functions in the outer part of those > contexts (a_expr's inside can't and shouldn't be done much about) > which will allow both RESPECT and IGNORE to become unreserved > keywords, without doing any lexer hacking or abusing the grammar. I reviewed this today and I think this is a very nice approach. Thanks for working on it! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
OK - I've attached another iteration of the patch with Troels' grammar changes. I think the only issue remaining is what the standard says about lead semantics. Thanks -
Attachment
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
> The result of the current patch using lead
Actually, I think I agree with you (and, FWIW, so does Oracle: http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm#autoId18). I've refactored the window function's implementation so that (e.g.) lead(5) means the 5th non-null value away in front of the current row (the previous implementation was the last non-null value returned if the 5th rows in front was null). These semantics are slower, as the require the function to scan through the tuples discarding non-null ones. I've made the implementation use a bitmap in the partition context to cache whether or not a given tuple produces a null. This seems correct (it passes the regression tests) but as it stores row offsets (which are int64s) I was careful not to use bitmap methods that use ints to refer to set members. I've added more explanation in the code's comments. Thanks -
Actually, I think I agree with you (and, FWIW, so does Oracle: http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm#autoId18). I've refactored the window function's implementation so that (e.g.) lead(5) means the 5th non-null value away in front of the current row (the previous implementation was the last non-null value returned if the 5th rows in front was null). These semantics are slower, as the require the function to scan through the tuples discarding non-null ones. I've made the implementation use a bitmap in the partition context to cache whether or not a given tuple produces a null. This seems correct (it passes the regression tests) but as it stores row offsets (which are int64s) I was careful not to use bitmap methods that use ints to refer to set members. I've added more explanation in the code's comments. Thanks -
Attachment
On Thu, Jun 27, 2013 at 8:52 PM, Nicholas White <n.j.white@gmail.com> wrote: >> The result of the current patch using lead > > Actually, I think I agree with you (and, FWIW, so does Oracle: > http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm#autoId18). > I've refactored the window function's implementation so that (e.g.) lead(5) > means the 5th non-null value away in front of the current row (the previous > implementation was the last non-null value returned if the 5th rows in front > was null). These semantics are slower, as the require the function to scan > through the tuples discarding non-null ones. I've made the implementation > use a bitmap in the partition context to cache whether or not a given tuple > produces a null. This seems correct (it passes the regression tests) but as > it stores row offsets (which are int64s) I was careful not to use bitmap > methods that use ints to refer to set members. I've added more explanation > in the code's comments. Thanks - The documentation you've added reads kind of funny to me: + [respect nulls]|[ignore nulls] Wouldn't we normally write that as [ [ RESPECT | IGNORE ] NULLS ] ? I've committed the changes from Troels to avoid the grammar conflicts, and I also took the opportunity to make OVER unreserved, as allowed by his refactoring and per related discussion on other threads. This patch will need to be rebased over those changes (sorry), but hopefully it'll help the review of this patch focus in on the issues that are specific to RESPECT/IGNORE NULLS. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Alvaro Herrera
Date:
Robert Haas escribió: > On Thu, Jun 27, 2013 at 8:52 PM, Nicholas White <n.j.white@gmail.com> wrote: > The documentation you've added reads kind of funny to me: > > + [respect nulls]|[ignore nulls] > > Wouldn't we normally write that as [ [ RESPECT | IGNORE ] NULLS ] ? I think it should be [ { RESPECT | IGNORE } NULLS ] One of the leading keywords must be present. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Jun 28, 2013 at 11:41 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas escribió: >> On Thu, Jun 27, 2013 at 8:52 PM, Nicholas White <n.j.white@gmail.com> wrote: > >> The documentation you've added reads kind of funny to me: >> >> + [respect nulls]|[ignore nulls] >> >> Wouldn't we normally write that as [ [ RESPECT | IGNORE ] NULLS ] ? > > I think it should be > [ { RESPECT | IGNORE } NULLS ] > One of the leading keywords must be present. Oh, yeah. What he said. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
> This patch will need to be rebased over those changes
See attached -Attachment
On Fri, Jun 28, 2013 at 12:29 PM, Nicholas White <n.j.white@gmail.com> wrote: >> This patch will need to be rebased over those changes > > See attached - Thanks. But I see a few issues... + [respect nulls]|[ignore nulls] You fixed one of these but missed the other. <replaceable class="parameter">default</replaceable> are evaluated with respect to the current row. If omitted, <replaceable class="parameter">offset</replaceable> defaults to 1 and - <replaceable class="parameter">default</replaceable> to null + <literal>IGNORE NULLS</> is specified then the function will be evaluated + as if the rows containing nulls didn't exist. </entry> </row> This looks like you dropped a line during cut-and-paste. + null_values = (Bitmapset *) WinGetPartitionLocalMemory( + winobj, + BITMAPSET_SIZE(words_needed)); + Assert(null_values); This certainly seems ugly - isn't there a way to accomplish this without having to violate the Bitmapset abstraction? + * A slight edge case. Consider: + * + * A | lag(A, 1) + * 1 | + * 2 | 1 + * | ? pgindent will reformat this into oblivion. Use ----- markers around the comment block as done elsewhere in the code where this is an issue, or don't use ASCII art. I haven't really reviewed the windowing-related code in depth; I thought Jeff might jump back in for that part of it. Jeff, is that something you're planning to do? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
I've fixed the problems you mentioned (see attached) - sorry, I was a bit careless with the docs.
> + null_values = (Bitmapset *) WinGetPartitionLocalMemory(
> + winobj,
> + BITMAPSET_SIZE(words_needed));
> + Assert(null_values);
>
> This certainly seems ugly - isn't there a way to accomplish this
> without having to violate the Bitmapset abstraction?
Indeed, it's ugly. I've revised it slightly to: > + null_values = (Bitmapset *) WinGetPartitionLocalMemory(
> + winobj,
> + BITMAPSET_SIZE(words_needed));
> + Assert(null_values);
>
> This certainly seems ugly - isn't there a way to accomplish this
> without having to violate the Bitmapset abstraction?
> null_values = (Bitmapset *) WinGetPartitionLocalMemory(
> winobj,
> BITMAPSET_SIZE(words_needed));
> null_values->nwords = (int) words_needed;
I don't think the solution would be tidier if I re-instated the leadlag_context struct with a single Bitmapset member. Other Bitmapset usage seems to just call bms_make_singleton then bms_add_member over and over again - which afaict will call palloc every BITS_PER_BITMAPWORD calls, which is not really what I want.
Attachment
On Mon, 2013-06-24 at 18:01 +0100, Nicholas White wrote: > Good catch - I've attached a patch to address your point 1. It now > returns the below (i.e. correctly doesn't fill in the saved value if > the index is out of the window. However, I'm not sure whether (e.g.) > lead-2-ignore-nulls means count forwards two rows, and if that's null > use the last one you've seen (the current implementation) or count > forwards two non-null rows (as you suggest). The behaviour isn't > specified in a (free) draft of the 2003 standard > (http://www.wiscorp.com/sql_2003_standard.zip), and I don't have > access to the (non-free) final version. Could someone who does have > access to it clarify this? I've also added your example to the > regression test cases. Reading a later version of the draft, it is specified, but is still slightly unclear. As I see it, the standard describes the behavior in terms of eliminating the NULL rows entirely before applying the offset. This matches Troels's interpretation. Are you aware of any implementations that do something different? > I didn't include this functionality for the first / last value window > functions as their implementation is currently a bit different; they > just call WinGetFuncArgInFrame to pick out a single value. Making > these functions respect nulls would involve changing the single lookup > to a walk through the tuples to find the first non-null version, and > keeping track of this index in a struct in the context. As this change > is reasonably orthogonal I was going to submit it as a separate patch. Sounds good. Regards,Jeff Davis
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Dean Rasheed
Date:
On 29 June 2013 17:30, Jeff Davis <pgsql@j-davis.com> wrote: > > On Mon, 2013-06-24 at 18:01 +0100, Nicholas White wrote: >> Good catch - I've attached a patch to address your point 1. It now >> returns the below (i.e. correctly doesn't fill in the saved value if >> the index is out of the window. However, I'm not sure whether (e.g.) >> lead-2-ignore-nulls means count forwards two rows, and if that's null >> use the last one you've seen (the current implementation) or count >> forwards two non-null rows (as you suggest). The behaviour isn't >> specified in a (free) draft of the 2003 standard >> (http://www.wiscorp.com/sql_2003_standard.zip), and I don't have >> access to the (non-free) final version. Could someone who does have >> access to it clarify this? I've also added your example to the >> regression test cases. > > Reading a later version of the draft, it is specified, but is still > slightly unclear. > > As I see it, the standard describes the behavior in terms of eliminating > the NULL rows entirely before applying the offset. This matches Troels's > interpretation. Are you aware of any implementations that do something > different? > >> I didn't include this functionality for the first / last value window >> functions as their implementation is currently a bit different; they >> just call WinGetFuncArgInFrame to pick out a single value. Making >> these functions respect nulls would involve changing the single lookup >> to a walk through the tuples to find the first non-null version, and >> keeping track of this index in a struct in the context. As this change >> is reasonably orthogonal I was going to submit it as a separate patch. > > Sounds good. > I took a quick look at this and I think there are still a few problems: 1). The ignore/respect nulls flag needs to be per-window-function data, not a window frame option, because the same window may be shared by multiple window function calls. For example, the following test causes a crash: SELECT val, lead(val, 2) IGNORE NULLS OVER w, lead(val, 2) RESPECT NULLS OVER w FROM unnest(ARRAY[1,2,3,4,NULL,NULL, NULL, 5, 6, 7]) AS val WINDOW w as (); The connection to the server was lost. Attempting reset: Failed. 2). As Troels Nielsen said up-thread, I think this should throw a FEATURE_NOT_SUPPORTED error if it is used for window functions that don't support it, rather than silently ignoring the flag. 3). Similarly, the parser accepts ignore/respect nulls for arbitrary aggregate functions over a window, so maybe this should also throw a FEATURE_NOT_SUPPORTED error. Alternatively, it might be trivial to make all aggregate functions work with ignore nulls in a window context, simply by using the existing code for strict aggregate transition functions. That might be quite handy to support things like array_agg(val) IGNORE NULLS OVER(...). Regards, Dean
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
> this should throw a FEATURE_NOT_SUPPORTED error if it is used for window functions that don't support it
> arbitrary aggregate functions over a window ... should also throw a FEATURE_NOT_SUPPORTED error.
Fixed (with test cases) in the attached patch.> arbitrary aggregate functions over a window ... should also throw a FEATURE_NOT_SUPPORTED error.
> because the same window may be shared by multiple window function calls.
#2 0x0000000100cdb68b in ExceptionalCondition (conditionName=Could not find the frame base for "ExceptionalCondition".
) at /Users/xxx/postgresql/src/backend/utils/error/assert.c:54
#3 0x00000001009a3c03 in transformWindowFuncCall (pstate=0x7f88228362c8, wfunc=0x7f8822948ec0, windef=0x7f88228353a8) at /Users/xxx/postgresql/src/backend/parser/parse_agg.c:573
Thanks -
Nick
Attachment
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
I've attached another iteration of the patch that fixes the multiple-window bug and adds (& uses) a function to create a Bitmapset using a custom allocator. I don't think there's any outstanding problems with it now.
> Alternatively, it might be trivial to make all aggregate functions work with ignore nulls in a window context
This is a good idea, but I'd like to keep the scope of this patch limited for the time being - I'll look at doing this (along with the first / last / nth value window functions) for a later release.> Alternatively, it might be trivial to make all aggregate functions work with ignore nulls in a window context
Thanks -
Attachment
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Dean Rasheed
Date:
On 1 July 2013 03:07, Nicholas White <n.j.white@gmail.com> wrote: >> Alternatively, it might be trivial to make all aggregate functions work >> with ignore nulls in a window context > > This is a good idea, but I'd like to keep the scope of this patch limited > for the time being Agreed. > - I'll look at doing this (along with the first / last / > nth value window functions) for a later release. > On the other hand, perhaps this is not worth doing for aggregates, since in that case IGNORE NULLS is just a special case of FILTER (WHERE ...). Making IGNORE NULLS work for the other window functions is probably more useful, as you say, in a future patch. Regards, Dean
On Sun, Jun 30, 2013 at 10:07 PM, Nicholas White <n.j.white@gmail.com> wrote: > I've attached another iteration of the patch that fixes the multiple-window > bug and adds (& uses) a function to create a Bitmapset using a custom > allocator. I don't think there's any outstanding problems with it now. I think the right way to do this is to temporarily set the current memory context to winobj->winstate->partcontext while creating or manipulating the Bitmapset and restore it afterwards. Maybe someone will say that's a modularity violation, but surely this is worse... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Dean Rasheed
Date:
On 1 July 2013 03:07, Nicholas White <n.j.white@gmail.com> wrote: > I've attached another iteration of the patch that fixes the multiple-window > bug and adds (& uses) a function to create a Bitmapset using a custom > allocator. I don't think there's any outstanding problems with it now. > I just realised there is another issue (sorry). pg_get_viewdef() needs to be updated so that dumping and restoring a view that uses ignore/respect nulls works properly. Regards, Dean
> pg_get_viewdef() needs to be updated
Ah, good catch - I've fixed this in the attached. I also discovered that there's a parent-child hierarchy of WindowDefs (using relname->name), so instead of cloning the WindowDef (in parse_agg.c) if the frameOptions are different (e.g. by adding the ignore-nulls flag) I create a child of the WindowDef and override the frameOptions. This has the useful side-effect of making pg_get_viewdef work as expected (the previous iteration of the patch produced a copy of the window definintion, not the window name, as it was using a nameless clone), although the output has parentheses around the view name:
> lag(i.i, 2) IGNORE NULLS OVER (w) AS lagged_by_2
I've updated the test cases accordingly. Thanks -
Nick
Ah, good catch - I've fixed this in the attached. I also discovered that there's a parent-child hierarchy of WindowDefs (using relname->name), so instead of cloning the WindowDef (in parse_agg.c) if the frameOptions are different (e.g. by adding the ignore-nulls flag) I create a child of the WindowDef and override the frameOptions. This has the useful side-effect of making pg_get_viewdef work as expected (the previous iteration of the patch produced a copy of the window definintion, not the window name, as it was using a nameless clone), although the output has parentheses around the view name:
> lag(i.i, 2) IGNORE NULLS OVER (w) AS lagged_by_2
I've updated the test cases accordingly. Thanks -
Nick
Attachment
On Mon, 2013-07-01 at 07:40 -0400, Robert Haas wrote: > On Sun, Jun 30, 2013 at 10:07 PM, Nicholas White <n.j.white@gmail.com> wrote: > > I've attached another iteration of the patch that fixes the multiple-window > > bug and adds (& uses) a function to create a Bitmapset using a custom > > allocator. I don't think there's any outstanding problems with it now. > > I think the right way to do this is to temporarily set the current > memory context to winobj->winstate->partcontext while creating or > manipulating the Bitmapset and restore it afterwards. Maybe someone > will say that's a modularity violation, but surely this is worse... I think we should get rid of the bitmapset entirely. For one thing, we want to be able to support large frames, and the size of the allocation for the bitmap is dependent on the size of the frame. It would take a very large frame for that to matter, but conceptually, it doesn't seem right to me. Instead of the bitmapset, we can keep track of two offsets, and the number of rows in between which are non-NULL. That only works with a constant offset; but I'm not inclined to optimize for the special case involving large frames, variable offset which always happens to be large, and IGNORE NULLS. Regards,Jeff Davis
On Fri, 2013-06-28 at 13:26 -0400, Robert Haas wrote: > I haven't really reviewed the windowing-related code in depth; I > thought Jeff might jump back in for that part of it. Jeff, is that > something you're planning to do? Yes, getting back into this patch now after a bit of delay. Regards,Jeff Davis
On Mon, 2013-07-01 at 18:20 -0400, Nicholas White wrote: > > pg_get_viewdef() needs to be updated > > Ah, good catch - I've fixed this in the attached. I also discovered > that there's a parent-child hierarchy of WindowDefs (using > relname->name), so instead of cloning the WindowDef (in parse_agg.c) > if the frameOptions are different (e.g. by adding the ignore-nulls > flag) I create a child of the WindowDef and override the frameOptions. > This has the useful side-effect of making pg_get_viewdef work as > expected (the previous iteration of the patch produced a copy of the > window definintion, not the window name, as it was using a nameless > clone), although the output has parentheses around the view name: > A couple comments:* We shouldn't create an arbitrary number of duplicate windows when many aggregates are specified with IGNORE NULLS.* It's bad form to modify a list while iterating through it. This is just a style issue because there's a break afterward, anyway. Also, I'm concerned that we're changing a reference of the form: OVER w into: OVER (w) in a user-visible way. Is there a problem with having two windowdefs in the p_windowdefs list with the same name and different frameOptions? I think you could just change the matching criteria to be a matching name and matching frameOptions. In the loop, if you find a matching name but frameOptions doesn't match, keep a pointer to the windowdef and create a new one at the end of the loop with the same name. You'll have to be a little careful that any other code knows that names can be duplicated in the list though. Regards,Jeff Davis
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
I've attached a revised version that fixes the issues above: > changing a reference of the form: > OVER w > into: > OVER (w) Fixed (and I've updated the tests). > It's bad form to modify a list while iterating through it. Fixed > We shouldn't create an arbitrary number of duplicate windows Fixed > Is there a problem with having two windowdefs in > the p_windowdefs list with the same name > ... > You'll have to be a little careful that any other code knows that names > can be duplicated in the list though. I'm not sure I really can verify this - as I'm not sure how much contrib / other third-party code has access to this data structure. I'd prefer to be cautious and just create a child window if needed. > I think we should get rid of the bitmapset entirely > ... > Instead of the bitmapset, we can keep track of two offsets I've modified leadlag_common so it uses your suggested algorithm for constant offsets (although it turns out you only need to keep a single int64 index in the context). This algorithm calls WinGetFuncArgInPartition at least twice per row, once to check whether the current row is null (and so check if we have to move the leading / lagged index forward) and either once to get leading / lagging value or more than once to push the leading / lagged value forwards to the next non-null value. I've kept the bitmap solution for the non-constant offset case (i.e. the random partition access case) as I believe it changes the cost of calculating the lead / lagged values for every row in the partition to O(partition size) - whereas a non-caching scan-the-partition solution would be O(partition size * partition size). Is that OK? Thanks - Nick
Attachment
On Thu, 2013-07-11 at 10:51 -0400, Nicholas White wrote: > I've attached a revised version that fixes the issues above: I'll get to this soon, sorry for the delay. Regards,Jeff Davis
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
np, optimising for quality not speed :)
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Josh Berkus
Date:
On 07/15/2013 10:19 AM, Jeff Davis wrote: > On Thu, 2013-07-11 at 10:51 -0400, Nicholas White wrote: >> I've attached a revised version that fixes the issues above: > > I'll get to this soon, sorry for the delay. > > Regards, > Jeff Davis So ... are you doing a final review of this for the CF, Jeff? We need to either commit it or bounce it to the next CF. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Jeff Davis
Date:
On Fri, 2013-07-19 at 09:39 -0700, Josh Berkus wrote: > So ... are you doing a final review of this for the CF, Jeff? We need > to either commit it or bounce it to the next CF. I am going on vacation tomorrow, and I just didn't quite find time to take this to commit. Sorry about that, Nicholas. The patch improved a lot this CF though, so we'll get it in quickly and I don't foresee any problem with it making it in 9.4. (For that matter, am I not supposed to commit between 'fests? Or is it still an option for me to finish up with this after I get back even if we close the CF?) Regards,Jeff Davis
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Heikki Linnakangas
Date:
On 21.07.2013 08:41, Jeff Davis wrote: > (For that matter, am I not supposed to commit between 'fests? Or is it > still an option for me to finish up with this after I get back even if > we close the CF?) It's totally OK to commit stuff between 'fests. - Heikki
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Josh Berkus
Date:
> (For that matter, am I not supposed to commit between 'fests? Or is it > still an option for me to finish up with this after I get back even if > we close the CF?) The idea of the CommitFests is to give committers some *time off* between them. If a committer wants to commit stuff when it's not a CF, that's totally up to them. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Peter Eisentraut
Date:
On Thu, 2013-07-11 at 10:51 -0400, Nicholas White wrote: > I've attached a revised version that fixes the issues above: This patch is in the 2013-09 commitfest but needs a rebase.
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
> but needs a rebase. See attached - thanks!
Attachment
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Peter Eisentraut
Date:
On Wed, 2013-08-21 at 22:34 -0400, Nicholas White wrote: > > but needs a rebase. > > See attached - thanks! Please fix these compiler warnings: windowfuncs.c: In function ‘leadlag_common’: windowfuncs.c:366:3: warning: passing argument 1 of ‘bms_initialize’ from incompatible pointer type [enabled by default] In file included from windowfuncs.c:16:0: ../../../../src/include/nodes/bitmapset.h:97:19: note: expected ‘void * (*)(void *, Size)’ but argument is of type ‘void* (*)(struct WindowObjectData *, Size)’ windowfuncs.c:306:8: warning: ‘result’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
> Please fix these compiler warnings Fixed - see attached. Thanks -
Attachment
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Alvaro Herrera
Date:
I gave this a quick look. It took me a while to figure out how to apply it -- turns out you used the "ignore whitespace" option to diff, so the only way to apply it was with patch -p1 --ignore-whitespace. Please don't do that. I ran pgindent over the patched code and there were a number of changes. I suggest you run that over your local copy before your next submission, to avoid the next official run to mangle your stuff in unforeseen ways. For instance, the comment starting with "A slight edge case" would be mangled; I suggest enclosing that in /*---- to avoid the problem. (TBH that's the only interesting thing, but avoiding that kind of breakage is worth it IMHO.) First thing I noticed was the funky bms_initialize thingy. There was some controversy upthread about the use of bitmapsets, and it seems you opted for not using them for the constant case as suggested by Jeff; but apparently the other comment by Robert about the custom bms initializer went largely ignored. I agree with him that this is grotty. However, the current API to get partition-local memory is too limited as there's no way to change to the partition's context; instead you only get the option to allocate a certain amount of memory and return that. I think the easiest way to get around this problem is to create a new windowapi.h function which returns the MemoryContext for the partition. Then you can just allocate the BMS in that context. But how do we ensure that the BMS is allocated in a context? You'd have to switch contexts each time you call bms_add_member. I don't have a good answer to this. I used this code in another project: /* * grow the "visited" bitmapset to the index' current size, to avoid * repeated repalloc's */ { BlockNumber lastblock; lastblock = RelationGetNumberOfBlocks(rel); visited = bms_add_member(visited, lastblock); visited = bms_del_member(visited,lastblock); } This way, I know the bitmapset already has enough space for all the bits I need and there will be no further allocation. But this is also grotty. Maybe we should have a new entry point in bitmapset.h, say "bms_grow" that ensures you have enough space for that many bits. Or perhaps add a MemoryContext member to struct Bitmapset, so that all allocations occur therein. I'm not too sure I follow the parse_agg.c changes, but don't you need to free the clone in the branch that finds a duplicate window spec? Is this parent/child relationship thingy a preexisting concept, or are you just coming up with it? It seems a bit unfamiliar to me. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Robert Haas
Date:
On Thu, Sep 26, 2013 at 4:20 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > But how do we ensure that the BMS is allocated in a context? You'd have > to switch contexts each time you call bms_add_member. I don't have a > good answer to this. The coding of bms_add_member is pretty funky. Why doesn't it just repalloc() the input argument if it's not big enough? If it did that, the new allocation would be in the same memory context as the original one, and we'd not need to care about the memory context when adding an element to an already non-empty set. But even if we did decide to switch memory contexts on every call, it would still be much cleaner than this. It's only three lines of code (and about as many instructions) to save and restore CurrentMemoryContext. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Heikki Linnakangas
Date:
On 27.09.2013 14:12, Robert Haas wrote: > On Thu, Sep 26, 2013 at 4:20 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> But how do we ensure that the BMS is allocated in a context? You'd have >> to switch contexts each time you call bms_add_member. I don't have a >> good answer to this. > > The coding of bms_add_member is pretty funky. Why doesn't it just > repalloc() the input argument if it's not big enough? If it did that, > the new allocation would be in the same memory context as the original > one, and we'd not need to care about the memory context when adding an > element to an already non-empty set. > > But even if we did decide to switch memory contexts on every call, it > would still be much cleaner than this. It's only three lines of code > (and about as many instructions) to save and restore > CurrentMemoryContext. [looks] Huh, yeah, as it stands, bms_add_member() is an accident waiting to happen. It's totally non-obvious that it might cause the BMS to jump from another memory context to CurrentMemoryContext. Same with bms_add_members(). And it would be good to Assert in bms_join that both arguments are in the same from MemoryContext. Let's fix that... - Heikki
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
> bms_add_member() is an accident waiting to happen I've attached a patch that makes it use repalloc as suggested - is it OK to commit separately? I'll address the lead-lag patch comments in the next couple of days. Thanks -
Attachment
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Heikki Linnakangas
Date:
On 29.09.2013 23:32, Nicholas White wrote: >> bms_add_member() is an accident waiting to happen > > I've attached a patch that makes it use repalloc as suggested You'll have to zero out the extended portion. I tried to demonstrate that by setting RANDOMIZE_ALLOCATED_MEMORY, but surprisingly regression tests still passed. I guess the regression suite doesn't use wide enough bitmapsets to exercise that. But this causes an assertion failure, with RANDOMIZE_ALLOCATED_MEMORY: create table t (i int4); select * from t as t1, t as t2, t as t3, t as t4, t as t5, t as t6, t as t7, t as t8, t as t9, t as t10, t as t11, t as t12, t as t13, t as t14, t as t15, t as t16, t as t17, t as t18, t as t19, t as t20, t as t21, t as t22, t as t23, t as t24, t as t25, t as t26, t as t27, t as t28, t as t29, t as t30, t as t31, t as t32, t as t33, t as t34, t as t35, t as t36, t as t37, t as t38, t as t39, t as t40; > - is it OK to commit separately? I'll address the lead-lag patch > comments in the next couple of days. Thanks Yep, thanks. I committed the attached. After thinking about this some more, I realized that bms_add_member() is still sensitive to CurrentMemoryContext, if the 'a' argument is NULL. That's probably OK for the lag&lead patch - I didn't check - but if we're going to start relying on the fact that bms_add_member() no longer allocates a new bms in CurrentMemoryContext, that needs to be documented. bitmapset.c currently doesn't say a word about memory contexts. So what needs to be done next is to document how the functions in bitmapset.c work wrt. memory contexts. Then double-check that the behavior of all the other "recycling" bms functions is consistent. (At least bms_add_members() needs a similar change). - Heikki
Attachment
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
I've attached another iteration of the lead-lag patch. > I suggest you run that over your local copy before your next submission I ran pgindent before generating my patch (without -w this time), and I've got a few more whitespace differences in the files that I touched. I hope that hasn't added too much noise. > I suggest enclosing that in /*---- to avoid the problem. Done > create a new windowapi.h function which returns the MemoryContext for the partition ... > But even if we did decide to switch memory contexts on every call, it would still be much cleaner than this. I've removed all the bms_initalize code from the patch and am using this solution. As the partition memory is zero-initialised I just store a Bitmapset pointer in the WinGetPartitionLocalMemory. The bms_add_member and bms_is_member functions behave sensibly for null-pointer inputs (they return a bms_make_singleton in the current memory context and false respectively). I've surrounded the calls to bms_make_singleton with a memory context switch (to the partition's context) so the Bitmapset stays in the partition's context. > Maybe we should have a new entry point in bitmapset.h, say "bms_grow" that ensures you have enough space for that manybits This would be useful, as currently n additions require O(n) repallocs, especially as I'm iterating through the indices in ascending order. However, I'd rather "cheat" as I know the number of bits I'll need up front; I can just set the (n+1)-th bit to force a single repalloc to the final size. It's worth noting that other Bitmap implementations (e.g. Java's java.util.BitSet) try to minimise re-allocations by increasing the size to (e.g.) Max(2 * current size, n) if a re-size is needed. > but don't you need to free the clone in the branch that finds a duplicate window spec? Good catch - I've fixed that > Is this parent/child relationship thingy a preexisting concept Yes, although it's not very well documented. I've added a lot of documentation to the WindowDef struct in src/include/nodes/parsenodes.h to explain which of the struct's members use this mechanism. The WindowDef is very much like an object in a higher-level language, where some of the members are 'virtual', so use the parent's version if they don't have a value, and some members are 'final', so values in this member in child WindowDefs are ignore (i.e. the parent WindowDef's value is always used). I don't think this degree of complexity is necessary for the lead-lag patch alone, but since it was there I decided to take advantage of it. Thanks - Nick
Attachment
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Alvaro Herrera
Date:
Nicholas White escribió: > > But even if we did decide to switch memory contexts on every call, it would still be much cleaner than this. > > I've removed all the bms_initalize code from the patch and am using > this solution. As the partition memory is zero-initialised I just > store a Bitmapset pointer in the WinGetPartitionLocalMemory. The > bms_add_member and bms_is_member functions behave sensibly for > null-pointer inputs (they return a bms_make_singleton in the current > memory context and false respectively). I've surrounded the calls to > bms_make_singleton with a memory context switch (to the partition's > context) so the Bitmapset stays in the partition's context. Now that I look again, would GetMemoryChunkContext() be useful here? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
> GetMemoryChunkContext Indeed - you can call that with the value WinGetPartitionLocalMemory returns to get the MemoryContext for the partition - so I've removed the function from windowapi.h that I added to get that. See attached -
Attachment
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Alvaro Herrera
Date:
We have this block: + { + /* + * This is the window we want - but we have to tweak the + * definition slightly (e.g. to support the IGNORE NULLS frame + * option) as we're not using the default (i.e. parent) frame + * options. + * + * We'll create a 'child' (using refname to inherit everything + * from the parent) that just overrides the frame options + * (assuming it doesn't already exist): + */ + WindowDef *clone = makeNode(WindowDef); ... then it goes to populate the clone. When this is done, we use the clone to walk the list of existing WindowDefs, and if there's a match we free this one and use that one. Wouldn't it be better to walk the existing array first looking for a match, and only create a clone if none is found? This would avoid the memory leak problems; I originally pointed out that you're leaking the bits created by this makeNode() call above, but now that I look at it again, I think you're also leaking the bits created by the two copyObject() calls to create the clone. It appears to me that it's simpler to not allocate any memory in the first place, unless necessary. Also, in parsenodes.h, you had the [MANDATORY] and such tags. Three things about that: 1) it looks a lot uglier than the original, so how about the modified version below? and 2) what does "MANDATORY value of NULL" means? Maybe you mean "MANDATORY value or NULL" instead? 3) Exactly what case does the "in this case" phrase refer to? I think the comment should be more explicit. Also, I think this should be its own paragraph instead of being mixed with the "For entries in a" paragraph. /** WindowDef - raw representation of WINDOW and OVER clauses** For entries in a WINDOW list, "name" is the window name beingdefined.* For OVER clauses, we use "name" for the "OVER window" syntax, or "refname"* for the "OVER (window)" syntax,which is subtly different --- the latter* implies overriding the window frame clause.* * In this case, the per-fieldindicators determine what the semantics* are:* [V]irtual* If NULL, then the parent's (refname)value is used.* [M]andatory* Never inherited from the parent, so must be specified; may be NULL.* [S]uper* Always inherited from parent, any local version ignored.*/ typedef struct WindowDef {NodeTag type;char *name; /* [M] window's own name */char *refname; /* [M] referencedwindow name, if any */List *partitionClause; /* [V] PARTITION BY expression list */List *orderClause; /* [M] ORDER BY (list of SortBy) */int frameOptions; /* [M] frame_clause options, see below */Node *startOffset; /* [M] expression for starting bound, if any */Node *endOffset; /* [M] expressionfor ending bound, if any */int location; /* parse location, or -1 if none/unknown */ } WindowDef; In gram.y there are some spurious whitespaces at end-of-line. You should be able to see them with git diff --check. (I don't think we support running pgindent on .y files, which would have otherwise cleaned this up.) A style issue. You have this: + /* + * We can process a constant offset much more efficiently; initially + * we'll scan through the first <offset> non-null rows, and store that + * index. On subsequent rows we'll decide whether to push that index + * forwards to the next non-null value, or just return it again. + */ + leadlag_const_context *context = WinGetPartitionLocalMemory( + winobj, + sizeof(leadlag_const_context)); + int count_forward = 0; I think it'd be better to put the declarations above the comment, and assignment to "context" below the comment. This way, the indentation of the assignment is not so odd. So it'd look like + leadlag_const_context *context; + int count_forward = 0; + + /* + * We can process a constant offset much more efficiently; initially + * we'll scan through the first <offset> non-null rows, and store that + * index. On subsequent rows we'll decide whether to push that index + * forwards to the next non-null value, or just return it again. + */ + context = WinGetPartitionLocalMemory(winobj, + sizeof(leadlag_const_context)); And a final style comment. You have a block like this: if (ignore_nulls && !const_offset) { long block; } else if (ignore_nulls /* && const_offset */) { another long block; } else { more stuff; } I think this looks better like this, even if it causes an extra level of indentation: if (ignore_nulls) { if (const_offset) { some stuff; } else { more; } } else { the third block; } Finally, I'm not really sure about the column you added to the regression tests table. It looks way too artificial; I mean the column name even states what test is going to use that data (respect=gorilla? uhm). I'm not sure what's a better option; maybe if you just named the column "favorite_pet" or something like that, it would appear less random. Maybe it'd be better to just create your own table for this. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Bruce Momjian
Date:
On Fri, Jul 5, 2013 at 02:36:10PM -0700, Jeff Davis wrote: > On Fri, 2013-06-28 at 13:26 -0400, Robert Haas wrote: > > I haven't really reviewed the windowing-related code in depth; I > > thought Jeff might jump back in for that part of it. Jeff, is that > > something you're planning to do? > > Yes, getting back into this patch now after a bit of delay. Jeff, any status on this? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Fri, 2013-11-29 at 16:18 -0500, Bruce Momjian wrote: > On Fri, Jul 5, 2013 at 02:36:10PM -0700, Jeff Davis wrote: > > On Fri, 2013-06-28 at 13:26 -0400, Robert Haas wrote: > > > I haven't really reviewed the windowing-related code in depth; I > > > thought Jeff might jump back in for that part of it. Jeff, is that > > > something you're planning to do? > > > > Yes, getting back into this patch now after a bit of delay. > > Jeff, any status on this? The last message was a review from Alvaro that hasn't been addressed yet. Right now I am looking at the extension templates patch. But this patch is fairly close, so if Nicholas doesn't get to looking at it, I'll see what I can do. Regards,Jeff Davis
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Bruce Momjian
Date:
On Fri, Nov 29, 2013 at 01:21:27PM -0800, Jeff Davis wrote: > On Fri, 2013-11-29 at 16:18 -0500, Bruce Momjian wrote: > > On Fri, Jul 5, 2013 at 02:36:10PM -0700, Jeff Davis wrote: > > > On Fri, 2013-06-28 at 13:26 -0400, Robert Haas wrote: > > > > I haven't really reviewed the windowing-related code in depth; I > > > > thought Jeff might jump back in for that part of it. Jeff, is that > > > > something you're planning to do? > > > > > > Yes, getting back into this patch now after a bit of delay. > > > > Jeff, any status on this? > > The last message was a review from Alvaro that hasn't been addressed > yet. > > Right now I am looking at the extension templates patch. But this patch > is fairly close, so if Nicholas doesn't get to looking at it, I'll see > what I can do. Thank you. I see it is looking very active on the commit-fest: :-) https://commitfest.postgresql.org/action/patch_view?id=1096 Thanks for all the work. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
Thanks for the detailed feedback, I'm sorry it took so long to incorporate it. I've attached the latest version of the patch, fixing in particular: > We have this block: I've re-written this so it only does a single pass through the window definitions (my patch originally added a second pass), and only does the clone if required. > In gram.y there are some spurious whitespaces at end-of-line. Fixed - I didn't know about diff --check, it's very useful! > Also, in parsenodes.h, you had the [MANDATORY] and such tags. I've re-written the comments (without tags) to make it much easier to understand . I agree they were ugly! >Exactly what case does the "in this case" phrase refer to? Clarified in the comments >A style issue. You have this: Fixed > And a final style comment. Fixed > Finally, I'm not really sure about the column you added to the regression tests table. Indeed, it was a bit artificial. I've re-written the tests to use a separate table as you suggest. Thanks - Nick
Attachment
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Abhijit Menon-Sen
Date:
Hi. What's the status of this patch? Jeff, Álvaro, you're listed as reviewers. Have you had a chance to look at the updated version that Nick posted? -- Abhijit
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Nicholas White
Date:
Hi Abhijit - > What's the status of this patch? The latest version of the patch needs a review, and I'd like to get it committed in this CF if possible. Thanks - Nick
On Wed, 2014-04-16 at 12:50 +0100, Nicholas White wrote: > Thanks for the detailed feedback, I'm sorry it took so long to > incorporate it. I've attached the latest version of the patch, fixing > in particular: I took a good look at this today. * It fails for offset of 0 with IGNORE NULLS. Fixed (trivial). * The tests are locale-sensitive. Fixed (trivial). * The leadlag_common function is just way too long. I refactored the IGNORE NULLS code into it's own function (win_get_arg_ignore_nulls()) with the same API as WinGetFuncArgInPartition. This cleans things up substantially, and makes it easier to add useful comments. * "We're implementing the former semantics, so we'll need to correct slightly" sounds arbitrary, but it's mandated by the standard. That should be clarified. * I did a lot of other refactoring within win_get_arg_ignore_nulls for the constant case. I'm not done yet, and I'm not 100% sure it's a net gain, because the code ended up a little longer. But the previous version was quite hard to follow because of so many special cases around positive versus negative offsets. For instance, having the negative 'next' value in your code actually means something quite different than when it's positive, but it took me a while to figure that out, so I made it into two variables. I hope my code is moving it in a direction that's easier for others to understand. Please let me know if you think I am making things worse with my refactorings. Otherwise I'll keep working on this and hopefully get it committable soon. The attached patch is still a WIP; just posting it here in case you see any red flags. Regards, Jeff Davis
Attachment
On Sun, 2014-07-06 at 21:11 -0700, Jeff Davis wrote: > On Wed, 2014-04-16 at 12:50 +0100, Nicholas White wrote: > > Thanks for the detailed feedback, I'm sorry it took so long to > > incorporate it. I've attached the latest version of the patch, fixing > > in particular: Looking a little more: * No tests exercise non-const offsets * No tests for default clauses with IGNORE NULLS * The use of bitmapsets is quite ugly. It would be nice if the API would grow the BMS within the memory context in which it was allocated, but I don't even see that the BMS is necessary. Why not just allocate a fixed-size array of bits, and forget the BMS? * Is there a reason you're leaving out first_value/last_value/nth_value? I think they could be supported without a lot of extra work. Regards,Jeff Davis
On Mon, 2014-07-07 at 01:21 -0700, Jeff Davis wrote: > On Sun, 2014-07-06 at 21:11 -0700, Jeff Davis wrote: > > On Wed, 2014-04-16 at 12:50 +0100, Nicholas White wrote: > > > Thanks for the detailed feedback, I'm sorry it took so long to > > > incorporate it. I've attached the latest version of the patch, fixing > > > in particular: As innocent as this patch seemed at first, it actually opens up a lot of questions. Attached is the (incomplete) edit of the patch so far. Changes from your patch: * changed test to be locale-insensitive * lots of refactoring in the execution itself * fix offset 0 case * many test improvements * remove bitmapset and just use an array bitmap * fix error message typo Open Issues: I don't think exposing the frame options is a good idea. That's an internal concept now, but putting it in windowapi.h will mean that it needs to live forever. The struct is private, so there's no easy hack to access the frame options directly. That means that we need to work with the existing API functions, which is OK because I think that everything we want to do can go into WinGetFuncArgInPartition(). If we do the same thing for WinGetFuncArgInFrame(), then first/last/nth also work. That leaves the questions: * Do we want IGNORE NULLS to work for every window function, or only a specified subset? * If it only works for some window functions, is that hard-coded or driven by the catalog? * If it works for all window functions, could it cause some pre-existing functions to behave strangely? Also, I'm re-thinking Dean's comments here: http://www.postgresql.org/message-id/CAEZATCWT3=P88nv2ThTjvRDLpOsVtAPxaVPe=MaWe-x=GuhSmg@mail.gmail.com He brings up a few good points. I will look into the frame vs. window option, though it looks like you've already at least fixed the crash. His other point about actually eliminating the NULLs from the window itself is interesting, but I don't think it works. IGNORE NULLS ignores *other* rows with NULL, but (per spec) does not ignore the current row. That sounds awkward if you've already removed the NULL rows from the window, but maybe there's something that could work. And there are a few other things I'm still looking into, but hopefully they don't raise new issues. Regards, Jeff Davis
Attachment
On Thu, 2014-07-10 at 23:43 -0700, Jeff Davis wrote: > On Mon, 2014-07-07 at 01:21 -0700, Jeff Davis wrote: > > On Sun, 2014-07-06 at 21:11 -0700, Jeff Davis wrote: > > > On Wed, 2014-04-16 at 12:50 +0100, Nicholas White wrote: > > > > Thanks for the detailed feedback, I'm sorry it took so long to > > > > incorporate it. I've attached the latest version of the patch, fixing > > > > in particular: > > As innocent as this patch seemed at first, it actually opens up a lot of > questions. > > Attached is the (incomplete) edit of the patch so far. I haven't received much feedback so far on my changes, and I don't think I will finish hacking on this in the next few days, so I will mark it "returned with feedback". I don't think it's too far away, but my changes are substantial enough (and incomplete enough) that some feedback is required. Regards,Jeff Davis
Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Stephen Frost
Date:
Jeff (Reviving an old thread for 2014...) * Jeff Davis (pgsql@j-davis.com) wrote: > On Thu, 2014-07-10 at 23:43 -0700, Jeff Davis wrote: > > On Mon, 2014-07-07 at 01:21 -0700, Jeff Davis wrote: > > > On Sun, 2014-07-06 at 21:11 -0700, Jeff Davis wrote: > > > > On Wed, 2014-04-16 at 12:50 +0100, Nicholas White wrote: > > > > > Thanks for the detailed feedback, I'm sorry it took so long to > > > > > incorporate it. I've attached the latest version of the patch, fixing > > > > > in particular: > > > > As innocent as this patch seemed at first, it actually opens up a lot of > > questions. > > > > Attached is the (incomplete) edit of the patch so far. > > I haven't received much feedback so far on my changes, and I don't think > I will finish hacking on this in the next few days, so I will mark it > "returned with feedback". > > I don't think it's too far away, but my changes are substantial enough > (and incomplete enough) that some feedback is required. Would you have time to work on this for 9.7..? I came across a real-world use case for exactly this capability and was sorely disappointed to discover we didn't support it even though there had been discussion for years on it, which quite a few interested parties. I'll take a look at reviewing your last version of the patch but wanted to get an idea of if you still had interest and might be able to help. Thanks! Stephen
Old thread link: http://www.postgresql.org/message-id/CA+=vxNa5_N1q5q5OkxC0aQnNdbo2Ru6GVw+86wk+oNsUNJDLig@mail.gmail.com On Thu, Apr 14, 2016 at 1:29 PM, Stephen Frost <sfrost@snowman.net> wrote: > Jeff > > (Reviving an old thread for 2014...) > > Would you have time to work on this for 9.7..? I came across a > real-world use case for exactly this capability and was sorely > disappointed to discover we didn't support it even though there had been > discussion for years on it, which quite a few interested parties. > > I'll take a look at reviewing your last version of the patch but wanted > to get an idea of if you still had interest and might be able to help. > > Thanks! > > Stephen There are actually quite a few issues remaining here. First, I think the syntax is still implemented in a bad way. Right now it's part of the OVER clause, and the IGNORE NULLS gets put into the frame options. It doesn't match the way the spec defines the grammar, and I don't see how it really makes sense that it's a part of the frame options or the window object at all. I believe that it should be a part of the FuncCall, and end up in the FuncExpr, so the flag can be read when the function is called without exposing frame options (which we don't want to do). I think the reason it was done as a part of the window object is so that it could save state between calls for the purpose of optimization, but I think that can be done using fn_extra. This change should remove a lot of the complexity about trying to share window definitions, etc. Second, we need a way to tell which functions support IGNORE NULLS and which do not. The grammar as implemented in the patch seems to allow it for any function with an OVER clause (which can include ordinary aggregates). Then the parse analysis hard-codes that only LEAD and LAG accept IGNORE NULLS, and throws an error otherwise. Neither of these things seem right. I think we need a need catalog support to say whether a function honors IGNORE|RESPECT NULLS or not, which means we also need support in CREATE FUNCTION. I think the execution is pretty good, except that (a) we need to keep the state in fn_extra rather than the winstate; and (b) we should get rid of the bitmaps and just do a naive scan unless we really think non-constant offsets will be important. We can always optimize more later. Regards, Jeff Davis
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
"David G. Johnston"
Date:
Just doing a drive-by...
Old thread link:
http://www.postgresql.org/message-id/CA+=vxNa5_N1q5q5OkxC0aQnNdbo2Ru6GVw+86wk+oNsUNJDLig@mail.gmail.com
On Thu, Apr 14, 2016 at 1:29 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Jeff
>
> (Reviving an old thread for 2014...)
>
> Would you have time to work on this for 9.7..? I came across a
> real-world use case for exactly this capability and was sorely
> disappointed to discover we didn't support it even though there had been
> discussion for years on it, which quite a few interested parties.
>
First, I think the syntax is still implemented in a bad way. Right now
it's part of the OVER clause, and the IGNORE NULLS gets put into the
frame options. It doesn't match the way the spec defines the grammar,
and I don't see how it really makes sense that it's a part of the
frame options or the window object at all.
How does the relatively new FILTER clause play into this, if at all?
I think we need a need catalog support to say
whether a function honors IGNORE|RESPECT NULLS or not, which means we
also need support in CREATE FUNCTION.
We already have "STRICT" for deciding whether a function processes nulls. Wouldn't this need to exist on the "CREATE AGGREGATE"
Rhetorical question: I presume we are going to punt on the issue, since it is non-standard, but what is supposed to happen with a window invocation that ignores nulls but has a non-strict function that returns a non-null on null input?
David J.
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Jeff Davis
Date:
On Fri, May 20, 2016 at 1:41 PM, David G. Johnston <david.g.johnston@gmail.com> wrote: > How does the relatively new FILTER clause play into this, if at all? My interpretation of the standard is that FILTER is not allowable for a window function, and IGNORE|RESPECT NULLS is not allowable for an ordinary aggregate. So if we support IGNORE|RESPECT NULLS for anything other than a window function, we have to come up with our own semantics. > We already have "STRICT" for deciding whether a function processes nulls. > Wouldn't this need to exist on the "CREATE AGGREGATE" STRICT defines behavior at DDL time. I was suggesting that we might want a DDL-time flag to indicate whether a function can make use of the query-time IGNORE|RESPECT NULLS option. In other words, most functions wouldn't know what to do with IGNORE|RESPECT NULLS, but perhaps some would if we allowed them the option. Perhaps I didn't understand your point? Regards, Jeff Davis
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
"David G. Johnston"
Date:
On Fri, May 20, 2016 at 1:41 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> How does the relatively new FILTER clause play into this, if at all?
My interpretation of the standard is that FILTER is not allowable for
a window function, and IGNORE|RESPECT NULLS is not allowable for an
ordinary aggregate.
So if we support IGNORE|RESPECT NULLS for anything other than a window
function, we have to come up with our own semantics.
> We already have "STRICT" for deciding whether a function processes nulls.
> Wouldn't this need to exist on the "CREATE AGGREGATE"
STRICT defines behavior at DDL time. I was suggesting that we might
want a DDL-time flag to indicate whether a function can make use of
the query-time IGNORE|RESPECT NULLS option. In other words, most
functions wouldn't know what to do with IGNORE|RESPECT NULLS, but
perhaps some would if we allowed them the option.
Perhaps I didn't understand your point?
The "this" in the quote doesn't refer to STRICT but rather the non-existence feature that we are talking about.
I am trying to make the point that the DDL syntax for "RESPECT|IGNORE NULLS" should be on the "CREATE AGGREGATE" page and not the "CREATE FUNCTION" page.
As far as a plain function cares its only responsibility with respect to NULLs is defined by the STRICT property. As this behavior only manifests in aggregate situations the corresponding property should be defined there as well.
David J.
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Emre Hasegeli
Date:
> My interpretation of the standard is that FILTER is not allowable for > a window function, and IGNORE|RESPECT NULLS is not allowable for an > ordinary aggregate. Yes, it is clear. > So if we support IGNORE|RESPECT NULLS for anything other than a window > function, we have to come up with our own semantics. I don't think this clause is useful for aggregates especially while we already have the FILTER clause. Though, I can see this error message being useful: > ERROR: IGNORE NULLS is only implemented for the lead and lag window functions Can we still give this message when the syntax is not part of the OVER clause? Thank you for returning back to this patch.
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Dean Rasheed
Date:
On 23 May 2016 at 17:01, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, May 20, 2016 at 1:41 PM, David G. Johnston > <david.g.johnston@gmail.com> wrote: >> How does the relatively new FILTER clause play into this, if at all? > > My interpretation of the standard is that FILTER is not allowable for > a window function, and IGNORE|RESPECT NULLS is not allowable for an > ordinary aggregate. > That may be so, but we already support FILTER for all windows functions as well as aggregates: https://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-AGGREGATES https://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS so to be clear, what we're talking about here is just supporting SQL standard syntax for window functions, rather than adding any new functionality, right? > So if we support IGNORE|RESPECT NULLS for anything other than a window > function, we have to come up with our own semantics. > Given that we can already do this using FILTER for aggregates, and that IGNORE|RESPECT NULLS for aggregates is not part of the SQL standard, I see no reason to support it. Regards, Dean
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Andrew Gierth
Date:
>>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes: Dean> That may be so, but we already support FILTER for all windowsDean> functions as well as aggregates: Not so: "If FILTER is specified, then only the input rows for which the filter_clause evaluates to true are fed to the window function; other rows are discarded. Only window functions that are aggregates accept a FILTER clause." (Per spec, FILTER appears only in the <aggregate function> production, which is just one of the several options for <window function>.) -- Andrew (irc:RhodiumToad)
Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls
From
Dean Rasheed
Date:
On 30 May 2016 at 15:44, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote: >>>>>> "Dean" == Dean Rasheed <dean.a.rasheed@gmail.com> writes: > > Dean> That may be so, but we already support FILTER for all windows > Dean> functions as well as aggregates: > > Not so: > > "If FILTER is specified, then only the input rows for which the > filter_clause evaluates to true are fed to the window function; other > rows are discarded. Only window functions that are aggregates accept a > FILTER clause." > Ah, yes. It's all coming back to me now. Regards, Dean