Thread: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Oliver Ford
Date:
Adds the options RESPECT/IGNORE NULLS (null treatment clause) and FROM
FIRST/LAST to the non-aggregate window functions.

A previous patch
(https://www.postgresql.org/message-id/CA+=vxNa5_N1q5q5OkxC0aQnNdbo2Ru6GVw+86wk+oNsUNJDLig@mail.gmail.com)
partially implemented this feature. However, that patch worked by
adding the null treatment clause to the window frame's frameOptions
variable, and consequently had the limitation that it wasn't possible
to reuse a window frame definition in a single query where two
functions were called that had different null treatment options. This
meant that the patch was never committed. The attached path takes a
different approach which gets around this limitation.

For example, the following query would not work correctly with the
implementation in the old patch but does with the attached patch:

WITH cte (x) AS (
select null union select 1 union select 2)
SELECT x,
first_value(x) over w as with_default,
first_value(x) respect nulls over w as with_respect,
first_value(x) ignore nulls over w as with_ignore
from cte WINDOW w as (order by x nulls first rows between unbounded
preceding and unbounded following);
 x | with_default | with_respect | with_ignore
---+--------------+--------------+-------------
   |              |              |           1
 1 |              |              |           1
 2 |              |              |           1
(3 rows)


== Implementation ==

The patch adds two types to the pg_type catalog: "ignorenulls" and
"fromlast". These types are of the Boolean category, and work as
wrappers around the bool type. They are used as function arguments to
extra versions of the window functions that take additional boolean
arguments. RESPECT NULLS and FROM FIRST are ignored by the parser, but
IGNORE NULLS and FROM LAST lead to the extra versions being called
with arguments to ignore nulls and order from last.

== Testing ==

Updated documentation and added regression tests. All existing tests
pass. This change will need a catversion bump.
Thanks to Krasiyan Andreev for initially testing this patch.

Attachment

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
David Fetter
Date:
On Fri, Jul 13, 2018 at 01:52:00PM +0100, Oliver Ford wrote:
> Adds the options RESPECT/IGNORE NULLS (null treatment clause) and FROM
> FIRST/LAST to the non-aggregate window functions.

Please find attached an updated version for OID drift.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Krasiyan Andreev
Date:
Hi,
Patch applies and compiles, all included tests and building of the docs pass.
I am using last version from more than two months ago in production environment with real data and I didn't find any bugs, 
so I'm marking this patch as ready for committer in the commitfest app.

На сб, 28.07.2018 г. в 22:00 ч. David Fetter <david@fetter.org> написа:
On Fri, Jul 13, 2018 at 01:52:00PM +0100, Oliver Ford wrote:
> Adds the options RESPECT/IGNORE NULLS (null treatment clause) and FROM
> FIRST/LAST to the non-aggregate window functions.

Please find attached an updated version for OID drift.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Andrew Gierth
Date:
>>>>> "Krasiyan" == Krasiyan Andreev <krasiyan@gmail.com> writes:

 Krasiyan> Hi,

 Krasiyan> Patch applies and compiles, all included tests and building
 Krasiyan> of the docs pass. I am using last version from more than two
 Krasiyan> months ago in production environment with real data and I
 Krasiyan> didn't find any bugs, so I'm marking this patch as ready for
 Krasiyan> committer in the commitfest app.

Unfortunately, reviewing it from a committer perspective - I can't
possibly commit this as it stands, and anything I did to it would be
basically a rewrite of much of it.

Some of the problems could be fixed. For example the type names could be
given pg_* prefixes (it's clearly not acceptable to create random
special-purpose boolean subtypes in pg_catalog and _not_ give them such
a prefix), and the precedence hackery in gram.y could have comments
added (gram.y is already bad enough; _anything_ fancy with precedence
has to be described in the comments). But I don't like that hack with
the special types at all, and I think that needs a better solution.

Normally I'd push hard to try and get some solution that's sufficiently
generic to allow user-defined functions to make use of the feature. But
I think the SQL spec people have managed to make that literally
impossible in this case, what with the FROM keyword appearing in the
middle of a production and not followed by anything sufficiently
distinctive to even use for extra token lookahead.

Also, as has been pointed out in a number of previous features, we're
starting to accumulate identifiers that are reserved in subtly different
ways from our basic four-category system (which is itself a significant
elaboration compared to the spec's simple reserved/unreserved
distinction). As I recall this objection was specifically raised for
CUBE, but justified there by the existence of the contrib/cube extension
(and the fact that the standard CUBE() construct is used only in very
specific places in the syntax). This patch would make lead / lag /
first_value / last_value / nth_value syntactically "special" while not
actually reserving them (beyond having them in unreserved_keywords); I
think serious consideration should be given to whether they should
instead become col_name_keywords (which would, I believe, make it
unnecessary to mess with precedence).

Anyone have any thoughts or comments on the above?

-- 
Andrew (irc:RhodiumToad)


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Normally I'd push hard to try and get some solution that's sufficiently
> generic to allow user-defined functions to make use of the feature. But
> I think the SQL spec people have managed to make that literally
> impossible in this case, what with the FROM keyword appearing in the
> middle of a production and not followed by anything sufficiently
> distinctive to even use for extra token lookahead.

Yeah.  Is there any appetite for a "Just Say No" approach?  That is,
refuse to implement the spec's syntax on the grounds that it's too
brain-dead to even consider, and instead provide some less random,
more extensibility-friendly way to accomplish the same thing?

The FROM FIRST/LAST bit seems particularly badly thought through,
because AFAICS it is flat out ambiguous with a normal FROM clause
immediately following the window function call.  The only way to
make it not so would be to make FIRST and LAST be fully reserved,
which is neither a good idea nor spec-compliant.

In short, there's a really good case to be made here that the SQL
committee is completely clueless about syntax design, and so we
shouldn't follow this particular pied piper.

> ... This patch would make lead / lag /
> first_value / last_value / nth_value syntactically "special" while not
> actually reserving them (beyond having them in unreserved_keywords); I
> think serious consideration should be given to whether they should
> instead become col_name_keywords (which would, I believe, make it
> unnecessary to mess with precedence).

I agree that messing with the precedence rules is likely to have
unforeseen and undesirable side-effects.  Generally, if you need
to create a precedence rule, that's because your grammar is
ambiguous.  Precedence fixes that in a well-behaved way only for
cases that actually are very much like operator precedence rules.
Otherwise, you may just be papering over something that isn't
working very well.  See e.g. commits 670a6c7a2 and 12b716457
for past cases where we learned that the hard way.  (The latter
also points out that if you must have a precedence hack, it's
safer to hack individual rules than to stick precedences onto
terminal symbols.)  In the case at hand, since the proposed patch
doesn't make FIRST and LAST be fully reserved, it seems just about
certain that it can be made to misbehave, including failing on
queries that were and should remain legal.

            regards, tom lane


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> The FROM FIRST/LAST bit seems particularly badly thought through,
 Tom> because AFAICS it is flat out ambiguous with a normal FROM clause
 Tom> immediately following the window function call. The only way to
 Tom> make it not so would be to make FIRST and LAST be fully reserved,
 Tom> which is neither a good idea nor spec-compliant.

In the actual spec syntax it's not ambiguous at all because NTH_VALUE is
a reserved word (as are LEAD, LAG, FIRST_VALUE and LAST_VALUE), and OVER
is a mandatory clause in its syntax, so a FROM appearing before the OVER
must be part of a FROM FIRST/LAST and not introducing a FROM-clause.

In our syntax, if we made NTH_VALUE etc. a col_name_keyword (and thus
not legal as a function name outside its own special syntax) it would
also become unambiguous.

i.e. given this token sequence (with . marking the current posision):

  select nth_value(x) . from first ignore 

if we know up front that "nth_value" is a window function and not any
other kind of function, we know that we have to shift the "from" rather
than reducing the select-list because we haven't seen an "over" yet.
(Neither "first" nor "ignore" are reserved, so "select foo(x) from first
ignore;" is a valid and complete query, and without reserving the
function name we'd need at least four tokens of lookahead to decide
otherwise.)

This is why I think the col_name_keyword option needs to be given
serious consideration - it still doesn't reserve the names as strongly
as the spec does, but enough to make the standard syntax work without
needing any dubious hacks.

-- 
Andrew (irc:RhodiumToad)


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> The FROM FIRST/LAST bit seems particularly badly thought through,
>  Tom> because AFAICS it is flat out ambiguous with a normal FROM clause
>  Tom> immediately following the window function call. The only way to
>  Tom> make it not so would be to make FIRST and LAST be fully reserved,
>  Tom> which is neither a good idea nor spec-compliant.

> In the actual spec syntax it's not ambiguous at all because NTH_VALUE is
> a reserved word (as are LEAD, LAG, FIRST_VALUE and LAST_VALUE), and OVER
> is a mandatory clause in its syntax, so a FROM appearing before the OVER
> must be part of a FROM FIRST/LAST and not introducing a FROM-clause.

Hmm ...

> In our syntax, if we made NTH_VALUE etc. a col_name_keyword (and thus
> not legal as a function name outside its own special syntax) it would
> also become unambiguous.
> i.e. given this token sequence (with . marking the current posision):
>   select nth_value(x) . from first ignore 
> if we know up front that "nth_value" is a window function and not any
> other kind of function, we know that we have to shift the "from" rather
> than reducing the select-list because we haven't seen an "over" yet.

I don't really find that to be a desirable solution, because quite aside
from the extensibility problem, it would mean that a lot of errors
become "syntax error" where we formerly gave a more useful message.

This does open up a thought about how to proceed, though.  I'd been
trying to think of a way to solve this using base_yylex's ability to
do some internal lookahead and change token types based on that.
If you just think of recognizing FROM FIRST/LAST, you get nowhere
because that's still legal in other contexts.  But if you were to
look for FROM followed by FIRST/LAST followed by IGNORE/RESPECT/OVER,
I think that could only validly happen in this syntax.  It'd take
some work to extend base_yylex to look ahead 2 tokens not one, but
I'm sure that could be done.  (You'd also need a lookahead rule to
match "IGNORE/RESPECT NULLS OVER", but that seems just as doable.)
Then the relevant productions use FROM_LA, IGNORE_LA, RESPECT_LA
instead of the corresponding bare tokens, and the grammar no longer
has an ambiguity problem.

            regards, tom lane


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> If you just think of recognizing FROM FIRST/LAST, you get nowhere
 Tom> because that's still legal in other contexts. But if you were to
 Tom> look for FROM followed by FIRST/LAST followed by
 Tom> IGNORE/RESPECT/OVER, I think that could only validly happen in
 Tom> this syntax.

No; you need to go four tokens ahead in total, not three. Assuming
nth_value is unreserved, then

  select nth_value(x) from first ignore;

is a valid query that has nth_value(x) as an expression, "first" as a
table name and "ignore" as its alias. Only when you see NULLS after
IGNORE, or OVER after FIRST/LAST, do you know that you're looking at
a window function and not a from clause.

So FROM_LA would have to mean "FROM" followed by any of:

  FIRST IGNORE NULLS
  LAST IGNORE NULLS
  FIRST RESPECT NULLS
  LAST RESPECT NULLS
  FIRST OVER
  LAST OVER

Remember that while OVER is reserved, all of FIRST, LAST, RESPECT and
IGNORE are unreserved.

 Tom> It'd take some work to extend base_yylex to look ahead 2 tokens
 Tom> not one, but I'm sure that could be done. (You'd also need a
 Tom> lookahead rule to match "IGNORE/RESPECT NULLS OVER", but that
 Tom> seems just as doable.) Then the relevant productions use FROM_LA,
 Tom> IGNORE_LA, RESPECT_LA instead of the corresponding bare tokens,
 Tom> and the grammar no longer has an ambiguity problem.

Yeah, but at the cost of having to extend base_yylex to go 3 tokens
ahead (not 2) rather than the current single lookahead slot.

Doable, certainly (probably not much harder to do 3 than 2 actually)

-- 
Andrew (irc:RhodiumToad)


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> If you just think of recognizing FROM FIRST/LAST, you get nowhere
>  Tom> because that's still legal in other contexts. But if you were to
>  Tom> look for FROM followed by FIRST/LAST followed by
>  Tom> IGNORE/RESPECT/OVER, I think that could only validly happen in
>  Tom> this syntax.

> No; you need to go four tokens ahead in total, not three. Assuming
> nth_value is unreserved, then
>   select nth_value(x) from first ignore;
> is a valid query that has nth_value(x) as an expression, "first" as a
> table name and "ignore" as its alias.

No, because once IGNORE is a keyword, even unreserved, it's not legal
as an AS-less alias.  We'd be breaking queries like that no matter what.

(I know there are people around here who'd like to remove that
restriction, but it's not happening anytime soon IMO.)

            regards, tom lane


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> select nth_value(x) from first ignore;

 Tom> No, because once IGNORE is a keyword, even unreserved, it's not
 Tom> legal as an AS-less alias.

That rule only applies in the select-list, not in the FROM clause; table
aliases in FROM are just ColId, so they can be anything except a fully
reserved or type_func_name keyword.

-- 
Andrew (irc:RhodiumToad)


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Andrew Gierth
Date:
So I've tried to rough out a decision tree for the various options on
how this might be implemented (discarding the "use precedence hacks"
option). Opinions? Additions?

(formatted for emacs outline-mode)

* 1. use lexical lookahead

  +: relatively straightforward parser changes
  +: no new reserved words
  +: has the option of working extensibly with all functions

  -: base_yylex needs extending to 3 lookahead tokens

** 1.1. Allow from/ignore clause on all (or all non-agg) window function calls

  If the clauses are legal on all window functions, what to do about existing
  window functions for which the clauses do not make sense?

*** 1.1.1. Ignore the clause when the function isn't aware of it

  +: simple
  -: somewhat surprising for users perhaps?

*** 1.1.2. Change the behavior of the windowapi in some consistent way

  Not sure if this can work.
  +: fairly simple(maybe?) and predictable
  -: changes the behavior of existing window functions

** 1.2. Allow from/ignore clause on only certain functions

  +: avoids any unexpected behavior
  -: needs some way to control what functions allow it

*** 1.2.1. Check the function name in parse analysis against a fixed list.

  +: simple
  -: not extensible

*** 1.2.2. Provide some option in CREATE FUNCTION

  +: extensible
  -: fairly intrusive, adding stuff to create function and pg_proc

*** 1.2.3. Do something magical with function argument types

  +: doesn't need changes in create function / pg_proc
  -: it's an ugly hack

* 2. reserve nth_value etc. as functions

  +: follows the spec reasonably well
  +: less of a hack than extending base_yylex

  -: new reserved words
  -: more parser rules
  -: not extensible

  (now goto 1.2.1)

* 3. "just say no" to the spec

  e.g. add new functions like lead_ignore_nulls(), or add extra boolean
  args to lead() etc. telling them to skip nulls

  +: simple
  -: doesn't conform to spec
  -: using extra args isn't quite the right semantics

-- 
Andrew (irc:RhodiumToad)


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> So I've tried to rough out a decision tree for the various options on
> how this might be implemented (discarding the "use precedence hacks"
> option). Opinions? Additions?

I think it'd be worth at least drafting an implementation for the
lexical-lookahead fix.  I think it's likely that we'll need to extend
base_yylex to do more lookahead in the future even if we don't do it
for this, given the SQL committee's evident love for COBOL-ish syntax
and lack of regard for what you can do in LALR(1).

The questions of how we interface to the individual window functions
are really independent of how we handle the parsing problem.  My
first inclination is to just pass the flags down to the window functions
(store them in WindowObject and provide some additional inquiry functions
in windowapi.h) and let them deal with it.

>   If the clauses are legal on all window functions, what to do about existing
>   window functions for which the clauses do not make sense?

Option 1: do nothing, document that nothing happens if w.f. doesn't
implement it.

Option 2: record whether the inquiry functions got called.  At end of
query, error out if they weren't and the options were used.

It's also worth wondering if we couldn't just implement the flags in
some generic fashion and not need to involve the window functions at
all.  FROM LAST, for example, could and perhaps should be implemented
by inverting the sort order.  Possibly IGNORE NULLS could be implemented
inside the WinGetFuncArgXXX functions?  These behaviors might or might
not make much sense with other window functions, but that doesn't seem
like it's our problem.

            regards, tom lane


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> So I've tried to rough out a decision tree for the various options
 >> on how this might be implemented (discarding the "use precedence
 >> hacks" option). Opinions? Additions?

 Tom> I think it'd be worth at least drafting an implementation for the
 Tom> lexical-lookahead fix. I think it's likely that we'll need to
 Tom> extend base_yylex to do more lookahead in the future even if we
 Tom> don't do it for this, given the SQL committee's evident love for
 Tom> COBOL-ish syntax and lack of regard for what you can do in
 Tom> LALR(1).

That's not _quite_ a fair criticism of the SQL committee; they're not
ignoring the capabilities of parsers, they're just not making their
syntax robust in the presence of the kinds of extensions and
generalizations that we want to do.

Without exception that I know of, every time we've run into a problem
that needed ugly precedence rules or extra lookahead it's been caused by
us not reserving something that is reserved in the spec, or by us
allowing constructs that are not in the spec at all (postfix operators,
especially), or by us deliberately generalizing what the spec allows
(e.g. allowing full expressions where the spec only allows column
references, or allowing extra parens around subselects) or where we've
repurposed syntax from the spec in an incompatible way (as in
ANY(array)).

Anyway, for the time being I will mark this patch as "returned with
feedback".

 >> If the clauses are legal on all window functions, what to do about
 >> existing window functions for which the clauses do not make sense?

 Tom> Option 1: do nothing, document that nothing happens if w.f.
 Tom> doesn't implement it.

That was 1.1.1 on my list.

 Tom> Option 2: record whether the inquiry functions got called. At end
 Tom> of query, error out if they weren't and the options were used.

Erroring at the _end_ of the query seems a bit of a potential surprise.

 Tom> It's also worth wondering if we couldn't just implement the flags
 Tom> in some generic fashion and not need to involve the window
 Tom> functions at all.

That was what I meant by option 1.1.2 on my list.

 Tom> FROM LAST, for example, could and perhaps should be implemented by
 Tom> inverting the sort order.

Actually that can't work for reasons brought up in the recent discussion
of optimization of window function sorts: if you change the sort order
you potentially disturb the ordering of peer rows, and the spec requires
that an (nth_value(x,n) from last over w) and (otherfunc(x) over w) for
order-equivalent windows "w" must see the peer rows in the same order.

So FROM LAST really does have to keep the original sort order, and count
backwards from the end of the window.

 Tom> Possibly IGNORE NULLS could be implemented inside the
 Tom> WinGetFuncArgXXX functions? These behaviors might or might not
 Tom> make much sense with other window functions, but that doesn't seem
 Tom> like it's our problem.

That's about what I was thinking for option 1.1.2, yes.

-- 
Andrew (irc:RhodiumToad)


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Andrew Gierth
Date:
>>>>> "Krasiyan" == Krasiyan Andreev <krasiyan@gmail.com> writes:

 Krasiyan> I am using last version from more than two months ago in
 Krasiyan> production environment with real data and I didn't find any
 Krasiyan> bugs, so I'm marking this patch as ready for committer in the
 Krasiyan> commitfest app.

Oliver (or anyone else), do you plan to continue working on this in the
immediate future, in line with the comments from myself and Tom in this
thread? If so I'll bump it to the next CF, otherwise I'll mark it
"returned with feedback".

I'm happy to help out with further work on this patch if needed, time
permitting.

-- 
Andrew (irc:RhodiumToad)


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Stephen Frost
Date:
Greetings,

This seems to have died out, and that's pretty unfortunate because this
is awfully useful SQL standard syntax that people look for and wish we
had.

* Andrew Gierth (andrew@tao11.riddles.org.uk) wrote:
> So I've tried to rough out a decision tree for the various options on
> how this might be implemented (discarding the "use precedence hacks"
> option). Opinions? Additions?
>
> (formatted for emacs outline-mode)
>
> * 1. use lexical lookahead
>
>   +: relatively straightforward parser changes
>   +: no new reserved words
>   +: has the option of working extensibly with all functions
>
>   -: base_yylex needs extending to 3 lookahead tokens

This sounds awful grotty and challenging to do and get right, and the
alternative (just reserving these, as the spec does) doesn't seem so
draconian as to be that much of an issue.

> * 2. reserve nth_value etc. as functions
>
>   +: follows the spec reasonably well
>   +: less of a hack than extending base_yylex
>
>   -: new reserved words
>   -: more parser rules
>   -: not extensible
>

For my 2c, at least, reserving these strikes me as entirely reasonable.
Yes, it sucks that we have to partially-reserve some additional
keywords, but such is life.  I get that we'll throw syntax errors
sometimes when we might have given a better error, but I think we can
accept that.

>   (now goto 1.2.1)

Hmm, not sure this was right?  but sure, I'll try...

> *** 1.2.1. Check the function name in parse analysis against a fixed list.
>
>   +: simple
>   -: not extensible

Seems like this is more-or-less required since we'd be reserving them..?

> *** 1.2.2. Provide some option in CREATE FUNCTION
>
>   +: extensible
>   -: fairly intrusive, adding stuff to create function and pg_proc

How would this work though, if we reserve the functions as keywords..?
Maybe I'm not entirely following, but wouldn't attempts to use other
functions end up with syntax errors in at least some of the cases,
meaning that having other functions support this wouldn't really work?
I don't particularly like the idea that some built-in functions would
always work but others would work but only some of the time.

> *** 1.2.3. Do something magical with function argument types
>
>   +: doesn't need changes in create function / pg_proc
>   -: it's an ugly hack

Not really a fan of 'ugly hack'.

> * 3. "just say no" to the spec
>
>   e.g. add new functions like lead_ignore_nulls(), or add extra boolean
>   args to lead() etc. telling them to skip nulls
>
>   +: simple
>   -: doesn't conform to spec
>   -: using extra args isn't quite the right semantics

Ugh, no thank you.

Thanks!

Stephen

Attachment

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Krasiyan Andreev
Date:
Thank you very much for feedback and yes, that is very useful SQL syntax.
Maybe you miss my previous answer, but you are right, that patch is currently dead,
because some important design questions must be discussed here, before patch rewriting.

I have dropped support of from first/last for nth_value(),
but also I reimplemented it in a different way,
by using negative number for the position argument, to be able to get the same frame in exact reverse order.
After that patch becomes much more simple and some concerns about precedence hack has gone.

I have not renamed special bool type "ignorenulls"
(I know that it is not acceptable way for calling extra version
of window functions, but also it makes things very easy and it can reuse frames),
but I removed the other special bool type "fromlast".

Attached file was for PostgreSQL 13 (master git branch, last commit fest),
everything was working and patch was at the time in very good shape, all tests was passed.

I read previous review and suggestions from Tom about special bool type and unreserved keywords and also,
that IGNORE NULLS could be implemented inside the WinGetFuncArgXXX functions,
but I am not sure how exactly to proceed (some example will be very helpful).

На чт, 30.04.2020 г. в 21:58 Stephen Frost <sfrost@snowman.net> написа:
Greetings,

This seems to have died out, and that's pretty unfortunate because this
is awfully useful SQL standard syntax that people look for and wish we
had.

* Andrew Gierth (andrew@tao11.riddles.org.uk) wrote:
> So I've tried to rough out a decision tree for the various options on
> how this might be implemented (discarding the "use precedence hacks"
> option). Opinions? Additions?
>
> (formatted for emacs outline-mode)
>
> * 1. use lexical lookahead
>
>   +: relatively straightforward parser changes
>   +: no new reserved words
>   +: has the option of working extensibly with all functions
>
>   -: base_yylex needs extending to 3 lookahead tokens

This sounds awful grotty and challenging to do and get right, and the
alternative (just reserving these, as the spec does) doesn't seem so
draconian as to be that much of an issue.

> * 2. reserve nth_value etc. as functions
>
>   +: follows the spec reasonably well
>   +: less of a hack than extending base_yylex
>
>   -: new reserved words
>   -: more parser rules
>   -: not extensible
>

For my 2c, at least, reserving these strikes me as entirely reasonable.
Yes, it sucks that we have to partially-reserve some additional
keywords, but such is life.  I get that we'll throw syntax errors
sometimes when we might have given a better error, but I think we can
accept that.

>   (now goto 1.2.1)

Hmm, not sure this was right?  but sure, I'll try...

> *** 1.2.1. Check the function name in parse analysis against a fixed list.
>
>   +: simple
>   -: not extensible

Seems like this is more-or-less required since we'd be reserving them..?

> *** 1.2.2. Provide some option in CREATE FUNCTION
>
>   +: extensible
>   -: fairly intrusive, adding stuff to create function and pg_proc

How would this work though, if we reserve the functions as keywords..?
Maybe I'm not entirely following, but wouldn't attempts to use other
functions end up with syntax errors in at least some of the cases,
meaning that having other functions support this wouldn't really work?
I don't particularly like the idea that some built-in functions would
always work but others would work but only some of the time.

> *** 1.2.3. Do something magical with function argument types
>
>   +: doesn't need changes in create function / pg_proc
>   -: it's an ugly hack

Not really a fan of 'ugly hack'.

> * 3. "just say no" to the spec
>
>   e.g. add new functions like lead_ignore_nulls(), or add extra boolean
>   args to lead() etc. telling them to skip nulls
>
>   +: simple
>   -: doesn't conform to spec
>   -: using extra args isn't quite the right semantics

Ugh, no thank you.

Thanks!

Stephen
Attachment

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tatsuo Ishii
Date:
I revisited the thread:
https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com

and came up with attached POC patch (I used some varibale names
appearing in the Krasiyan Andreev's patch). I really love to have
RESPECT/IGNORE NULLS because I believe they are convenient for
users. For FIRST/LAST I am not so excited since there are alternatives
as our document stats, so FIRST/LAST are not included in the patch.

Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
NULLS. I think it's not hard to implement it for others (lead, lag,
first_value and last_value).  No document nor test patches are
included for now.

Note that RESPECT/IGNORE are not registered as reserved keywords in
this patch (but registered as unreserved keywords). I am not sure if
this is acceptable or not.

> The questions of how we interface to the individual window functions
> are really independent of how we handle the parsing problem.  My
> first inclination is to just pass the flags down to the window functions
> (store them in WindowObject and provide some additional inquiry functions
> in windowapi.h) and let them deal with it.

I agree with this.  Also I do not change the prototype of
nth_value. So I pass RESPECT/IGNORE NULLS information from the raw
parser to parse/analysis and finally to WindowObject.

> It's also worth wondering if we couldn't just implement the flags in
> some generic fashion and not need to involve the window functions at
> all.  FROM LAST, for example, could and perhaps should be implemented
> by inverting the sort order.  Possibly IGNORE NULLS could be implemented
> inside the WinGetFuncArgXXX functions?  These behaviors might or might
> not make much sense with other window functions, but that doesn't seem
> like it's our problem.

Yes, probably we could make WinGetFuncArgXXX a little bit smarter in
this direction (not implemented in the patch at this point).

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
From 07f01f8859e159c908ada72e8f53daf51e0b8bdf Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <ishii@postgresql.org>
Date: Sat, 22 Apr 2023 16:52:50 +0900
Subject: [PATCH v1 1/3] Implement IGNORE or RESPECT NULLS parse/analysis part.

Implement SQL standard's IGNORE/RESPECT NULLS clause for window functions.
For now, only nth_value() can use this option.
---
 src/backend/parser/gram.y       | 22 ++++++++++++++++++----
 src/backend/parser/parse_func.c | 13 +++++++++++++
 src/include/nodes/parsenodes.h  |  8 ++++++++
 src/include/nodes/primnodes.h   |  2 ++
 src/include/parser/kwlist.h     |  2 ++
 5 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index acf6cf4866..2980ecd666 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,6 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
     MergeWhenClause *mergewhen;
     struct KeyActions *keyactions;
     struct KeyAction *keyaction;
+    NullTreatment    nulltreatment;
 }
 
 %type <node>    stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -661,6 +662,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
                     json_object_constructor_null_clause_opt
                     json_array_constructor_null_clause_opt
 
+%type <nulltreatment>        opt_null_treatment
+
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -718,7 +721,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
     HANDLER HAVING HEADER_P HOLD HOUR_P
 
-    IDENTITY_P IF_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
+    IDENTITY_P IF_P IGNORE_P ILIKE IMMEDIATE IMMUTABLE IMPLICIT_P IMPORT_P IN_P INCLUDE
     INCLUDING INCREMENT INDENT INDEX INDEXES INHERIT INHERITS INITIALLY INLINE_P
     INNER_P INOUT INPUT_P INSENSITIVE INSERT INSTEAD INT_P INTEGER
     INTERSECT INTERVAL INTO INVOKER IS ISNULL ISOLATION
@@ -752,7 +755,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 
     RANGE READ REAL REASSIGN RECHECK RECURSIVE REF_P REFERENCES REFERENCING
     REFRESH REINDEX RELATIVE_P RELEASE RENAME REPEATABLE REPLACE REPLICA
-    RESET RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
+    RESET RESPECT RESTART RESTRICT RETURN RETURNING RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
     ROUTINE ROUTINES ROW ROWS RULE
 
     SAVEPOINT SCALAR SCHEMA SCHEMAS SCROLL SEARCH SECOND_P SECURITY SELECT
@@ -15213,7 +15216,7 @@ func_application: func_name '(' ')'
  * (Note that many of the special SQL functions wouldn't actually make any
  * sense as functional index entries, but we ignore that consideration here.)
  */
-func_expr: func_application within_group_clause filter_clause over_clause
+func_expr: func_application within_group_clause filter_clause opt_null_treatment over_clause
                 {
                     FuncCall   *n = (FuncCall *) $1;
 
@@ -15246,7 +15249,8 @@ func_expr: func_application within_group_clause filter_clause over_clause
                         n->agg_within_group = true;
                     }
                     n->agg_filter = $3;
-                    n->over = $4;
+                    n->null_treatment = $4;
+                    n->over = $5;
                     $$ = (Node *) n;
                 }
             | json_aggregate_func filter_clause over_clause
@@ -15790,6 +15794,14 @@ filter_clause:
             | /*EMPTY*/                                { $$ = NULL; }
         ;
 
+/*
+ * Window function option clauses
+ */
+opt_null_treatment:
+            RESPECT NULLS_P                            { $$ = RESPECT_NULLS; }
+            | IGNORE_P NULLS_P                        { $$ = IGNORE_NULLS; }
+            | /*EMPTY*/                                { $$ = NULL_TREATMENT_NOT_SET; }
+        ;
 
 /*
  * Window Definitions
@@ -17111,6 +17123,7 @@ unreserved_keyword:
             | HOUR_P
             | IDENTITY_P
             | IF_P
+            | IGNORE_P
             | IMMEDIATE
             | IMMUTABLE
             | IMPLICIT_P
@@ -17223,6 +17236,7 @@ unreserved_keyword:
             | REPLACE
             | REPLICA
             | RESET
+            | RESPECT
             | RESTART
             | RESTRICT
             | RETURN
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index b3f0b6a137..92af0d10f1 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -31,6 +31,7 @@
 #include "parser/parse_target.h"
 #include "parser/parse_type.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"
 
@@ -99,6 +100,7 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
     bool        agg_distinct = (fn ? fn->agg_distinct : false);
     bool        func_variadic = (fn ? fn->func_variadic : false);
     CoercionForm funcformat = (fn ? fn->funcformat : COERCE_EXPLICIT_CALL);
+    NullTreatment null_treatment = (fn ? fn->null_treatment : NULL_TREATMENT_NOT_SET);
     bool        could_be_projection;
     Oid            rettype;
     Oid            funcid;
@@ -534,6 +536,13 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
                      errmsg("window function %s cannot have WITHIN GROUP",
                             NameListToString(funcname)),
                      parser_errposition(pstate, location)));
+        /* Check RESPECT NULLS or IGNORE NULLS is specified. They are only allowed with nth_value */
+        if (null_treatment != NULL_TREATMENT_NOT_SET && funcid != F_NTH_VALUE)
+            ereport(ERROR,
+                    (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                     errmsg("window function %s cannot have RESPECT NULLS or IGNORE NULLS",
+                            NameListToString(funcname)),
+                     parser_errposition(pstate, location)));
     }
     else if (fdresult == FUNCDETAIL_COERCION)
     {
@@ -835,6 +844,10 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
         wfunc->winagg = (fdresult == FUNCDETAIL_AGGREGATE);
         wfunc->aggfilter = agg_filter;
         wfunc->location = location;
+        if (null_treatment == IGNORE_NULLS)
+            wfunc->ignorenulls = true;
+        else
+            wfunc->ignorenulls = false;
 
         /*
          * agg_star is allowed for aggregate functions but distinct isn't
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index cc7b32b279..f13ae26a24 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -404,6 +404,13 @@ typedef struct RoleSpec
     int            location;        /* token location, or -1 if unknown */
 } RoleSpec;
 
+typedef enum NullTreatment
+{
+    NULL_TREATMENT_NOT_SET = 0,
+    RESPECT_NULLS,
+    IGNORE_NULLS
+} NullTreatment;
+
 /*
  * FuncCall - a function or aggregate invocation
  *
@@ -431,6 +438,7 @@ typedef struct FuncCall
     bool        agg_distinct;    /* arguments were labeled DISTINCT */
     bool        func_variadic;    /* last argument was labeled VARIADIC */
     CoercionForm funcformat;    /* how to display this node */
+    NullTreatment null_treatment;    /* RESPECT_NULLS or IGNORE NULLS */
     int            location;        /* token location, or -1 if unknown */
 } FuncCall;
 
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index be9c29f0bf..213297dbd3 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -559,6 +559,8 @@ typedef struct WindowFunc
     bool        winstar pg_node_attr(query_jumble_ignore);
     /* is function a simple aggregate? */
     bool        winagg pg_node_attr(query_jumble_ignore);
+    /* true if IGNORE NULLS, false if RESPECT NULLS */
+    bool        ignorenulls pg_node_attr(query_jumble_ignore);
     /* token location, or -1 if unknown */
     int            location;
 } WindowFunc;
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index f5b2e61ca5..c7e61a8f0e 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -198,6 +198,7 @@ PG_KEYWORD("hold", HOLD, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("hour", HOUR_P, UNRESERVED_KEYWORD, AS_LABEL)
 PG_KEYWORD("identity", IDENTITY_P, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("if", IF_P, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("ignore", IGNORE_P, UNRESERVED_KEYWORD, AS_LABEL)
 PG_KEYWORD("ilike", ILIKE, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
 PG_KEYWORD("immediate", IMMEDIATE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("immutable", IMMUTABLE, UNRESERVED_KEYWORD, BARE_LABEL)
@@ -360,6 +361,7 @@ PG_KEYWORD("repeatable", REPEATABLE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replace", REPLACE, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("replica", REPLICA, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("reset", RESET, UNRESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("respect", RESPECT, UNRESERVED_KEYWORD, AS_LABEL)
 PG_KEYWORD("restart", RESTART, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("restrict", RESTRICT, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("return", RETURN, UNRESERVED_KEYWORD, BARE_LABEL)
-- 
2.25.1

From 3dc6f4bb897f76247589db018716bf5680d5331c Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <ishii@postgresql.org>
Date: Sat, 22 Apr 2023 16:58:48 +0900
Subject: [PATCH v1 2/3] Implement IGNORE or RESPECT NULLS planner part.

---
 src/backend/optimizer/util/clauses.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index a9c7bc342e..40fc62c447 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2474,6 +2474,7 @@ eval_const_expressions_mutator(Node *node,
                 newexpr->winref = expr->winref;
                 newexpr->winstar = expr->winstar;
                 newexpr->winagg = expr->winagg;
+                newexpr->ignorenulls = expr->ignorenulls;
                 newexpr->location = expr->location;
 
                 return (Node *) newexpr;
-- 
2.25.1

From a78feec9bf7b08c644c2b3089b2de9237d4fcd9e Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii <ishii@postgresql.org>
Date: Sat, 22 Apr 2023 17:00:06 +0900
Subject: [PATCH v1 3/3] Implement IGNORE or RESPECT NULLS executor and window
 functions part.

---
 src/backend/executor/nodeWindowAgg.c | 11 ++++++++++
 src/backend/utils/adt/windowfuncs.c  | 30 +++++++++++++++++++++++++---
 src/include/windowapi.h              |  2 ++
 3 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 3ac581a711..7e2affb12c 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -69,6 +69,7 @@ typedef struct WindowObjectData
     int            readptr;        /* tuplestore read pointer for this fn */
     int64        markpos;        /* row that markptr is positioned on */
     int64        seekpos;        /* row that readptr is positioned on */
+    WindowFunc    *wfunc;            /* WindowFunc of this function */
 } WindowObjectData;
 
 /*
@@ -2617,6 +2618,7 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
             winobj->winstate = winstate;
             winobj->argstates = wfuncstate->args;
             winobj->localmem = NULL;
+            winobj->wfunc = wfunc;
             perfuncstate->winobj = winobj;
 
             /* It's a real window function, so set up to call it. */
@@ -3620,3 +3622,12 @@ WinGetFuncArgCurrent(WindowObject winobj, int argno, bool *isnull)
     return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno),
                         econtext, isnull);
 }
+
+/*
+ * Return current WindowFunc
+ */
+WindowFunc    *
+WinGetWindowFunc(WindowObject winobj)
+{
+    return winobj->wfunc;
+}
diff --git a/src/backend/utils/adt/windowfuncs.c b/src/backend/utils/adt/windowfuncs.c
index b87a624fb2..919295ba13 100644
--- a/src/backend/utils/adt/windowfuncs.c
+++ b/src/backend/utils/adt/windowfuncs.c
@@ -693,6 +693,7 @@ window_nth_value(PG_FUNCTION_ARGS)
     bool        const_offset;
     Datum        result;
     bool        isnull;
+    bool        isout;
     int32        nth;
 
     nth = DatumGetInt32(WinGetFuncArgCurrent(winobj, 1, &isnull));
@@ -705,9 +706,32 @@ window_nth_value(PG_FUNCTION_ARGS)
                 (errcode(ERRCODE_INVALID_ARGUMENT_FOR_NTH_VALUE),
                  errmsg("argument of nth_value must be greater than zero")));
 
-    result = WinGetFuncArgInFrame(winobj, 0,
-                                  nth - 1, WINDOW_SEEK_HEAD, const_offset,
-                                  &isnull, NULL);
+    if (WinGetWindowFunc(winobj)->ignorenulls)
+    {
+        int        i, n;
+
+        i = n = 0;
+
+        for (;;)
+        {
+            result = WinGetFuncArgInFrame(winobj, 0,
+                                          i++, WINDOW_SEEK_HEAD, false,
+                                          &isnull, &isout);
+            if (isout)
+                break;
+
+            if (!isnull)
+            {
+                if (n == nth - 1)
+                    break;
+                n++;
+            }
+        }
+    }
+    else
+        result = WinGetFuncArgInFrame(winobj, 0,
+                                      nth - 1, WINDOW_SEEK_HEAD, const_offset,
+                                      &isnull, NULL);
     if (isnull)
         PG_RETURN_NULL();
 
diff --git a/src/include/windowapi.h b/src/include/windowapi.h
index b8c2c565d1..64f7d4c84d 100644
--- a/src/include/windowapi.h
+++ b/src/include/windowapi.h
@@ -61,4 +61,6 @@ extern Datum WinGetFuncArgInFrame(WindowObject winobj, int argno,
 extern Datum WinGetFuncArgCurrent(WindowObject winobj, int argno,
                                   bool *isnull);
 
+extern WindowFunc    *WinGetWindowFunc(WindowObject winobj);
+
 #endif                            /* WINDOWAPI_H */
-- 
2.25.1


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Oliver Ford
Date:


On Sat, 22 Apr 2023, 13:14 Tatsuo Ishii, <ishii@sraoss.co.jp> wrote:
I revisited the thread:
https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com

and came up with attached POC patch (I used some varibale names
appearing in the Krasiyan Andreev's patch). I really love to have
RESPECT/IGNORE NULLS because I believe they are convenient for
users. For FIRST/LAST I am not so excited since there are alternatives
as our document stats, so FIRST/LAST are not included in the patch.

Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
NULLS. I think it's not hard to implement it for others (lead, lag,
first_value and last_value).  No document nor test patches are
included for now.

I've actually recently been looking at this feature again recently as well. One thing I wondered, but would need consensus, is to take the SEEK_HEAD/SEEK_TAIL case statements out of WinGetFuncArgInPartition. This function is only called by leadlag_common, which uses SEEK_CURRENT, so those case statements are never reached. Taking them out simplifies the code as it is but means future features might need it re-added (although I'm not sure the use case for it, as that function is for window funcs that ignore the frame options).


Note that RESPECT/IGNORE are not registered as reserved keywords in
this patch (but registered as unreserved keywords). I am not sure if
this is acceptable or not.

> The questions of how we interface to the individual window functions
> are really independent of how we handle the parsing problem.  My
> first inclination is to just pass the flags down to the window functions
> (store them in WindowObject and provide some additional inquiry functions
> in windowapi.h) and let them deal with it.
I agree with this.  Also I do not change the prototype of
nth_value. So I pass RESPECT/IGNORE NULLS information from the raw
parser to parse/analysis and finally to WindowObject.

This is a much better option than my older patch which needed to change the functions.


> It's also worth wondering if we couldn't just implement the flags in
> some generic fashion and not need to involve the window functions at
> all.  FROM LAST, for example, could and perhaps should be implemented
> by inverting the sort order.  Possibly IGNORE NULLS could be implemented
> inside the WinGetFuncArgXXX functions?  These behaviors might or might
> not make much sense with other window functions, but that doesn't seem
> like it's our problem.

Yes, probably we could make WinGetFuncArgXXX a little bit smarter in
this direction (not implemented in the patch at this point).

+1 for doing it here. Maybe also refactor WinGetFuncArgInFrame, putting the exclusion checks in a static function as that function is already pretty big?


Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Vik Fearing
Date:
On 4/22/23 14:14, Tatsuo Ishii wrote:
> I revisited the thread:
> https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com
> 
> and came up with attached POC patch (I used some varibale names
> appearing in the Krasiyan Andreev's patch). I really love to have
> RESPECT/IGNORE NULLS because I believe they are convenient for
> users.

Excellent.  I was thinking about picking my version of this patch up 
again, but I think this might be better than mine.

I am curious why set_mark is false in the IGNORE version instead of also 
being const_offset.  Surely the nth non-null in the frame will never go 
backwards.

Dealing with marks was the main reason (I think) that my patch was not 
accepted.

> For FIRST/LAST I am not so excited since there are alternatives
> as our document stats,

I disagree with this.  The point of having FROM LAST is to avoid 
calculating a new window and running a new pass over it.

 > so FIRST/LAST are not included in the patch.

I do agree that we can have <null treatment> without <from first or 
last> so let's move forward with this and handle the latter later.

> Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
> NULLS.

This should not be hard coded.  It should be a new field in pg_proc 
(with a sanity check that it is only true for window functions).  That 
way custom window functions can implement it.

> I think it's not hard to implement it for others (lead, lag,
> first_value and last_value).

It doesn't seem like it should be, no.

> No document nor test patches are included for now.

I can volunteer to work on these if you want.

> Note that RESPECT/IGNORE are not registered as reserved keywords in
> this patch (but registered as unreserved keywords). I am not sure if
> this is acceptable or not.

For me, this is perfectly okay.  Keep them at the lowest level of 
reservation as possible.
-- 
Vik Fearing




Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tom Lane
Date:
Vik Fearing <vik@postgresfriends.org> writes:
> On 4/22/23 14:14, Tatsuo Ishii wrote:
>> Note that RESPECT/IGNORE are not registered as reserved keywords in
>> this patch (but registered as unreserved keywords). I am not sure if
>> this is acceptable or not.

> For me, this is perfectly okay.  Keep them at the lowest level of 
> reservation as possible.

Yeah, keep them unreserved if at all possible.  Any higher reservation
level risks breaking existing applications that might be using these
words as column or function names.

            regards, tom lane



Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tatsuo Ishii
Date:
> Excellent.  I was thinking about picking my version of this patch up
> again, but I think this might be better than mine.

Thanks.

> I am curious why set_mark is false in the IGNORE version instead of
> also being const_offset.  Surely the nth non-null in the frame will
> never go backwards.

Initially I thought that too. But when I used const_offset instead of
false. I got an error:

ERROR:  cannot fetch row before WindowObject's mark position

> I do agree that we can have <null treatment> without <from first or
> last> so let's move forward with this and handle the latter later.

Agreed.

>> Currently in the patch only nth_value is allowed to use RESPECT/IGNORE
>> NULLS.
>
> This should not be hard coded.  It should be a new field in pg_proc
> (with a sanity check that it is only true for window functions).  That
> way custom window functions can implement it.

There were some discussions on this in the past.
https://www.postgresql.org/message-id/flat/CAGMVOdsbtRwE_4%2Bv8zjH1d9xfovDeQAGLkP_B6k69_VoFEgX-A%40mail.gmail.com

It seems Tom and Andrew thought that "1.1.2. Change the behavior of
the windowapi in some consistent way" is ambitious. If we follow this
direction, I think each window function should check WindowFunc struct
passed by WinGetWindowFunc (added in my patch) to check whether IGNORE
NULLS can be applied or not in the function. If not, error out. This
way, we don't need to add a new field to pg_proc.

>> No document nor test patches are included for now.
> 
> I can volunteer to work on these if you want.

Thanks! I think you can work on top of the last patch posted by Krasiyan Andreev:
https://www.postgresql.org/message-id/CAN1PwonAnC-KkRyY%2BDtRmxQ8rjdJw%2BgcOsHruLr6EnF7zSMH%3DQ%40mail.gmail.com

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tatsuo Ishii
Date:
> Vik Fearing <vik@postgresfriends.org> writes:
>> On 4/22/23 14:14, Tatsuo Ishii wrote:
>>> Note that RESPECT/IGNORE are not registered as reserved keywords in
>>> this patch (but registered as unreserved keywords). I am not sure if
>>> this is acceptable or not.
> 
>> For me, this is perfectly okay.  Keep them at the lowest level of 
>> reservation as possible.
> 
> Yeah, keep them unreserved if at all possible.  Any higher reservation
> level risks breaking existing applications that might be using these
> words as column or function names.

Agreed.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Oliver Ford
Date:


On Sun, Apr 23, 2023 at 4:29 AM Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
> Vik Fearing <vik@postgresfriends.org> writes:
>
>> For me, this is perfectly okay.  Keep them at the lowest level of
>> reservation as possible.
>
> Yeah, keep them unreserved if at all possible.  Any higher reservation
> level risks breaking existing applications that might be using these
> words as column or function names.

Agreed.
 
Attached is a new version of the code and tests to implement this. There's now no modification to windowfuncs.c or the catalog,
it's only a bool added to FuncCall which if set to true, ignores nulls. It adds IGNORE/RESPECT at the Unreserved, As Label level.

The implementation also aims at better performance over previous versions by not disabling set_mark, and using an array to
track previous non-null positions in SEEK_HEAD or SEEK_CURRENT with Forward (lead, but not lag). The mark is set if a row
is out of frame and further rows can't be in frame (to ensure it works with an exclusion clause).

The attached test patch is mostly the same as in the previous patch set, but it doesn't fail on row_number anymore as the main patch
only rejects aggregate functions. The test patch also adds a test for EXCLUDE CURRENT ROW and for two contiguous null rows.

I've not yet tested custom window functions with the patch, but I'm happy to add them to the test patch in v2 if we want to go this way
in implementing this feature.
Attachment

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tatsuo Ishii
Date:
> The attached test patch is mostly the same as in the previous patch
> set, but it doesn't fail on row_number anymore as the main patch
> only rejects aggregate functions. The test patch also adds a test for

> +SELECT sum(orbit) RESPECT NULLS OVER () FROM planets; -- succeeds

I think the standard does not allow to specify RESPECT NULLS other
than lead, lag, first_value, last_value and nth_value. Unless we agree
that PostgreSQL violates the standard in this regard, you should not
allow to use RESPECT NULLS for the window functions, expect lead etc.
and aggregates.

See my patch.

> +/*
> + * Window function option clauses
> + */
> +opt_null_treatment:
> +            RESPECT NULLS_P                            { $$ = RESPECT_NULLS; }
> +            | IGNORE_P NULLS_P                        { $$ = IGNORE_NULLS; }
> +            | /*EMPTY*/                                { $$ = NULL_TREATMENT_NOT_SET; }
> +        ;

With this, you can check if null treatment clause is used or not in
each window function.

In my previous patch I did the check in parse/analysis but I think
it's better to be checked in each window function. This way,

- need not to add a column to pg_proc.

- allow user defined window functions to decide by themselves whether
  they can accept null treatment option.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tatsuo Ishii
Date:
>> The attached test patch is mostly the same as in the previous patch
>> set, but it doesn't fail on row_number anymore as the main patch
>> only rejects aggregate functions. The test patch also adds a test for
> 
>> +SELECT sum(orbit) RESPECT NULLS OVER () FROM planets; -- succeeds
> 
> I think the standard does not allow to specify RESPECT NULLS other
> than lead, lag, first_value, last_value and nth_value. Unless we agree
> that PostgreSQL violates the standard in this regard, you should not
> allow to use RESPECT NULLS for the window functions, expect lead etc.
> and aggregates.
> 
> See my patch.
> 
>> +/*
>> + * Window function option clauses
>> + */
>> +opt_null_treatment:
>> +            RESPECT NULLS_P                            { $$ = RESPECT_NULLS; }
>> +            | IGNORE_P NULLS_P                        { $$ = IGNORE_NULLS; }
>> +            | /*EMPTY*/                                { $$ = NULL_TREATMENT_NOT_SET; }
>> +        ;
> 
> With this, you can check if null treatment clause is used or not in
> each window function.
> 
> In my previous patch I did the check in parse/analysis but I think
> it's better to be checked in each window function. This way,
> 
> - need not to add a column to pg_proc.
> 
> - allow user defined window functions to decide by themselves whether
>   they can accept null treatment option.

Attached is the patch to implement this (on top of your patch).

test=# SELECT row_number() RESPECT NULLS OVER () FROM (SELECT 1) AS s;
ERROR:  window function row_number cannot have RESPECT NULLS or IGNORE NULLS

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index fac0e05dee..b8519d9890 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -74,6 +74,7 @@ typedef struct WindowObjectData
     int64        *win_nonnulls;    /* tracks non-nulls in ignore nulls mode */
     int            nonnulls_size;    /* track size of the win_nonnulls array */
     int            nonnulls_len;    /* track length of the win_nonnulls array */
+    WindowFunc    *wfunc;            /* WindowFunc of this function */
 } WindowObjectData;
 
 /*
@@ -2634,6 +2635,8 @@ ExecInitWindowAgg(WindowAgg *node, EState *estate, int eflags)
                 winobj->nonnulls_len = 0;
             }
 
+            winobj->wfunc = wfunc;
+
             /* It's a real window function, so set up to call it. */
             fmgr_info_cxt(wfunc->winfnoid, &perfuncstate->flinfo,
                           econtext->ecxt_per_query_memory);
@@ -3881,3 +3884,24 @@ WinGetFuncArgCurrent(WindowObject winobj, int argno, bool *isnull)
     return ExecEvalExpr((ExprState *) list_nth(winobj->argstates, argno),
                         econtext, isnull);
 }
+
+/*
+ * Error out that this window function cannot have null treatement.
+ */
+void
+ErrorOutNullTreatment(WindowObject winobj)
+{
+    char            *fname;
+
+    Assert(WindowObjectIsValid(winobj));
+
+    if (winobj->wfunc->null_treatment == NULL_TREATMENT_NOT_SET)
+        return;
+
+    fname = get_func_name(winobj->wfunc->winfnoid);
+
+    ereport(ERROR,
+            (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+             errmsg("window function %s cannot have RESPECT NULLS or IGNORE NULLS",
+                    fname)));
+}
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 01fd16acf9..05e64c4569 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -2475,6 +2475,7 @@ eval_const_expressions_mutator(Node *node,
                 newexpr->winstar = expr->winstar;
                 newexpr->winagg = expr->winagg;
                 newexpr->ignore_nulls = expr->ignore_nulls;
+                newexpr->null_treatment = expr->null_treatment;
                 newexpr->location = expr->location;
 
                 return (Node *) newexpr;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 58c00bfd4f..e131428e85 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -276,6 +276,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
     MergeWhenClause *mergewhen;
     struct KeyActions *keyactions;
     struct KeyAction *keyaction;
+    NullTreatment    nulltreatment;
 }
 
 %type <node>    stmt toplevel_stmt schema_stmt routine_body_stmt
@@ -633,7 +634,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
                 opt_frame_clause frame_extent frame_bound
 %type <ival>    opt_window_exclusion_clause
 %type <str>        opt_existing_window_name
-%type <boolean> null_treatment
 %type <boolean> opt_if_not_exists
 %type <boolean> opt_unique_null_treatment
 %type <ival>    generated_when override_kind
@@ -662,6 +662,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
                     json_object_constructor_null_clause_opt
                     json_array_constructor_null_clause_opt
 
+%type <nulltreatment>        null_treatment
+
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -15247,7 +15249,7 @@ func_expr: func_application within_group_clause filter_clause null_treatment ove
                         n->agg_within_group = true;
                     }
                     n->agg_filter = $3;
-                    n->ignore_nulls = $4;
+                    n->null_treatment = $4;
                     n->over = $5;
                     $$ = (Node *) n;
                 }
@@ -15797,9 +15799,9 @@ filter_clause:
  * Window Definitions
  */
 null_treatment:
-            IGNORE_P NULLS_P                        { $$ = true; }
-            | RESPECT_P NULLS_P                        { $$ = false; }
-            | /*EMPTY*/                                { $$ = false; }
+            RESPECT_P NULLS_P                        { $$ = RESPECT_NULLS; }
+            | IGNORE_P NULLS_P                        { $$ = IGNORE_NULLS; }
+            | /*EMPTY*/                                { $$ = NULL_TREATMENT_NOT_SET; }
         ;
 
 window_clause:
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index afa4bcc8d1..63af8ca6aa 100644
--- a/src/backend/parser/parse_func.c
+++ b/src/backend/parser/parse_func.c
@@ -98,7 +98,7 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
     bool        agg_star = (fn ? fn->agg_star : false);
     bool        agg_distinct = (fn ? fn->agg_distinct : false);
     bool        func_variadic = (fn ? fn->func_variadic : false);
-    bool        ignore_nulls = (fn ? fn->ignore_nulls : false);
+    NullTreatment null_treatment = (fn ? fn->null_treatment : NULL_TREATMENT_NOT_SET);
     CoercionForm funcformat = (fn ? fn->funcformat : COERCE_EXPLICIT_CALL);
     bool        could_be_projection;
     Oid            rettype;
@@ -516,11 +516,12 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
                                 NameListToString(funcname)),
                          parser_errposition(pstate, location)));
 
-            /* It also can't treat nulls as a window function */
-            if (ignore_nulls)
+            /* Aggregate functions cannot have null treatment clause */
+            if (null_treatment != NULL_TREATMENT_NOT_SET)
                 ereport(ERROR,
                         (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-                         errmsg("aggregate functions do not accept RESPECT/IGNORE NULLS"),
+                         errmsg("aggregate function %s cannot have RESPECT NULLS or IGNORE NULLS",
+                                NameListToString(funcname)),
                          parser_errposition(pstate, location)));
         }
     }
@@ -842,7 +843,8 @@ ParseFuncOrColumn(ParseState *pstate, List *funcname, List *fargs,
         wfunc->winstar = agg_star;
         wfunc->winagg = (fdresult == FUNCDETAIL_AGGREGATE);
         wfunc->aggfilter = agg_filter;
-        wfunc->ignore_nulls = ignore_nulls;
+        wfunc->null_treatment = null_treatment;
+        wfunc->ignore_nulls = (null_treatment == IGNORE_NULLS);
         wfunc->location = location;
 
         /*
diff --git a/src/backend/utils/adt/windowfuncs.c b/src/backend/utils/adt/windowfuncs.c
index b87a624fb2..297e787927 100644
--- a/src/backend/utils/adt/windowfuncs.c
+++ b/src/backend/utils/adt/windowfuncs.c
@@ -85,6 +85,9 @@ window_row_number(PG_FUNCTION_ARGS)
     WindowObject winobj = PG_WINDOW_OBJECT();
     int64        curpos = WinGetCurrentPosition(winobj);
 
+    /* row_number() does not support null treatment */
+    ErrorOutNullTreatment(winobj);
+
     WinSetMarkPosition(winobj, curpos);
     PG_RETURN_INT64(curpos + 1);
 }
@@ -140,6 +143,9 @@ window_rank(PG_FUNCTION_ARGS)
     rank_context *context;
     bool        up;
 
+    /* rank() does not support null treatment */
+    ErrorOutNullTreatment(winobj);
+
     up = rank_up(winobj);
     context = (rank_context *)
         WinGetPartitionLocalMemory(winobj, sizeof(rank_context));
@@ -202,6 +208,9 @@ window_dense_rank(PG_FUNCTION_ARGS)
     rank_context *context;
     bool        up;
 
+    /* dense_rank() does not support null treatment */
+    ErrorOutNullTreatment(winobj);
+
     up = rank_up(winobj);
     context = (rank_context *)
         WinGetPartitionLocalMemory(winobj, sizeof(rank_context));
@@ -266,6 +275,9 @@ window_percent_rank(PG_FUNCTION_ARGS)
 
     Assert(totalrows > 0);
 
+    /* percent_rank() does not support null treatment */
+    ErrorOutNullTreatment(winobj);
+
     up = rank_up(winobj);
     context = (rank_context *)
         WinGetPartitionLocalMemory(winobj, sizeof(rank_context));
@@ -335,6 +347,9 @@ window_cume_dist(PG_FUNCTION_ARGS)
 
     Assert(totalrows > 0);
 
+    /* cume_dist() does not support null treatment */
+    ErrorOutNullTreatment(winobj);
+
     up = rank_up(winobj);
     context = (rank_context *)
         WinGetPartitionLocalMemory(winobj, sizeof(rank_context));
@@ -412,6 +427,9 @@ window_ntile(PG_FUNCTION_ARGS)
     WindowObject winobj = PG_WINDOW_OBJECT();
     ntile_context *context;
 
+    /* ntile() does not support null treatment */
+    ErrorOutNullTreatment(winobj);
+
     context = (ntile_context *)
         WinGetPartitionLocalMemory(winobj, sizeof(ntile_context));
 
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 073e2469ba..32fbab46a0 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -432,6 +432,7 @@ typedef struct FuncCall
     bool        agg_distinct;    /* arguments were labeled DISTINCT */
     bool        func_variadic;    /* last argument was labeled VARIADIC */
     CoercionForm funcformat;    /* how to display this node */
+    NullTreatment null_treatment;    /* RESPECT_NULLS or IGNORE NULLS */
     int            location;        /* token location, or -1 if unknown */
 } FuncCall;
 
diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h
index 221b5e6218..545b5e5ac8 100644
--- a/src/include/nodes/primnodes.h
+++ b/src/include/nodes/primnodes.h
@@ -532,6 +532,13 @@ typedef struct GroupingFunc
     int            location;
 } GroupingFunc;
 
+typedef enum NullTreatment
+{
+    NULL_TREATMENT_NOT_SET = 0,
+    RESPECT_NULLS,
+    IGNORE_NULLS
+} NullTreatment;
+
 /*
  * WindowFunc
  *
@@ -559,7 +566,8 @@ typedef struct WindowFunc
     bool        winstar pg_node_attr(query_jumble_ignore);
     /* is function a simple aggregate? */
     bool        winagg pg_node_attr(query_jumble_ignore);
-    /* ignore nulls */
+    /* null treatement */
+    NullTreatment null_treatment pg_node_attr(query_jumble_ignore);
     bool        ignore_nulls;
     /* token location, or -1 if unknown */
     int            location;
diff --git a/src/include/windowapi.h b/src/include/windowapi.h
index b8c2c565d1..8a50478ee9 100644
--- a/src/include/windowapi.h
+++ b/src/include/windowapi.h
@@ -61,4 +61,6 @@ extern Datum WinGetFuncArgInFrame(WindowObject winobj, int argno,
 extern Datum WinGetFuncArgCurrent(WindowObject winobj, int argno,
                                   bool *isnull);
 
+extern void    ErrorOutNullTreatment(WindowObject winobj);
+
 #endif                            /* WINDOWAPI_H */

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Oliver Ford
Date:


On Sat, 6 May 2023, 04:57 Tatsuo Ishii, <ishii@sraoss.co.jp> wrote:
Attached is the patch to implement this (on top of your patch).

test=# SELECT row_number() RESPECT NULLS OVER () FROM (SELECT 1) AS s;
ERROR:  window function row_number cannot have RESPECT NULLS or IGNORE NULLS

The last time this was discussed (https://www.postgresql.org/message-id/1037735.1610402426%40sss.pgh.pa.us) it was suggested to make the feature generalizable, beyond what the standard says it should be limited to.

With it generalizable, there would need to be extra checks for custom functions, such as if they allow multiple column arguments (which I'll add in v2 of the patch if the design's accepted).

So I think we need a consensus on whether to stick to limiting it to several specific functions, or making it generalized yet agreeing the rules to limit it (such as no agg functions, and no functions with multiple column arguments).


Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tatsuo Ishii
Date:
> The last time this was discussed (
> https://www.postgresql.org/message-id/1037735.1610402426%40sss.pgh.pa.us)
> it was suggested to make the feature generalizable, beyond what the
> standard says it should be limited to.

I have read the mail. In my understanding nobody said that standard
window functions should all accept the null treatment clause.

Also Tom said:
https://www.postgresql.org/message-id/5567.1537884439%40sss.pgh.pa.us
> The questions of how we interface to the individual window functions
> are really independent of how we handle the parsing problem.  My
> first inclination is to just pass the flags down to the window functions
> (store them in WindowObject and provide some additional inquiry functions
> in windowapi.h) and let them deal with it.

As I said before I totally agree with this. With my patch if a
(custom) window function does not want to accept null treatment
clause, it just calls ErrorOutNullTreatment(). It will raise an error
if IGNORE NULLS or RESPECT NULLS is provided. If it does call the
function, it is up to the function how to deal with the null
treatment. In another word, the infrastructure does not have fixed
rules to allow/disallow null treatment clause for each window
function. It's "delegated" to each window function.

Anyway we can change the rule for other than nth_value etc. later
easily once my patch is brought in.

> With it generalizable, there would need to be extra checks for custom
> functions, such as if they allow multiple column arguments (which I'll add
> in v2 of the patch if the design's accepted).

I am not sure if allowing-multiple-column-arguments patch should be
provided with null-treatment patch.

> So I think we need a consensus on whether to stick to limiting it to
> several specific functions, or making it generalized yet agreeing the rules
> to limit it (such as no agg functions, and no functions with multiple
> column arguments).

Let's see the discussion...

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Vik Fearing
Date:
On 9/7/24 22:25, Oliver Ford wrote:
> On Sat, May 6, 2023 at 9:41 AM Oliver Ford <ojford@gmail.com> wrote:
>>
>>
>>
>> On Sat, 6 May 2023, 04:57 Tatsuo Ishii, <ishii@sraoss.co.jp> wrote:
>>>
>>> Attached is the patch to implement this (on top of your patch).
>>>
>>> test=# SELECT row_number() RESPECT NULLS OVER () FROM (SELECT 1) AS s;
>>> ERROR:  window function row_number cannot have RESPECT NULLS or IGNORE NULLS
>>
>>
>> The last time this was discussed (https://www.postgresql.org/message-id/1037735.1610402426%40sss.pgh.pa.us) it was
suggestedto make the feature generalizable, beyond what the standard says it should be limited to.
 
>>
>> With it generalizable, there would need to be extra checks for custom functions, such as if they allow multiple
columnarguments (which I'll add in v2 of the patch if the design's accepted).
 
>>
>> So I think we need a consensus on whether to stick to limiting it to several specific functions, or making it
generalizedyet agreeing the rules to limit it (such as no agg functions, and no functions with multiple column
arguments).
> 
> Reviving this thread, I've attached a rebased patch with code, docs,
> and tests and added it to November commitfest.

Excellent!  One of these days we'll get this in. :-)

I have a problem with this test, though:

     SELECT sum(orbit) RESPECT NULLS OVER () FROM planets; -- succeeds

Why should that succeed?  Especially since aggregates such as SUM() will 
ignore nulls!  The error message on its partner seems to confirm this:

     SELECT sum(orbit) IGNORE NULLS OVER () FROM planets; -- fails
     ERROR:  aggregate functions do not accept RESPECT/IGNORE NULLS

I believe they should both fail.
-- 
Vik Fearing




Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tatsuo Ishii
Date:
>> >> On Sat, 6 May 2023, 04:57 Tatsuo Ishii, <ishii@sraoss.co.jp> wrote:
>> >>>
>> >>> Attached is the patch to implement this (on top of your patch).
>> >>>
>> >>> test=# SELECT row_number() RESPECT NULLS OVER () FROM (SELECT 1) AS s;
>> >>> ERROR:  window function row_number cannot have RESPECT NULLS or IGNORE NULLS
>> >>
>> >>
>> >> The last time this was discussed (https://www.postgresql.org/message-id/1037735.1610402426%40sss.pgh.pa.us) it
wassuggested to make the feature generalizable, beyond what the standard says it should be limited to.
 
>> >>
>> >> With it generalizable, there would need to be extra checks for custom functions, such as if they allow multiple
columnarguments (which I'll add in v2 of the patch if the design's accepted).
 
>> >>
>> >> So I think we need a consensus on whether to stick to limiting it to several specific functions, or making it
generalizedyet agreeing the rules to limit it (such as no agg functions, and no functions with multiple column
arguments).

It seems you allow to use IGNORE NULLS for all window functions. If
the case, you should explicitely stat that in the docs. Otherwise
users will be confused because;

1) The SQL standard says IGNORE NULLS only for lead, lag, first_value,
last_value and nth_value.

2) Some window function returns same rows with IGNORE NULLS/RESPECT
NULLS. Consider following case.

test=# create table t1(i int);
CREATE TABLE
test=# insert into t1 values(NULL),(NULL);
INSERT 0 2
test=# select * from t1;
 i 
---
  
  
(2 rows)

test=# SELECT row_number() IGNORE NULLS OVER w FROM t1 WINDOW w AS (ORDER BY i);
 row_number 
------------
          1
          2
(2 rows)

The t1 table only contains NULL rows. By using IGNORE NULLS, I think
it's no wonder that a user expects 0 rows returned, if there's no
mention in the docs that actually IGNORE NULLS/RESPECT NULLS are just
ignored in some window functions.

Instead I think it's better that other than lead, lag, first_value,
last_value and nth_value each window function errors out if IGNORE
NULLS/RESPECT NULL are passed to these window functions.

I take a look at the patch and noticed that following functions have
no comments on what they are doing and what are the arguments. Please
look into other functions in nodeWindowAgg.c and add appropriate
comments to those functions.

+static void increment_notnulls(WindowObject winobj, int64 pos)

+static Datum ignorenulls_getfuncarginpartition(WindowObject winobj, int argno,
+                        int relpos, int seektype, bool set_mark, bool *isnull, bool *isout) {

+static Datum ignorenulls_getfuncarginframe(WindowObject winobj, int argno,
+                        int relpos, int seektype, bool set_mark, bool *isnull, bool *isout) {

Also the coding style does not fit into our coding standard. They should be written something like:

static void
increment_notnulls(WindowObject winobj, int64 pos)

static Datum
ignorenulls_getfuncarginpartition(WindowObject winobj, int argno,
                        int relpos, int seektype, bool set_mark, bool *isnull, bool *isout)
{

static Datum
ignorenulls_getfuncarginframe(WindowObject winobj, int argno,
                        int relpos, int seektype, bool set_mark, bool *isnull, bool *isout)
{

See also:
https://www.postgresql.org/docs/current/source-format.html

+    int            ignore_nulls;    /* ignore nulls */

You should add more comment here. I.e. what values are possible for
ignore_nulls.

I also notice that you have an array in memory which records non-null
row positions in a partition. The position is represented in int64,
which means 1 entry consumes 8 bytes. If my understanding is correct,
the array continues to grow up to the partition size. Also the array
is created for each window function (is it really necessary?). I worry
about this because it might consume excessive memory for big
partitions.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp



Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
"David G. Johnston"
Date:
On Wednesday, September 11, 2024, Tatsuo Ishii <ishii@postgresql.org> wrote:

test=# SELECT row_number() IGNORE NULLS OVER w FROM t1 WINDOW w AS (ORDER BY i);
 row_number
------------
          1
          2
(2 rows)

The t1 table only contains NULL rows. By using IGNORE NULLS, I think
it's no wonder that a user expects 0 rows returned, if there's no
mention in the docs that actually IGNORE NULLS/RESPECT NULLS are just
ignored in some window functions.

My nieve understanding of the nulls treatment is computations are affected, therefore a zero-argument function is incapable of abiding by this clause (it should error…).  Your claim that this should somehow produce zero rows confuses me on two fronts.  One, window function should be incapable of affecting how many rows are returned.  The query must output two rows regardless of the result of the window expression (it should at worse produce the null value).  Two, to produce said null value you have to be ignoring the row due to the order by clause seeing a null.  But the order by isn’t part of the computation.

David J.

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

From
Tatsuo Ishii
Date:
> On Wednesday, September 11, 2024, Tatsuo Ishii <ishii@postgresql.org> wrote:
> 
>>
>> test=# SELECT row_number() IGNORE NULLS OVER w FROM t1 WINDOW w AS (ORDER
>> BY i);
>>  row_number
>> ------------
>>           1
>>           2
>> (2 rows)
>>
>> The t1 table only contains NULL rows. By using IGNORE NULLS, I think
>> it's no wonder that a user expects 0 rows returned, if there's no
>> mention in the docs that actually IGNORE NULLS/RESPECT NULLS are just
>> ignored in some window functions.
>>
> 
> My nieve understanding of the nulls treatment is computations are affected,
> therefore a zero-argument function is incapable of abiding by this clause
> (it should error…).

Yes. I actually claimed that row_number() should error out if the
clause is provided.

> Instead I think it's better that other than lead, lag, first_value,
> last_value and nth_value each window function errors out if IGNORE
> NULLS/RESPECT NULL are passed to these window functions.

> Your claim that this should somehow produce zero rows
> confuses me on two fronts.  One, window function should be incapable of
> affecting how many rows are returned.  The query must output two rows
> regardless of the result of the window expression (it should at worse
> produce the null value).  Two, to produce said null value you have to be
> ignoring the row due to the order by clause seeing a null.  But the order
> by isn’t part of the computation.

Well I did not claim that. I just gave a possible example what users
could misunderstand. Probably my example was not so good.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp