Thread: Compatible defaults for LEAD/LAG

Compatible defaults for LEAD/LAG

From

Vik Fearing

Date:

31 May 2020, 17:20:10

I noticed that the PostgreSQL entry in a pan-database feature matrix by
Modern SQL was not reflecting the reality of our features.[1]

It turns out that test case used by the author produced an error which
the tool took to mean the feature was not implemented.  I don't have the
actual test, but here is a simulation of it:


  postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
  postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
  postgres-# ORDER BY n;

  ERROR:  function lag(numeric, integer, integer) does not exist
  LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
               ^
  HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.


Attached is a patch that fixes this issue using the new anycompatible
pseudotype.  I am hoping this can be slipped into 13 even though it
requires a catversion bump after BETA1.

I looked for other functions with a similar issue but didn't find any.


[1] https://twitter.com/pg_xocolatl/status/1266694496194093057
-- 
Vik Fearing

Attachment

lead_lag_compatible_default.diff

Re: Compatible defaults for LEAD/LAG

From

Tom Lane

Date:

31 May 2020, 19:53:21

Vik Fearing <vik@postgresfriends.org> writes:
>   postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
>   postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
>   postgres-# ORDER BY n;
>   ERROR:  function lag(numeric, integer, integer) does not exist
>   LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
>                ^

Yeah, we have similar issues elsewhere.

> Attached is a patch that fixes this issue using the new anycompatible
> pseudotype.  I am hoping this can be slipped into 13 even though it
> requires a catversion bump after BETA1.

When the anycompatible patch went in, I thought for a little bit about
trying to use it with existing built-in functions, but didn't have the
time to investigate the issue in detail.  I'm not in favor of hacking
things one-function-at-a-time here; we should look through the whole
library and see what we've got.

The main thing that makes this perhaps-not-trivial is that an
anycompatible-ified function will match more cases than it would have
before, possibly causing conflicts if the function or operator name
is overloaded.  We'd have to look at such cases and decide what we
want to do --- one answer would be to drop some of the alternatives
and rely on the parser to add casts, but that might slow things down.

Anyway, I agree that this is a good direction to pursue, but not in
a last-minute-hack-for-v13 way.

            regards, tom lane

Re: Compatible defaults for LEAD/LAG

From

Vik Fearing

Date:

31 May 2020, 20:02:29

On 5/31/20 9:53 PM, Tom Lane wrote:
> Vik Fearing <vik@postgresfriends.org> writes:
>>   postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
>>   postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
>>   postgres-# ORDER BY n;
>>   ERROR:  function lag(numeric, integer, integer) does not exist
>>   LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
>>                ^
> 
> Yeah, we have similar issues elsewhere.
> 
>> Attached is a patch that fixes this issue using the new anycompatible
>> pseudotype.  I am hoping this can be slipped into 13 even though it
>> requires a catversion bump after BETA1.
> 
> When the anycompatible patch went in, I thought for a little bit about
> trying to use it with existing built-in functions, but didn't have the
> time to investigate the issue in detail.  I'm not in favor of hacking
> things one-function-at-a-time here; we should look through the whole
> library and see what we've got.
> 
> The main thing that makes this perhaps-not-trivial is that an
> anycompatible-ified function will match more cases than it would have
> before, possibly causing conflicts if the function or operator name
> is overloaded.  We'd have to look at such cases and decide what we
> want to do --- one answer would be to drop some of the alternatives
> and rely on the parser to add casts, but that might slow things down.
> 
> Anyway, I agree that this is a good direction to pursue, but not in
> a last-minute-hack-for-v13 way.

Fair enough.  I put it in the commitfest app for version 14.
https://commitfest.postgresql.org/28/2574/
-- 
Vik Fearing

Re: Compatible defaults for LEAD/LAG

From

Vik Fearing

Date:

31 May 2020, 23:30:21

On 5/31/20 9:53 PM, Tom Lane wrote:
> When the anycompatible patch went in, I thought for a little bit about
> trying to use it with existing built-in functions, but didn't have the
> time to investigate the issue in detail.  I'm not in favor of hacking
> things one-function-at-a-time here; we should look through the whole
> library and see what we've got.

BTW, I did go through pg_proc.dat to see what we've got and these were
the only two I found.  I mentioned that in a part you didn't quote.  Now
I went through again, this time using a query on pg_proc itself, and I
missed array_replace during my manual scan.

array_replace, lead, and lag are the only functions we have that take
more than one anyelement.

There are many functions (seemingly all operator implementations) that
take multiple anyarray, anyrange, and anyenum; but none with anynonarray
and only those three for anyelement.  Are you sure we don't want to give
at least the anycompatible type a nice public workout with this?
-- 
Vik Fearing

Re: Compatible defaults for LEAD/LAG

From

Tom Lane

Date:

01 June 2020, 02:07:29

Vik Fearing <vik@postgresfriends.org> writes:
> On 5/31/20 9:53 PM, Tom Lane wrote:
>> When the anycompatible patch went in, I thought for a little bit about
>> trying to use it with existing built-in functions, but didn't have the
>> time to investigate the issue in detail.  I'm not in favor of hacking
>> things one-function-at-a-time here; we should look through the whole
>> library and see what we've got.

> array_replace, lead, and lag are the only functions we have that take
> more than one anyelement.

That's just the tip of the iceberg, though.  If you consider all the
old-style polymorphic types, we have

                     proc                      
-----------------------------------------------
 array_append(anyarray,anyelement)
 array_cat(anyarray,anyarray)
 array_eq(anyarray,anyarray)
 array_ge(anyarray,anyarray)
 array_gt(anyarray,anyarray)
 array_larger(anyarray,anyarray)
 array_le(anyarray,anyarray)
 array_lt(anyarray,anyarray)
 array_ne(anyarray,anyarray)
 array_position(anyarray,anyelement)
 array_position(anyarray,anyelement,integer)
 array_positions(anyarray,anyelement)
 array_prepend(anyelement,anyarray)
 array_remove(anyarray,anyelement)
 array_replace(anyarray,anyelement,anyelement)
 array_smaller(anyarray,anyarray)
 arraycontained(anyarray,anyarray)
 arraycontains(anyarray,anyarray)
 arrayoverlap(anyarray,anyarray)
 btarraycmp(anyarray,anyarray)
 elem_contained_by_range(anyelement,anyrange)
 lag(anyelement,integer,anyelement)
 lead(anyelement,integer,anyelement)
 range_adjacent(anyrange,anyrange)
 range_after(anyrange,anyrange)
 range_before(anyrange,anyrange)
 range_cmp(anyrange,anyrange)
 range_contained_by(anyrange,anyrange)
 range_contains(anyrange,anyrange)
 range_contains_elem(anyrange,anyelement)
 range_eq(anyrange,anyrange)
 range_ge(anyrange,anyrange)
 range_gist_same(anyrange,anyrange,internal)
 range_gt(anyrange,anyrange)
 range_intersect(anyrange,anyrange)
 range_le(anyrange,anyrange)
 range_lt(anyrange,anyrange)
 range_merge(anyrange,anyrange)
 range_minus(anyrange,anyrange)
 range_ne(anyrange,anyrange)
 range_overlaps(anyrange,anyrange)
 range_overleft(anyrange,anyrange)
 range_overright(anyrange,anyrange)
 range_union(anyrange,anyrange)
 width_bucket(anyelement,anyarray)
(45 rows)

(I ignored anyenum here, since it has no mapping to the anycompatible
family.)  Some of these are internal or can otherwise be discounted,
but surely there'd be a market for, say, "int8[] || int4".  The big
question that raises is can we do it without creating impossible confusion
with text-style concatenation.

> Are you sure we don't want to give
> at least the anycompatible type a nice public workout with this?

Yes, I'm quite sure.  If the idea crashes and burns for some reason,
we'll be glad we didn't buy into it full-speed-ahead right away.

            regards, tom lane

Re: Compatible defaults for LEAD/LAG

From

Pavel Stehule

Date:

01 June 2020, 03:35:59

po 1. 6. 2020 v 4:07 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Vik Fearing <vik@postgresfriends.org> writes:
> On 5/31/20 9:53 PM, Tom Lane wrote:
>> When the anycompatible patch went in, I thought for a little bit about
>> trying to use it with existing built-in functions, but didn't have the
>> time to investigate the issue in detail. I'm not in favor of hacking
>> things one-function-at-a-time here; we should look through the whole
>> library and see what we've got.

> array_replace, lead, and lag are the only functions we have that take
> more than one anyelement.

That's just the tip of the iceberg, though. If you consider all the
old-style polymorphic types, we have

proc
-----------------------------------------------
array_append(anyarray,anyelement)
array_cat(anyarray,anyarray)
array_eq(anyarray,anyarray)
array_ge(anyarray,anyarray)
array_gt(anyarray,anyarray)
array_larger(anyarray,anyarray)
array_le(anyarray,anyarray)
array_lt(anyarray,anyarray)
array_ne(anyarray,anyarray)
array_position(anyarray,anyelement)
array_position(anyarray,anyelement,integer)
array_positions(anyarray,anyelement)
array_prepend(anyelement,anyarray)
array_remove(anyarray,anyelement)
array_replace(anyarray,anyelement,anyelement)
array_smaller(anyarray,anyarray)
arraycontained(anyarray,anyarray)
arraycontains(anyarray,anyarray)
arrayoverlap(anyarray,anyarray)
btarraycmp(anyarray,anyarray)

I am not sure, if using anycompatible for buildin's array functions can be good idea. Theoretically a users can do new performance errors due hidden cast of a longer array.

For example array_positions(int[], numeric). In this case conversion int[] to numeric[] can be bad idea in some cases. Reversely in this case we want to convert numeric to int. When it is not possible without loss, then we can return false, or maybe raise exception. I designed anycompatible* for usage when the parameters has "symmetric weight", and cast to most common type should not to have performance issue. It is not this case. When I though about this cases, and about designing functions compatible with Oracle I though about another generic family (families) with different behave (specified by suffix or maybe with typemod or with some syntax):

a) force_cast .. it can be good for array's functions - array_position(anyarray, anyelement_force_cast), array_position(anyarray, anyelement(force_cast)) .. this is often behave in Oracle

b) force_safe_cast .. special kind of casting - safer variant of tolerant Oracle's casting - 1.0::int is valid, 1.1::int is not valid in this type of casting

anycompatible* family can helps with some cases, but it is not silver bullet for all unfriendly, or not compatible situation (mainly for buildin functionality).

Regards

Pavel

elem_contained_by_range(anyelement,anyrange)
lag(anyelement,integer,anyelement)
lead(anyelement,integer,anyelement)
range_adjacent(anyrange,anyrange)
range_after(anyrange,anyrange)
range_before(anyrange,anyrange)
range_cmp(anyrange,anyrange)
range_contained_by(anyrange,anyrange)
range_contains(anyrange,anyrange)
range_contains_elem(anyrange,anyelement)
range_eq(anyrange,anyrange)
range_ge(anyrange,anyrange)
range_gist_same(anyrange,anyrange,internal)
range_gt(anyrange,anyrange)
range_intersect(anyrange,anyrange)
range_le(anyrange,anyrange)
range_lt(anyrange,anyrange)
range_merge(anyrange,anyrange)
range_minus(anyrange,anyrange)
range_ne(anyrange,anyrange)
range_overlaps(anyrange,anyrange)
range_overleft(anyrange,anyrange)
range_overright(anyrange,anyrange)
range_union(anyrange,anyrange)
width_bucket(anyelement,anyarray)
(45 rows)

(I ignored anyenum here, since it has no mapping to the anycompatible
family.) Some of these are internal or can otherwise be discounted,
but surely there'd be a market for, say, "int8[] || int4". The big
question that raises is can we do it without creating impossible confusion
with text-style concatenation.

> Are you sure we don't want to give
> at least the anycompatible type a nice public workout with this?

Yes, I'm quite sure. If the idea crashes and burns for some reason,
we'll be glad we didn't buy into it full-speed-ahead right away.

regards, tom lane

Re: Compatible defaults for LEAD/LAG

From

Tom Lane

Date:

01 June 2020, 15:36:44

Pavel Stehule <pavel.stehule@gmail.com> writes:
> po 1. 6. 2020 v 4:07 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> That's just the tip of the iceberg, though.  If you consider all the
>> old-style polymorphic types, we have [for example]
>> array_append(anyarray,anyelement)

> I am not sure, if using anycompatible for buildin's array functions can be
> good idea. Theoretically a users can do new performance errors due hidden
> cast of a longer array.

I don't buy that argument.  If the query requires casting int4[] to
int8[], making the user do it by hand isn't going to improve performance
over having the parser insert the coercion automatically.  Sure, there
will be some fraction of queries that could be rewritten to avoid the
need for any cast, but so what?  Often the performance difference isn't
going to matter; and when it does, I don't see that this is any different
from hundreds of other cases in which there are more-efficient and
less-efficient ways to write a query.  SQL is full of performance traps
and always will be.  You're also assuming that when the user gets an
"operator does not exist" error from "int4[] || int8", that will magically
lead them to choosing an optimal substitute.  I know of no reason to
believe that --- it's at least as likely that they'll conclude it just
can't be done, as in the LAG() example we started the thread with.  So
I think most people would be much happier if the system just did something
reasonable, and they can optimize later if it's important.

> When I
> though about this cases, and about designing functions compatible with
> Oracle I though about another generic family (families) with different
> behave (specified by suffix or maybe with typemod or with some syntax):

So we're already deciding anycompatible can't get the job done?  Maybe
it's a good thing we had this conversation now.  It's not too late to
rip the feature out of v13 altogether, and try again later.  But if
you think I'm going to commit yet another variant of polymorphism on
top of this one, you're mistaken.

            regards, tom lane

Re: Compatible defaults for LEAD/LAG

From

Pavel Stehule

Date:

01 June 2020, 16:26:47

po 1. 6. 2020 v 17:36 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> po 1. 6. 2020 v 4:07 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> That's just the tip of the iceberg, though. If you consider all the
>> old-style polymorphic types, we have [for example]
>> array_append(anyarray,anyelement)

> I am not sure, if using anycompatible for buildin's array functions can be
> good idea. Theoretically a users can do new performance errors due hidden
> cast of a longer array.

I don't buy that argument. If the query requires casting int4[] to
int8[], making the user do it by hand isn't going to improve performance
over having the parser insert the coercion automatically. Sure, there
will be some fraction of queries that could be rewritten to avoid the
need for any cast, but so what? Often the performance difference isn't
going to matter; and when it does, I don't see that this is any different
from hundreds of other cases in which there are more-efficient and
less-efficient ways to write a query. SQL is full of performance traps
and always will be. You're also assuming that when the user gets an
"operator does not exist" error from "int4[] || int8", that will magically
lead them to choosing an optimal substitute. I know of no reason to
believe that --- it's at least as likely that they'll conclude it just
can't be done, as in the LAG() example we started the thread with. So
I think most people would be much happier if the system just did something
reasonable, and they can optimize later if it's important.

> When I
> though about this cases, and about designing functions compatible with
> Oracle I though about another generic family (families) with different
> behave (specified by suffix or maybe with typemod or with some syntax):

So we're already deciding anycompatible can't get the job done? Maybe
it's a good thing we had this conversation now. It's not too late to
rip the feature out of v13 altogether, and try again later. But if
you think I'm going to commit yet another variant of polymorphism on
top of this one, you're mistaken.

anycompatible types are fully conssistent with Postgres buildin functions supported by parser. It is main benefit and important benefit.

Without anycompatible types is pretty hard to write variadic functions with friendly behave like buildin functions has.

It can be perfect for LAG() example. It does almost same work what we did manually in parser.

Surely, it is not compatible with Oracle's polymorphism rules, because

a) Our and Postgres type system is different (sometimes very different).

b) Oracle's casting rules depends on argument positions and some specific exceptions - so it is not possible to map it on Postgres type system (or system of polymorphic types).

I wrote and I spent lot of time on this feature to be used - by core developers, by extension's developers. Like lot of other feature - it can carry more comfort with some risk of performance issues.

On second hand if we use this feature for buildin functions, there is zero impact of current applications - there should not be any problem with compatibility or performance.

Maybe I am too old, and last year was too terrible so I have a problem to imagine a "intelligent" user now :)

Regards

Pavel

Although I can imagine other enhancing of polymorphic types, I don't propose any new, and I don't expect any enhancing in near feature. Is true, so there are not requests and I think so "anycompatible" and "any" are more than good enough for 99.99% developers.

I am little bit surprised so semi compatibility mode implemented in Orafce is good enough and full compatibility with Oracle a) isn't possible, b) isn't requested by visible group of users (or users who need it are satisfied by EDB).

regards, tom lane

Re: Compatible defaults for LEAD/LAG

From

Pavel Stehule

Date:

13 July 2020, 17:23:57

ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing <vik@postgresfriends.org> napsal:

On 5/31/20 9:53 PM, Tom Lane wrote:
> Vik Fearing <vik@postgresfriends.org> writes:
>> postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
>> postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
>> postgres-# ORDER BY n;
>> ERROR: function lag(numeric, integer, integer) does not exist
>> LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
>> ^
>
> Yeah, we have similar issues elsewhere.
>
>> Attached is a patch that fixes this issue using the new anycompatible
>> pseudotype. I am hoping this can be slipped into 13 even though it
>> requires a catversion bump after BETA1.
>
> When the anycompatible patch went in, I thought for a little bit about
> trying to use it with existing built-in functions, but didn't have the
> time to investigate the issue in detail. I'm not in favor of hacking
> things one-function-at-a-time here; we should look through the whole
> library and see what we've got.
>
> The main thing that makes this perhaps-not-trivial is that an
> anycompatible-ified function will match more cases than it would have
> before, possibly causing conflicts if the function or operator name
> is overloaded. We'd have to look at such cases and decide what we
> want to do --- one answer would be to drop some of the alternatives
> and rely on the parser to add casts, but that might slow things down.
>
> Anyway, I agree that this is a good direction to pursue, but not in
> a last-minute-hack-for-v13 way.

Fair enough. I put it in the commitfest app for version 14.
https://commitfest.postgresql.org/28/2574/
--
Vik Fearing

The patch is ok. It is almost trivial. It solves one issue, but maybe it introduces a new issue.

Other databases try to coerce default constant expression to value expression. I found a description about this behaviour for MSSQL, Sybase, BigQuery.

I didn't find related documentation for Oracle, and I have not a access to some running instance of Oracle to test it.

Ansi SQL say - type of default expression should be compatible with lag expression, and declared type of result should be type of value expression.

IWD 9075-2:201?(E) 6.10 <window function> - j) v)

Current implementation is limited (and the behaviour is not user friendly in all details), but new behaviour (implemented by patch) is in half cases out of standard :(.

These cases are probably not often - and they are really corner cases, but I cannot to qualify how much important this fact is.

For users, the implemented feature is better, and it is safe. Implementation is trivial - is significantly simpler than implementation that is 100% standard, although it should not be a hard problem too (in analyze stage it probably needs a few lines too).

There has to be a decision, because now we can go in both directions. After choosing there will not be a way back.

Regards

Pavel

Re: Compatible defaults for LEAD/LAG

From

Daniel Gustafsson

Date:

23 July 2020, 11:29:42

> On 13 Jul 2020, at 19:23, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> napsal:
> On 5/31/20 9:53 PM, Tom Lane wrote:
> > Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> writes:
> >>   postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >>   postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
> >>   postgres-# ORDER BY n;
> >>   ERROR:  function lag(numeric, integer, integer) does not exist
> >>   LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >>                ^
> >
> > Yeah, we have similar issues elsewhere.
> >
> >> Attached is a patch that fixes this issue using the new anycompatible
> >> pseudotype.  I am hoping this can be slipped into 13 even though it
> >> requires a catversion bump after BETA1.
> >
> > When the anycompatible patch went in, I thought for a little bit about
> > trying to use it with existing built-in functions, but didn't have the
> > time to investigate the issue in detail.  I'm not in favor of hacking
> > things one-function-at-a-time here; we should look through the whole
> > library and see what we've got.
> >
> > The main thing that makes this perhaps-not-trivial is that an
> > anycompatible-ified function will match more cases than it would have
> > before, possibly causing conflicts if the function or operator name
> > is overloaded.  We'd have to look at such cases and decide what we
> > want to do --- one answer would be to drop some of the alternatives
> > and rely on the parser to add casts, but that might slow things down.
> >
> > Anyway, I agree that this is a good direction to pursue, but not in
> > a last-minute-hack-for-v13 way.
>
> Fair enough.  I put it in the commitfest app for version 14.
> https://commitfest.postgresql.org/28/2574/ <https://commitfest.postgresql.org/28/2574/>
> --
> Vik Fearing
>
> The patch is ok. It is almost trivial. It solves one issue, but maybe it introduces a new issue.
>
> Other databases try to coerce default constant expression to value expression. I found a description about this
behaviourfor MSSQL, Sybase, BigQuery. 
>
> I didn't find related documentation for Oracle, and I have not a access to some running instance of Oracle to test
it.
>
> Ansi SQL say - type of default expression should be compatible with lag expression, and declared type of result
shouldbe type of value expression. 
>
> IWD 9075-2:201?(E) 6.10 <window function> - j) v)
>
> Current implementation is limited (and the behaviour is not user friendly in all details), but new behaviour
(implementedby patch) is in half cases out of standard :(. 
>
> These cases are probably not often - and they are really corner cases, but I cannot to qualify how much important
thisfact is. 
>
> For users, the implemented feature is better, and it is safe. Implementation is trivial - is significantly simpler
thanimplementation that is 100% standard, although it should not be a hard problem too (in analyze stage it probably
needsa few lines too). 
>
> There has to be a decision, because now we can go in both directions. After choosing there will not be a way back.

This patch is marked waiting for author, but it's not clear from this review
what we're waiting on.  Should this be moved to Needs review or Ready for
Committer instead to solicit the input from a committer?  ISTM the latter is
more suitable. Or did I misunderstand your review?

cheers ./daniel

Re: Compatible defaults for LEAD/LAG

From

Pavel Stehule

Date:

23 July 2020, 11:55:19

čt 23. 7. 2020 v 13:29 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:

> On 13 Jul 2020, at 19:23, Pavel Stehule <pavel.stehule@gmail.com> wrote:

> ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> napsal:
> On 5/31/20 9:53 PM, Tom Lane wrote:
> > Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> writes:
> >> postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >> postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
> >> postgres-# ORDER BY n;
> >> ERROR: function lag(numeric, integer, integer) does not exist
> >> LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >> ^
> >
> > Yeah, we have similar issues elsewhere.
> >
> >> Attached is a patch that fixes this issue using the new anycompatible
> >> pseudotype. I am hoping this can be slipped into 13 even though it
> >> requires a catversion bump after BETA1.
> >
> > When the anycompatible patch went in, I thought for a little bit about
> > trying to use it with existing built-in functions, but didn't have the
> > time to investigate the issue in detail. I'm not in favor of hacking
> > things one-function-at-a-time here; we should look through the whole
> > library and see what we've got.
> >
> > The main thing that makes this perhaps-not-trivial is that an
> > anycompatible-ified function will match more cases than it would have
> > before, possibly causing conflicts if the function or operator name
> > is overloaded. We'd have to look at such cases and decide what we
> > want to do --- one answer would be to drop some of the alternatives
> > and rely on the parser to add casts, but that might slow things down.
> >
> > Anyway, I agree that this is a good direction to pursue, but not in
> > a last-minute-hack-for-v13 way.
>
> Fair enough. I put it in the commitfest app for version 14.
> https://commitfest.postgresql.org/28/2574/ <https://commitfest.postgresql.org/28/2574/>
> --
> Vik Fearing
>
> The patch is ok. It is almost trivial. It solves one issue, but maybe it introduces a new issue.
>
> Other databases try to coerce default constant expression to value expression. I found a description about this behaviour for MSSQL, Sybase, BigQuery.
>
> I didn't find related documentation for Oracle, and I have not a access to some running instance of Oracle to test it.
>
> Ansi SQL say - type of default expression should be compatible with lag expression, and declared type of result should be type of value expression.
>
> IWD 9075-2:201?(E) 6.10 <window function> - j) v)
>
> Current implementation is limited (and the behaviour is not user friendly in all details), but new behaviour (implemented by patch) is in half cases out of standard :(.
>
> These cases are probably not often - and they are really corner cases, but I cannot to qualify how much important this fact is.
>
> For users, the implemented feature is better, and it is safe. Implementation is trivial - is significantly simpler than implementation that is 100% standard, although it should not be a hard problem too (in analyze stage it probably needs a few lines too).
>
> There has to be a decision, because now we can go in both directions. After choosing there will not be a way back.

This patch is marked waiting for author, but it's not clear from this review
what we're waiting on. Should this be moved to Needs review or Ready for
Committer instead to solicit the input from a committer? ISTM the latter is
more suitable. Or did I misunderstand your review?

Unfortunately, I don't know - I am not sure about committing this patch, and I am not able to qualify for impact.

This is nice example of usage of anycompatible type (that is consistent with other things in Postgres), but standard says something else.

It can be easily solved with https://commitfest.postgresql.org/28/2081/ - but Tom doesn't like this patch.

I am more inclined to think so this feature should be implemented differently - there is no strong reason to go against standard or against the implementations of other databases (and increase the costs of porting). Now the implementation is limited, but allowed behaviour is 100% ANSI.

On second hand, the implemented feature (behaviour) is pretty small, and really corner case, so maybe a simple solution (although it isn't perfect) is good.

So I would listen to the opinions of other people.

Regards

Pavel

cheers ./daniel

Re: Compatible defaults for LEAD/LAG

From

Tom Lane

Date:

30 August 2020, 21:59:39

Pavel Stehule <pavel.stehule@gmail.com> writes:
> This is nice example of usage of anycompatible type (that is consistent
> with other things in Postgres), but standard says something else.
> It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
> but Tom doesn't like this patch.
> I am more inclined to think so this feature should be implemented
> differently - there is no strong reason to go against standard or against
> the implementations of other databases (and increase the costs of porting).
> Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument.  The case at hand is what to do
if we have, say,

    select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2".  The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work.  But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise.  It is not normally the case that
any information-losing cast would be applied silently within an
expression.

So this deviation from spec doesn't bother me; we have much bigger ones.

My concern with this patch is what I said upthread: I'm not sure that
we should be adjusting this behavior in a piecemeal fashion.  I looked
through pg_proc to find all the functions that have more than one any*
argument, and found these:

                      oid                      | prorettype 
-----------------------------------------------+------------
 lag(anyelement,integer,anyelement)            | anyelement
 lead(anyelement,integer,anyelement)           | anyelement
 width_bucket(anyelement,anyarray)             | integer
 btarraycmp(anyarray,anyarray)                 | integer
 array_eq(anyarray,anyarray)                   | boolean
 array_ne(anyarray,anyarray)                   | boolean
 array_lt(anyarray,anyarray)                   | boolean
 array_gt(anyarray,anyarray)                   | boolean
 array_le(anyarray,anyarray)                   | boolean
 array_ge(anyarray,anyarray)                   | boolean
 array_append(anyarray,anyelement)             | anyarray
 array_prepend(anyelement,anyarray)            | anyarray
 array_cat(anyarray,anyarray)                  | anyarray
 array_larger(anyarray,anyarray)               | anyarray
 array_smaller(anyarray,anyarray)              | anyarray
 array_position(anyarray,anyelement)           | integer
 array_position(anyarray,anyelement,integer)   | integer
 array_positions(anyarray,anyelement)          | integer[]
 array_remove(anyarray,anyelement)             | anyarray
 array_replace(anyarray,anyelement,anyelement) | anyarray
 arrayoverlap(anyarray,anyarray)               | boolean
 arraycontains(anyarray,anyarray)              | boolean
 arraycontained(anyarray,anyarray)             | boolean
 elem_contained_by_range(anyelement,anyrange)  | boolean
 range_contains_elem(anyrange,anyelement)      | boolean
 range_eq(anyrange,anyrange)                   | boolean
 range_ne(anyrange,anyrange)                   | boolean
 range_overlaps(anyrange,anyrange)             | boolean
 range_contains(anyrange,anyrange)             | boolean
 range_contained_by(anyrange,anyrange)         | boolean
 range_adjacent(anyrange,anyrange)             | boolean
 range_before(anyrange,anyrange)               | boolean
 range_after(anyrange,anyrange)                | boolean
 range_overleft(anyrange,anyrange)             | boolean
 range_overright(anyrange,anyrange)            | boolean
 range_union(anyrange,anyrange)                | anyrange
 range_merge(anyrange,anyrange)                | anyrange
 range_intersect(anyrange,anyrange)            | anyrange
 range_minus(anyrange,anyrange)                | anyrange
 range_cmp(anyrange,anyrange)                  | integer
 range_lt(anyrange,anyrange)                   | boolean
 range_le(anyrange,anyrange)                   | boolean
 range_ge(anyrange,anyrange)                   | boolean
 range_gt(anyrange,anyrange)                   | boolean
 range_gist_same(anyrange,anyrange,internal)   | internal
(45 rows)

Now, there's no point in changing range_eq and the later entries in this
table (the ones taking two anyrange's); our rather lame definition of
anycompatiblerange means that we'd get no benefit from doing so.  But
I think there's a strong case for changing everything before range_eq.
It'd be nice if something like

SELECT array[1] < array[1.1];

would work the same way that "SELECT 1 < 1.1" would.

I checked the other concern that I had about whether the more flexible
polymorphic definitions would create any new ambiguities, and I don't
see any problems with this list.  As functions, none of these names are
overloaded, except with different numbers of arguments so there's no
ambiguity.  Most of the array functions are also operators, but the
competing operators do not take arrays, so it doesn't look like there's
any issue on that side either.

So I think we should be more ambitious and generalize all of these
to use anycompatiblearray etc.

            regards, tom lane

Re: Compatible defaults for LEAD/LAG

From

Pavel Stehule

Date:

31 August 2020, 05:05:41

ne 30. 8. 2020 v 23:59 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> This is nice example of usage of anycompatible type (that is consistent
> with other things in Postgres), but standard says something else.
> It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
> but Tom doesn't like this patch.
> I am more inclined to think so this feature should be implemented
> differently - there is no strong reason to go against standard or against
> the implementations of other databases (and increase the costs of porting).
> Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument. The case at hand is what to do
if we have, say,

select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2". The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work. But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise. It is not normally the case that
any information-losing cast would be applied silently within an
expression.

postgres=# create table foo(a int);
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1

postgres=# create table foo(a int default 1.1);
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)

It is maybe strange, but it is not an unusual pattern in SQL. I think it can be analogy with default clause

DECLARE a int DEFAULT 1.2;

The default value doesn't change a type of variable. This is maybe too stupid example. There can be other little bit more realistic

CREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...

DECLARE x int DEFAULT a;

BEGIN

...

I am afraid about performance - if default value can change type, then some other things can stop work - like using index.

For *this* case we don't speak about some operations between two independent operands or function arguments. We are speaking about

the value and about a *default* for the value.

So this deviation from spec doesn't bother me; we have much bigger ones.

ok, if it is acceptable for other people, I can accept it too - I understand well so it is a corner case and there is some consistency with other Postgres features.

Maybe this difference should be mentioned in documentation.

My concern with this patch is what I said upthread: I'm not sure that
we should be adjusting this behavior in a piecemeal fashion. I looked
through pg_proc to find all the functions that have more than one any*
argument, and found these:

oid | prorettype
-----------------------------------------------+------------
lag(anyelement,integer,anyelement) | anyelement
lead(anyelement,integer,anyelement) | anyelement
width_bucket(anyelement,anyarray) | integer
btarraycmp(anyarray,anyarray) | integer
array_eq(anyarray,anyarray) | boolean
array_ne(anyarray,anyarray) | boolean
array_lt(anyarray,anyarray) | boolean
array_gt(anyarray,anyarray) | boolean
array_le(anyarray,anyarray) | boolean
array_ge(anyarray,anyarray) | boolean
array_append(anyarray,anyelement) | anyarray
array_prepend(anyelement,anyarray) | anyarray
array_cat(anyarray,anyarray) | anyarray
array_larger(anyarray,anyarray) | anyarray
array_smaller(anyarray,anyarray) | anyarray
array_position(anyarray,anyelement) | integer
array_position(anyarray,anyelement,integer) | integer
array_positions(anyarray,anyelement) | integer[]
array_remove(anyarray,anyelement) | anyarray
array_replace(anyarray,anyelement,anyelement) | anyarray
arrayoverlap(anyarray,anyarray) | boolean
arraycontains(anyarray,anyarray) | boolean
arraycontained(anyarray,anyarray) | boolean
elem_contained_by_range(anyelement,anyrange) | boolean
range_contains_elem(anyrange,anyelement) | boolean
range_eq(anyrange,anyrange) | boolean
range_ne(anyrange,anyrange) | boolean
range_overlaps(anyrange,anyrange) | boolean
range_contains(anyrange,anyrange) | boolean
range_contained_by(anyrange,anyrange) | boolean
range_adjacent(anyrange,anyrange) | boolean
range_before(anyrange,anyrange) | boolean
range_after(anyrange,anyrange) | boolean
range_overleft(anyrange,anyrange) | boolean
range_overright(anyrange,anyrange) | boolean
range_union(anyrange,anyrange) | anyrange
range_merge(anyrange,anyrange) | anyrange
range_intersect(anyrange,anyrange) | anyrange
range_minus(anyrange,anyrange) | anyrange
range_cmp(anyrange,anyrange) | integer
range_lt(anyrange,anyrange) | boolean
range_le(anyrange,anyrange) | boolean
range_ge(anyrange,anyrange) | boolean
range_gt(anyrange,anyrange) | boolean
range_gist_same(anyrange,anyrange,internal) | internal
(45 rows)

Now, there's no point in changing range_eq and the later entries in this
table (the ones taking two anyrange's); our rather lame definition of
anycompatiblerange means that we'd get no benefit from doing so. But
I think there's a strong case for changing everything before range_eq.
It'd be nice if something like

SELECT array[1] < array[1.1];

would work the same way that "SELECT 1 < 1.1" would.

There it has sense without any discussion. But it is a little bit different than using the default value in the LAG function.

I checked the other concern that I had about whether the more flexible
polymorphic definitions would create any new ambiguities, and I don't
see any problems with this list. As functions, none of these names are
overloaded, except with different numbers of arguments so there's no
ambiguity. Most of the array functions are also operators, but the
competing operators do not take arrays, so it doesn't look like there's
any issue on that side either.

So I think we should be more ambitious and generalize all of these
to use anycompatiblearray etc.

Pavel

regards, tom lane

Re: Compatible defaults for LEAD/LAG

From

Pavel Stehule

Date:

04 September 2020, 06:33:36

po 31. 8. 2020 v 7:05 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

ne 30. 8. 2020 v 23:59 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> This is nice example of usage of anycompatible type (that is consistent
> with other things in Postgres), but standard says something else.
> It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
> but Tom doesn't like this patch.
> I am more inclined to think so this feature should be implemented
> differently - there is no strong reason to go against standard or against
> the implementations of other databases (and increase the costs of porting).
> Now the implementation is limited, but allowed behaviour is 100% ANSI.

I don't particularly buy this argument. The case at hand is what to do
if we have, say,

select lag(integer_column, 1, 1.2) over ...

The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2". The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work. But

(1) I doubt that anybody actually writes such things;

(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise. It is not normally the case that
any information-losing cast would be applied silently within an
expression.

postgres=# create table foo(a int);
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1

postgres=# create table foo(a int default 1.1);
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)

It is maybe strange, but it is not an unusual pattern in SQL. I think it can be analogy with default clause

DECLARE a int DEFAULT 1.2;

The default value doesn't change a type of variable. This is maybe too stupid example. There can be other little bit more realistic

CREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...
DECLARE x int DEFAULT a;
BEGIN
...

I am afraid about performance - if default value can change type, then some other things can stop work - like using index.

For *this* case we don't speak about some operations between two independent operands or function arguments. We are speaking about
the value and about a *default* for the value.

So this deviation from spec doesn't bother me; we have much bigger ones.

ok, if it is acceptable for other people, I can accept it too - I understand well so it is a corner case and there is some consistency with other Postgres features.

Maybe this difference should be mentioned in documentation.

I thought more about this problem, and I think so ANSI specification is semantically fully correct - it is consistent with application of default value elsewhere in SQL environment.

In this case the optional argument is not "any" expression. It is the default value for some expression . In other cases we usually use forced explicit cast.

Unfortunately we do not have good tools for generic implementation of this situation. In other cases there the functions have special support in parser for this case (example xmltable)

I see few possibilities how to finish and close this issue:

1. use anycompatible type and add note to documentation so expression of optional argument can change a result type, and so this is Postgres specific - other databases and ANSI SQL disallow this.

It is the most simple solution, and it is good enough for this case, that is not extra important.

2. choose the correct type somewhere inside the parser - for these two functions.

3. introduce and implement some generic solution for this case - it can be implemented on C level via some function helper or on syntax level

CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval a%type) ...

"defval argname%type" means for caller's expression "CAST(defval AS typeof(argname))"

@3 can be a very interesting and useful feature, but it needs an agreement and harder work

@2 this is 100% correct solution without hard work (but I am not sure if there can be an agreement on this implementation)

@1 it is good enough for this issue without harder work and probably there we can find an agreement simply.

Regards

Pavel

Re: Compatible defaults for LEAD/LAG

From

Tom Lane

Date:

22 September 2020, 00:33:24

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I see few possibilities how to finish and close this issue:
> 1. use anycompatible type and add note to documentation so expression of
> optional argument can change a result type, and so this is Postgres
> specific - other databases and ANSI SQL disallow this.
> It is the most simple solution, and it is good enough for this case, that
> is not extra important.
> 2. choose the correct type somewhere inside the parser - for these two
> functions.
> 3. introduce and implement some generic solution for this case - it can be
> implemented on C level via some function helper or on syntax level
>    CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval
> a%type) ...
> "defval argname%type" means for caller's expression "CAST(defval AS
> typeof(argname))"

I continue to feel that the spec's definition of this is not so
obviously right that we should jump through hoops to duplicate it.
In fact, I don't even agree that we need a disclaimer in the docs
saying that it's not quite the same.  Only the most nitpicky
language lawyers would ever even notice.

In hopes of moving this along a bit, I experimented with converting
the other functions I listed to use anycompatible.  I soon found that
touching any functions/operators that are listed in operator classes
would open a can of worms far larger than I'd previously supposed.
To maintain consistency, we'd have to propagate the datatype changes
to *every* function/operator associated with those opfamilies --- many
of which only have one any-foo input and thus aren't on my original
list.  (An example here is that extending btree array_ops will require
changing the max(anyarray) and min(anyarray) aggregates too.)  We'd
then end up with a situation that would be rather confusing from a
user's standpoint, in that it'd be quite un-obvious why some array
functions use anyarray while other ones use anycompatiblearray.

So I backed off to just changing the functions/operators that have
no opclass connections, such as array_cat.  Even that has some
downsides --- for example, in the attached patch, it's necessary
to change some polymorphism.sql examples that explicitly reference
array_cat(anyarray).  I wonder whether this change would break a
significant number of user-defined aggregates or operators.

(Note that I think we'd have to resist the temptation to fix that
by letting CREATE AGGREGATE et al accept support functions that
take anyarray/anycompatiblearray (etc) interchangeably.  A lot of
the security analysis that went into CVE-2020-14350 depended on
the assumption that utility commands only do exact lookups of
support functions.  If we tried to be lax about this, we'd
re-introduce the possibility for hostile capture of function
references in extension scripts.)

Another interesting issue, not seen in the attached but which
came up while I was experimenting with the more aggressive patch,
was this failure in the polymorphism test:

 select max(histogram_bounds) from pg_stats where tablename = 'pg_am';
-ERROR:  cannot compare arrays of different element types
+ERROR:  function max(anyarray) does not exist

That's because we don't accept pg_stats.histogram_bounds (of
declared type anyarray) as a valid input for anycompatiblearray.
I wonder if that isn't a bug we need to fix in the anycompatible
patch, independently of anything else.  We'd not be able to deduce
an actual element type from such an input, but we already cannot
do so when we match it to an anyarray parameter.

Anyway, attached find

0001 - updates Vik's original patch to HEAD and tweaks the
grammar in the docs a bit.

0002 - add-on patch to convert array_append, array_prepend,
array_cat, array_position, array_positions, array_remove,
array_replace, and width_bucket to use anycompatiblearray.

I think 0001 is committable, but 0002 is just WIP since
I didn't touch the docs.  I'm slightly discouraged about
whether 0002 is worth proceeding with.  Any thoughts?

            regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 461b748d89..14cb8e2ce6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19621,17 +19621,17 @@ SELECT count(*) FROM sometable;
         <indexterm>
          <primary>lag</primary>
         </indexterm>
-        <function>lag</function> ( <parameter>value</parameter> <type>anyelement</type>
+        <function>lag</function> ( <parameter>value</parameter> <type>anycompatible</type>
           <optional>, <parameter>offset</parameter> <type>integer</type>
-          <optional>, <parameter>default</parameter> <type>anyelement</type> </optional></optional> )
-        <returnvalue>anyelement</returnvalue>
+          <optional>, <parameter>default</parameter> <type>anycompatible</type> </optional></optional> )
+        <returnvalue>anycompatible</returnvalue>
        </para>
        <para>
         Returns <parameter>value</parameter> evaluated at
         the row that is <parameter>offset</parameter>
         rows before the current row within the partition; if there is no such
         row, instead returns <parameter>default</parameter>
-        (which must be of the same type as
+        (which must be of a type compatible with
         <parameter>value</parameter>).
         Both <parameter>offset</parameter> and
         <parameter>default</parameter> are evaluated
@@ -19646,17 +19646,17 @@ SELECT count(*) FROM sometable;
         <indexterm>
          <primary>lead</primary>
         </indexterm>
-        <function>lead</function> ( <parameter>value</parameter> <type>anyelement</type>
+        <function>lead</function> ( <parameter>value</parameter> <type>anycompatible</type>
           <optional>, <parameter>offset</parameter> <type>integer</type>
-          <optional>, <parameter>default</parameter> <type>anyelement</type> </optional></optional> )
-        <returnvalue>anyelement</returnvalue>
+          <optional>, <parameter>default</parameter> <type>anycompatible</type> </optional></optional> )
+        <returnvalue>anycompatible</returnvalue>
        </para>
        <para>
         Returns <parameter>value</parameter> evaluated at
         the row that is <parameter>offset</parameter>
         rows after the current row within the partition; if there is no such
         row, instead returns <parameter>default</parameter>
-        (which must be of the same type as
+        (which must be of a type compatible with
         <parameter>value</parameter>).
         Both <parameter>offset</parameter> and
         <parameter>default</parameter> are evaluated
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f48f5fb4d9..6e7832b0f4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9729,8 +9729,8 @@
   proname => 'lag', prokind => 'w', prorettype => 'anyelement',
   proargtypes => 'anyelement int4', prosrc => 'window_lag_with_offset' },
 { oid => '3108', descr => 'fetch the Nth preceding row value with default',
-  proname => 'lag', prokind => 'w', prorettype => 'anyelement',
-  proargtypes => 'anyelement int4 anyelement',
+  proname => 'lag', prokind => 'w', prorettype => 'anycompatible',
+  proargtypes => 'anycompatible int4 anycompatible',
   prosrc => 'window_lag_with_offset_and_default' },
 { oid => '3109', descr => 'fetch the following row value',
   proname => 'lead', prokind => 'w', prorettype => 'anyelement',
@@ -9739,8 +9739,8 @@
   proname => 'lead', prokind => 'w', prorettype => 'anyelement',
   proargtypes => 'anyelement int4', prosrc => 'window_lead_with_offset' },
 { oid => '3111', descr => 'fetch the Nth following row value with default',
-  proname => 'lead', prokind => 'w', prorettype => 'anyelement',
-  proargtypes => 'anyelement int4 anyelement',
+  proname => 'lead', prokind => 'w', prorettype => 'anycompatible',
+  proargtypes => 'anycompatible int4 anycompatible',
   prosrc => 'window_lead_with_offset_and_default' },
 { oid => '3112', descr => 'fetch the first row value',
   proname => 'first_value', prokind => 'w', prorettype => 'anyelement',
diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out
index 21c6cac491..19e2ac518a 100644
--- a/src/test/regress/expected/window.out
+++ b/src/test/regress/expected/window.out
@@ -300,6 +300,21 @@ SELECT lag(ten, four, 0) OVER (PARTITION BY four ORDER BY ten), ten, four FROM t
    0 |   3 |    3
 (10 rows)

+SELECT lag(ten, four, 0.7) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY
four,ten; 
+ lag | ten | four
+-----+-----+------
+   0 |   0 |    0
+   0 |   0 |    0
+   4 |   4 |    0
+ 0.7 |   1 |    1
+   1 |   1 |    1
+   1 |   7 |    1
+   7 |   9 |    1
+ 0.7 |   0 |    2
+ 0.7 |   1 |    3
+ 0.7 |   3 |    3
+(10 rows)
+
 SELECT lead(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
  lead | ten | four
 ------+-----+------
@@ -345,6 +360,21 @@ SELECT lead(ten * 2, 1, -1) OVER (PARTITION BY four ORDER BY ten), ten, four FRO
    -1 |   3 |    3
 (10 rows)

+SELECT lead(ten * 2, 1, -1.4) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY
four,ten; 
+ lead | ten | four
+------+-----+------
+    0 |   0 |    0
+    8 |   0 |    0
+ -1.4 |   4 |    0
+    2 |   1 |    1
+   14 |   1 |    1
+   18 |   7 |    1
+ -1.4 |   9 |    1
+ -1.4 |   0 |    2
+    6 |   1 |    3
+ -1.4 |   3 |    3
+(10 rows)
+
 SELECT first_value(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
  first_value | ten | four
 -------------+-----+------
diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql
index 9485aebce8..eae5fa6017 100644
--- a/src/test/regress/sql/window.sql
+++ b/src/test/regress/sql/window.sql
@@ -63,12 +63,14 @@ SELECT lag(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHER
 SELECT lag(ten, four) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;

 SELECT lag(ten, four, 0) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
+SELECT lag(ten, four, 0.7) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY
four,ten; 

 SELECT lead(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;

 SELECT lead(ten * 2, 1) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;

 SELECT lead(ten * 2, 1, -1) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;
+SELECT lead(ten * 2, 1, -1.4) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY
four,ten; 

 SELECT first_value(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10;

diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index 7cc812adda..c3377ebe46 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -168,14 +168,14 @@
   oprcode => 'textnename', oprrest => 'neqsel', oprjoin => 'neqjoinsel' },

 { oid => '349', descr => 'append element onto end of array',
-  oprname => '||', oprleft => 'anyarray', oprright => 'anyelement',
-  oprresult => 'anyarray', oprcode => 'array_append' },
+  oprname => '||', oprleft => 'anycompatiblearray', oprright => 'anycompatible',
+  oprresult => 'anycompatiblearray', oprcode => 'array_append' },
 { oid => '374', descr => 'prepend element onto front of array',
-  oprname => '||', oprleft => 'anyelement', oprright => 'anyarray',
-  oprresult => 'anyarray', oprcode => 'array_prepend' },
+  oprname => '||', oprleft => 'anycompatible', oprright => 'anycompatiblearray',
+  oprresult => 'anycompatiblearray', oprcode => 'array_prepend' },
 { oid => '375', descr => 'concatenate',
-  oprname => '||', oprleft => 'anyarray', oprright => 'anyarray',
-  oprresult => 'anyarray', oprcode => 'array_cat' },
+  oprname => '||', oprleft => 'anycompatiblearray', oprright => 'anycompatiblearray',
+  oprresult => 'anycompatiblearray', oprcode => 'array_cat' },

 { oid => '352', descr => 'equal',
   oprname => '=', oprcanhash => 't', oprleft => 'xid', oprright => 'xid',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6e7832b0f4..b239b5d42d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1549,14 +1549,14 @@
   proname => 'cardinality', prorettype => 'int4', proargtypes => 'anyarray',
   prosrc => 'array_cardinality' },
 { oid => '378', descr => 'append element onto end of array',
-  proname => 'array_append', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyarray anyelement', prosrc => 'array_append' },
+  proname => 'array_append', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_append' },
 { oid => '379', descr => 'prepend element onto front of array',
-  proname => 'array_prepend', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyelement anyarray', prosrc => 'array_prepend' },
+  proname => 'array_prepend', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatible anycompatiblearray', prosrc => 'array_prepend' },
 { oid => '383',
-  proname => 'array_cat', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyarray anyarray', prosrc => 'array_cat' },
+  proname => 'array_cat', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatiblearray anycompatiblearray', prosrc => 'array_cat' },
 { oid => '394', descr => 'split delimited text',
   proname => 'string_to_array', proisstrict => 'f', prorettype => '_text',
   proargtypes => 'text text', prosrc => 'text_to_array' },
@@ -1588,15 +1588,15 @@
   proargtypes => 'anyarray anyarray', prosrc => 'array_smaller' },
 { oid => '3277', descr => 'returns an offset of value in array',
   proname => 'array_position', proisstrict => 'f', prorettype => 'int4',
-  proargtypes => 'anyarray anyelement', prosrc => 'array_position' },
+  proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_position' },
 { oid => '3278',
   descr => 'returns an offset of value in array with start index',
   proname => 'array_position', proisstrict => 'f', prorettype => 'int4',
-  proargtypes => 'anyarray anyelement int4', prosrc => 'array_position_start' },
+  proargtypes => 'anycompatiblearray anycompatible int4', prosrc => 'array_position_start' },
 { oid => '3279',
   descr => 'returns an array of offsets of some value in array',
   proname => 'array_positions', proisstrict => 'f', prorettype => '_int4',
-  proargtypes => 'anyarray anyelement', prosrc => 'array_positions' },
+  proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_positions' },
 { oid => '1191', descr => 'array subscripts generator',
   proname => 'generate_subscripts', prorows => '1000', proretset => 't',
   prorettype => 'int4', proargtypes => 'anyarray int4 bool',
@@ -1621,11 +1621,11 @@
   proargtypes => 'internal', prosrc => 'array_unnest_support' },
 { oid => '3167',
   descr => 'remove any occurrences of an element from an array',
-  proname => 'array_remove', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyarray anyelement', prosrc => 'array_remove' },
+  proname => 'array_remove', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_remove' },
 { oid => '3168', descr => 'replace any occurrences of an element in an array',
-  proname => 'array_replace', proisstrict => 'f', prorettype => 'anyarray',
-  proargtypes => 'anyarray anyelement anyelement', prosrc => 'array_replace' },
+  proname => 'array_replace', proisstrict => 'f', prorettype => 'anycompatiblearray',
+  proargtypes => 'anycompatiblearray anycompatible anycompatible', prosrc => 'array_replace' },
 { oid => '2333', descr => 'aggregate transition function',
   proname => 'array_agg_transfn', proisstrict => 'f', prorettype => 'internal',
   proargtypes => 'internal anynonarray', prosrc => 'array_agg_transfn' },
@@ -1651,7 +1651,8 @@
 { oid => '3218',
   descr => 'bucket number of operand given a sorted array of bucket lower bounds',
   proname => 'width_bucket', prorettype => 'int4',
-  proargtypes => 'anyelement anyarray', prosrc => 'width_bucket_array' },
+  proargtypes => 'anycompatible anycompatiblearray',
+  prosrc => 'width_bucket_array' },
 { oid => '3816', descr => 'array typanalyze',
   proname => 'array_typanalyze', provolatile => 's', prorettype => 'bool',
   proargtypes => 'internal', prosrc => 'array_typanalyze' },
diff --git a/src/test/regress/expected/polymorphism.out b/src/test/regress/expected/polymorphism.out
index 1ff40764d9..2c3bb0a60b 100644
--- a/src/test/regress/expected/polymorphism.out
+++ b/src/test/regress/expected/polymorphism.out
@@ -729,10 +729,10 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl;
 (5 rows)

 -- another sort of polymorphic aggregate
-CREATE AGGREGATE array_cat_accum (anyarray)
+CREATE AGGREGATE array_cat_accum (anycompatiblearray)
 (
     sfunc = array_cat,
-    stype = anyarray,
+    stype = anycompatiblearray,
     initcond = '{}'
 );
 SELECT array_cat_accum(i)
@@ -786,16 +786,16 @@ create aggregate build_group(int8, integer) (
   STYPE = int8[]
 );
 -- check proper resolution of data types for polymorphic transfn/finalfn
-create function first_el(anyarray) returns anyelement as
+create function first_el(anycompatiblearray) returns anycompatible as
 'select $1[1]' language sql strict immutable;
 create aggregate first_el_agg_f8(float8) (
   SFUNC = array_append,
   STYPE = float8[],
   FINALFUNC = first_el
 );
-create aggregate first_el_agg_any(anyelement) (
+create aggregate first_el_agg_any(anycompatible) (
   SFUNC = array_append,
-  STYPE = anyarray,
+  STYPE = anycompatiblearray,
   FINALFUNC = first_el
 );
 select first_el_agg_f8(x::float8) from generate_series(1,10) x;
diff --git a/src/test/regress/sql/polymorphism.sql b/src/test/regress/sql/polymorphism.sql
index e5222f1f81..70a21c8978 100644
--- a/src/test/regress/sql/polymorphism.sql
+++ b/src/test/regress/sql/polymorphism.sql
@@ -498,10 +498,10 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl;

 -- another sort of polymorphic aggregate

-CREATE AGGREGATE array_cat_accum (anyarray)
+CREATE AGGREGATE array_cat_accum (anycompatiblearray)
 (
     sfunc = array_cat,
-    stype = anyarray,
+    stype = anycompatiblearray,
     initcond = '{}'
 );

@@ -549,7 +549,7 @@ create aggregate build_group(int8, integer) (

 -- check proper resolution of data types for polymorphic transfn/finalfn

-create function first_el(anyarray) returns anyelement as
+create function first_el(anycompatiblearray) returns anycompatible as
 'select $1[1]' language sql strict immutable;

 create aggregate first_el_agg_f8(float8) (
@@ -558,9 +558,9 @@ create aggregate first_el_agg_f8(float8) (
   FINALFUNC = first_el
 );

-create aggregate first_el_agg_any(anyelement) (
+create aggregate first_el_agg_any(anycompatible) (
   SFUNC = array_append,
-  STYPE = anyarray,
+  STYPE = anycompatiblearray,
   FINALFUNC = first_el
 );

Re: Compatible defaults for LEAD/LAG

From

Pavel Stehule

Date:

24 September 2020, 19:34:03

út 22. 9. 2020 v 2:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> I see few possibilities how to finish and close this issue:
> 1. use anycompatible type and add note to documentation so expression of
> optional argument can change a result type, and so this is Postgres
> specific - other databases and ANSI SQL disallow this.
> It is the most simple solution, and it is good enough for this case, that
> is not extra important.
> 2. choose the correct type somewhere inside the parser - for these two
> functions.
> 3. introduce and implement some generic solution for this case - it can be
> implemented on C level via some function helper or on syntax level
> CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval
> a%type) ...
> "defval argname%type" means for caller's expression "CAST(defval AS
> typeof(argname))"

I continue to feel that the spec's definition of this is not so
obviously right that we should jump through hoops to duplicate it.
In fact, I don't even agree that we need a disclaimer in the docs
saying that it's not quite the same. Only the most nitpicky
language lawyers would ever even notice.

In hopes of moving this along a bit, I experimented with converting
the other functions I listed to use anycompatible. I soon found that
touching any functions/operators that are listed in operator classes
would open a can of worms far larger than I'd previously supposed.
To maintain consistency, we'd have to propagate the datatype changes
to *every* function/operator associated with those opfamilies --- many
of which only have one any-foo input and thus aren't on my original
list. (An example here is that extending btree array_ops will require
changing the max(anyarray) and min(anyarray) aggregates too.) We'd
then end up with a situation that would be rather confusing from a
user's standpoint, in that it'd be quite un-obvious why some array
functions use anyarray while other ones use anycompatiblearray.

So I backed off to just changing the functions/operators that have
no opclass connections, such as array_cat. Even that has some
downsides --- for example, in the attached patch, it's necessary
to change some polymorphism.sql examples that explicitly reference
array_cat(anyarray). I wonder whether this change would break a
significant number of user-defined aggregates or operators.

(Note that I think we'd have to resist the temptation to fix that
by letting CREATE AGGREGATE et al accept support functions that
take anyarray/anycompatiblearray (etc) interchangeably. A lot of
the security analysis that went into CVE-2020-14350 depended on
the assumption that utility commands only do exact lookups of
support functions. If we tried to be lax about this, we'd
re-introduce the possibility for hostile capture of function
references in extension scripts.)

Another interesting issue, not seen in the attached but which
came up while I was experimenting with the more aggressive patch,
was this failure in the polymorphism test:

select max(histogram_bounds) from pg_stats where tablename = 'pg_am';
-ERROR: cannot compare arrays of different element types
+ERROR: function max(anyarray) does not exist

That's because we don't accept pg_stats.histogram_bounds (of
declared type anyarray) as a valid input for anycompatiblearray.
I wonder if that isn't a bug we need to fix in the anycompatible
patch, independently of anything else. We'd not be able to deduce
an actual element type from such an input, but we already cannot
do so when we match it to an anyarray parameter.

Anyway, attached find

0001 - updates Vik's original patch to HEAD and tweaks the
grammar in the docs a bit.

0002 - add-on patch to convert array_append, array_prepend,
array_cat, array_position, array_positions, array_remove,
array_replace, and width_bucket to use anycompatiblearray.

I think 0001 is committable, but 0002 is just WIP since
I didn't touch the docs. I'm slightly discouraged about
whether 0002 is worth proceeding with. Any thoughts?

I think so 0002 has sense - more than doc I miss related regress tests, but it is partially covered by anycompatible tests

Anyway I tested both patches and there is not problem with compilation, building doc, and make check-world passed

I'll mark this patch as ready for committer

Best regards

Pavel

regards, tom lane

Re: Compatible defaults for LEAD/LAG

From

Tom Lane

Date:

04 November 2020, 21:12:29

Pavel Stehule <pavel.stehule@gmail.com> writes:
> út 22. 9. 2020 v 2:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> Anyway, attached find
>> 0001 - updates Vik's original patch to HEAD and tweaks the
>> grammar in the docs a bit.
>> 0002 - add-on patch to convert array_append, array_prepend,
>> array_cat, array_position, array_positions, array_remove,
>> array_replace, and width_bucket to use anycompatiblearray.
>> I think 0001 is committable, but 0002 is just WIP since
>> I didn't touch the docs.  I'm slightly discouraged about
>> whether 0002 is worth proceeding with.  Any thoughts?

> I think so 0002 has sense - more than doc I miss related regress tests, but
> it is partially covered by anycompatible tests

I didn't see any need for particularly exhaustive testing, but
I did add one new test for an operator and one for a function.
Pushed with that and the necessary docs work.

            regards, tom lane

Re: Compatible defaults for LEAD/LAG

From

Pavel Stehule

Date:

04 November 2020, 22:00:17

st 4. 11. 2020 v 22:12 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:
> út 22. 9. 2020 v 2:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> Anyway, attached find
>> 0001 - updates Vik's original patch to HEAD and tweaks the
>> grammar in the docs a bit.
>> 0002 - add-on patch to convert array_append, array_prepend,
>> array_cat, array_position, array_positions, array_remove,
>> array_replace, and width_bucket to use anycompatiblearray.
>> I think 0001 is committable, but 0002 is just WIP since
>> I didn't touch the docs. I'm slightly discouraged about
>> whether 0002 is worth proceeding with. Any thoughts?

> I think so 0002 has sense - more than doc I miss related regress tests, but
> it is partially covered by anycompatible tests

I didn't see any need for particularly exhaustive testing, but
I did add one new test for an operator and one for a function.
Pushed with that and the necessary docs work.

ok, Thank you

Pavel

regards, tom lane