Thread: Compatible defaults for LEAD/LAG
I noticed that the PostgreSQL entry in a pan-database feature matrix by Modern SQL was not reflecting the reality of our features.[1] It turns out that test case used by the author produced an error which the tool took to mean the feature was not implemented. I don't have the actual test, but here is a simulation of it: postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n) postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n) postgres-# ORDER BY n; ERROR: function lag(numeric, integer, integer) does not exist LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n) ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. Attached is a patch that fixes this issue using the new anycompatible pseudotype. I am hoping this can be slipped into 13 even though it requires a catversion bump after BETA1. I looked for other functions with a similar issue but didn't find any. [1] https://twitter.com/pg_xocolatl/status/1266694496194093057 -- Vik Fearing
Attachment
Vik Fearing <vik@postgresfriends.org> writes: > postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n) > postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n) > postgres-# ORDER BY n; > ERROR: function lag(numeric, integer, integer) does not exist > LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n) > ^ Yeah, we have similar issues elsewhere. > Attached is a patch that fixes this issue using the new anycompatible > pseudotype. I am hoping this can be slipped into 13 even though it > requires a catversion bump after BETA1. When the anycompatible patch went in, I thought for a little bit about trying to use it with existing built-in functions, but didn't have the time to investigate the issue in detail. I'm not in favor of hacking things one-function-at-a-time here; we should look through the whole library and see what we've got. The main thing that makes this perhaps-not-trivial is that an anycompatible-ified function will match more cases than it would have before, possibly causing conflicts if the function or operator name is overloaded. We'd have to look at such cases and decide what we want to do --- one answer would be to drop some of the alternatives and rely on the parser to add casts, but that might slow things down. Anyway, I agree that this is a good direction to pursue, but not in a last-minute-hack-for-v13 way. regards, tom lane
On 5/31/20 9:53 PM, Tom Lane wrote: > Vik Fearing <vik@postgresfriends.org> writes: >> postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n) >> postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n) >> postgres-# ORDER BY n; >> ERROR: function lag(numeric, integer, integer) does not exist >> LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n) >> ^ > > Yeah, we have similar issues elsewhere. > >> Attached is a patch that fixes this issue using the new anycompatible >> pseudotype. I am hoping this can be slipped into 13 even though it >> requires a catversion bump after BETA1. > > When the anycompatible patch went in, I thought for a little bit about > trying to use it with existing built-in functions, but didn't have the > time to investigate the issue in detail. I'm not in favor of hacking > things one-function-at-a-time here; we should look through the whole > library and see what we've got. > > The main thing that makes this perhaps-not-trivial is that an > anycompatible-ified function will match more cases than it would have > before, possibly causing conflicts if the function or operator name > is overloaded. We'd have to look at such cases and decide what we > want to do --- one answer would be to drop some of the alternatives > and rely on the parser to add casts, but that might slow things down. > > Anyway, I agree that this is a good direction to pursue, but not in > a last-minute-hack-for-v13 way. Fair enough. I put it in the commitfest app for version 14. https://commitfest.postgresql.org/28/2574/ -- Vik Fearing
On 5/31/20 9:53 PM, Tom Lane wrote: > When the anycompatible patch went in, I thought for a little bit about > trying to use it with existing built-in functions, but didn't have the > time to investigate the issue in detail. I'm not in favor of hacking > things one-function-at-a-time here; we should look through the whole > library and see what we've got. BTW, I did go through pg_proc.dat to see what we've got and these were the only two I found. I mentioned that in a part you didn't quote. Now I went through again, this time using a query on pg_proc itself, and I missed array_replace during my manual scan. array_replace, lead, and lag are the only functions we have that take more than one anyelement. There are many functions (seemingly all operator implementations) that take multiple anyarray, anyrange, and anyenum; but none with anynonarray and only those three for anyelement. Are you sure we don't want to give at least the anycompatible type a nice public workout with this? -- Vik Fearing
Vik Fearing <vik@postgresfriends.org> writes: > On 5/31/20 9:53 PM, Tom Lane wrote: >> When the anycompatible patch went in, I thought for a little bit about >> trying to use it with existing built-in functions, but didn't have the >> time to investigate the issue in detail. I'm not in favor of hacking >> things one-function-at-a-time here; we should look through the whole >> library and see what we've got. > array_replace, lead, and lag are the only functions we have that take > more than one anyelement. That's just the tip of the iceberg, though. If you consider all the old-style polymorphic types, we have proc ----------------------------------------------- array_append(anyarray,anyelement) array_cat(anyarray,anyarray) array_eq(anyarray,anyarray) array_ge(anyarray,anyarray) array_gt(anyarray,anyarray) array_larger(anyarray,anyarray) array_le(anyarray,anyarray) array_lt(anyarray,anyarray) array_ne(anyarray,anyarray) array_position(anyarray,anyelement) array_position(anyarray,anyelement,integer) array_positions(anyarray,anyelement) array_prepend(anyelement,anyarray) array_remove(anyarray,anyelement) array_replace(anyarray,anyelement,anyelement) array_smaller(anyarray,anyarray) arraycontained(anyarray,anyarray) arraycontains(anyarray,anyarray) arrayoverlap(anyarray,anyarray) btarraycmp(anyarray,anyarray) elem_contained_by_range(anyelement,anyrange) lag(anyelement,integer,anyelement) lead(anyelement,integer,anyelement) range_adjacent(anyrange,anyrange) range_after(anyrange,anyrange) range_before(anyrange,anyrange) range_cmp(anyrange,anyrange) range_contained_by(anyrange,anyrange) range_contains(anyrange,anyrange) range_contains_elem(anyrange,anyelement) range_eq(anyrange,anyrange) range_ge(anyrange,anyrange) range_gist_same(anyrange,anyrange,internal) range_gt(anyrange,anyrange) range_intersect(anyrange,anyrange) range_le(anyrange,anyrange) range_lt(anyrange,anyrange) range_merge(anyrange,anyrange) range_minus(anyrange,anyrange) range_ne(anyrange,anyrange) range_overlaps(anyrange,anyrange) range_overleft(anyrange,anyrange) range_overright(anyrange,anyrange) range_union(anyrange,anyrange) width_bucket(anyelement,anyarray) (45 rows) (I ignored anyenum here, since it has no mapping to the anycompatible family.) Some of these are internal or can otherwise be discounted, but surely there'd be a market for, say, "int8[] || int4". The big question that raises is can we do it without creating impossible confusion with text-style concatenation. > Are you sure we don't want to give > at least the anycompatible type a nice public workout with this? Yes, I'm quite sure. If the idea crashes and burns for some reason, we'll be glad we didn't buy into it full-speed-ahead right away. regards, tom lane
po 1. 6. 2020 v 4:07 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Vik Fearing <vik@postgresfriends.org> writes:
> On 5/31/20 9:53 PM, Tom Lane wrote:
>> When the anycompatible patch went in, I thought for a little bit about
>> trying to use it with existing built-in functions, but didn't have the
>> time to investigate the issue in detail. I'm not in favor of hacking
>> things one-function-at-a-time here; we should look through the whole
>> library and see what we've got.
> array_replace, lead, and lag are the only functions we have that take
> more than one anyelement.
That's just the tip of the iceberg, though. If you consider all the
old-style polymorphic types, we have
proc
-----------------------------------------------
array_append(anyarray,anyelement)
array_cat(anyarray,anyarray)
array_eq(anyarray,anyarray)
array_ge(anyarray,anyarray)
array_gt(anyarray,anyarray)
array_larger(anyarray,anyarray)
array_le(anyarray,anyarray)
array_lt(anyarray,anyarray)
array_ne(anyarray,anyarray)
array_position(anyarray,anyelement)
array_position(anyarray,anyelement,integer)
array_positions(anyarray,anyelement)
array_prepend(anyelement,anyarray)
array_remove(anyarray,anyelement)
array_replace(anyarray,anyelement,anyelement)
array_smaller(anyarray,anyarray)
arraycontained(anyarray,anyarray)
arraycontains(anyarray,anyarray)
arrayoverlap(anyarray,anyarray)
btarraycmp(anyarray,anyarray)
I am not sure, if using anycompatible for buildin's array functions can be good idea. Theoretically a users can do new performance errors due hidden cast of a longer array.
For example array_positions(int[], numeric). In this case conversion int[] to numeric[] can be bad idea in some cases. Reversely in this case we want to convert numeric to int. When it is not possible without loss, then we can return false, or maybe raise exception. I designed anycompatible* for usage when the parameters has "symmetric weight", and cast to most common type should not to have performance issue. It is not this case. When I though about this cases, and about designing functions compatible with Oracle I though about another generic family (families) with different behave (specified by suffix or maybe with typemod or with some syntax):
a) force_cast .. it can be good for array's functions - array_position(anyarray, anyelement_force_cast), array_position(anyarray, anyelement(force_cast)) .. this is often behave in Oracle
b) force_safe_cast .. special kind of casting - safer variant of tolerant Oracle's casting - 1.0::int is valid, 1.1::int is not valid in this type of casting
anycompatible* family can helps with some cases, but it is not silver bullet for all unfriendly, or not compatible situation (mainly for buildin functionality).
Regards
Pavel
elem_contained_by_range(anyelement,anyrange)
lag(anyelement,integer,anyelement)
lead(anyelement,integer,anyelement)
range_adjacent(anyrange,anyrange)
range_after(anyrange,anyrange)
range_before(anyrange,anyrange)
range_cmp(anyrange,anyrange)
range_contained_by(anyrange,anyrange)
range_contains(anyrange,anyrange)
range_contains_elem(anyrange,anyelement)
range_eq(anyrange,anyrange)
range_ge(anyrange,anyrange)
range_gist_same(anyrange,anyrange,internal)
range_gt(anyrange,anyrange)
range_intersect(anyrange,anyrange)
range_le(anyrange,anyrange)
range_lt(anyrange,anyrange)
range_merge(anyrange,anyrange)
range_minus(anyrange,anyrange)
range_ne(anyrange,anyrange)
range_overlaps(anyrange,anyrange)
range_overleft(anyrange,anyrange)
range_overright(anyrange,anyrange)
range_union(anyrange,anyrange)
width_bucket(anyelement,anyarray)
(45 rows)
(I ignored anyenum here, since it has no mapping to the anycompatible
family.) Some of these are internal or can otherwise be discounted,
but surely there'd be a market for, say, "int8[] || int4". The big
question that raises is can we do it without creating impossible confusion
with text-style concatenation.
> Are you sure we don't want to give
> at least the anycompatible type a nice public workout with this?
Yes, I'm quite sure. If the idea crashes and burns for some reason,
we'll be glad we didn't buy into it full-speed-ahead right away.
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > po 1. 6. 2020 v 4:07 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: >> That's just the tip of the iceberg, though. If you consider all the >> old-style polymorphic types, we have [for example] >> array_append(anyarray,anyelement) > I am not sure, if using anycompatible for buildin's array functions can be > good idea. Theoretically a users can do new performance errors due hidden > cast of a longer array. I don't buy that argument. If the query requires casting int4[] to int8[], making the user do it by hand isn't going to improve performance over having the parser insert the coercion automatically. Sure, there will be some fraction of queries that could be rewritten to avoid the need for any cast, but so what? Often the performance difference isn't going to matter; and when it does, I don't see that this is any different from hundreds of other cases in which there are more-efficient and less-efficient ways to write a query. SQL is full of performance traps and always will be. You're also assuming that when the user gets an "operator does not exist" error from "int4[] || int8", that will magically lead them to choosing an optimal substitute. I know of no reason to believe that --- it's at least as likely that they'll conclude it just can't be done, as in the LAG() example we started the thread with. So I think most people would be much happier if the system just did something reasonable, and they can optimize later if it's important. > When I > though about this cases, and about designing functions compatible with > Oracle I though about another generic family (families) with different > behave (specified by suffix or maybe with typemod or with some syntax): So we're already deciding anycompatible can't get the job done? Maybe it's a good thing we had this conversation now. It's not too late to rip the feature out of v13 altogether, and try again later. But if you think I'm going to commit yet another variant of polymorphism on top of this one, you're mistaken. regards, tom lane
po 1. 6. 2020 v 17:36 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> po 1. 6. 2020 v 4:07 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> That's just the tip of the iceberg, though. If you consider all the
>> old-style polymorphic types, we have [for example]
>> array_append(anyarray,anyelement)
> I am not sure, if using anycompatible for buildin's array functions can be
> good idea. Theoretically a users can do new performance errors due hidden
> cast of a longer array.
I don't buy that argument. If the query requires casting int4[] to
int8[], making the user do it by hand isn't going to improve performance
over having the parser insert the coercion automatically. Sure, there
will be some fraction of queries that could be rewritten to avoid the
need for any cast, but so what? Often the performance difference isn't
going to matter; and when it does, I don't see that this is any different
from hundreds of other cases in which there are more-efficient and
less-efficient ways to write a query. SQL is full of performance traps
and always will be. You're also assuming that when the user gets an
"operator does not exist" error from "int4[] || int8", that will magically
lead them to choosing an optimal substitute. I know of no reason to
believe that --- it's at least as likely that they'll conclude it just
can't be done, as in the LAG() example we started the thread with. So
I think most people would be much happier if the system just did something
reasonable, and they can optimize later if it's important.
> When I
> though about this cases, and about designing functions compatible with
> Oracle I though about another generic family (families) with different
> behave (specified by suffix or maybe with typemod or with some syntax):
So we're already deciding anycompatible can't get the job done? Maybe
it's a good thing we had this conversation now. It's not too late to
rip the feature out of v13 altogether, and try again later. But if
you think I'm going to commit yet another variant of polymorphism on
top of this one, you're mistaken.
anycompatible types are fully conssistent with Postgres buildin functions supported by parser. It is main benefit and important benefit.
Without anycompatible types is pretty hard to write variadic functions with friendly behave like buildin functions has.
It can be perfect for LAG() example. It does almost same work what we did manually in parser.
Surely, it is not compatible with Oracle's polymorphism rules, because
a) Our and Postgres type system is different (sometimes very different).
b) Oracle's casting rules depends on argument positions and some specific exceptions - so it is not possible to map it on Postgres type system (or system of polymorphic types).
I wrote and I spent lot of time on this feature to be used - by core developers, by extension's developers. Like lot of other feature - it can carry more comfort with some risk of performance issues.
On second hand if we use this feature for buildin functions, there is zero impact of current applications - there should not be any problem with compatibility or performance.
Maybe I am too old, and last year was too terrible so I have a problem to imagine a "intelligent" user now :)
Regards
Pavel
Although I can imagine other enhancing of polymorphic types, I don't propose any new, and I don't expect any enhancing in near feature. Is true, so there are not requests and I think so "anycompatible" and "any" are more than good enough for 99.99% developers.
I am little bit surprised so semi compatibility mode implemented in Orafce is good enough and full compatibility with Oracle a) isn't possible, b) isn't requested by visible group of users (or users who need it are satisfied by EDB).
regards, tom lane
Hi
ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing <vik@postgresfriends.org> napsal:
On 5/31/20 9:53 PM, Tom Lane wrote:
> Vik Fearing <vik@postgresfriends.org> writes:
>> postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
>> postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
>> postgres-# ORDER BY n;
>> ERROR: function lag(numeric, integer, integer) does not exist
>> LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
>> ^
>
> Yeah, we have similar issues elsewhere.
>
>> Attached is a patch that fixes this issue using the new anycompatible
>> pseudotype. I am hoping this can be slipped into 13 even though it
>> requires a catversion bump after BETA1.
>
> When the anycompatible patch went in, I thought for a little bit about
> trying to use it with existing built-in functions, but didn't have the
> time to investigate the issue in detail. I'm not in favor of hacking
> things one-function-at-a-time here; we should look through the whole
> library and see what we've got.
>
> The main thing that makes this perhaps-not-trivial is that an
> anycompatible-ified function will match more cases than it would have
> before, possibly causing conflicts if the function or operator name
> is overloaded. We'd have to look at such cases and decide what we
> want to do --- one answer would be to drop some of the alternatives
> and rely on the parser to add casts, but that might slow things down.
>
> Anyway, I agree that this is a good direction to pursue, but not in
> a last-minute-hack-for-v13 way.
Fair enough. I put it in the commitfest app for version 14.
https://commitfest.postgresql.org/28/2574/
--
Vik Fearing
The patch is ok. It is almost trivial. It solves one issue, but maybe it introduces a new issue.
Other databases try to coerce default constant expression to value expression. I found a description about this behaviour for MSSQL, Sybase, BigQuery.
I didn't find related documentation for Oracle, and I have not a access to some running instance of Oracle to test it.
Ansi SQL say - type of default expression should be compatible with lag expression, and declared type of result should be type of value expression.
IWD 9075-2:201?(E) 6.10 <window function> - j) v)
Current implementation is limited (and the behaviour is not user friendly in all details), but new behaviour (implemented by patch) is in half cases out of standard :(.
These cases are probably not often - and they are really corner cases, but I cannot to qualify how much important this fact is.
For users, the implemented feature is better, and it is safe. Implementation is trivial - is significantly simpler than implementation that is 100% standard, although it should not be a hard problem too (in analyze stage it probably needs a few lines too).
There has to be a decision, because now we can go in both directions. After choosing there will not be a way back.
Regards
Pavel
> On 13 Jul 2020, at 19:23, Pavel Stehule <pavel.stehule@gmail.com> wrote: > ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> napsal: > On 5/31/20 9:53 PM, Tom Lane wrote: > > Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> writes: > >> postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n) > >> postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n) > >> postgres-# ORDER BY n; > >> ERROR: function lag(numeric, integer, integer) does not exist > >> LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n) > >> ^ > > > > Yeah, we have similar issues elsewhere. > > > >> Attached is a patch that fixes this issue using the new anycompatible > >> pseudotype. I am hoping this can be slipped into 13 even though it > >> requires a catversion bump after BETA1. > > > > When the anycompatible patch went in, I thought for a little bit about > > trying to use it with existing built-in functions, but didn't have the > > time to investigate the issue in detail. I'm not in favor of hacking > > things one-function-at-a-time here; we should look through the whole > > library and see what we've got. > > > > The main thing that makes this perhaps-not-trivial is that an > > anycompatible-ified function will match more cases than it would have > > before, possibly causing conflicts if the function or operator name > > is overloaded. We'd have to look at such cases and decide what we > > want to do --- one answer would be to drop some of the alternatives > > and rely on the parser to add casts, but that might slow things down. > > > > Anyway, I agree that this is a good direction to pursue, but not in > > a last-minute-hack-for-v13 way. > > Fair enough. I put it in the commitfest app for version 14. > https://commitfest.postgresql.org/28/2574/ <https://commitfest.postgresql.org/28/2574/> > -- > Vik Fearing > > The patch is ok. It is almost trivial. It solves one issue, but maybe it introduces a new issue. > > Other databases try to coerce default constant expression to value expression. I found a description about this behaviourfor MSSQL, Sybase, BigQuery. > > I didn't find related documentation for Oracle, and I have not a access to some running instance of Oracle to test it. > > Ansi SQL say - type of default expression should be compatible with lag expression, and declared type of result shouldbe type of value expression. > > IWD 9075-2:201?(E) 6.10 <window function> - j) v) > > Current implementation is limited (and the behaviour is not user friendly in all details), but new behaviour (implementedby patch) is in half cases out of standard :(. > > These cases are probably not often - and they are really corner cases, but I cannot to qualify how much important thisfact is. > > For users, the implemented feature is better, and it is safe. Implementation is trivial - is significantly simpler thanimplementation that is 100% standard, although it should not be a hard problem too (in analyze stage it probably needsa few lines too). > > There has to be a decision, because now we can go in both directions. After choosing there will not be a way back. This patch is marked waiting for author, but it's not clear from this review what we're waiting on. Should this be moved to Needs review or Ready for Committer instead to solicit the input from a committer? ISTM the latter is more suitable. Or did I misunderstand your review? cheers ./daniel
čt 23. 7. 2020 v 13:29 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
> On 13 Jul 2020, at 19:23, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> ne 31. 5. 2020 v 22:02 odesílatel Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> napsal:
> On 5/31/20 9:53 PM, Tom Lane wrote:
> > Vik Fearing <vik@postgresfriends.org <mailto:vik@postgresfriends.org>> writes:
> >> postgres=# SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >> postgres-# FROM (VALUES (1.1), (2.2), (3.3)) AS v (n)
> >> postgres-# ORDER BY n;
> >> ERROR: function lag(numeric, integer, integer) does not exist
> >> LINE 1: SELECT LAG(n, 1, -99) OVER (ORDER BY n)
> >> ^
> >
> > Yeah, we have similar issues elsewhere.
> >
> >> Attached is a patch that fixes this issue using the new anycompatible
> >> pseudotype. I am hoping this can be slipped into 13 even though it
> >> requires a catversion bump after BETA1.
> >
> > When the anycompatible patch went in, I thought for a little bit about
> > trying to use it with existing built-in functions, but didn't have the
> > time to investigate the issue in detail. I'm not in favor of hacking
> > things one-function-at-a-time here; we should look through the whole
> > library and see what we've got.
> >
> > The main thing that makes this perhaps-not-trivial is that an
> > anycompatible-ified function will match more cases than it would have
> > before, possibly causing conflicts if the function or operator name
> > is overloaded. We'd have to look at such cases and decide what we
> > want to do --- one answer would be to drop some of the alternatives
> > and rely on the parser to add casts, but that might slow things down.
> >
> > Anyway, I agree that this is a good direction to pursue, but not in
> > a last-minute-hack-for-v13 way.
>
> Fair enough. I put it in the commitfest app for version 14.
> https://commitfest.postgresql.org/28/2574/ <https://commitfest.postgresql.org/28/2574/>
> --
> Vik Fearing
>
> The patch is ok. It is almost trivial. It solves one issue, but maybe it introduces a new issue.
>
> Other databases try to coerce default constant expression to value expression. I found a description about this behaviour for MSSQL, Sybase, BigQuery.
>
> I didn't find related documentation for Oracle, and I have not a access to some running instance of Oracle to test it.
>
> Ansi SQL say - type of default expression should be compatible with lag expression, and declared type of result should be type of value expression.
>
> IWD 9075-2:201?(E) 6.10 <window function> - j) v)
>
> Current implementation is limited (and the behaviour is not user friendly in all details), but new behaviour (implemented by patch) is in half cases out of standard :(.
>
> These cases are probably not often - and they are really corner cases, but I cannot to qualify how much important this fact is.
>
> For users, the implemented feature is better, and it is safe. Implementation is trivial - is significantly simpler than implementation that is 100% standard, although it should not be a hard problem too (in analyze stage it probably needs a few lines too).
>
> There has to be a decision, because now we can go in both directions. After choosing there will not be a way back.
This patch is marked waiting for author, but it's not clear from this review
what we're waiting on. Should this be moved to Needs review or Ready for
Committer instead to solicit the input from a committer? ISTM the latter is
more suitable. Or did I misunderstand your review?
Unfortunately, I don't know - I am not sure about committing this patch, and I am not able to qualify for impact.
This is nice example of usage of anycompatible type (that is consistent with other things in Postgres), but standard says something else.
It can be easily solved with https://commitfest.postgresql.org/28/2081/ - but Tom doesn't like this patch.
I am more inclined to think so this feature should be implemented differently - there is no strong reason to go against standard or against the implementations of other databases (and increase the costs of porting). Now the implementation is limited, but allowed behaviour is 100% ANSI.
On second hand, the implemented feature (behaviour) is pretty small, and really corner case, so maybe a simple solution (although it isn't perfect) is good.
So I would listen to the opinions of other people.
Regards
Pavel
cheers ./daniel
Pavel Stehule <pavel.stehule@gmail.com> writes: > This is nice example of usage of anycompatible type (that is consistent > with other things in Postgres), but standard says something else. > It can be easily solved with https://commitfest.postgresql.org/28/2081/ - > but Tom doesn't like this patch. > I am more inclined to think so this feature should be implemented > differently - there is no strong reason to go against standard or against > the implementations of other databases (and increase the costs of porting). > Now the implementation is limited, but allowed behaviour is 100% ANSI. I don't particularly buy this argument. The case at hand is what to do if we have, say, select lag(integer_column, 1, 1.2) over ... The proposed patch would result in the output being of type numeric, and any rows using the default would show "1.2". The spec says that the right thing is to return integer, and we should round the default to "1" to make that work. But (1) I doubt that anybody actually writes such things; (2) For anyone who does write it, the spec's behavior fails to meet the principle of least surprise. It is not normally the case that any information-losing cast would be applied silently within an expression. So this deviation from spec doesn't bother me; we have much bigger ones. My concern with this patch is what I said upthread: I'm not sure that we should be adjusting this behavior in a piecemeal fashion. I looked through pg_proc to find all the functions that have more than one any* argument, and found these: oid | prorettype -----------------------------------------------+------------ lag(anyelement,integer,anyelement) | anyelement lead(anyelement,integer,anyelement) | anyelement width_bucket(anyelement,anyarray) | integer btarraycmp(anyarray,anyarray) | integer array_eq(anyarray,anyarray) | boolean array_ne(anyarray,anyarray) | boolean array_lt(anyarray,anyarray) | boolean array_gt(anyarray,anyarray) | boolean array_le(anyarray,anyarray) | boolean array_ge(anyarray,anyarray) | boolean array_append(anyarray,anyelement) | anyarray array_prepend(anyelement,anyarray) | anyarray array_cat(anyarray,anyarray) | anyarray array_larger(anyarray,anyarray) | anyarray array_smaller(anyarray,anyarray) | anyarray array_position(anyarray,anyelement) | integer array_position(anyarray,anyelement,integer) | integer array_positions(anyarray,anyelement) | integer[] array_remove(anyarray,anyelement) | anyarray array_replace(anyarray,anyelement,anyelement) | anyarray arrayoverlap(anyarray,anyarray) | boolean arraycontains(anyarray,anyarray) | boolean arraycontained(anyarray,anyarray) | boolean elem_contained_by_range(anyelement,anyrange) | boolean range_contains_elem(anyrange,anyelement) | boolean range_eq(anyrange,anyrange) | boolean range_ne(anyrange,anyrange) | boolean range_overlaps(anyrange,anyrange) | boolean range_contains(anyrange,anyrange) | boolean range_contained_by(anyrange,anyrange) | boolean range_adjacent(anyrange,anyrange) | boolean range_before(anyrange,anyrange) | boolean range_after(anyrange,anyrange) | boolean range_overleft(anyrange,anyrange) | boolean range_overright(anyrange,anyrange) | boolean range_union(anyrange,anyrange) | anyrange range_merge(anyrange,anyrange) | anyrange range_intersect(anyrange,anyrange) | anyrange range_minus(anyrange,anyrange) | anyrange range_cmp(anyrange,anyrange) | integer range_lt(anyrange,anyrange) | boolean range_le(anyrange,anyrange) | boolean range_ge(anyrange,anyrange) | boolean range_gt(anyrange,anyrange) | boolean range_gist_same(anyrange,anyrange,internal) | internal (45 rows) Now, there's no point in changing range_eq and the later entries in this table (the ones taking two anyrange's); our rather lame definition of anycompatiblerange means that we'd get no benefit from doing so. But I think there's a strong case for changing everything before range_eq. It'd be nice if something like SELECT array[1] < array[1.1]; would work the same way that "SELECT 1 < 1.1" would. I checked the other concern that I had about whether the more flexible polymorphic definitions would create any new ambiguities, and I don't see any problems with this list. As functions, none of these names are overloaded, except with different numbers of arguments so there's no ambiguity. Most of the array functions are also operators, but the competing operators do not take arrays, so it doesn't look like there's any issue on that side either. So I think we should be more ambitious and generalize all of these to use anycompatiblearray etc. regards, tom lane
ne 30. 8. 2020 v 23:59 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> This is nice example of usage of anycompatible type (that is consistent
> with other things in Postgres), but standard says something else.
> It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
> but Tom doesn't like this patch.
> I am more inclined to think so this feature should be implemented
> differently - there is no strong reason to go against standard or against
> the implementations of other databases (and increase the costs of porting).
> Now the implementation is limited, but allowed behaviour is 100% ANSI.
I don't particularly buy this argument. The case at hand is what to do
if we have, say,
select lag(integer_column, 1, 1.2) over ...
The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2". The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work. But
(1) I doubt that anybody actually writes such things;
(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise. It is not normally the case that
any information-losing cast would be applied silently within an
expression.
postgres=# create table foo(a int);
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1
postgres=# create table foo(a int default 1.1);
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)
It is maybe strange, but it is not an unusual pattern in SQL. I think it can be analogy with default clause
DECLARE a int DEFAULT 1.2;
The default value doesn't change a type of variable. This is maybe too stupid example. There can be other little bit more realistic
CREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...
DECLARE x int DEFAULT a;
BEGIN
...
I am afraid about performance - if default value can change type, then some other things can stop work - like using index.
For *this* case we don't speak about some operations between two independent operands or function arguments. We are speaking about
the value and about a *default* for the value.
So this deviation from spec doesn't bother me; we have much bigger ones.
ok, if it is acceptable for other people, I can accept it too - I understand well so it is a corner case and there is some consistency with other Postgres features.
Maybe this difference should be mentioned in documentation.
My concern with this patch is what I said upthread: I'm not sure that
we should be adjusting this behavior in a piecemeal fashion. I looked
through pg_proc to find all the functions that have more than one any*
argument, and found these:
oid | prorettype
-----------------------------------------------+------------
lag(anyelement,integer,anyelement) | anyelement
lead(anyelement,integer,anyelement) | anyelement
width_bucket(anyelement,anyarray) | integer
btarraycmp(anyarray,anyarray) | integer
array_eq(anyarray,anyarray) | boolean
array_ne(anyarray,anyarray) | boolean
array_lt(anyarray,anyarray) | boolean
array_gt(anyarray,anyarray) | boolean
array_le(anyarray,anyarray) | boolean
array_ge(anyarray,anyarray) | boolean
array_append(anyarray,anyelement) | anyarray
array_prepend(anyelement,anyarray) | anyarray
array_cat(anyarray,anyarray) | anyarray
array_larger(anyarray,anyarray) | anyarray
array_smaller(anyarray,anyarray) | anyarray
array_position(anyarray,anyelement) | integer
array_position(anyarray,anyelement,integer) | integer
array_positions(anyarray,anyelement) | integer[]
array_remove(anyarray,anyelement) | anyarray
array_replace(anyarray,anyelement,anyelement) | anyarray
arrayoverlap(anyarray,anyarray) | boolean
arraycontains(anyarray,anyarray) | boolean
arraycontained(anyarray,anyarray) | boolean
elem_contained_by_range(anyelement,anyrange) | boolean
range_contains_elem(anyrange,anyelement) | boolean
range_eq(anyrange,anyrange) | boolean
range_ne(anyrange,anyrange) | boolean
range_overlaps(anyrange,anyrange) | boolean
range_contains(anyrange,anyrange) | boolean
range_contained_by(anyrange,anyrange) | boolean
range_adjacent(anyrange,anyrange) | boolean
range_before(anyrange,anyrange) | boolean
range_after(anyrange,anyrange) | boolean
range_overleft(anyrange,anyrange) | boolean
range_overright(anyrange,anyrange) | boolean
range_union(anyrange,anyrange) | anyrange
range_merge(anyrange,anyrange) | anyrange
range_intersect(anyrange,anyrange) | anyrange
range_minus(anyrange,anyrange) | anyrange
range_cmp(anyrange,anyrange) | integer
range_lt(anyrange,anyrange) | boolean
range_le(anyrange,anyrange) | boolean
range_ge(anyrange,anyrange) | boolean
range_gt(anyrange,anyrange) | boolean
range_gist_same(anyrange,anyrange,internal) | internal
(45 rows)
Now, there's no point in changing range_eq and the later entries in this
table (the ones taking two anyrange's); our rather lame definition of
anycompatiblerange means that we'd get no benefit from doing so. But
I think there's a strong case for changing everything before range_eq.
It'd be nice if something like
SELECT array[1] < array[1.1];
would work the same way that "SELECT 1 < 1.1" would.
There it has sense without any discussion. But it is a little bit different than using the default value in the LAG function.
I checked the other concern that I had about whether the more flexible
polymorphic definitions would create any new ambiguities, and I don't
see any problems with this list. As functions, none of these names are
overloaded, except with different numbers of arguments so there's no
ambiguity. Most of the array functions are also operators, but the
competing operators do not take arrays, so it doesn't look like there's
any issue on that side either.
So I think we should be more ambitious and generalize all of these
to use anycompatiblearray etc.
+1
Pavel
regards, tom lane
po 31. 8. 2020 v 7:05 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
ne 30. 8. 2020 v 23:59 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:Pavel Stehule <pavel.stehule@gmail.com> writes:
> This is nice example of usage of anycompatible type (that is consistent
> with other things in Postgres), but standard says something else.
> It can be easily solved with https://commitfest.postgresql.org/28/2081/ -
> but Tom doesn't like this patch.
> I am more inclined to think so this feature should be implemented
> differently - there is no strong reason to go against standard or against
> the implementations of other databases (and increase the costs of porting).
> Now the implementation is limited, but allowed behaviour is 100% ANSI.
I don't particularly buy this argument. The case at hand is what to do
if we have, say,
select lag(integer_column, 1, 1.2) over ...
The proposed patch would result in the output being of type numeric,
and any rows using the default would show "1.2". The spec says that
the right thing is to return integer, and we should round the default
to "1" to make that work. But
(1) I doubt that anybody actually writes such things;
(2) For anyone who does write it, the spec's behavior fails to meet
the principle of least surprise. It is not normally the case that
any information-losing cast would be applied silently within an
expression.postgres=# create table foo(a int);
CREATE TABLE
postgres=# insert into foo values(1.1);
INSERT 0 1postgres=# create table foo(a int default 1.1);
CREATE TABLE
postgres=# insert into foo values(default);
INSERT 0 1
postgres=# select * from foo;
┌───┐
│ a │
╞═══╡
│ 1 │
└───┘
(1 row)It is maybe strange, but it is not an unusual pattern in SQL. I think it can be analogy with default clauseDECLARE a int DEFAULT 1.2;The default value doesn't change a type of variable. This is maybe too stupid example. There can be other little bit more realisticCREATE OR REPLACE FUNCTION foo(a numeric, b numeric, ...DECLARE x int DEFAULT a;BEGIN...I am afraid about performance - if default value can change type, then some other things can stop work - like using index.For *this* case we don't speak about some operations between two independent operands or function arguments. We are speaking aboutthe value and about a *default* for the value.
So this deviation from spec doesn't bother me; we have much bigger ones.ok, if it is acceptable for other people, I can accept it too - I understand well so it is a corner case and there is some consistency with other Postgres features.Maybe this difference should be mentioned in documentation.
In this case the optional argument is not "any" expression. It is the default value for some expression . In other cases we usually use forced explicit cast.
Unfortunately we do not have good tools for generic implementation of this situation. In other cases there the functions have special support in parser for this case (example xmltable)
I see few possibilities how to finish and close this issue:
1. use anycompatible type and add note to documentation so expression of optional argument can change a result type, and so this is Postgres specific - other databases and ANSI SQL disallow this.
It is the most simple solution, and it is good enough for this case, that is not extra important.
2. choose the correct type somewhere inside the parser - for these two functions.
3. introduce and implement some generic solution for this case - it can be implemented on C level via some function helper or on syntax level
CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval a%type) ...
"defval argname%type" means for caller's expression "CAST(defval AS typeof(argname))"
@3 can be a very interesting and useful feature, but it needs an agreement and harder work
@2 this is 100% correct solution without hard work (but I am not sure if there can be an agreement on this implementation)
@1 it is good enough for this issue without harder work and probably there we can find an agreement simply.
Regards
Pavel
Pavel Stehule <pavel.stehule@gmail.com> writes: > I see few possibilities how to finish and close this issue: > 1. use anycompatible type and add note to documentation so expression of > optional argument can change a result type, and so this is Postgres > specific - other databases and ANSI SQL disallow this. > It is the most simple solution, and it is good enough for this case, that > is not extra important. > 2. choose the correct type somewhere inside the parser - for these two > functions. > 3. introduce and implement some generic solution for this case - it can be > implemented on C level via some function helper or on syntax level > CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval > a%type) ... > "defval argname%type" means for caller's expression "CAST(defval AS > typeof(argname))" I continue to feel that the spec's definition of this is not so obviously right that we should jump through hoops to duplicate it. In fact, I don't even agree that we need a disclaimer in the docs saying that it's not quite the same. Only the most nitpicky language lawyers would ever even notice. In hopes of moving this along a bit, I experimented with converting the other functions I listed to use anycompatible. I soon found that touching any functions/operators that are listed in operator classes would open a can of worms far larger than I'd previously supposed. To maintain consistency, we'd have to propagate the datatype changes to *every* function/operator associated with those opfamilies --- many of which only have one any-foo input and thus aren't on my original list. (An example here is that extending btree array_ops will require changing the max(anyarray) and min(anyarray) aggregates too.) We'd then end up with a situation that would be rather confusing from a user's standpoint, in that it'd be quite un-obvious why some array functions use anyarray while other ones use anycompatiblearray. So I backed off to just changing the functions/operators that have no opclass connections, such as array_cat. Even that has some downsides --- for example, in the attached patch, it's necessary to change some polymorphism.sql examples that explicitly reference array_cat(anyarray). I wonder whether this change would break a significant number of user-defined aggregates or operators. (Note that I think we'd have to resist the temptation to fix that by letting CREATE AGGREGATE et al accept support functions that take anyarray/anycompatiblearray (etc) interchangeably. A lot of the security analysis that went into CVE-2020-14350 depended on the assumption that utility commands only do exact lookups of support functions. If we tried to be lax about this, we'd re-introduce the possibility for hostile capture of function references in extension scripts.) Another interesting issue, not seen in the attached but which came up while I was experimenting with the more aggressive patch, was this failure in the polymorphism test: select max(histogram_bounds) from pg_stats where tablename = 'pg_am'; -ERROR: cannot compare arrays of different element types +ERROR: function max(anyarray) does not exist That's because we don't accept pg_stats.histogram_bounds (of declared type anyarray) as a valid input for anycompatiblearray. I wonder if that isn't a bug we need to fix in the anycompatible patch, independently of anything else. We'd not be able to deduce an actual element type from such an input, but we already cannot do so when we match it to an anyarray parameter. Anyway, attached find 0001 - updates Vik's original patch to HEAD and tweaks the grammar in the docs a bit. 0002 - add-on patch to convert array_append, array_prepend, array_cat, array_position, array_positions, array_remove, array_replace, and width_bucket to use anycompatiblearray. I think 0001 is committable, but 0002 is just WIP since I didn't touch the docs. I'm slightly discouraged about whether 0002 is worth proceeding with. Any thoughts? regards, tom lane diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 461b748d89..14cb8e2ce6 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -19621,17 +19621,17 @@ SELECT count(*) FROM sometable; <indexterm> <primary>lag</primary> </indexterm> - <function>lag</function> ( <parameter>value</parameter> <type>anyelement</type> + <function>lag</function> ( <parameter>value</parameter> <type>anycompatible</type> <optional>, <parameter>offset</parameter> <type>integer</type> - <optional>, <parameter>default</parameter> <type>anyelement</type> </optional></optional> ) - <returnvalue>anyelement</returnvalue> + <optional>, <parameter>default</parameter> <type>anycompatible</type> </optional></optional> ) + <returnvalue>anycompatible</returnvalue> </para> <para> Returns <parameter>value</parameter> evaluated at the row that is <parameter>offset</parameter> rows before the current row within the partition; if there is no such row, instead returns <parameter>default</parameter> - (which must be of the same type as + (which must be of a type compatible with <parameter>value</parameter>). Both <parameter>offset</parameter> and <parameter>default</parameter> are evaluated @@ -19646,17 +19646,17 @@ SELECT count(*) FROM sometable; <indexterm> <primary>lead</primary> </indexterm> - <function>lead</function> ( <parameter>value</parameter> <type>anyelement</type> + <function>lead</function> ( <parameter>value</parameter> <type>anycompatible</type> <optional>, <parameter>offset</parameter> <type>integer</type> - <optional>, <parameter>default</parameter> <type>anyelement</type> </optional></optional> ) - <returnvalue>anyelement</returnvalue> + <optional>, <parameter>default</parameter> <type>anycompatible</type> </optional></optional> ) + <returnvalue>anycompatible</returnvalue> </para> <para> Returns <parameter>value</parameter> evaluated at the row that is <parameter>offset</parameter> rows after the current row within the partition; if there is no such row, instead returns <parameter>default</parameter> - (which must be of the same type as + (which must be of a type compatible with <parameter>value</parameter>). Both <parameter>offset</parameter> and <parameter>default</parameter> are evaluated diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index f48f5fb4d9..6e7832b0f4 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -9729,8 +9729,8 @@ proname => 'lag', prokind => 'w', prorettype => 'anyelement', proargtypes => 'anyelement int4', prosrc => 'window_lag_with_offset' }, { oid => '3108', descr => 'fetch the Nth preceding row value with default', - proname => 'lag', prokind => 'w', prorettype => 'anyelement', - proargtypes => 'anyelement int4 anyelement', + proname => 'lag', prokind => 'w', prorettype => 'anycompatible', + proargtypes => 'anycompatible int4 anycompatible', prosrc => 'window_lag_with_offset_and_default' }, { oid => '3109', descr => 'fetch the following row value', proname => 'lead', prokind => 'w', prorettype => 'anyelement', @@ -9739,8 +9739,8 @@ proname => 'lead', prokind => 'w', prorettype => 'anyelement', proargtypes => 'anyelement int4', prosrc => 'window_lead_with_offset' }, { oid => '3111', descr => 'fetch the Nth following row value with default', - proname => 'lead', prokind => 'w', prorettype => 'anyelement', - proargtypes => 'anyelement int4 anyelement', + proname => 'lead', prokind => 'w', prorettype => 'anycompatible', + proargtypes => 'anycompatible int4 anycompatible', prosrc => 'window_lead_with_offset_and_default' }, { oid => '3112', descr => 'fetch the first row value', proname => 'first_value', prokind => 'w', prorettype => 'anyelement', diff --git a/src/test/regress/expected/window.out b/src/test/regress/expected/window.out index 21c6cac491..19e2ac518a 100644 --- a/src/test/regress/expected/window.out +++ b/src/test/regress/expected/window.out @@ -300,6 +300,21 @@ SELECT lag(ten, four, 0) OVER (PARTITION BY four ORDER BY ten), ten, four FROM t 0 | 3 | 3 (10 rows) +SELECT lag(ten, four, 0.7) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY four,ten; + lag | ten | four +-----+-----+------ + 0 | 0 | 0 + 0 | 0 | 0 + 4 | 4 | 0 + 0.7 | 1 | 1 + 1 | 1 | 1 + 1 | 7 | 1 + 7 | 9 | 1 + 0.7 | 0 | 2 + 0.7 | 1 | 3 + 0.7 | 3 | 3 +(10 rows) + SELECT lead(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; lead | ten | four ------+-----+------ @@ -345,6 +360,21 @@ SELECT lead(ten * 2, 1, -1) OVER (PARTITION BY four ORDER BY ten), ten, four FRO -1 | 3 | 3 (10 rows) +SELECT lead(ten * 2, 1, -1.4) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY four,ten; + lead | ten | four +------+-----+------ + 0 | 0 | 0 + 8 | 0 | 0 + -1.4 | 4 | 0 + 2 | 1 | 1 + 14 | 1 | 1 + 18 | 7 | 1 + -1.4 | 9 | 1 + -1.4 | 0 | 2 + 6 | 1 | 3 + -1.4 | 3 | 3 +(10 rows) + SELECT first_value(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; first_value | ten | four -------------+-----+------ diff --git a/src/test/regress/sql/window.sql b/src/test/regress/sql/window.sql index 9485aebce8..eae5fa6017 100644 --- a/src/test/regress/sql/window.sql +++ b/src/test/regress/sql/window.sql @@ -63,12 +63,14 @@ SELECT lag(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHER SELECT lag(ten, four) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT lag(ten, four, 0) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; +SELECT lag(ten, four, 0.7) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY four,ten; SELECT lead(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT lead(ten * 2, 1) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; SELECT lead(ten * 2, 1, -1) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; +SELECT lead(ten * 2, 1, -1.4) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10 ORDER BY four,ten; SELECT first_value(ten) OVER (PARTITION BY four ORDER BY ten), ten, four FROM tenk1 WHERE unique2 < 10; diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat index 7cc812adda..c3377ebe46 100644 --- a/src/include/catalog/pg_operator.dat +++ b/src/include/catalog/pg_operator.dat @@ -168,14 +168,14 @@ oprcode => 'textnename', oprrest => 'neqsel', oprjoin => 'neqjoinsel' }, { oid => '349', descr => 'append element onto end of array', - oprname => '||', oprleft => 'anyarray', oprright => 'anyelement', - oprresult => 'anyarray', oprcode => 'array_append' }, + oprname => '||', oprleft => 'anycompatiblearray', oprright => 'anycompatible', + oprresult => 'anycompatiblearray', oprcode => 'array_append' }, { oid => '374', descr => 'prepend element onto front of array', - oprname => '||', oprleft => 'anyelement', oprright => 'anyarray', - oprresult => 'anyarray', oprcode => 'array_prepend' }, + oprname => '||', oprleft => 'anycompatible', oprright => 'anycompatiblearray', + oprresult => 'anycompatiblearray', oprcode => 'array_prepend' }, { oid => '375', descr => 'concatenate', - oprname => '||', oprleft => 'anyarray', oprright => 'anyarray', - oprresult => 'anyarray', oprcode => 'array_cat' }, + oprname => '||', oprleft => 'anycompatiblearray', oprright => 'anycompatiblearray', + oprresult => 'anycompatiblearray', oprcode => 'array_cat' }, { oid => '352', descr => 'equal', oprname => '=', oprcanhash => 't', oprleft => 'xid', oprright => 'xid', diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 6e7832b0f4..b239b5d42d 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -1549,14 +1549,14 @@ proname => 'cardinality', prorettype => 'int4', proargtypes => 'anyarray', prosrc => 'array_cardinality' }, { oid => '378', descr => 'append element onto end of array', - proname => 'array_append', proisstrict => 'f', prorettype => 'anyarray', - proargtypes => 'anyarray anyelement', prosrc => 'array_append' }, + proname => 'array_append', proisstrict => 'f', prorettype => 'anycompatiblearray', + proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_append' }, { oid => '379', descr => 'prepend element onto front of array', - proname => 'array_prepend', proisstrict => 'f', prorettype => 'anyarray', - proargtypes => 'anyelement anyarray', prosrc => 'array_prepend' }, + proname => 'array_prepend', proisstrict => 'f', prorettype => 'anycompatiblearray', + proargtypes => 'anycompatible anycompatiblearray', prosrc => 'array_prepend' }, { oid => '383', - proname => 'array_cat', proisstrict => 'f', prorettype => 'anyarray', - proargtypes => 'anyarray anyarray', prosrc => 'array_cat' }, + proname => 'array_cat', proisstrict => 'f', prorettype => 'anycompatiblearray', + proargtypes => 'anycompatiblearray anycompatiblearray', prosrc => 'array_cat' }, { oid => '394', descr => 'split delimited text', proname => 'string_to_array', proisstrict => 'f', prorettype => '_text', proargtypes => 'text text', prosrc => 'text_to_array' }, @@ -1588,15 +1588,15 @@ proargtypes => 'anyarray anyarray', prosrc => 'array_smaller' }, { oid => '3277', descr => 'returns an offset of value in array', proname => 'array_position', proisstrict => 'f', prorettype => 'int4', - proargtypes => 'anyarray anyelement', prosrc => 'array_position' }, + proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_position' }, { oid => '3278', descr => 'returns an offset of value in array with start index', proname => 'array_position', proisstrict => 'f', prorettype => 'int4', - proargtypes => 'anyarray anyelement int4', prosrc => 'array_position_start' }, + proargtypes => 'anycompatiblearray anycompatible int4', prosrc => 'array_position_start' }, { oid => '3279', descr => 'returns an array of offsets of some value in array', proname => 'array_positions', proisstrict => 'f', prorettype => '_int4', - proargtypes => 'anyarray anyelement', prosrc => 'array_positions' }, + proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_positions' }, { oid => '1191', descr => 'array subscripts generator', proname => 'generate_subscripts', prorows => '1000', proretset => 't', prorettype => 'int4', proargtypes => 'anyarray int4 bool', @@ -1621,11 +1621,11 @@ proargtypes => 'internal', prosrc => 'array_unnest_support' }, { oid => '3167', descr => 'remove any occurrences of an element from an array', - proname => 'array_remove', proisstrict => 'f', prorettype => 'anyarray', - proargtypes => 'anyarray anyelement', prosrc => 'array_remove' }, + proname => 'array_remove', proisstrict => 'f', prorettype => 'anycompatiblearray', + proargtypes => 'anycompatiblearray anycompatible', prosrc => 'array_remove' }, { oid => '3168', descr => 'replace any occurrences of an element in an array', - proname => 'array_replace', proisstrict => 'f', prorettype => 'anyarray', - proargtypes => 'anyarray anyelement anyelement', prosrc => 'array_replace' }, + proname => 'array_replace', proisstrict => 'f', prorettype => 'anycompatiblearray', + proargtypes => 'anycompatiblearray anycompatible anycompatible', prosrc => 'array_replace' }, { oid => '2333', descr => 'aggregate transition function', proname => 'array_agg_transfn', proisstrict => 'f', prorettype => 'internal', proargtypes => 'internal anynonarray', prosrc => 'array_agg_transfn' }, @@ -1651,7 +1651,8 @@ { oid => '3218', descr => 'bucket number of operand given a sorted array of bucket lower bounds', proname => 'width_bucket', prorettype => 'int4', - proargtypes => 'anyelement anyarray', prosrc => 'width_bucket_array' }, + proargtypes => 'anycompatible anycompatiblearray', + prosrc => 'width_bucket_array' }, { oid => '3816', descr => 'array typanalyze', proname => 'array_typanalyze', provolatile => 's', prorettype => 'bool', proargtypes => 'internal', prosrc => 'array_typanalyze' }, diff --git a/src/test/regress/expected/polymorphism.out b/src/test/regress/expected/polymorphism.out index 1ff40764d9..2c3bb0a60b 100644 --- a/src/test/regress/expected/polymorphism.out +++ b/src/test/regress/expected/polymorphism.out @@ -729,10 +729,10 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl; (5 rows) -- another sort of polymorphic aggregate -CREATE AGGREGATE array_cat_accum (anyarray) +CREATE AGGREGATE array_cat_accum (anycompatiblearray) ( sfunc = array_cat, - stype = anyarray, + stype = anycompatiblearray, initcond = '{}' ); SELECT array_cat_accum(i) @@ -786,16 +786,16 @@ create aggregate build_group(int8, integer) ( STYPE = int8[] ); -- check proper resolution of data types for polymorphic transfn/finalfn -create function first_el(anyarray) returns anyelement as +create function first_el(anycompatiblearray) returns anycompatible as 'select $1[1]' language sql strict immutable; create aggregate first_el_agg_f8(float8) ( SFUNC = array_append, STYPE = float8[], FINALFUNC = first_el ); -create aggregate first_el_agg_any(anyelement) ( +create aggregate first_el_agg_any(anycompatible) ( SFUNC = array_append, - STYPE = anyarray, + STYPE = anycompatiblearray, FINALFUNC = first_el ); select first_el_agg_f8(x::float8) from generate_series(1,10) x; diff --git a/src/test/regress/sql/polymorphism.sql b/src/test/regress/sql/polymorphism.sql index e5222f1f81..70a21c8978 100644 --- a/src/test/regress/sql/polymorphism.sql +++ b/src/test/regress/sql/polymorphism.sql @@ -498,10 +498,10 @@ select q2, sql_if(q2 > 0, q2, q2 + 1) from int8_tbl; -- another sort of polymorphic aggregate -CREATE AGGREGATE array_cat_accum (anyarray) +CREATE AGGREGATE array_cat_accum (anycompatiblearray) ( sfunc = array_cat, - stype = anyarray, + stype = anycompatiblearray, initcond = '{}' ); @@ -549,7 +549,7 @@ create aggregate build_group(int8, integer) ( -- check proper resolution of data types for polymorphic transfn/finalfn -create function first_el(anyarray) returns anyelement as +create function first_el(anycompatiblearray) returns anycompatible as 'select $1[1]' language sql strict immutable; create aggregate first_el_agg_f8(float8) ( @@ -558,9 +558,9 @@ create aggregate first_el_agg_f8(float8) ( FINALFUNC = first_el ); -create aggregate first_el_agg_any(anyelement) ( +create aggregate first_el_agg_any(anycompatible) ( SFUNC = array_append, - STYPE = anyarray, + STYPE = anycompatiblearray, FINALFUNC = first_el );
út 22. 9. 2020 v 2:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I see few possibilities how to finish and close this issue:
> 1. use anycompatible type and add note to documentation so expression of
> optional argument can change a result type, and so this is Postgres
> specific - other databases and ANSI SQL disallow this.
> It is the most simple solution, and it is good enough for this case, that
> is not extra important.
> 2. choose the correct type somewhere inside the parser - for these two
> functions.
> 3. introduce and implement some generic solution for this case - it can be
> implemented on C level via some function helper or on syntax level
> CREATE OR REPLACE FUNCTION lag(a anyelement, offset int, default defval
> a%type) ...
> "defval argname%type" means for caller's expression "CAST(defval AS
> typeof(argname))"
I continue to feel that the spec's definition of this is not so
obviously right that we should jump through hoops to duplicate it.
In fact, I don't even agree that we need a disclaimer in the docs
saying that it's not quite the same. Only the most nitpicky
language lawyers would ever even notice.
In hopes of moving this along a bit, I experimented with converting
the other functions I listed to use anycompatible. I soon found that
touching any functions/operators that are listed in operator classes
would open a can of worms far larger than I'd previously supposed.
To maintain consistency, we'd have to propagate the datatype changes
to *every* function/operator associated with those opfamilies --- many
of which only have one any-foo input and thus aren't on my original
list. (An example here is that extending btree array_ops will require
changing the max(anyarray) and min(anyarray) aggregates too.) We'd
then end up with a situation that would be rather confusing from a
user's standpoint, in that it'd be quite un-obvious why some array
functions use anyarray while other ones use anycompatiblearray.
So I backed off to just changing the functions/operators that have
no opclass connections, such as array_cat. Even that has some
downsides --- for example, in the attached patch, it's necessary
to change some polymorphism.sql examples that explicitly reference
array_cat(anyarray). I wonder whether this change would break a
significant number of user-defined aggregates or operators.
(Note that I think we'd have to resist the temptation to fix that
by letting CREATE AGGREGATE et al accept support functions that
take anyarray/anycompatiblearray (etc) interchangeably. A lot of
the security analysis that went into CVE-2020-14350 depended on
the assumption that utility commands only do exact lookups of
support functions. If we tried to be lax about this, we'd
re-introduce the possibility for hostile capture of function
references in extension scripts.)
Another interesting issue, not seen in the attached but which
came up while I was experimenting with the more aggressive patch,
was this failure in the polymorphism test:
select max(histogram_bounds) from pg_stats where tablename = 'pg_am';
-ERROR: cannot compare arrays of different element types
+ERROR: function max(anyarray) does not exist
That's because we don't accept pg_stats.histogram_bounds (of
declared type anyarray) as a valid input for anycompatiblearray.
I wonder if that isn't a bug we need to fix in the anycompatible
patch, independently of anything else. We'd not be able to deduce
an actual element type from such an input, but we already cannot
do so when we match it to an anyarray parameter.
Anyway, attached find
0001 - updates Vik's original patch to HEAD and tweaks the
grammar in the docs a bit.
0002 - add-on patch to convert array_append, array_prepend,
array_cat, array_position, array_positions, array_remove,
array_replace, and width_bucket to use anycompatiblearray.
I think 0001 is committable, but 0002 is just WIP since
I didn't touch the docs. I'm slightly discouraged about
whether 0002 is worth proceeding with. Any thoughts?
I think so 0002 has sense - more than doc I miss related regress tests, but it is partially covered by anycompatible tests
Anyway I tested both patches and there is not problem with compilation, building doc, and make check-world passed
I'll mark this patch as ready for committer
Best regards
Pavel
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > út 22. 9. 2020 v 2:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: >> Anyway, attached find >> 0001 - updates Vik's original patch to HEAD and tweaks the >> grammar in the docs a bit. >> 0002 - add-on patch to convert array_append, array_prepend, >> array_cat, array_position, array_positions, array_remove, >> array_replace, and width_bucket to use anycompatiblearray. >> I think 0001 is committable, but 0002 is just WIP since >> I didn't touch the docs. I'm slightly discouraged about >> whether 0002 is worth proceeding with. Any thoughts? > I think so 0002 has sense - more than doc I miss related regress tests, but > it is partially covered by anycompatible tests I didn't see any need for particularly exhaustive testing, but I did add one new test for an operator and one for a function. Pushed with that and the necessary docs work. regards, tom lane
st 4. 11. 2020 v 22:12 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> út 22. 9. 2020 v 2:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> Anyway, attached find
>> 0001 - updates Vik's original patch to HEAD and tweaks the
>> grammar in the docs a bit.
>> 0002 - add-on patch to convert array_append, array_prepend,
>> array_cat, array_position, array_positions, array_remove,
>> array_replace, and width_bucket to use anycompatiblearray.
>> I think 0001 is committable, but 0002 is just WIP since
>> I didn't touch the docs. I'm slightly discouraged about
>> whether 0002 is worth proceeding with. Any thoughts?
> I think so 0002 has sense - more than doc I miss related regress tests, but
> it is partially covered by anycompatible tests
I didn't see any need for particularly exhaustive testing, but
I did add one new test for an operator and one for a function.
Pushed with that and the necessary docs work.
ok, Thank you
Pavel
regards, tom lane