Thread: Implicit casts with generic arrays
I've looked into cutting back on the implicit casts to text, which exposed the following little gem. The expressions 'abc' || 34 34 || 'abc' would no longer work, with the following error message: ERROR: 22P02: array value must start with "{" or dimension information That's because the best matches are now respectively anyarray || anyelement anyelement || anyarray Now either this is just too bad and users of a system with reduced casts to text will have to live with this odd error message, or coercing any old unknown constant to anyarray isn't such a good idea. Comments? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > I've looked into cutting back on the implicit casts to text, which > exposed the following little gem. > The expressions > 'abc' || 34 > 34 || 'abc' > would no longer work, with the following error message: > ERROR: 22P02: array value must start with "{" or dimension information Hm, that's annoying. Not that the expressions fail --- we want them to --- but that the error message is so unhelpful. Since ANYARRAY is already special to the type system, I don't have a problem with inserting some special case to prevent this, but I'm not sure what the special case should be. Is it too klugy to say "don't implicitly cast unknown to anyarray unless the literal's value starts with { or ["? We've never made casting decisions depend on the contents of strings before, and I'm really loath to make 'em do so now. Seems basically we'd want to not cast unknown to anyarray unless there is some additional bit of context suggesting that that's the right thing. But what should that extra requirement be? Can we go as far as not doing this cast implicitly at all? regards, tom lane
Am Dienstag, 27. Februar 2007 19:50 schrieb Tom Lane: > Seems basically we'd want to not cast unknown to anyarray unless there > is some additional bit of context suggesting that that's the right thing. > But what should that extra requirement be? Can we go as far as not > doing this cast implicitly at all? We could say that unknown is not taken as anyarray input if the entire function/operator argument list consists of anyelement or anyarray. But that might be even harder to comprehend. With the ARRAY[...] syntax available, converting unknown to anyarray might be altogether unnecessary. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Awhile back, I wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> I've looked into cutting back on the implicit casts to text, which >> exposed the following little gem. >> The expressions >> 'abc' || 34 >> 34 || 'abc' >> would no longer work, with the following error message: >> ERROR: 22P02: array value must start with "{" or dimension information > Hm, that's annoying. Not that the expressions fail --- we want them to > --- but that the error message is so unhelpful. I've looked into this more closely. The problem basically is that the parser sees these alternatives for binary || operators: select oid,oid::regoperator,oprcode from pg_operator where oprname = '||';oid | oid | oprcode ------+-------------------------+--------------- 349 | ||(anyarray,anyelement) | array_append 374 | ||(anyelement,anyarray)| array_prepend 375 | ||(anyarray,anyarray) | array_cat 654 | ||(text,text) | textcat1797| ||(bit,bit) | bitcat2018 | ||(bytea,bytea) | byteacat (6 rows) If there is no implicit cast from int to text, then operator 349 is the *only* candidate that is not immediately eliminated by the lack of any way to cast an integer 34 to its right argument type. So as far as the parser is concerned there is no ambiguity. If we hack things to prevent matching unknown to anyarray, as was suggested in the previous discussion, we'll get "operator does not exist: "unknown" || integer". Which is better than the 22P02 error, but still not great. It furthermore seems that the two operators anyarray || anyelement and anyelement || anyarray are really the only cases where an undesirable match to anyarray might occur. The other operators that take anyarray take it on both sides, which means that they'd not be preferred unless the other operand was discernibly an array. I don't think we want a solution that causes "knownarraycolumn = '{1,2,3}'" to start failing. That argument is even more compelling on the function side, because for instance there isn't a lot of doubt about the user's intent if he writes "array_append('{1,2,3}', 34)". So after reflecting on all that, it doesn't seem like a good idea to hack the type-coercion code to discriminate against matching unknown to anyarray. It looks to me like we have a very narrow problem and we should tailor a very narrow solution. What I am currently thinking we should do is make oper() specifically test for the case of operator 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input, and throw a custom error message hinting that the other operand needs to be cast to text. In the long run maybe we should choose some other name for the array_append and array_prepend operators to avoid the confusion with concatenation. It seems to me that "concatenation" normally implies "stringing together similar objects", which these two operators definitely don't do, and so you could argue that || was a bad name for them from the get-go. But compatibility worries would mean we couldn't eliminate the old names for quite a long time, so maybe it's too late for that. Comments? regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > So after reflecting on all that, it doesn't seem like a good idea to > hack the type-coercion code to discriminate against matching unknown > to anyarray. It looks to me like we have a very narrow problem and > we should tailor a very narrow solution. What I am currently thinking > we should do is make oper() specifically test for the case of operator > 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input, > and throw a custom error message hinting that the other operand > needs to be cast to text. Wouldn't that mean that 'foo'||'bar' would *still* fail? It really seems to me that at some point down the line we're going to cave and admit that users do expect 'foo' to be a string first and cast to other types only if the context requires it. That would mean we should be considering matching "unknown" as text first without casting and only if that fails looking for other types. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
I wrote: > It looks to me like we have a very narrow problem and > we should tailor a very narrow solution. What I am currently thinking > we should do is make oper() specifically test for the case of operator > 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input, > and throw a custom error message hinting that the other operand > needs to be cast to text. I've been experimenting with another solution, which is to not add any weird error cases but instead add operators that will capture the problem cases back away from the anyelement||anyarray operators. My current prototype is create function catany(text, anyelement) returns text as $$ select $1 || $2::text $$ language sql; create function catany(anyelement, text) returns text as $$ select $1::text || $2 $$ language sql; create operator || (procedure = catany, leftarg = text, rightarg = anyelement); create operator || (procedure = catany, leftarg = anyelement, rightarg = text); which seems to mostly do the "right" thing. This approach would have one nice property, namely eliminating the single biggest point of push-back we are likely to get from removing the implicit casts to text. I have no doubt that practically the only reasonable use-case for that behavior was to let people concatenate stuff without being too picky about casts, and this mostly preserves that ability. It's not perfect, because it only fixes cases in which at least one operand is either unknown or implicitly coercible to text. But in practice I think that would cover 99% of cases, since typical usages tend to alternate literals and data values. Thoughts? Is this too klugy for words? regards, tom lane
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> we should do is make oper() specifically test for the case of operator >> 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input, >> and throw a custom error message hinting that the other operand >> needs to be cast to text. > Wouldn't that mean that 'foo'||'bar' would *still* fail? No, because that would preferentially match to text || text, it being a preferred-type case. The current behavior with the implicit casts removed is template1=# select 'abc' || '34';?column? ----------abc34 (1 row) ie, this was matched to the text || text operator; template1=# select 'abc' || 34; ERROR: array value must start with "{" or dimension information ie, this was matched to the anyarray || anyelement operator --- because it clearly can't match text || text. > It really seems to me that at some point down the line we're going to > cave and admit that users do expect 'foo' to be a string first and > cast to other types only if the context requires it. We already do that to some extent, as shown above; and it's got approximately nothing to do with this problem anyway. The cases where we have got a problem are where the other argument is clearly *not* text. But having said that, I'm currently leaning to the other solution of generalizing the || operator (and only that operator) instead of fooling with the type resolution rules. regards, tom lane
I wrote: > I've been experimenting with another solution, which is to not add any > weird error cases but instead add operators that will capture the > problem cases back away from the anyelement||anyarray operators. > My current prototype is > create function catany(text, anyelement) returns text as > $$ select $1 || $2::text $$ language sql; > create function catany(anyelement, text) returns text as > $$ select $1::text || $2 $$ language sql; > create operator || (procedure = catany, leftarg = text, rightarg = anyelement); > create operator || (procedure = catany, leftarg = anyelement, rightarg = text); > which seems to mostly do the "right" thing. This approach would have > one nice property, namely eliminating the single biggest point of > push-back we are likely to get from removing the implicit casts to text. I've been testing this approach more, and finding that it really captures a bit too much: some cases that you'd prefer went to anyelement||anyarray will be captured by the text||anyelement operator. For example in 8.2 this is mapped to array_prepend: regression=# select 'x'::text || array['aa','bb','cc']; ?column? --------------{x,aa,bb,cc} (1 row) but with the experimental code you get textcat: catany=# select 'x'::text || array['aa','bb','cc']; ?column? -------------x{aa,bb,cc} (1 row) Basically the textcat operators will capture any case where the scalar side is implicitly coercible to text, because the type resolution rules will prefer that. There are some hacks we could make to make this less probable (eg, declare the capturing operators as taking varchar instead of text) but I can't find any complete solution short of changing the resolution rules themselves. Which I'm loath to do since it might have unexpected side-effects. What I would like to propose is that we deprecate use of || as the operator name for array_prepend and array_append, and invent new recommended names for them. As I said earlier, these operators aren't exactly concatenation in any normal sense anyway, since they don't treat their operands symmetrically. My first thought is to suggest using the shifting symbols: anyelement >> anyarray anyarray << anyelement but perhaps someone will have a better suggestion. If we do that, then we have a solution for anyone whose array prepend or append operator is unexpectedly captured by text concatenation: use the new names instead. Now this is only going to seem like a good idea if you agree that we should have some capturing operators like these. But if we don't, I think we are going to get a lot of push-back from people whose concatenations of random datatypes suddenly stopped working. Essentially this proposal is putting the compatibility hit of tightening the implicit cast rules onto people who are using array append/prepend instead of people who are using concatenation without explicit casts. I think there are a lot fewer of the former than the latter. Comments? regards, tom lane
Tom Lane wrote: > >>> The expressions >>> 'abc' || 34 >>> 34 || 'abc' >>> would no longer work, with the following error message: >>> ERROR: 22P02: array value must start with "{" or dimension information > >> Hm, that's annoying. Not that the expressions fail --- we want them to >> --- but that the error message is so unhelpful. indeed > In the long run maybe we should choose some other name for the > array_append and array_prepend operators to avoid the confusion with > concatenation. It seems to me that "concatenation" normally implies > "stringing together similar objects", which these two operators > definitely don't do, and so you could argue that || was a bad name > for them from the get-go. But compatibility worries would mean we > couldn't eliminate the old names for quite a long time, so maybe > it's too late for that. > > Comments? Originally I saw this situation as as requiring the concatenation operator per SQL 2003: <array value expression> ::= <array concatenation> | <array primary> <array concatenation> ::= <array value expression 1> <concatenation operator> <array primary> <concatenation operator> ::= || <array value expression 1> ::= <array value expression> <array primary> ::= <value expression primary> <value expression primary> ::= <parenthesized value expression> | <nonparenthesized value expression primary> <parenthesized value expression> ::= <left paren> <value expression> <right paren> <value expression> ::= <common value expression> | <boolean value expression> | <row value expression> <common value expression> ::= <numeric value expression> | <string value expression> | <datetime value expression> | <interval value expression> | <user-defined type value expression> | <reference value expression> | <collection valueexpression> <nonparenthesized value expression primary> ::= <unsigned value specification> | <column reference> | <set functionspecification> | <window function> | <scalar subquery> | <case expression> | <cast specification> | <field reference> | <subtype treatment> | <method invocation> | <static method invocation> | <new specification> | <attributeor method reference> | <reference resolution> | <collection value constructor> | <array element reference> | <multiset element reference> | <routine invocation> | <next value expression> <collection value constructor> ::= <array value constructor> | <multiset value constructor> <unsigned value specification> ::= <unsigned literal> | <general value specification> <unsigned literal> ::= <unsigned numeric literal> | <general literal> <general literal> ::= <character string literal> | <national character string literal> | <Unicode character string literal> | <binary string literal> | <datetime literal> | <interval literal> | <boolean literal> What I can't decide now is whether all the above means the anyelement in this operation ought to be in parens or not. It seems to me that the anyelement can be any literal _except_ a <signed numeric literal>. In that case the spec seems to require parenthesis. Joe
> For example in 8.2 this is mapped to array_prepend: > > regression=# select 'x'::text || array['aa','bb','cc']; > ?column? > -------------- > {x,aa,bb,cc} > (1 row) > > but with the experimental code you get textcat: > > catany=# select 'x'::text || array['aa','bb','cc']; > ?column? > ------------- > x{aa,bb,cc} > (1 row) This is what I would have expected || to give, and not what 8.2 does. So disregarding the rest of the argument I think that array_[pre|ap]pend should have other operators. Andreas
> > > For example in 8.2 this is mapped to array_prepend: > > > > regression=# select 'x'::text || array['aa','bb','cc']; > > ?column? > > -------------- > > {x,aa,bb,cc} > > (1 row) > > > > but with the experimental code you get textcat: > > > > catany=# select 'x'::text || array['aa','bb','cc']; > > ?column? > > ------------- > > x{aa,bb,cc} > > (1 row) > > This is what I would have expected || to give, and not what 8.2 does. > So disregarding the rest of the argument I think that array_[pre|ap]pend > should have other operators. > > Andreas > I thing so current behave is more intuitive and practical. Result x{aa,bb,cc} is nonsens. Array concation have to have higher priority than text concation. Pavel
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> In the long run maybe we should choose some other name for the >> array_append and array_prepend operators to avoid the confusion with >> concatenation. It seems to me that "concatenation" normally implies >> "stringing together similar objects", which these two operators >> definitely don't do, and so you could argue that || was a bad name >> for them from the get-go. > Originally I saw this situation as as requiring the concatenation > operator per SQL 2003: Maybe I am missing something, but the only such construct I see in SQL2003 is concatenation of arrays of equal rank. There is nothing corresponding to array_prepend or array_append. I do have a plan B if people don't want to rename the operators, though. It looks to me like we could eliminate the conflict if we invented a new polymorphic pseudotype called "anynonarray" or some such, which would act like anyelement *except* it would not match an array. Then, declaring the capturing operators as text||anynonarray and anynonarray||text would prevent them from matching any case where either side was known to be an array type. But they would (I think) still win out in cases such as scalar || 'unknown literal'. The end result would be that concatenations involving a known-array value would be array concatenation, but you could force them to be text concatenation, if that's what you wanted, by explicitly casting the array value(s) to text. I was a bit hesitant to propose this since I couldn't immediately think of any other use-case for such a pseudotype. It's not a huge amount of added code (cf. anyenum) but it's definitely a visible wart on the type system. Comments? regards, tom lane
Tom Lane wrote: > I do have a plan B if people don't want to rename the operators, though. > It looks to me like we could eliminate the conflict if we invented a new > polymorphic pseudotype called "anynonarray" or some such, which would > act like anyelement *except* it would not match an array. Then, > declaring the capturing operators as text||anynonarray and > anynonarray||text would prevent them from matching any case where either > side was known to be an array type. But they would (I think) still win > out in cases such as scalar || 'unknown literal'. The end result would > be that concatenations involving a known-array value would be array > concatenation, but you could force them to be text concatenation, if > that's what you wanted, by explicitly casting the array value(s) to text. > > I was a bit hesitant to propose this since I couldn't immediately think > of any other use-case for such a pseudotype. It's not a huge amount of > added code (cf. anyenum) but it's definitely a visible wart on the type > system. Comments? On the contrary, I would think that it fits nicely to "close the loop" on the anyarray/anyelement feature set. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> I do have a plan B if people don't want to rename the operators, though. >> It looks to me like we could eliminate the conflict if we invented a new >> polymorphic pseudotype called "anynonarray" or some such, which would >> act like anyelement *except* it would not match an array. >> ... >> I was a bit hesitant to propose this since I couldn't immediately think >> of any other use-case for such a pseudotype. It's not a huge amount of >> added code (cf. anyenum) but it's definitely a visible wart on the type >> system. Comments? > On the contrary, I would think that it fits nicely to "close the loop" > on the anyarray/anyelement feature set. OK, I'll go code this up and verify that it behaves like I think it will... regards, tom lane
Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> Tom Lane wrote: >>> In the long run maybe we should choose some other name for the >>> array_append and array_prepend operators to avoid the confusion with >>> concatenation. It seems to me that "concatenation" normally implies >>> "stringing together similar objects", which these two operators >>> definitely don't do, and so you could argue that || was a bad name >>> for them from the get-go. > >> Originally I saw this situation as as requiring the concatenation >> operator per SQL 2003: > > Maybe I am missing something, but the only such construct I see in > SQL2003 is concatenation of arrays of equal rank. There is nothing > corresponding to array_prepend or array_append. Well, I've never claimed to be particularly good at interpreting the SQL spec, but as an example... <array concatenation> ::= <array value expression 1> || <array primary> <array primary> ::= <value expression primary> ::= <nonparenthesized value expression primary> ::= <unsigned value specification> ::= <unsigned literal> ::= <unsigned numeric literal> Doesn't this mean that array concatenation should include things like: <array value expression> || <unsigned numeric literal> e.g. ARRAY[1,2,3] || 42 ? > I do have a plan B if people don't want to rename the operators, though. > It looks to me like we could eliminate the conflict if we invented a new > polymorphic pseudotype called "anynonarray" or some such, which would > act like anyelement *except* it would not match an array. Then, > declaring the capturing operators as text||anynonarray and > anynonarray||text would prevent them from matching any case where either > side was known to be an array type. But they would (I think) still win > out in cases such as scalar || 'unknown literal'. The end result would > be that concatenations involving a known-array value would be array > concatenation, but you could force them to be text concatenation, if > that's what you wanted, by explicitly casting the array value(s) to text. That sounds reasonable to me. Joe
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> Maybe I am missing something, but the only such construct I see in >> SQL2003 is concatenation of arrays of equal rank. There is nothing >> corresponding to array_prepend or array_append. > Well, I've never claimed to be particularly good at interpreting the SQL > spec, but as an example... > <array concatenation> ::= > <array value expression 1> || <array primary> > <array primary> ::= > <value expression primary> ::= > <nonparenthesized value expression primary> ::= > <unsigned value specification> ::= > <unsigned literal> ::= > <unsigned numeric literal> > Doesn't this mean that array concatenation should include things like: > <array value expression> || <unsigned numeric literal> No, because the first syntax rule for that is 1) The declared type of <value expression primary> shall be an array type. However, assuming that the anynonarray idea works out, we can do that and not worry about touching the array operators. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> It looks to me like we could eliminate the conflict if we invented a new >> polymorphic pseudotype called "anynonarray" or some such, which would >> act like anyelement *except* it would not match an array. > ... > On the contrary, I would think that it fits nicely to "close the loop" > on the anyarray/anyelement feature set. OK, I hacked this together and it seems to behave at least as reasonably as 8.2 does. "8.3" here means HEAD + anynonarray + capturing concat operators. I used integer as an example of a type for which 8.2 has an implicit cast to text, and point as an example of a type for which it doesn't: Expression 8.3 8.2 text || text text concat text concat text || 'unknown' text concat text concat text || text[] array concat array concat text || non-text array error error text || non-text scalar text concat text concat [1] integer || integer error text concat integer || 'unknown' text concat text concat integer || integer[] array concat array concat integer || non-integer array error error integer || non-integer scalar error text concat [1] point || point error error point || 'unknown' text concat 'array value must start ...' point || point[] array concat array concat point || non-point array error error point || non-point scalar error error text[] || text[] array concat array concat text[] || 'unknown' error error text[] || non-text array error error text[] || non-text scalar error error [1] for types for which 8.2 has an implicit cast to text, else it fails. These are:bigintsmallintintegeroidrealdouble precisionnumericdatetime without time zonetime with time zonetimestamp withouttime zonetimestamp with time zoneinterval (I was interested to find that there were cases where 8.2 would come out with the dreaded "array value must start with "{" or dimension information" error.) I think that the above chart is pretty defensible; the only cases that fail now where they worked before are concatenations where neither side is either text or an unadorned string literal. Frankly, I think this: catany=# select 3 || 0.4; ERROR: operator does not exist: integer || numeric is way preferable to this: regression=# select 3 || 0.4;?column? ----------30.4 (1 row) which is what 8.2 does --- if you want text concatenation you should make at least *some* effort to signal that, like casting one side to text or at least quoting it. Run-together concatenations like catany=# select 'sin(' || 2 || ')';?column? ----------sin(2) (1 row) will work as long as at least one of the first two concatenated items is textual or an unadorned string literal. Barring objections I'll clean this up and commit it. regards, tom lane