Thread: Implicit casts with generic arrays

Implicit casts with generic arrays

From

Peter Eisentraut

Date:

27 February 2007, 14:03:07

I've looked into cutting back on the implicit casts to text, which 
exposed the following little gem.

The expressions

'abc' || 34
34 || 'abc'

would no longer work, with the following error message:

ERROR:  22P02: array value must start with "{" or dimension information

That's because the best matches are now respectively

anyarray || anyelement
anyelement || anyarray

Now either this is just too bad and users of a system with reduced casts 
to text will have to live with this odd error message, or coercing any 
old unknown constant to anyarray isn't such a good idea.

Comments?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Implicit casts with generic arrays

From

Tom Lane

Date:

27 February 2007, 14:50:57

Peter Eisentraut <peter_e@gmx.net> writes:
> I've looked into cutting back on the implicit casts to text, which 
> exposed the following little gem.

> The expressions
> 'abc' || 34
> 34 || 'abc'
> would no longer work, with the following error message:
> ERROR:  22P02: array value must start with "{" or dimension information

Hm, that's annoying.  Not that the expressions fail --- we want them to
--- but that the error message is so unhelpful.

Since ANYARRAY is already special to the type system, I don't have a
problem with inserting some special case to prevent this, but I'm not
sure what the special case should be.  Is it too klugy to say "don't
implicitly cast unknown to anyarray unless the literal's value starts
with { or ["?  We've never made casting decisions depend on the contents
of strings before, and I'm really loath to make 'em do so now.

Seems basically we'd want to not cast unknown to anyarray unless there
is some additional bit of context suggesting that that's the right thing.
But what should that extra requirement be?  Can we go as far as not
doing this cast implicitly at all?
        regards, tom lane

Re: Implicit casts with generic arrays

From

Peter Eisentraut

Date:

28 February 2007, 08:40:25

Am Dienstag, 27. Februar 2007 19:50 schrieb Tom Lane:
> Seems basically we'd want to not cast unknown to anyarray unless there
> is some additional bit of context suggesting that that's the right thing.
> But what should that extra requirement be?  Can we go as far as not
> doing this cast implicitly at all?

We could say that unknown is not taken as anyarray input if the entire 
function/operator argument list consists of anyelement or anyarray.  But that 
might be even harder to comprehend.  With the ARRAY[...] syntax available, 
converting unknown to anyarray might be altogether unnecessary.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Implicit casts with generic arrays

From

Tom Lane

Date:

04 June 2007, 17:30:44

Awhile back, I wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> I've looked into cutting back on the implicit casts to text, which 
>> exposed the following little gem.

>> The expressions
>> 'abc' || 34
>> 34 || 'abc'
>> would no longer work, with the following error message:
>> ERROR:  22P02: array value must start with "{" or dimension information

> Hm, that's annoying.  Not that the expressions fail --- we want them to
> --- but that the error message is so unhelpful.

I've looked into this more closely.  The problem basically is that the
parser sees these alternatives for binary || operators:

select oid,oid::regoperator,oprcode from pg_operator where oprname = '||';oid  |           oid           |    oprcode
------+-------------------------+--------------- 349 | ||(anyarray,anyelement) | array_append 374 |
||(anyelement,anyarray)| array_prepend 375 | ||(anyarray,anyarray)   | array_cat 654 | ||(text,text)           |
textcat1797| ||(bit,bit)             | bitcat2018 | ||(bytea,bytea)         | byteacat

(6 rows)

If there is no implicit cast from int to text, then operator 349 is the
*only* candidate that is not immediately eliminated by the lack of any
way to cast an integer 34 to its right argument type.  So as far as the
parser is concerned there is no ambiguity.  If we hack things to prevent
matching unknown to anyarray, as was suggested in the previous
discussion, we'll get "operator does not exist: "unknown" || integer".
Which is better than the 22P02 error, but still not great.

It furthermore seems that the two operators anyarray || anyelement and
anyelement || anyarray are really the only cases where an undesirable
match to anyarray might occur.  The other operators that take anyarray
take it on both sides, which means that they'd not be preferred unless
the other operand was discernibly an array.  I don't think we want a
solution that causes "knownarraycolumn = '{1,2,3}'" to start failing.

That argument is even more compelling on the function side, because
for instance there isn't a lot of doubt about the user's intent if he
writes "array_append('{1,2,3}', 34)".

So after reflecting on all that, it doesn't seem like a good idea to
hack the type-coercion code to discriminate against matching unknown
to anyarray.  It looks to me like we have a very narrow problem and
we should tailor a very narrow solution.  What I am currently thinking
we should do is make oper() specifically test for the case of operator
349 with UNKNOWN left input, or operator 374 with UNKNOWN right input,
and throw a custom error message hinting that the other operand
needs to be cast to text.

In the long run maybe we should choose some other name for the
array_append and array_prepend operators to avoid the confusion with
concatenation.  It seems to me that "concatenation" normally implies
"stringing together similar objects", which these two operators
definitely don't do, and so you could argue that || was a bad name
for them from the get-go.  But compatibility worries would mean we
couldn't eliminate the old names for quite a long time, so maybe
it's too late for that.

Comments?
        regards, tom lane

Re: Implicit casts with generic arrays

From

Gregory Stark

Date:

04 June 2007, 17:59:24

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> So after reflecting on all that, it doesn't seem like a good idea to
> hack the type-coercion code to discriminate against matching unknown
> to anyarray.  It looks to me like we have a very narrow problem and
> we should tailor a very narrow solution.  What I am currently thinking
> we should do is make oper() specifically test for the case of operator
> 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input,
> and throw a custom error message hinting that the other operand
> needs to be cast to text.

Wouldn't that mean that 'foo'||'bar' would *still* fail?

It really seems to me that at some point down the line we're going to cave and
admit that users do expect 'foo' to be a string first and cast to other types
only if the context requires it. That would mean we should be considering
matching "unknown" as text first without casting and only if that fails
looking for other types.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: Implicit casts with generic arrays

From

Tom Lane

Date:

04 June 2007, 20:06:51

I wrote:
> It looks to me like we have a very narrow problem and
> we should tailor a very narrow solution.  What I am currently thinking
> we should do is make oper() specifically test for the case of operator
> 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input,
> and throw a custom error message hinting that the other operand
> needs to be cast to text.

I've been experimenting with another solution, which is to not add any
weird error cases but instead add operators that will capture the
problem cases back away from the anyelement||anyarray operators.
My current prototype is

create function catany(text, anyelement) returns text as  $$ select $1 || $2::text $$ language sql;
create function catany(anyelement, text) returns text as  $$ select $1::text || $2 $$ language sql;

create operator || (procedure = catany, leftarg = text, rightarg = anyelement);
create operator || (procedure = catany, leftarg = anyelement, rightarg = text);

which seems to mostly do the "right" thing.  This approach would have
one nice property, namely eliminating the single biggest point of
push-back we are likely to get from removing the implicit casts to text.
I have no doubt that practically the only reasonable use-case for that
behavior was to let people concatenate stuff without being too picky
about casts, and this mostly preserves that ability.  It's not perfect,
because it only fixes cases in which at least one operand is either
unknown or implicitly coercible to text.  But in practice I think that
would cover 99% of cases, since typical usages tend to alternate
literals and data values.

Thoughts?  Is this too klugy for words?
        regards, tom lane

Re: Implicit casts with generic arrays

From

Tom Lane

Date:

04 June 2007, 22:25:51

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> we should do is make oper() specifically test for the case of operator
>> 349 with UNKNOWN left input, or operator 374 with UNKNOWN right input,
>> and throw a custom error message hinting that the other operand
>> needs to be cast to text.

> Wouldn't that mean that 'foo'||'bar' would *still* fail?

No, because that would preferentially match to text || text, it being
a preferred-type case.  The current behavior with the implicit casts
removed is

template1=# select 'abc' || '34';?column? 
----------abc34
(1 row)

ie, this was matched to the text || text operator;

template1=# select 'abc' || 34;
ERROR:  array value must start with "{" or dimension information

ie, this was matched to the anyarray || anyelement operator --- because
it clearly can't match text || text.

> It really seems to me that at some point down the line we're going to
> cave and admit that users do expect 'foo' to be a string first and
> cast to other types only if the context requires it.

We already do that to some extent, as shown above; and it's got
approximately nothing to do with this problem anyway.  The cases where
we have got a problem are where the other argument is clearly *not*
text.

But having said that, I'm currently leaning to the other solution of
generalizing the || operator (and only that operator) instead of
fooling with the type resolution rules.
        regards, tom lane

Re: Implicit casts with generic arrays

From

Tom Lane

Date:

05 June 2007, 14:02:39

I wrote:
> I've been experimenting with another solution, which is to not add any
> weird error cases but instead add operators that will capture the
> problem cases back away from the anyelement||anyarray operators.
> My current prototype is

> create function catany(text, anyelement) returns text as 
>   $$ select $1 || $2::text $$ language sql;
> create function catany(anyelement, text) returns text as 
>   $$ select $1::text || $2 $$ language sql;

> create operator || (procedure = catany, leftarg = text, rightarg = anyelement);
> create operator || (procedure = catany, leftarg = anyelement, rightarg = text);

> which seems to mostly do the "right" thing.  This approach would have
> one nice property, namely eliminating the single biggest point of
> push-back we are likely to get from removing the implicit casts to text.

I've been testing this approach more, and finding that it really
captures a bit too much: some cases that you'd prefer went to
anyelement||anyarray will be captured by the text||anyelement operator.
For example in 8.2 this is mapped to array_prepend:

regression=# select 'x'::text || array['aa','bb','cc'];  ?column?
--------------{x,aa,bb,cc}
(1 row)

but with the experimental code you get textcat:

catany=# select 'x'::text || array['aa','bb','cc']; ?column?
-------------x{aa,bb,cc}
(1 row)

Basically the textcat operators will capture any case where the scalar
side is implicitly coercible to text, because the type resolution rules
will prefer that.  There are some hacks we could make to make this less
probable (eg, declare the capturing operators as taking varchar instead
of text) but I can't find any complete solution short of changing the
resolution rules themselves.  Which I'm loath to do since it might have
unexpected side-effects.

What I would like to propose is that we deprecate use of || as the
operator name for array_prepend and array_append, and invent new
recommended names for them.  As I said earlier, these operators
aren't exactly concatenation in any normal sense anyway, since they
don't treat their operands symmetrically.  My first thought is to
suggest using the shifting symbols:    anyelement >> anyarray    anyarray << anyelement
but perhaps someone will have a better suggestion.  If we do that, then
we have a solution for anyone whose array prepend or append operator is
unexpectedly captured by text concatenation: use the new names instead.

Now this is only going to seem like a good idea if you agree that we
should have some capturing operators like these.  But if we don't,
I think we are going to get a lot of push-back from people whose
concatenations of random datatypes suddenly stopped working.
Essentially this proposal is putting the compatibility hit of tightening
the implicit cast rules onto people who are using array append/prepend
instead of people who are using concatenation without explicit casts.
I think there are a lot fewer of the former than the latter.

Comments?
        regards, tom lane

Re: Implicit casts with generic arrays

From

Joe Conway

Date:

06 June 2007, 00:03:14

Tom Lane wrote:
> 
>>> The expressions
>>> 'abc' || 34
>>> 34 || 'abc'
>>> would no longer work, with the following error message:
>>> ERROR:  22P02: array value must start with "{" or dimension information
> 
>> Hm, that's annoying.  Not that the expressions fail --- we want them to
>> --- but that the error message is so unhelpful.

indeed

> In the long run maybe we should choose some other name for the
> array_append and array_prepend operators to avoid the confusion with
> concatenation.  It seems to me that "concatenation" normally implies
> "stringing together similar objects", which these two operators
> definitely don't do, and so you could argue that || was a bad name
> for them from the get-go.  But compatibility worries would mean we
> couldn't eliminate the old names for quite a long time, so maybe
> it's too late for that.
> 
> Comments?

Originally I saw this situation as as requiring the concatenation 
operator per SQL 2003:

<array value expression> ::=    <array concatenation>  | <array primary>
<array concatenation> ::=   <array value expression 1> <concatenation operator> <array primary>
<concatenation operator> ::= ||

<array value expression 1> ::= <array value expression>
<array primary> ::= <value expression primary>
<value expression primary> ::=    <parenthesized value expression>  | <nonparenthesized value expression primary>
<parenthesized value expression> ::=    <left paren> <value expression> <right paren>
<value expression> ::=    <common value expression>  | <boolean value expression>  | <row value expression>
<common value expression> ::=    <numeric value expression>  | <string value expression>  | <datetime value expression>
| <interval value expression>  | <user-defined type value expression>  | <reference value expression>  | <collection
valueexpression>
 
<nonparenthesized value expression primary> ::=    <unsigned value specification>  | <column reference>  | <set
functionspecification>  | <window function>  | <scalar subquery>  | <case expression>  | <cast specification>  | <field
reference> | <subtype treatment>  | <method invocation>  | <static method invocation>  | <new specification>  |
<attributeor method reference>  | <reference resolution>  | <collection value constructor>  | <array element reference>
| <multiset element reference>  | <routine invocation>  | <next value expression>
 
<collection value constructor> ::=    <array value constructor>  | <multiset value constructor>
<unsigned value specification> ::=    <unsigned literal>  | <general value specification>
<unsigned literal> ::=    <unsigned numeric literal>  | <general literal>
<general literal> ::=    <character string literal>  | <national character string literal>  | <Unicode character string
literal> | <binary string literal>  | <datetime literal>  | <interval literal>  | <boolean literal>
 


What I can't decide now is whether all the above means the anyelement in 
this operation ought to be in parens or not. It seems to me that the 
anyelement can be any literal _except_ a <signed numeric literal>. In 
that case the spec seems to require parenthesis.

Joe

Re: Implicit casts with generic arrays

From

"Zeugswetter Andreas ADI SD"

Date:

06 June 2007, 04:36:50

> For example in 8.2 this is mapped to array_prepend:
>
> regression=# select 'x'::text || array['aa','bb','cc'];
>    ?column?
> --------------
>  {x,aa,bb,cc}
> (1 row)
>
> but with the experimental code you get textcat:
>
> catany=# select 'x'::text || array['aa','bb','cc'];
>   ?column?
> -------------
>  x{aa,bb,cc}
> (1 row)

This is what I would have expected || to give, and not what 8.2 does.
So disregarding the rest of the argument I think that array_[pre|ap]pend
should have other operators.

Andreas

Re: Implicit casts with generic arrays

From

"Pavel Stehule"

Date:

06 June 2007, 05:17:37

>
> > For example in 8.2 this is mapped to array_prepend:
> >
> > regression=# select 'x'::text || array['aa','bb','cc'];
> >    ?column?
> > --------------
> >  {x,aa,bb,cc}
> > (1 row)
> >
> > but with the experimental code you get textcat:
> >
> > catany=# select 'x'::text || array['aa','bb','cc'];
> >   ?column?
> > -------------
> >  x{aa,bb,cc}
> > (1 row)
>
> This is what I would have expected || to give, and not what 8.2 does.
> So disregarding the rest of the argument I think that array_[pre|ap]pend
> should have other operators.
>
> Andreas
>

I thing so current behave is more intuitive and practical. Result
x{aa,bb,cc} is nonsens. Array concation have to have higher priority
than text concation.

Pavel

Re: Implicit casts with generic arrays

From

Tom Lane

Date:

06 June 2007, 11:21:46

Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> In the long run maybe we should choose some other name for the
>> array_append and array_prepend operators to avoid the confusion with
>> concatenation.  It seems to me that "concatenation" normally implies
>> "stringing together similar objects", which these two operators
>> definitely don't do, and so you could argue that || was a bad name
>> for them from the get-go.

> Originally I saw this situation as as requiring the concatenation 
> operator per SQL 2003:

Maybe I am missing something, but the only such construct I see in
SQL2003 is concatenation of arrays of equal rank.  There is nothing
corresponding to array_prepend or array_append.

I do have a plan B if people don't want to rename the operators, though.
It looks to me like we could eliminate the conflict if we invented a new
polymorphic pseudotype called "anynonarray" or some such, which would
act like anyelement *except* it would not match an array.  Then,
declaring the capturing operators as text||anynonarray and
anynonarray||text would prevent them from matching any case where either
side was known to be an array type.  But they would (I think) still win
out in cases such as scalar || 'unknown literal'.  The end result would
be that concatenations involving a known-array value would be array
concatenation, but you could force them to be text concatenation, if
that's what you wanted, by explicitly casting the array value(s) to text.

I was a bit hesitant to propose this since I couldn't immediately think
of any other use-case for such a pseudotype.  It's not a huge amount of
added code (cf. anyenum) but it's definitely a visible wart on the type
system.  Comments?
        regards, tom lane

Re: Implicit casts with generic arrays

From

Alvaro Herrera

Date:

06 June 2007, 11:37:17

Tom Lane wrote:

> I do have a plan B if people don't want to rename the operators, though.
> It looks to me like we could eliminate the conflict if we invented a new
> polymorphic pseudotype called "anynonarray" or some such, which would
> act like anyelement *except* it would not match an array.  Then,
> declaring the capturing operators as text||anynonarray and
> anynonarray||text would prevent them from matching any case where either
> side was known to be an array type.  But they would (I think) still win
> out in cases such as scalar || 'unknown literal'.  The end result would
> be that concatenations involving a known-array value would be array
> concatenation, but you could force them to be text concatenation, if
> that's what you wanted, by explicitly casting the array value(s) to text.
> 
> I was a bit hesitant to propose this since I couldn't immediately think
> of any other use-case for such a pseudotype.  It's not a huge amount of
> added code (cf. anyenum) but it's definitely a visible wart on the type
> system.  Comments?

On the contrary, I would think that it fits nicely to "close the loop"
on the anyarray/anyelement feature set.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Implicit casts with generic arrays

From

Tom Lane

Date:

06 June 2007, 12:23:11

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> I do have a plan B if people don't want to rename the operators, though.
>> It looks to me like we could eliminate the conflict if we invented a new
>> polymorphic pseudotype called "anynonarray" or some such, which would
>> act like anyelement *except* it would not match an array.
>> ...
>> I was a bit hesitant to propose this since I couldn't immediately think
>> of any other use-case for such a pseudotype.  It's not a huge amount of
>> added code (cf. anyenum) but it's definitely a visible wart on the type
>> system.  Comments?

> On the contrary, I would think that it fits nicely to "close the loop"
> on the anyarray/anyelement feature set.

OK, I'll go code this up and verify that it behaves like I think it will...
        regards, tom lane

Re: Implicit casts with generic arrays

From

Joe Conway

Date:

06 June 2007, 14:38:09

Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> Tom Lane wrote:
>>> In the long run maybe we should choose some other name for the
>>> array_append and array_prepend operators to avoid the confusion with
>>> concatenation.  It seems to me that "concatenation" normally implies
>>> "stringing together similar objects", which these two operators
>>> definitely don't do, and so you could argue that || was a bad name
>>> for them from the get-go.
> 
>> Originally I saw this situation as as requiring the concatenation 
>> operator per SQL 2003:
> 
> Maybe I am missing something, but the only such construct I see in
> SQL2003 is concatenation of arrays of equal rank.  There is nothing
> corresponding to array_prepend or array_append.

Well, I've never claimed to be particularly good at interpreting the SQL 
spec, but as an example...

<array concatenation> ::=   <array value expression 1> || <array primary>
<array primary> ::=
<value expression primary> ::=
<nonparenthesized value expression primary> ::=
<unsigned value specification> ::=
<unsigned literal> ::=    <unsigned numeric literal>

Doesn't this mean that array concatenation should include things like:
   <array value expression> || <unsigned numeric literal>

e.g.
  ARRAY[1,2,3] || 42
?

> I do have a plan B if people don't want to rename the operators, though.
> It looks to me like we could eliminate the conflict if we invented a new
> polymorphic pseudotype called "anynonarray" or some such, which would
> act like anyelement *except* it would not match an array.  Then,
> declaring the capturing operators as text||anynonarray and
> anynonarray||text would prevent them from matching any case where either
> side was known to be an array type.  But they would (I think) still win
> out in cases such as scalar || 'unknown literal'.  The end result would
> be that concatenations involving a known-array value would be array
> concatenation, but you could force them to be text concatenation, if
> that's what you wanted, by explicitly casting the array value(s) to text.

That sounds reasonable to me.

Joe

Re: Implicit casts with generic arrays

From

Tom Lane

Date:

06 June 2007, 17:22:55

Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> Maybe I am missing something, but the only such construct I see in
>> SQL2003 is concatenation of arrays of equal rank.  There is nothing
>> corresponding to array_prepend or array_append.

> Well, I've never claimed to be particularly good at interpreting the SQL 
> spec, but as an example...

> <array concatenation> ::=
>     <array value expression 1> || <array primary>
> <array primary> ::=
> <value expression primary> ::=
> <nonparenthesized value expression primary> ::=
> <unsigned value specification> ::=
> <unsigned literal> ::=
>      <unsigned numeric literal>

> Doesn't this mean that array concatenation should include things like:
>     <array value expression> || <unsigned numeric literal>

No, because the first syntax rule for that is
        1) The declared type of <value expression primary> shall be an           array type.

However, assuming that the anynonarray idea works out, we can do that
and not worry about touching the array operators.
        regards, tom lane

Re: Implicit casts with generic arrays

From

Tom Lane

Date:

06 June 2007, 18:04:02

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> It looks to me like we could eliminate the conflict if we invented a new
>> polymorphic pseudotype called "anynonarray" or some such, which would
>> act like anyelement *except* it would not match an array.
> ...
> On the contrary, I would think that it fits nicely to "close the loop"
> on the anyarray/anyelement feature set.

OK, I hacked this together and it seems to behave at least as reasonably
as 8.2 does.  "8.3" here means HEAD + anynonarray + capturing concat
operators.  I used integer as an example of a type for which 8.2 has an
implicit cast to text, and point as an example of a type for which it
doesn't:

Expression            8.3        8.2

text || text            text concat    text concat
text || 'unknown'        text concat    text concat
text || text[]            array concat    array concat
text || non-text array        error        error
text || non-text scalar        text concat    text concat [1]

integer || integer        error        text concat
integer || 'unknown'        text concat    text concat
integer || integer[]        array concat    array concat
integer || non-integer array    error        error
integer || non-integer scalar    error        text concat [1]

point || point            error        error
point || 'unknown'        text concat    'array value must start ...'
point || point[]        array concat    array concat
point || non-point array    error        error
point || non-point scalar    error        error

text[] || text[]        array concat    array concat
text[] || 'unknown'        error        error
text[] || non-text array    error        error
text[] || non-text scalar    error        error

[1] for types for which 8.2 has an implicit cast to text, else it fails.
These are:bigintsmallintintegeroidrealdouble precisionnumericdatetime without time zonetime with time zonetimestamp
withouttime zonetimestamp with time zoneinterval

(I was interested to find that there were cases where 8.2 would come out
with the dreaded "array value must start with "{" or dimension
information" error.)

I think that the above chart is pretty defensible; the only cases that
fail now where they worked before are concatenations where neither side
is either text or an unadorned string literal.  Frankly, I think this:

catany=# select 3 || 0.4;
ERROR:  operator does not exist: integer || numeric

is way preferable to this:

regression=# select 3 || 0.4;?column?
----------30.4
(1 row)

which is what 8.2 does --- if you want text concatenation you should
make at least *some* effort to signal that, like casting one side to
text or at least quoting it.  Run-together concatenations like

catany=# select 'sin(' || 2 || ')';?column?
----------sin(2)
(1 row)

will work as long as at least one of the first two concatenated items is
textual or an unadorned string literal.

Barring objections I'll clean this up and commit it.
        regards, tom lane